AI-powered ETL & ML Microservice System for Financial Data Processing
π Production deployed on AWS EC2 (Ubuntu 22.04, Docker Compose)
Live URL here (deployed on AWS EC2):
https://etl.tonys-dev.com/docs
QuantStox is a robust, containerized microservice platform for end-to-end financial data processing, analysis, and prediction. It automates extraction, transformation, loading (ETL), and machine learning (ML) for stock data, integrating with Alpha Vantage, AWS S3, and Google Gemini LLM for sentiment analysis. Built for reliability, observability, and easy deployment.
Production deployment: Runs on AWS EC2 (Ubuntu 22.04) using Docker Compose for orchestration.
graph LR
A[Client] -->|POST /extract| B(API Gateway)
B -->|Async Task| C{ETL Worker}
C -->|Fetch| D[Alpha Vantage]
C -->|Process| E[Pandas/Clean]
C -->|Inference| F[ML Service]
F -->|Sentiment| G[Gemini/OpenAI]
C -->|Store| H[(PostgreSQL)]
C -->|Archive| I[AWS S3]
Backend:
- Python 3.14
- FastAPI 0.128.0 (API, Worker, ML Service)
- Uvicorn
- Pandas, NumPy
- TensorFlow 2.20.0, scikit-learn
- Google Gemini LLM (via
google-genai) - Requests, Pydantic
Database & Infrastructure:
- PostgreSQL 15 (Dockerized)
- AWS S3 (Data Lake)
- Docker Compose (multi-service orchestration)
Monitoring & Observability:
- Prometheus (metrics)
- Grafana (dashboards)
- Prometheus Python client
Testing & Quality:
- Pytest, pytest-cov, pytest-asyncio
DevOps & Deployment:
- Docker, Docker Compose
- Nginx (reverse proxy, SSL)
- Bash/PowerShell scripts for setup and management
quantstox/
βββ api/ # FastAPI API Gateway
βββ worker/ # ETL Processor (Extract, Transform, Load)
βββ ml-service/ # ML & Sentiment Microservice
βββ monitoring/ # Prometheus & Grafana configs
βββ scripts/ # Setup, run, deploy scripts
βββ tests/ # Pytest-based test suite
βββ docker-compose.yml
βββ init-db.sql
βββ nginx.conf.template
βββ .env.example
βββ README.md
βββ ...
- Docker (v24+)
- Docker Compose (v2.20+)
- Git
- (Optional) AWS CLI for S3 integration
git clone https://github.com/TonyS-dev/quantstox.git
cd quantstoxcp .env.example .env
# Edit .env with your credentials (DB, API keys, AWS, etc.)
nano .envdocker compose up --build| Service | URL/Port | Publicly Accessible | Auth Required | Notes |
|---|---|---|---|---|
| API Gateway | http://localhost:8000 | β Yes | β API Key | Main entrypoint, all sensitive endpoints require API key |
| Worker | http://localhost:8001 | β No (internal) | N/A | Only accessible via Docker network |
| ML Service | http://localhost:8002 | β No (internal) | N/A | Only accessible via Docker network |
| Prometheus | http://localhost:9090 | β No (internal) | N/A | Only accessible via Docker network |
| Grafana | http://localhost:3000 | β Yes | β Password | Public dashboard, password protected |
- API Gateway and Grafana are the only services exposed to the public network (with authentication).
- All other services communicate securely via the Docker internal network and are not exposed externally.
- Internal endpoints (e.g.,
/internal/metrics) are only accessible from within the Docker network.
All protected endpoints require API key authentication via the X-API-KEY or Authorization: Bearer <key> header.
-
How to get an API key:
- Run the setup script to generate admin and client keys:
# If running locally python3 scripts/setup_keys.py # Or inside Docker Compose (recommended for production) docker compose exec api python3 scripts/setup_keys.py # Or use the management tool for advanced options docker compose exec api python3 scripts/manage_keys.py
- The script will print your keys. Store them securely (e.g., in your
.envfile or a password manager).
- Run the setup script to generate admin and client keys:
-
How to use the API key:
curl -H "X-API-KEY: your_key" http://localhost:8000/extract \ -d '{"symbols": ["AAPL"]}'
or
curl -H "Authorization: Bearer your_key" http://localhost:8000/extract \ -d '{"symbols": ["AAPL"]}'
-
API key types:
- Admin key: Unlimited requests (for owner/maintainer)
- Client key: Limited to 5 requests per day (for demo/external users)
-
Endpoints requiring API key:
POST /extractPOST /predictPOST /sentiment/{symbol}GET /metrics(does NOT consume usage quota)
-
Endpoints NOT requiring API key:
GET /(root)GET /healthGET /stocks/{symbol}(read-only stock data)GET /internal/metrics(internal Docker network only)
- Automated ETL: Extracts, transforms, and loads stock data
- ML Price Prediction: LSTM-based forecasting per symbol
- Sentiment Analysis: Google Gemini LLM integration
- Data Lake: Archives to AWS S3
- RESTful API: FastAPI endpoints for ETL, prediction, sentiment
- Monitoring: Prometheus metrics, Grafana dashboards
- Dockerized: Easy local or cloud deployment
- Secure: Environment-based secrets, production-ready configs
curl -X POST http://localhost:8000/extract \
-H "Content-Type: application/json" \
-d '{"symbols": ["AAPL", "GOOGL"]}'curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"symbol": "AAPL", "days_ahead": 1}'curl -X POST http://localhost:8002/sentiment \
-H "Content-Type: application/json" \
-d '{"text": "Apple stock is performing well"}'- Swagger UI: http://localhost:8000/docs
- Key Endpoints:
POST /extractβ Trigger ETL pipelinePOST /predictβ Predict next-day pricePOST /sentimentβ Analyze news/text sentimentGET /metricsβ Prometheus metrics (all services)
cd tests
pytest --cov=..docker compose up --buildThis project is deployed and tested on AWS EC2 (Ubuntu 22.04, t3.small, Docker Compose).
- Launch Ubuntu EC2, open ports 22, 8000, 8001, 8002, 9090, 3000
- Install Docker, Docker Compose, Git
- Clone repo, copy
.env.exampleto.env.prod, fill with prod values - Deploy:
docker compose --env-file .env.prod up -d- (Optional) Set up Nginx + SSL (see
nginx.conf.template)
Pull requests and issues are welcome! Please open an issue to discuss major changes first.
MIT
More detailed documentation can be found in the following files: scripts/README.md Deployment and automation scripts guide
Antonio Santiago (TonyS-dev)
GitHub
Email: santiagor.acarlos@gmail.com





