A full-stack, production-grade ML system for real-time e-commerce recommendations using a two-stage pipeline: ALS (Collaborative Filtering) + LightGBM (Ranking).
Key Components:
- Two-stage ML Pipeline: ALS candidate generation + LightGBM ranking
- Real-time Streaming: Kafka → Flink → Redis feature store
- Production Ready: FastAPI + ONNX + Docker + Azure deployment
- Data Pipeline: DVC versioning with Azure Blob Storage
- Monitoring: Prometheus + Grafana + MLflow tracking
- Python (main language)
- Pandas, NumPy (data processing)
- Scikit-learn (ML utilities)
- Great Expectations (data validation)
- DVC (data versioning)
- MLflow (experiment tracking)
- Kafka (event streaming)
- Flink (stream processing)
- Feast (feature store)
- Redis (caching)
- ALS (collaborative filtering)
- LightGBM (ranking model)
- ONNX (model optimization)
- ONNX Runtime (fast inference)
- Faiss (vector similarity)
- FastAPI (Python API server)
- Next.js (frontend framework)
- Swagger UI (API documentation)
- Azure (cloud platform)
- Docker (containerization)
- GitHub Actions (CI/CD)
- Azure Container Apps (deployment)
- Git (version control)
- Pytest (testing)
- Prometheus + Grafana (monitoring)
- Azure Monitor (cloud monitoring)
- Azure Key Vault (secrets management)
- JWT (authentication)
src/
├── data_generation/ # Synthetic data generation
├── processing/ # Data processing & feature engineering
├── retrieval/ # ALS candidate generation (Phase 3)
├── ranking/ # LightGBM ranking (Phase 4)
├── similarity/ # Item similarity engine (Phase 5)
├── serving/ # FastAPI serving layer (Phase 6)
├── streaming/ # Kafka + Flink real-time processing
└── validation/ # Data validation with Great Expectations
models/ # Trained ML models (ALS, LightGBM, ONNX)
data/ # DVC-tracked data (users, products, interactions)
frontend/ # React application
docker/ # Docker configuration
- Python 3.9+, Docker, Git
# Clone and setup
git clone <your-repo-url>
cd recommndr
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Run data pipeline
dvc repro
# Start services
docker-compose up -d
# Test API
python test_phase6_api.py- Phase 1: Data Generation & Validation (10K users, 1K products, 100K interactions)
- Phase 2: Streaming Pipeline (Kafka + Flink + Redis)
- Phase 3: ALS Candidate Generation (Collaborative Filtering)
- Phase 4: LightGBM Ranking (27+ contextual features)
- Phase 5: Similarity Engine (Item-item recommendations)
- Phase 6: FastAPI Serving Layer (Production endpoints)
- Phase 7: Frontend Development (React application)
# Data pipeline
dvc repro # Run complete pipeline
dvc push # Push to Azure storage
# ML pipeline
python -m src.ranking.main --train # Train LightGBM model
python demo_complete_lifecycle.py # End-to-end demo
# API testing
python test_phase6_api.py # Test all endpoints- End-to-End Latency: <200ms
- Data Quality: 88.75% validation score
- Real-time Updates: 5-15 seconds from event to recommendation
- Container Apps: API deployment
- Static Web Apps: Frontend hosting
- Blob Storage: DVC data versioning
- Redis Cache: Feature store
# Deploy to Azure
az containerapp up --name recommndr-api --resource-group recommndr-rgMIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request

