Production-ready ML model monitoring system with automated drift detection, retraining, and real-time dashboard
- Real-time Monitoring: Track model performance metrics (accuracy, precision, recall, F1, ROC-AUC)
- Drift Detection: Automated data drift monitoring with Evidently AI
- Auto-Retraining: Smart triggers based on drift, accuracy, and schedule
- REST API: 15+ FastAPI endpoints with Swagger documentation
- Modern Dashboard: Next.js frontend with glassmorphism UI
- MLflow Integration: Experiment tracking and model registry
- Database Persistence: PostgreSQL/SQLite with SQLAlchemy
- Docker Deployment: Complete stack with docker-compose
- CI/CD Pipeline: GitHub Actions for testing and deployment
- Comprehensive Testing: 50+ unit and integration tests
Backend (FastAPI) → Database (PostgreSQL) → MLflow
↓ ↓
Predictions API Model Registry
↓ ↓
Monitoring API ← Drift Detection
↓
Frontend Dashboard (Next.js)
# Clone repository
git clone https://github.com/Isuruigi/ml-monitoring-system.git
cd ml-monitoring-system
# Configure environment
cp .env.example .env
# Edit .env with your settings
# Start all services
docker-compose up -d
# Access services:
# - Frontend: http://localhost:3000
# - Backend API: http://localhost:8000/docs
# - MLflow: http://localhost:5000
# - Prometheus: http://localhost:9090
# - Grafana: http://localhost:3001Backend:
cd backend
pip install -r requirements.txt
python -m uvicorn api.main:app --reload --port 8000Frontend:
cd frontend
npm install
npm run dev- Quick Start Guide - Detailed setup instructions
- API Documentation - Interactive Swagger UI
- Project Summary - Complete feature overview
Backend:
- FastAPI 0.104.1
- XGBoost 2.0.3
- Scikit-learn 1.3.2
- Evidently 0.4 (Drift Detection)
- MLflow 2.9.2
- SQLAlchemy 2.0.23
- Prometheus Client
Frontend:
- Next.js 14
- TypeScript
- Tailwind CSS
- Recharts
- Lucide Icons
Infrastructure:
- Docker & Docker Compose
- PostgreSQL 15
- Redis 7
- Prometheus
- Grafana
ml-monitoring-system/
├── backend/ # FastAPI application
│ ├── api/ # REST API routes
│ ├── ml/ # ML models & training
│ ├── data/ # Data loading & database
│ └── monitoring/ # Metrics & drift detection
├── frontend/ # Next.js dashboard
│ ├── app/ # Pages & layouts
│ └── components/ # React components
├── tests/ # Test suite
├── docker/ # Dockerfiles
└── .github/ # CI/CD workflows
# Run all tests
pytest tests/ -v --cov=backend
# Run specific test file
pytest tests/test_model.py -v
# Run with coverage report
pytest tests/ --cov=backend --cov-report=htmlThe system includes:
- CI/CD Pipeline: Automated testing and deployment via GitHub Actions
- Docker Support: Multi-stage builds for optimized images
- Health Checks: All services monitored
- Database Migrations: Automatic on deployment
See deployment guide for details.
- Model Performance: Accuracy, Precision, Recall, F1, ROC-AUC
- Drift Metrics: PSI per feature, overall drift score
- System Metrics: API latency, prediction volume, uptime
- Business Metrics: Prediction accuracy over time
- API key authentication
- Admin key for model management
- Environment-based secrets
- Rate limiting
- Input validation
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see LICENSE file.
- Advanced model comparison
- Email/Slack alerts
- Multi-model support
- A/B testing framework
- Cloud deployment guides (AWS/GCP/Azure)
Built with ❤️ using modern MLOps practices
⭐ Star this repo if you find it useful!