End-to-end machine learning pipeline for road accident risk prediction with comprehensive monitoring, deployment, and CI/CD automation.
This project implements a complete MLOps pipeline for predicting road accident risk using Gradient Boosting Regressor. The pipeline includes data ingestion, validation, transformation, feature engineering, model training, evaluation, and monitoring with Evidently AI.
graph LR
A[Data Ingestion] --> B[Data Validation]
B --> C[Feature Engineering]
C --> D[Data Transformation]
D --> E[Model Training]
E --> F[Model Evaluation]
F --> G[Monitoring]
G --> H[MLflow Tracking]
I[main.py] --> A
J[airflow_dag.py] --> A
K[app.py] --> L[Flask Web App]
E --> L
- Loads raw dataset from source
- Splits data into train and test sets
- Stores in artifacts directory
- Validates schema and data types
- Checks for missing values
- Ensures data quality
- Creates interaction features (lanes_speed, curvature_speed)
- Generates risk indicators (high_speed, few_lanes, no_signs)
- Builds categorical features (speed_category, curvature_category)
- Encodes categorical features using LabelEncoder
- Scales numerical features using StandardScaler
- Prepares data for model training
- Trains Gradient Boosting Regressor
- Performs hyperparameter tuning
- Saves trained model to artifacts
- Evaluates model performance (MAE, RMSE, R2 Score)
- Logs metrics to MLflow
- Stores evaluation results
- Generates data drift reports using Evidently AI
- Tracks model performance over time
- Creates interactive HTML dashboards
- ML Framework: scikit-learn
- Monitoring: Evidently AI
- Experiment Tracking: MLflow, DagsHub
- Orchestration: Apache Airflow
- Web Framework: Flask
- Containerization: Docker
- CI/CD: GitHub Actions
- Deployment: Kubernetes
ML_PipeLine_Evidently/
├── src/heartpipeline/
│ ├── components/ # Pipeline components
│ ├── pipeline/ # Stage implementations
│ ├── config/ # Configuration management
│ ├── entity/ # Entity definitions
│ └── utils/ # Utility functions
├── config/ # YAML configurations
├── artifacts/ # Generated artifacts
├── monitoring/ # Evidently monitoring
├── deployment/ # Kubernetes manifests
├── templates/ # Flask HTML templates
├── static/ # CSS/JS/Images
├── .github/workflows/ # CI/CD pipelines
├── app.py # Flask web application
├── main.py # Pipeline executor
├── airflow_dag.py # Airflow DAG definition
├── Dockerfile # Container image
└── requirements.txt # Python dependencies
- Interactive prediction interface
- Real-time risk assessment (Low/Medium/High)
- Monitoring dashboard with drift reports
- REST API endpoints for predictions
- Automated pipeline execution
- Task dependencies and scheduling
- Failure handling and retries
- Data drift detection
- Model performance tracking
- Feature-level analysis
- Interactive visualizations
- Experiment tracking
- Model versioning
- Metric logging
- DagsHub remote tracking
git clone https://github.com/Abeshith/ML_PipeLine_Evidently.git
cd ML_PipeLine_Evidentlycp -r /mnt/d/ML\ PipeLine\ \(Evidently\) ~/
cd ~/ML\ PipeLine\ \(Evidently\)# Install Python3
sudo apt update
sudo apt install python3 python3-pip python3-venv
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Set Airflow home directory
export AIRFLOW_HOME=~/airflow
echo $AIRFLOW_HOME
# Initialize Airflow database
airflow db init
# Configure authentication manager
vim ~/airflow/airflow.cfgEdit the configuration file:
- Press
ito enter insert mode - Replace
auth_manager = airflow.api.fastapi.auth.managers.simple.simple_auth_manager.SimpleAuthManager - With
auth_manager=airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager - Press
ESCthen type:wq!and pressENTER
# Create DAGs directory
mkdir -p ~/airflow/dags
# Copy DAG file to Airflow directory
cp airflow_dag.py ~/airflow/dags/
# Test DAG configuration
python ~/airflow/dags/airflow_dag.py# Start Airflow standalone mode
airflow standalone- Open browser and navigate to:
http://0.0.0.0:8080 - Search for
ml_pipeline_dagin the DAGs list - Click on the DAG and trigger execution
- Monitor workflow progress through Airflow UI
python main.pypython app.pyAccess at: http://localhost:5000
airflow dags trigger ml_pipeline_dagdocker build -t ml-pipeline .
docker run -p 5000:5000 ml-pipelineGitHub Actions workflow automates:
- Checkout - Retrieves code from repository
- Build - Creates Docker images from application code
- Scan - Performs security vulnerability scans with Trivy
- Deliver - Pushes validated images to DockerHub registry
The pipeline triggers automatically on push to main branch.
# Initialize Minikube cluster
minikube start# Deploy application using deployment manifest
kubectl apply -f deployment/deployment.yaml
# Check pod status
kubectl get pods# Apply service configuration
kubectl apply -f deployment/service.yaml
# Verify service deployment
kubectl get svc# Forward service port to local machine
kubectl port-forward svc/ml-pipeline-service 8000:80
# Access application in browser at:
# http://localhost:8000# Edit service configuration
kubectl edit svc ml-pipeline-service
# Change service type from NodePort to LoadBalancer
# Press ESC, type :wq! and press ENTER to save
# Open new terminal and create tunnel
minikube tunnel
# In original terminal, check for external IP
kubectl get svc
# Access application using external IP (127.0.0.1) in browser# Deploy ingress configuration
kubectl apply -f deployment/ingress.yaml
# Install Ingress Controller (nginx)
minikube addons enable ingress
# Check Ingress Controller pods
kubectl get pods -A | grep nginx
# Check Ingress deployment and get address
kubectl get ingress
# Configure local DNS
sudo vim /etc/hostsAdd the following lines to /etc/hosts:
127.0.0.1 localhost
192.168.49.2 ml-pipeline.example.com
Press ESC, type :wq! and press ENTER to save.
# Verify configuration
ping ml-pipeline.example.com
# Access application via domain
# http://ml-pipeline.example.comkubectl apply -f deployment/deployment.yaml
kubectl apply -f deployment/service.yaml
kubectl apply -f deployment/ingress.yaml- Algorithm: Gradient Boosting Regressor
- Features: 12 input features + 10 engineered features
- Target: accident_risk (continuous)
- Metrics: MAE, RMSE, R2 Score
- road_type, num_lanes, curvature, speed_limit
- lighting, weather, road_signs_present, public_road
- time_of_day, holiday, school_season, num_reported_accidents
POST /api/predict
Content-Type: application/json
{
"road_type": "highway",
"speed_limit": 60,
"num_lanes": 2,
...
}/- Home page/predict- Prediction form/dashboard- Monitoring dashboard/reports/drift- Data drift report/reports/performance- Performance metrics
Access Evidently AI reports:
- Data Drift Analysis
- Feature Distribution Changes
- Model Performance Metrics
- MLflow Experiments: https://dagshub.com/abheshith7/ML-Pipeline-Evidently.mlflow
Edit config/config.yaml for pipeline settings:
- Artifact paths
- Model parameters
- Data sources
- Fork the repository
- Create feature branch
- Commit changes
- Push to branch
- Create pull request