ML Pipeline with Evidently AI Monitoring

End-to-end machine learning pipeline for road accident risk prediction with comprehensive monitoring, deployment, and CI/CD automation.

Project Overview

This project implements a complete MLOps pipeline for predicting road accident risk using Gradient Boosting Regressor. The pipeline includes data ingestion, validation, transformation, feature engineering, model training, evaluation, and monitoring with Evidently AI.

Architecture

graph LR
    A[Data Ingestion] --> B[Data Validation]
    B --> C[Feature Engineering]
    C --> D[Data Transformation]
    D --> E[Model Training]
    E --> F[Model Evaluation]
    F --> G[Monitoring]
    G --> H[MLflow Tracking]
    I[main.py] --> A
    J[airflow_dag.py] --> A
    K[app.py] --> L[Flask Web App]
    E --> L

Pipeline Stages

Stage 1: Data Ingestion

Loads raw dataset from source
Splits data into train and test sets
Stores in artifacts directory

Stage 2: Data Validation

Validates schema and data types
Checks for missing values
Ensures data quality

Stage 3: Feature Engineering

Creates interaction features (lanes_speed, curvature_speed)
Generates risk indicators (high_speed, few_lanes, no_signs)
Builds categorical features (speed_category, curvature_category)

Stage 4: Data Transformation

Encodes categorical features using LabelEncoder
Scales numerical features using StandardScaler
Prepares data for model training

Stage 5: Model Training

Trains Gradient Boosting Regressor
Performs hyperparameter tuning
Saves trained model to artifacts

Stage 6: Model Evaluation

Evaluates model performance (MAE, RMSE, R2 Score)
Logs metrics to MLflow
Stores evaluation results

Stage 7: Monitoring

Generates data drift reports using Evidently AI
Tracks model performance over time
Creates interactive HTML dashboards

Technologies Used

ML Framework: scikit-learn
Monitoring: Evidently AI
Experiment Tracking: MLflow, DagsHub
Orchestration: Apache Airflow
Web Framework: Flask
Containerization: Docker
CI/CD: GitHub Actions
Deployment: Kubernetes

Project Structure

ML_PipeLine_Evidently/
├── src/heartpipeline/
│   ├── components/          # Pipeline components
│   ├── pipeline/            # Stage implementations
│   ├── config/              # Configuration management
│   ├── entity/              # Entity definitions
│   └── utils/               # Utility functions
├── config/                  # YAML configurations
├── artifacts/               # Generated artifacts
├── monitoring/              # Evidently monitoring
├── deployment/              # Kubernetes manifests
├── templates/               # Flask HTML templates
├── static/                  # CSS/JS/Images
├── .github/workflows/       # CI/CD pipelines
├── app.py                   # Flask web application
├── main.py                  # Pipeline executor
├── airflow_dag.py          # Airflow DAG definition
├── Dockerfile              # Container image
└── requirements.txt        # Python dependencies

Features

Web Application (app.py)

Interactive prediction interface
Real-time risk assessment (Low/Medium/High)
Monitoring dashboard with drift reports
REST API endpoints for predictions

Orchestration (airflow_dag.py)

Automated pipeline execution
Task dependencies and scheduling
Failure handling and retries

Monitoring (Evidently AI)

Data drift detection
Model performance tracking
Feature-level analysis
Interactive visualizations

MLflow Integration

Experiment tracking
Model versioning
Metric logging
DagsHub remote tracking

Installation

Clone Repository

git clone https://github.com/Abeshith/ML_PipeLine_Evidently.git
cd ML_PipeLine_Evidently

Environment Setup (Linux/WSL)

1. Copy Project to Linux Environment (Windows Users)

cp -r /mnt/d/ML\ PipeLine\ \(Evidently\) ~/
cd ~/ML\ PipeLine\ \(Evidently\)

2. Install Python and Setup Virtual Environment

# Install Python3
sudo apt update
sudo apt install python3 python3-pip python3-venv

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configure Apache Airflow

# Set Airflow home directory
export AIRFLOW_HOME=~/airflow
echo $AIRFLOW_HOME

# Initialize Airflow database
airflow db init

# Configure authentication manager
vim ~/airflow/airflow.cfg

Edit the configuration file:

Press i to enter insert mode
Replace auth_manager = airflow.api.fastapi.auth.managers.simple.simple_auth_manager.SimpleAuthManager
With auth_manager=airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
Press ESC then type :wq! and press ENTER

# Create DAGs directory
mkdir -p ~/airflow/dags

# Copy DAG file to Airflow directory
cp airflow_dag.py ~/airflow/dags/

# Test DAG configuration
python ~/airflow/dags/airflow_dag.py

4. Launch Airflow Webserver

# Start Airflow standalone mode
airflow standalone

5. Execute Pipeline via Airflow UI

Open browser and navigate to: http://0.0.0.0:8080
Search for ml_pipeline_dag in the DAGs list
Click on the DAG and trigger execution
Monitor workflow progress through Airflow UI

Usage

Run Complete Pipeline

python main.py

Run Flask Web App

python app.py

Access at: http://localhost:5000

Run with Airflow

airflow dags trigger ml_pipeline_dag

Run with Docker

docker build -t ml-pipeline .
docker run -p 5000:5000 ml-pipeline

CI/CD Pipeline

GitHub Actions workflow automates:

Checkout - Retrieves code from repository
Build - Creates Docker images from application code
Scan - Performs security vulnerability scans with Trivy
Deliver - Pushes validated images to DockerHub registry

The pipeline triggers automatically on push to main branch.

Kubernetes Deployment

Local Deployment with Minikube

1. Start Minikube Cluster

# Initialize Minikube cluster
minikube start

2. Deploy Application

# Deploy application using deployment manifest
kubectl apply -f deployment/deployment.yaml

# Check pod status
kubectl get pods

3. Deploy Services

# Apply service configuration
kubectl apply -f deployment/service.yaml

# Verify service deployment
kubectl get svc

4. Access Application - Method 1 (Port Forwarding)

# Forward service port to local machine
kubectl port-forward svc/ml-pipeline-service 8000:80

# Access application in browser at:
# http://localhost:8000

5. Access Application - Method 2 (Load Balancer)

# Edit service configuration
kubectl edit svc ml-pipeline-service

# Change service type from NodePort to LoadBalancer
# Press ESC, type :wq! and press ENTER to save

# Open new terminal and create tunnel
minikube tunnel

# In original terminal, check for external IP
kubectl get svc

# Access application using external IP (127.0.0.1) in browser

6. Configure Ingress (Optional)

# Deploy ingress configuration
kubectl apply -f deployment/ingress.yaml

# Install Ingress Controller (nginx)
minikube addons enable ingress

# Check Ingress Controller pods
kubectl get pods -A | grep nginx

# Check Ingress deployment and get address
kubectl get ingress

# Configure local DNS
sudo vim /etc/hosts

Add the following lines to /etc/hosts:

127.0.0.1       localhost
192.168.49.2    ml-pipeline.example.com

Press ESC, type :wq! and press ENTER to save.

# Verify configuration
ping ml-pipeline.example.com

# Access application via domain
# http://ml-pipeline.example.com

Production Deployment

kubectl apply -f deployment/deployment.yaml
kubectl apply -f deployment/service.yaml
kubectl apply -f deployment/ingress.yaml

Model Details

Algorithm: Gradient Boosting Regressor
Features: 12 input features + 10 engineered features
Target: accident_risk (continuous)
Metrics: MAE, RMSE, R2 Score

Input Features

road_type, num_lanes, curvature, speed_limit
lighting, weather, road_signs_present, public_road
time_of_day, holiday, school_season, num_reported_accidents

API Endpoints

Prediction

POST /api/predict
Content-Type: application/json

{
  "road_type": "highway",
  "speed_limit": 60,
  "num_lanes": 2,
  ...
}

Dashboard

/ - Home page
/predict - Prediction form
/dashboard - Monitoring dashboard
/reports/drift - Data drift report
/reports/performance - Performance metrics

Monitoring Dashboard

Access Evidently AI reports:

Data Drift Analysis
Feature Distribution Changes
Model Performance Metrics
MLflow Experiments: https://dagshub.com/abheshith7/ML-Pipeline-Evidently.mlflow

Configuration

Edit config/config.yaml for pipeline settings:

Artifact paths
Model parameters
Data sources

Contributing

Fork the repository
Create feature branch
Commit changes
Push to branch
Create pull request

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
artifacts		artifacts
config		config
deployment		deployment
monitoring		monitoring
src/heartpipeline		src/heartpipeline
static/css		static/css
templates		templates
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
airflow_dag.py		airflow_dag.py
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

ML Pipeline with Evidently AI Monitoring

Project Overview

Architecture

Pipeline Stages

Stage 1: Data Ingestion

Stage 2: Data Validation

Stage 3: Feature Engineering

Stage 4: Data Transformation

Stage 5: Model Training

Stage 6: Model Evaluation

Stage 7: Monitoring

Technologies Used

Project Structure

Features

Web Application (app.py)

Orchestration (airflow_dag.py)

Monitoring (Evidently AI)

MLflow Integration

Installation

Clone Repository

Environment Setup (Linux/WSL)

1. Copy Project to Linux Environment (Windows Users)

2. Install Python and Setup Virtual Environment

3. Configure Apache Airflow

4. Launch Airflow Webserver

5. Execute Pipeline via Airflow UI

Usage

Run Complete Pipeline

Run Flask Web App

Run with Airflow

Run with Docker

CI/CD Pipeline

Kubernetes Deployment

Local Deployment with Minikube

1. Start Minikube Cluster

2. Deploy Application

3. Deploy Services

4. Access Application - Method 1 (Port Forwarding)

5. Access Application - Method 2 (Load Balancer)

6. Configure Ingress (Optional)

Production Deployment

Model Details

Input Features

API Endpoints

Prediction

Dashboard

Monitoring Dashboard

Configuration

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages