Skip to content

Keerthik1622/truck-delay-prediction

Repository files navigation

🚚 Truck Delay Prediction — End-to-End ML Pipeline

A production-grade machine learning pipeline that predicts truck shipment delays, built for deployment on Lightning.ai with a Flask REST API.


🏗️ Architecture

MySQL DB ──┐
           ├─→ ETL Pipeline ─→ Feature Engineering ─→ Model Training ─→ Flask API
Postgres ──┘                                           (RF / XGB / LGBM)

📁 Project Structure

truck_delay_ml/
├── config.yaml                  # Central config (no hardcoded values)
├── run_pipeline.py              # One command: ETL + Training
├── requirements.txt
├── .env.example                 # Secret template (never commit .env!)
│
├── ml_pipeline/
│   ├── etl/
│   │   ├── db_connector.py      # MySQL + PostgreSQL connections + mock data
│   │   ├── extractor.py         # Extract & merge from both DBs
│   │   ├── transformer.py       # Feature engineering & cleaning
│   │   └── loader.py            # Save/load parquet files
│   ├── modeling/
│   │   └── trainer.py           # Multi-model training + MLflow tracking
│   └── utils/
│       ├── config_loader.py     # YAML + env var loader
│       └── logger.py            # Rotating file + console logger
│
├── deployment/
│   └── flask_app.py             # REST API with /predict and /predict/batch
│
└── tests/
    └── test_pipeline.py         # pytest unit tests

🚀 Quick Start on Lightning.ai

1. Clone & setup

git clone https://github.com/YOUR_USERNAME/truck_delay_ml.git
cd truck_delay_ml
pip install -r requirements.txt

2. Configure environment

cp .env.example .env
# Edit .env with your DB credentials
# Or set MOCK_DATA=true to skip DB and use synthetic data

3. Run the full pipeline

# With mock data (no database needed):
python run_pipeline.py --mock

# With real databases:
python run_pipeline.py

4. Start the Flask API

python deployment/flask_app.py

5. Test the API

curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "distance_km": 850,
    "truck_type": "Large",
    "truck_age_years": 9,
    "driver_experience": 2,
    "cargo_weight_kg": 15000,
    "weather_condition": "Rain",
    "route_type": "Rural",
    "traffic_index": 0.85,
    "road_quality": "Poor",
    "num_stops": 4
  }'

Expected response:

{
  "prediction": 1,
  "label": "Delayed",
  "probability": 0.8231,
  "confidence": "82.3%",
  "risk_level": "High"
}

🧪 Run Tests

pytest tests/ -v

📊 API Endpoints

Method Endpoint Description
GET / Health check
POST /predict Single prediction
POST /predict/batch Batch predictions (max 1000)
GET /model/info Feature list & model type
POST /reload Hot-reload model after retraining

🔬 Models Compared

Model CV F1 Notes
Random Forest ~0.84 Robust, good baseline
XGBoost ~0.86 Fast, handles missing values
LightGBM ~0.87 Best — used in production

✨ Key Features

  • No hardcoded values — everything in config.yaml
  • MLflow experiment tracking — compare all runs visually
  • Mock data mode — test the full pipeline without any database
  • Production Flask API/predict and /predict/batch endpoints
  • Automatic logging — predictions logged to CSV for monitoring
  • Unit testspytest coverage for all pipeline stages

🛠️ Tech Stack

Python · scikit-learn · XGBoost · LightGBM · MLflow · Flask · SQLAlchemy · pandas · pytest

About

End-to-end Machine Learning pipeline for Truck Delay Prediction using XGBoost, Flask API, MLflow, and Lightning AI deployment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages