# VRP Validation Pipeline Results

This notebook runs the full end-to-end validation pipeline for the VRP project. It covers:
1.  **Synthetic Data Generation**: Creates realistic datasets for customers, vehicles, and historical deliveries.
2.  **ML Model Training**: Trains and validates models to predict travel time and customer availability.
3.  **Optimization**: Formulates and solves the VRP using the ML predictions.
4.  **Analysis & Reporting**: Generates a final report with metrics, visualizations, and recommendations.

In [1]:
import os
import sys
import yaml
import pandas as pd
from IPython.display import display, Markdown

# Add src to path to allow imports
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
src_path = os.path.join(project_root, 'src')
if src_path not in sys.path:
    sys.path.insert(0, src_path)

from data_generator import SyntheticDataGenerator
from ml_pipeline import VRPPredictor
from optimization_model import VRPOptimizer
from analysis_dashboard import ValidationDashboard

# Load config
config_path = os.path.join(project_root, 'config', 'validation_config.yaml')
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

print("Setup complete. All modules and configuration loaded.")

Setup complete. All modules and configuration loaded.


## Step 1: Generate Synthetic Data

In [2]:
data_dir = os.path.join(project_root, 'data', 'synthetic')
os.makedirs(data_dir, exist_ok=True)

generator = SyntheticDataGenerator(config)
generator.generate_all(data_dir)

print(f"Data generated in {data_dir}")

Generated 50 customers.
Generated 3 vehicles.
Generated 1457 historical deliveries.
Data generated in /app/vrp_validation/data/synthetic


## Step 2: Train and Validate ML Models

In [3]:
historical_data_path = os.path.join(data_dir, 'historical_deliveries.csv')
historical_data = pd.read_csv(historical_data_path)

predictor = VRPPredictor(config)
ml_metrics = predictor.train_and_validate(historical_data)


Training travel time prediction model...


Training customer availability prediction model...


Models trained successfully.

Validating models...
Validation complete. Metrics:
- travel_time_mae: 3.6296
- travel_time_r2: 0.9235
- availability_auc: 0.5282
- availability_accuracy: 0.5205


## Step 3: Formulate and Solve the VRP

In [4]:
customers = pd.read_csv(os.path.join(data_dir, 'customers.csv'))
vehicles = pd.read_csv(os.path.join(data_dir, 'vehicles.csv'))

# Use a smaller subset of customers for the optimization run to ensure it's fast
customers_subset = customers.head(10)

optimizer = VRPOptimizer(config)
optimizer.formulate_milp(customers_subset, vehicles, predictor)
opt_metrics = optimizer.solve()

Formulating MILP model...


MILP model formulated.

Solving with PULP_CBC_CMD...
Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /home/jules/.pyenv/versions/3.12.11/lib/python3.12/site-packages/pulp/apis/../solverdir/cbc/linux/i64/cbc /tmp/dca4faa254d443f48b131531b110066e-pulp.mps -sec 600 -timeMode elapsed -branch -printingOptions all -solution /tmp/dca4faa254d443f48b131531b110066e-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 407 COLUMNS
At line 3851 RHS
At line 4254 BOUNDS
At line 4615 ENDATA
Problem MODEL has 402 rows, 374 columns and 2333 elements
Coin0008I MODEL read with 0 errors
seconds was changed from 1e+100 to 600
Option for timeMode changed from cpu to elapsed
Continuous objective value is 682.38 - 0.00 seconds
Cgl0000I Cut generators found to be infeasible! (or unbounded)
Pre-processing says infeasible or unbounded
Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.01   (Wallclock secon

## Step 4: Generate Analysis Dashboard

In [5]:
dashboard = ValidationDashboard(
    config=config,
    ml_metrics=ml_metrics,
    opt_metrics=opt_metrics,
    ml_predictor=predictor,
    customers=customers_subset,
    solution_routes=None # In a real scenario, we would extract and pass this
)

report_path = dashboard.generate_full_report()


Full report generated at: /app/vrp_validation/data/results/validation_report.md


## Step 5: Display Final Report

In [6]:
with open(report_path, 'r') as f:
    report_content = f.read()

display(Markdown(report_content))

# VRP Validation Report

## 1. Executive Summary
WARNING: The pipeline ran successfully, but the optimization problem was INFEASIBLE. This suggests conflicting constraints (e.g., time windows, capacity).
- High R^2 score for travel time prediction (>0.9) suggests the synthetic data might be too predictable.
- Low AUC score for availability prediction (0.528) indicates the model is not learning effectively. This is a RED FLAG.

## 2. ML Performance Analysis
- travel_time_mae: 3.6296
- travel_time_r2: 0.9235
- availability_auc: 0.5282
- availability_accuracy: 0.5205
![ML Performance](ml_performance.png)
![Feature Importance](feature_importance.png)

## 3. Optimization Performance Analysis
- solver_status: Infeasible
- total_cost: 682.379842837847

## 4. Recommendations
- **Proceed with Caution**. The pipeline is technically functional, but the results show significant weaknesses.
- **Revisit Data Generation**: The availability model's failure and the travel model's high performance suggest the synthetic data needs more complexity and noise.
- **Feature Engineering**: More complex features are needed for the availability model.
- **Constraint Analysis**: The 'Infeasible' result requires analyzing which constraints are too tight.