# 🚢 ETA Prediction Model: Live Training Demo

### Overview
This notebook demonstrates the end-to-end training pipeline for the **Predictive ETA System**.
We will showing real-time connectivity to **Snowflake**, perform **Data Quality Checks**, train the **XGBoost Model**, and generate **Context-Aware Predictions**.

### Architecture
- **Source**: Snowflake (`HACKATHON.DT_INGESTION`)
- **Validation**: Automated Quality Gate
- **Model**: XGBoost Regressor (Features: Route, Mode, Risk Severity)
- **Target**: `Actual_Duration_Hours`

---

## 1. Environment Setup & Snowflake Connection
Initialize the backend engines and connect to the secure Snowflake Data Warehouse.

In [None]:
import sys
import os
import pandas as pd
sys.path.append(os.getcwd())  # Add root to path

from backend.data_loader import DataLoader
from backend.model_engine import ETAModel
from backend.quality_check import QualityCheck

# Initialize Loader
print("Initializing Data Loader...")
loader = DataLoader()
print("✅ Connected to Snowflake.")

## 2. Data Ingestion & Transformation
Fetch live data from Snowflake tables (`DIM_LANE`, `DIM_VEHICLE`, `FACT_TRIP`, `FACT_EXT_CONDITIONS`) and perform in-memory joins to create the Training View.

In [None]:
print("Fetching and Joining Data Tables...")
df_training = loader.get_training_view()

print(f"\nSuccessfully Loaded {len(df_training)} Shipment Records.")
print("Sample Data (First 3 Rows):")
display(df_training[['TID', 'PolCode', 'PodCode', 'VNm', 'Actual_Duration_Hours', 'Severity_Score']].head(3))

## 3. Data Quality Gate 🛡️
Before training, the system runs automated checks to ensure data integrity:
- **Completeness**: Critical columns must not be Null.
- **Logic**: Arrival Time must be after Departure Time.
- **Volume**: Minimum row count requirement.

In [None]:
print("Running Quality Gate...")
dq_result = QualityCheck.run_checks(df_training)

if dq_result['passed']:
    print("✅ QA PASSED: Data is valid for training.")
    print(f"Metrics: {dq_result['metrics']}")
else:
    print(f"❌ QA FAILED: {dq_result['reason']}")

## 4. Model Training (XGBoost)
Train the gradient boosting model on the verified dataset. The model learns to predict travel duration based on **Route** and **Risk Severity**.

In [None]:
print("Starting Model Training...")
model_engine = ETAModel()
train_result = model_engine.train()

print("Training Complete.")
print(f"Result: {train_result}")

## 5. Live Prediction Demo
Let's test the model with a sample route to see the **ETA** and the **Context-Aware Explanation**.

In [None]:
# Test Case 1: Standard Route
print("--- Prediction 1: Route without major risks ---")
pred1 = model_engine.predict("CNDLC", "ARBUE", "OCEAN")
print(f"ETA Date: {pred1['eta_date']}")
print(f"Explanation: {pred1['explanation']}")

# Test Case 2: Risk Route (Simulated High Severity if data allows)
# We pick a route that had high severity in training
print("\n--- Prediction 2: Route with valid Risk Factors ---")
sample_high_risk = df_training[df_training['Severity_Score'] > 5].head(1)
if not sample_high_risk.empty:
    r = sample_high_risk.iloc[0]
    pred2 = model_engine.predict(r['PolCode'], r['PodCode'], r['ModeOfTransport'])
    print(f"Route: {r['PolCode']} -> {r['PodCode']}")
    print(f"ETA Date: {pred2['eta_date']}")
    print(f"Explanation: {pred2['explanation']}")
else:
    print("No high risk routes found in current sample for demo.")