# ETA Prediction Model Evaluation

## Objective
In this notebook, we will analyze our trained **ETA (Estimated Time of Arrival)** prediction model.
we want to understand how well our model is predicting the ride time.

### Key Concepts:
- **ETA**: How much time the ride will take (in minutes).
- **MAE (Mean Absolute Error)**: On average, how many minutes our prediction is wrong.
- **RMSE (Root Mean Squared Error)**: Penalizes large errors more. Helps us see if there are any big mistakes.

---

In [None]:
# Step 2: Import Libraries
# First, import madbeku (we need to import) necessary libraries

import pandas as pd  # Data handle madoke
import numpy as np   # Math calculations ge
import matplotlib.pyplot as plt  # Graphs ge
import seaborn as sns  # Advanced graphs ge
from sklearn.metrics import mean_absolute_error, mean_squared_error # Error calculate madoke
import joblib # Model load madoke
import sys
import os

# Add project root to path so we can import src modules
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

# Import custom preprocessing functions
from src.features.temporal import extract_temporal_features
from src.features.encoders import encode_vehicle_type

# Set style for graphs
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

In [None]:
# Step 3: Load Dataset
# Data load madona (Let's load data)

data_path = '../data/raw/rides.csv'
df = pd.read_csv(data_path)

# Show first 5 rows
print("First 5 rows of our dataset:")
display(df.head())

# Let's take a small sample for testing if dataset is huge
# Namma test madoke last 20% data togonyana (Let's take last 20% for testing)
test_size = int(len(df) * 0.2)
test_df = df.tail(test_size).copy()

print(f"\nTotal Data: {len(df)}")
print(f"Test Data used for analysis: {len(test_df)}")

In [None]:
# Step 4: Load Trained ETA Model
# Model load madona from models folder

model_path = '../models/saved/eta_lgbm.pkl'
try:
    model = joblib.load(model_path)
    print("‚úÖ Model loaded successfully!")
    print("Model type:", type(model))
except FileNotFoundError:
    print("‚ùå Model file not found. Please check path.")

In [None]:
# Step 5: Feature Preparation
# Preprocessing madbeku (Need to preprocess data exactly like training)

# 1. Extract Temporal Features (Hour, Day, etc.)
test_df = extract_temporal_features(test_df)

# 2. Encode Vehicle Type
test_df = encode_vehicle_type(test_df, method='label')

# Define features used in training (MUST BE SAME)
feature_cols = [
    'trip_distance',
    'hour',
    'day_of_week',
    'is_rush_hour',
    'is_weekend',
    'is_morning_rush',
    'is_evening_rush',
    'is_late_night',
    'vehicle_encoded'
]

target_col = 'trip_duration'

print(f"Target Column (y): {target_col}")

# Prepare X and y
X_test = test_df[feature_cols].copy()
y_test = test_df[target_col].copy()

print("Features (X) columns:")
print(X_test.columns.tolist())

In [None]:
# Step 6: Prediction
# Model use madi predict madona

y_pred = model.predict(X_test)

# Store predictions in dataframe for comparison
results_df = test_df.copy()
results_df['Predicted_ETA'] = y_pred
results_df['Error'] = results_df[target_col] - results_df['Predicted_ETA']

display(results_df[[target_col, 'Predicted_ETA', 'Error']].head())

In [None]:
# Step 7: Evaluation Metrics
# Calculate MAE and RMSE

mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print("---------------------------------------")
print("üìä MODEL PERFORMANCE REPORT")
print("---------------------------------------")
print(f"Mean Absolute Error (MAE): {mae:.4f} minutes")
print(f"Root Mean Squared Error (RMSE): {rmse:.4f} minutes")
print("---------------------------------------")

### Step 8: PICTORIAL VISUALIZATION
Visuals nodona (Let's see graphs) to understand better.

In [None]:
# Graph 1: Actual vs Predicted Scatter Plot

plt.figure(figsize=(8, 6))
sns.scatterplot(x=y_test, y=y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2) # Perfect prediction line
plt.xlabel('Actual ETA (min)')
plt.ylabel('Predicted ETA (min)')
plt.title('Actual vs Predicted ETA')
plt.show()

In [None]:
# Graph 2: Error Distribution
# Error hege distribute agide anta nodona

plt.figure(figsize=(8, 6))
sns.histplot(results_df['Error'], bins=30, kde=True, color='purple')
plt.title('Error Distribution (Residuals)')
plt.xlabel('Error (Minutes)')
plt.show()

In [None]:
# Graph 3: Line Plot for first 50 samples
# Compare madoke first 50 rides to goli

sample_data = results_df.head(50)

plt.figure(figsize=(12, 6))
plt.plot(sample_data.index, sample_data[target_col], label='Actual ETA', marker='o')
plt.plot(sample_data.index, sample_data['Predicted_ETA'], label='Predicted ETA', marker='x', linestyle='--')
plt.title('Actual vs Predicted (First 50 Rides)')
plt.ylabel('Time (Minutes)')
plt.legend()
plt.show()

## Step 9: Interpretation

**1. What does MAE mean?**
- MAE value indicates average error.
- Example: If MAE is **0.72 minutes**, it means our model's prediction is usually off by just under a minute.

**2. What does RMSE tell?**
- RMSE is usually higher than MAE.
- If RMSE is very high compared to MAE, it means there are some **outliers** (some very bad predictions).

**3. Is model performing well?**
- If MAE is low (e.g., < 1 min), model is performing Excellent!
- Check the **Scatter Plot**. If dots are close to the red line, it is Good!

## Step 10: Conclusion

We have successfully analyzed the ETA model.
- Loaded test data & trained model.
- Calculated accuracy metrics (MAE: ~0.72 min).
- Visualized errors.

This helps in deciding if we can deploy this model to production app.