## Model Evaluation Checklist
- *Document Purpose*: Standardized rubric for evaluating model with consistency and against project goals.
- *Goal*: Exact output replication, model accuracy when applied to public_cases
- *Checklist Objectives*:
-   Exact match reimbursement outputs(model outcomes compared to expected outcomes from public_cases.json).
-   Avoidance of over/under fitting.
-   Inclusion of all constructed metrics and their behavior with each model.

## Failure Types
- *Low Exact Match Rate*: model does not accurately reproduce the system outcomes.
- *Overfitting*: model memorized public dataset only.


## Model Evaluation Framework

| Evaluation Area | Purpose |
|---|---|
| **Data Integrity** | Ensure inputs are valid and consistent |
| **Feature Behavior** | Ensure engineered features capture known relationships observed in the legacy system |
| **Core Metrics** | Ensure the model matches the legacy system's output behavior and statistical characteristics |
| **Slice Analysis** | Detect hidden threshold breakpoints where the legacy logic changes behavior |
| **Rounding & Quirks** | Capture and preserve edge-case artifacts, inconsistencies, and rounding patterns |
| **Generalization** | Avoid overfitting to the public dataset and ensure stability across unseen cases |
| **Explainability** | Ensure rules and behavior are interpretable, documented, and can be explained to stakeholders |
| **Final Success Gate** | Decide whether the model meets reproduction requirements and is ready to move to the next phase |

## Next steps
- *automate tests for model evaluation checklist*: once we have model output, I will create an automated version of this checklist to allow for the comparison of different models.






In [15]:
#====================================================
# Verify file paths to ensure correct path is used
#====================================================
import os

print("Working directory:", os.getcwd())
print("\nRoot files:", os.listdir())
try:
    print("\nData folder contents:", os.listdir("data"))
except:
    print("\n 'data' folder not found from current directory")


Working directory: C:\Users\ferna\OneDrive - East Carolina University\Documents\GitHub\BlackBox\Notebooks

Root files: ['.gitkeep', '01_EDA_Reimbursement (3).ipynb', '02_Feature_Engineering_and_Baseline_Model.ipynb', 'Feature Correlation and Visualization.ipynb', 'Model Evaluation Checklist.ipynb', 'Performance Summary.ipynb', 'week1_data_cleaning.ipynb']

ðŸ”´ 'data' folder not found from current directory


In [26]:
# ==========================================================
#  Model Comparison with 5-Fold Cross-Validation
# ==========================================================
# Author: Matthew Fernald
# Project: ACME Corp â€“ Reimbursement BlackBox
# ==========================================================

# ----------------------------------------------------------
# Step 0: Import Libraries
# ----------------------------------------------------------
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import KFold, cross_val_score

# ----------------------------------------------------------
# Step 1: Load Enhanced Dataset with Engineered Features
# ----------------------------------------------------------

combined_df = pd.read_csv("../data/phase2_features_baseline_models.csv", low_memory=False)
print(" Clean dataset with engineered features loaded successfully!")
print("Shape:", combined_df.shape)
display(combined_df.head(10))

# Immediately after loading combined_df verify the values for trip_duration_days
value_counts = combined_df["trip_duration_days"].value_counts().sort_index()
print("trip_duration_days counts:")
print(value_counts)

assert combined_df["trip_duration_days"].max() > 3, \
    "Expected some trips longer than 3 days"



# ----------------------------------------------------------
# Step 2: Define Features & Target
# ----------------------------------------------------------
features = [
    "trip_duration_days",
    "miles_traveled",
    "total_receipts_amount",
    "cost_per_day",
    "cost_per_mile",
    "miles_per_day",
    "cost_ratio"
]

target = "reimbursement"

X = combined_df[features]
y = combined_df[target]

print("\n Feature & Target Shapes:")
print("X:", X.shape, " | y:", y.shape)

# ----------------------------------------------------------
# Step 3: Configure 5-Fold Cross-Validation
# ----------------------------------------------------------
kfold = KFold(n_splits=5, shuffle=True, random_state=42)

models = {
    "Linear": LinearRegression(),
    "Ridge": Ridge(alpha=1.0),
    "Lasso": Lasso(alpha=0.01),
    "Polynomial (deg=2)": make_pipeline(
        PolynomialFeatures(degree=2, include_bias=False),
        LinearRegression()
    ),
}

cv_results = []

# ----------------------------------------------------------
# Step 4: Run Evaluation Across All Models
# ----------------------------------------------------------
for model_name, model in models.items():

    cv_r2 = cross_val_score(model, X, y, cv=kfold, scoring="r2")
    cv_mae = -cross_val_score(model, X, y, cv=kfold, scoring="neg_mean_absolute_error")
    cv_rmse = np.sqrt(-cross_val_score(model, X, y, cv=kfold, scoring="neg_mean_squared_error"))

    cv_results.append({
        "Model": model_name,
        "CV RÂ² (mean)": cv_r2.mean(),
        "CV RÂ² (std)": cv_r2.std(),
        "CV MAE (mean)": cv_mae.mean(),
        "CV MAE (std)": cv_mae.std(),
        "CV RMSE (mean)": cv_rmse.mean(),
        "CV RMSE (std)": cv_rmse.std(),
    })

cv_summary = pd.DataFrame(cv_results)

# ----------------------------------------------------------
# Step 5: Display Results
# ----------------------------------------------------------
print("\nðŸ“Š 5-Fold Cross-Validation Summary (All Models):")
display(
    cv_summary.style
        .background_gradient(cmap="YlGnBu")
        .format({
            "CV RÂ² (mean)": "{:.3f}",
            "CV RÂ² (std)": "{:.3f}",
            "CV MAE (mean)": "{:.3f}",
            "CV MAE (std)": "{:.3f}",
            "CV RMSE (mean)": "{:.3f}",
            "CV RMSE (std)": "{:.3f}",
        })
)


 Clean dataset with engineered features loaded successfully!
Shape: (1000, 9)


Unnamed: 0,trip_duration_days,miles_traveled,total_receipts_amount,reimbursement,dataset,cost_per_day,cost_per_mile,miles_per_day,cost_ratio
0,3,93.0,1.42,364.51,public,0.473333,0.015269,31.0,31.0
1,1,55.0,3.6,126.06,public,3.6,0.065455,55.0,55.0
2,1,47.0,17.97,128.91,public,17.97,0.38234,47.0,47.0
3,2,13.0,4.67,203.52,public,2.335,0.359231,6.5,6.5
4,3,88.0,5.78,380.37,public,1.926667,0.065682,29.333333,29.333333
5,1,76.0,13.74,158.35,public,13.74,0.180789,76.0,76.0
6,3,41.0,4.52,320.12,public,1.506667,0.110244,13.666667,13.666667
7,1,140.0,22.71,199.68,public,22.71,0.162214,140.0,140.0
8,3,121.0,21.17,464.07,public,7.056667,0.174959,40.333333,40.333333
9,3,117.0,21.99,359.1,public,7.33,0.187949,39.0,39.0


trip_duration_days counts:
trip_duration_days
1      92
2      59
3      83
4      67
5     112
6      62
7      67
8      82
9      71
10     63
11     68
12     74
13     46
14     54
Name: count, dtype: int64

 Feature & Target Shapes:
X: (1000, 7)  | y: (1000,)

ðŸ“Š 5-Fold Cross-Validation Summary (All Models):


Unnamed: 0,Model,CV RÂ² (mean),CV RÂ² (std),CV MAE (mean),CV MAE (std),CV RMSE (mean),CV RMSE (std)
0,Linear,0.786,0.01,174.639,6.337,216.222,7.635
1,Ridge,0.786,0.01,174.639,6.336,216.222,7.635
2,Lasso,0.786,0.01,174.639,6.337,216.222,7.635
3,Polynomial (deg=2),0.891,0.016,105.56,10.811,154.266,16.29


Use this to verify correct path for importing the features_baseline_models.csv

In [21]:
import os

for root, dirs, files in os.walk("../", topdown=True):
    for file in files:
        if "phase2_features_baseline_models" in file.lower():
            print(os.path.join(root, file))


../data\phase2_features_baseline_models.csv


NameError: name 'poly_model' is not defined