# Modeling & Evaluation

This notebook trains and evaluates machine learning models for predicting **patient appointment no-shows**.

Focus:
- Strong baseline models
- Healthcare-relevant metrics
- Clear interpretation of results

In [ ]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    classification_report,
    confusion_matrix,
    roc_auc_score,
    RocCurveDisplay
)

import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")

## Load Engineered Dataset

We recreate the engineered features exactly as defined in `02_feature_engineering.ipynb`.

In [ ]:
# Load raw data
df = pd.read_csv("../data/raw/KaggleV2-May-2016.csv")

# Target
df['no_show'] = df['No-show'].map({'No': 0, 'Yes': 1})

# Dates
df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])
df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])

df['days_between'] = (df['AppointmentDay'] - df['ScheduledDay']).dt.days
df['appointment_weekday'] = df['AppointmentDay'].dt.weekday
df['is_weekend'] = df['appointment_weekday'].isin([5, 6]).astype(int)

# Demographics
df['gender'] = df['Gender'].map({'F': 0, 'M': 1})
df.loc[df['Age'] < 0, 'Age'] = np.nan

# Neighborhood frequency encoding
neighborhood_freq = df['Neighbourhood'].value_counts(normalize=True)
df['neighborhood_freq'] = df['Neighbourhood'].map(neighborhood_freq)

feature_cols = [
    'Age', 'gender', 'days_between', 'appointment_weekday', 'is_weekend',
    'SMS_received', 'Hypertension', 'Diabetes', 'Alcoholism', 'Handcap',
    'neighborhood_freq'
]

X = df[feature_cols]
y = df['no_show']

X.head()

## Train / Validation Split

- Stratified split to preserve no-show ratio
- 80% train / 20% validation

In [ ]:
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print("Train target distribution:\n", y_train.value_counts(normalize=True))
print("Validation target distribution:\n", y_val.value_counts(normalize=True))

## Baseline Model â€” Logistic Regression

Why Logistic Regression?
- Interpretable
- Strong baseline
- Works well with structured healthcare data

In [ ]:
# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

# Logistic Regression with class weight
log_reg = LogisticRegression(max_iter=1000, class_weight='balanced')
log_reg.fit(X_train_scaled, y_train)

y_pred_lr = log_reg.predict(X_val_scaled)
y_prob_lr = log_reg.predict_proba(X_val_scaled)[:, 1]

In [ ]:
print("Logistic Regression Classification Report")
print(classification_report(y_val, y_pred_lr))

print("ROC-AUC:", roc_auc_score(y_val, y_prob_lr))

In [ ]:
# Confusion Matrix
cm = confusion_matrix(y_val, y_pred_lr)

plt.figure(figsize=(5,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Logistic Regression Confusion Matrix")
plt.show()

## Random Forest Model

Why Random Forest?
- Captures non-linear patterns
- Handles feature interactions
- Strong performance on tabular data

In [ ]:
rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=10,
    random_state=42,
    class_weight='balanced'
)

rf.fit(X_train, y_train)

y_pred_rf = rf.predict(X_val)
y_prob_rf = rf.predict_proba(X_val)[:, 1]

In [ ]:
print("Random Forest Classification Report")
print(classification_report(y_val, y_pred_rf))

print("ROC-AUC:", roc_auc_score(y_val, y_prob_rf))

In [ ]:
# Confusion Matrix
cm = confusion_matrix(y_val, y_pred_rf)

plt.figure(figsize=(5,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Greens')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Random Forest Confusion Matrix")
plt.show()

## ROC Curve Comparison

In [ ]:
RocCurveDisplay.from_predictions(y_val, y_prob_lr, name="Logistic Regression")
RocCurveDisplay.from_predictions(y_val, y_prob_rf, name="Random Forest")
plt.show()

# Healthcare Interpretation

- **Recall for no-show (class 1)** is critical
- False negatives = missed opportunity to intervene
- False positives = extra reminders (low cost)

**Therefore, recall > precision is acceptable.**

# Next Steps

1. Feature importance & SHAP explainability
2. Threshold tuning for recall optimization
3. Optional: XGBoost or Neural Network
4. FastAPI endpoint for real-time prediction