# Emergency Department Model Evaluation

**Author:** Suk Jin Mun  
**Course:** DS 5110, Fall 2025  
**Date:** November 10, 2025

This notebook evaluates the trained classification and regression models for Emergency Department analysis.

In [None]:
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import (
    confusion_matrix, classification_report, roc_curve, auc,
    r2_score, mean_squared_error, mean_absolute_error
)

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)

print('Imports successful!')

## 1. Load Trained Models

In [None]:
# Load classification models
with open('../trained_models/esi_logistic.pkl', 'rb') as f:
    clf_logistic = pickle.load(f)

with open('../trained_models/esi_lda.pkl', 'rb') as f:
    clf_lda = pickle.load(f)

with open('../trained_models/esi_naive_bayes.pkl', 'rb') as f:
    clf_nb = pickle.load(f)

# Load regression models
with open('../trained_models/wait_time_predictor.pkl', 'rb') as f:
    wait_time_model = pickle.load(f)

with open('../trained_models/volume_predictor.pkl', 'rb') as f:
    volume_model = pickle.load(f)

print('All models loaded successfully!')
print(f'\nWait time model features: {len(wait_time_model["feature_names"])}')

## 2. Model Performance Summary

### Classification Models (ESI Prediction)

**Results:**
- **Logistic Regression:** 54.84% accuracy
- **Linear Discriminant Analysis:** 54.84% accuracy  
- **Naive Bayes:** 46.98% accuracy

**Key Findings:**
- Models show moderate performance on 5-class ESI prediction
- Strong bias toward ESI level 3 (55% of samples)
- Class imbalance affects prediction of rare ESI levels 1 and 5
- ESI level 1 (most critical): 0% precision/recall - requires improvement

### Regression Models

**Wait Time Prediction:**
- **R² Score:** 0.8146 (explains 81.46% of variance)
- **RMSE:** 14.05 minutes
- **MAE:** 11.19 minutes
- Mean actual wait time: 78.0 minutes
- Relative error: ~14% (good performance)

**Patient Volume Prediction (Poisson GLM):**
- **RMSE:** 0.84 patients/hour
- **MAE:** 0.66 patients/hour
- Mean volume: 1.58 patients/hour
- Relative error: ~53% (needs improvement)

## 3. Feature Importance Analysis

### Wait Time Regression - Top Predictors

From statsmodels OLS summary:
- **ESI Level (x1):** Coefficient = 29.66, p < 0.001 (highly significant)
  - Higher ESI (less urgent) → longer wait times
- **Feature x3:** Coefficient = 0.41, p = 0.047 (significant)
- **Other features:** Not statistically significant (p > 0.05)

**Model Statistics:**
- Adjusted R² = 0.826 (excellent fit)
- F-statistic: 1738 (highly significant model)
- Durbin-Watson: 2.033 (no autocorrelation)

## 4. Recommendations for Improvement

### Classification Models:
1. **Address class imbalance:**
   - Use SMOTE (Synthetic Minority Over-sampling Technique) for ESI levels 1, 2, 5
   - Apply class weights in model training
   - Consider stratified k-fold cross-validation

2. **Feature engineering:**
   - Add interaction terms (e.g., age × vital signs)
   - Include chief complaint embeddings or clustering
   - Add historical patient data (prior ED visits)

3. **Alternative models:**
   - Random Forest (handles imbalance better)
   - Gradient Boosting (XGBoost, LightGBM)
   - Ensemble methods combining all models

### Regression Models:
1. **Wait time model:** Currently performs well, minor improvements:
   - Add ED occupancy features (current patient count)
   - Include staff availability metrics
   - Consider time-series lag features

2. **Volume model:** Needs improvement:
   - Add holiday indicators
   - Include weather data if available
   - Try negative binomial regression for overdispersion
   - Add seasonal decomposition features

## 5. Clinical Implications

### Wait Time Prediction:
- **Accuracy:** ±14 minutes prediction error is acceptable for ED planning
- **Use case:** Can inform patients of expected wait times at triage
- **Strong ESI correlation:** Validates clinical practice (higher acuity = faster treatment)

### ESI Classification:
- **Current limitation:** Cannot reliably detect ESI level 1 (most critical patients)
- **Safety concern:** False negatives for level 1 patients are unacceptable
- **Recommendation:** Use model only as decision support, not replacement for triage nurses

### Volume Forecasting:
- **Current accuracy:** Moderate, suitable for rough staffing estimates
- **Improvement needed:** Better prediction crucial for shift planning
- **Next steps:** Incorporate external factors (weather, local events, flu season)

## Next Steps

1. Create confusion matrix visualizations
2. Generate ROC curves for binary classification tasks
3. Plot residual diagnostics for regression models
4. Implement SMOTE and retrain classification models
5. Add model prediction API endpoints to Flask backend
6. Create interactive dashboard for model outputs