# Statistical Models vs Machine Learning Models Comparison
## Comprehensive MAE-based Performance Analysis

This notebook compares the performance of statistical forecasting models (StatsForecast) against machine learning models (MLForecast) using Mean Absolute Error (MAE) as the primary metric.

### Gray Zone Threshold:
- **5% threshold**: If the MAE difference between approaches is less than 5%, the difference is considered **not significant**
- This creates three categories: ML wins, Stats wins, and No Significant Difference

### Analysis Components:
1. **Data Loading and Merging**: Combine ML and Stats results by unique_id
2. **Best Model Frequency Analysis**: Which models are selected most often
3. **Comparison Levels**: Individual, State, Metric, and Overall comparisons
4. **Visualizations**: Comprehensive dashboards for each analysis level
5. **Export Results**: Excel files with detailed analysis

## Setup and Configuration

In [126]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
from matplotlib.gridspec import GridSpec
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print("✓ Libraries imported successfully")

✓ Libraries imported successfully


In [127]:
# Configuration and Style
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({
    'font.family': 'sans-serif',
    'font.sans-serif': ['Arial', 'DejaVu Sans'],
    'font.size': 11,
    'axes.titlesize': 13,
    'axes.titleweight': 'bold',
    'axes.labelsize': 11,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10,
    'legend.fontsize': 10,
    'figure.titlesize': 16,
    'figure.titleweight': 'bold',
    'axes.spines.top': False,
    'axes.spines.right': False,
})

# Display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

# Gray zone threshold
GRAY_ZONE_THRESHOLD = 5.0  # If MAE difference is less than 5%, it's not significant

# Color schemes
ML_COLOR = '#1f77b4'      # Blue for ML
STATS_COLOR = '#ff7f0e'   # Orange for Stats
GRAY_COLOR = '#CCCCCC'    # Gray for no significant difference

# States and metrics
STATES = ['MI', 'IN', 'IL', 'OH']
METRICS = ['UR', 'NoP']  # Units Reimbursed, Number of Prescriptions

print(f"✓ Configuration complete")
print(f"  Gray zone threshold: {GRAY_ZONE_THRESHOLD}%")
print(f"  States: {', '.join(STATES)}")
print(f"  Metrics: Units Reimbursed (UR), Number of Prescriptions (NoP)")

✓ Configuration complete
  Gray zone threshold: 5.0%
  States: MI, IN, IL, OH
  Metrics: Units Reimbursed (UR), Number of Prescriptions (NoP)


## 1. Load and Merge Data

In [128]:
# Load ML and Stats models results
user="Lilian"

path_ml=rf"C:\Users\{user}\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\ML_general.xlsx"
path_stats=rf"C:\Users\{user}\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\Stats_General.xlsx"

ml_file = pd.ExcelFile(path_ml)
stats_file = pd.ExcelFile(path_stats)

print(f"✓ ML file sheets: {ml_file.sheet_names}")
print(f"✓ Stats file sheets: {stats_file.sheet_names}")

✓ ML file sheets: ['MI_ml_UR', 'MI_ml_NoP', 'IN_ml_UR', 'IN_ml_NoP', 'IL_ml_UR', 'IL_ml_NoP', 'OH_ml_UR', 'OH_ml_NoP']
✓ Stats file sheets: ['IN_stats_UR', 'IN_stats_NoP', 'IL_stats_UR', 'IL_stats_NoP', 'MI_stats_NoP', 'MI_stats_UR', 'OH_stats_UR', 'OH_stats_NoP']


In [129]:
# Load all sheets into dictionaries
ml_data = {}
stats_data = {}

for state in STATES:
    for metric in METRICS:
        ml_sheet = f"{state}_ml_{metric}"
        stats_sheet = f"{state}_stats_{metric}"
        
        ml_data[ml_sheet] = pd.read_excel(ml_file, sheet_name=ml_sheet)
        stats_data[stats_sheet] = pd.read_excel(stats_file, sheet_name=stats_sheet)

print(f"✓ Loaded {len(ml_data)} ML sheets")
print(f"✓ Loaded {len(stats_data)} Stats sheets")

✓ Loaded 8 ML sheets
✓ Loaded 8 Stats sheets


In [130]:
# Function to determine winner with gray zone threshold
def determine_winner(row, threshold=GRAY_ZONE_THRESHOLD):
    """
    Determine the winner based on MAE difference with gray zone threshold.
    If the percentage difference is less than threshold, it's not significant.
    """
    pct_diff = abs(row['pct_improvement'])
    if pct_diff < threshold:
        return 'No Significant Difference'
    elif row['ml_avg_mae'] < row['stats_avg_mae']:
        return 'ML'
    else:
        return 'Stats'

print("✓ Helper functions defined")

✓ Helper functions defined


In [131]:
# Merge ML and Stats data for each state-metric combination
print("Merging data by unique_id...")
merged_data = {}

for state in STATES:
    for metric in METRICS:
        sheet_name = f"{state}_ml_{metric}"
        stats_sheet = f"{state}_stats_{metric}"
        
        # Get ML and Stats dataframes
        ml_df = ml_data[sheet_name][['unique_id', 'best_model', 'avg_mae']].copy()
        stats_df = stats_data[stats_sheet][['unique_id', 'best_model', 'avg_mae']].copy()
        
        # Rename columns to distinguish between ML and Stats
        ml_df.columns = ['unique_id', 'ml_best_model', 'ml_avg_mae']
        stats_df.columns = ['unique_id', 'stats_best_model', 'stats_avg_mae']
        
        # Merge on unique_id
        merged = pd.merge(ml_df, stats_df, on='unique_id', how='inner')
        
        # Add state and metric columns
        merged['state'] = state
        merged['metric'] = metric
        
        # Calculate MAE difference (positive = Stats is worse, negative = ML is worse)
        merged['mae_diff'] = merged['stats_avg_mae'] - merged['ml_avg_mae']
        
        # Calculate percentage improvement
        merged['pct_improvement'] = ((merged['stats_avg_mae'] - merged['ml_avg_mae']) / 
                                     merged['stats_avg_mae'] * 100)
        
        # Determine winner with gray zone threshold
        merged['winner'] = merged.apply(determine_winner, axis=1)
        
        merged_data[sheet_name] = merged

# Combine all merged data
all_comparisons = pd.concat(merged_data.values(), ignore_index=True)

print(f"✓ Merged {len(merged_data)} datasets")
print(f"✓ Total comparisons: {len(all_comparisons)}")
print(f"✓ Gray zone threshold: {GRAY_ZONE_THRESHOLD}%")
print(f"\nSample merged data:")
display(all_comparisons.head(10))

Merging data by unique_id...
✓ Merged 8 datasets
✓ Total comparisons: 634
✓ Gray zone threshold: 5.0%

Sample merged data:


Unnamed: 0,unique_id,ml_best_model,ml_avg_mae,stats_best_model,stats_avg_mae,state,metric,mae_diff,pct_improvement,winner
0,MI_A01,RandomForest,1876084.1,WindowAverage,1479399.07,MI,UR,-396685.04,-26.81,Stats
1,MI_A02,XGBoost,1301668.45,Naive,1945343.32,MI,UR,643674.87,33.09,ML
2,MI_A03,XGBoost,441752.14,SARIMAX,176297.92,MI,UR,-265454.22,-150.57,Stats
3,MI_A04,LightGBM,314088.13,HistoricAverage,199760.56,MI,UR,-114327.57,-57.23,Stats
4,MI_A05,XGBoost,13822.52,SARIMAX,16040.63,MI,UR,2218.11,13.83,ML
5,MI_A06,RandomForest,6268475.66,WindowAverage,4801812.01,MI,UR,-1466663.66,-30.54,Stats
6,MI_A07,XGBoost,566551.76,ARIMAX,509905.12,MI,UR,-56646.64,-11.11,Stats
7,MI_A09,XGBoost,82450.83,HistoricAverage,64937.39,MI,UR,-17513.44,-26.97,Stats
8,MI_A10,XGBoost,3164579.72,SARIMAX,822034.95,MI,UR,-2342544.77,-284.97,Stats
9,MI_A11,LightGBM,64255.33,WindowAverage,113493.95,MI,UR,49238.62,43.38,ML


## 2. Best Model Frequency Analysis

In [132]:
# Overall model frequency
print("="*80)
print("BEST MODEL FREQUENCY ANALYSIS")
print("="*80)
print(f"\nTotal comparisons: {len(all_comparisons)}")
print(f"Gray zone threshold: {GRAY_ZONE_THRESHOLD}%")

BEST MODEL FREQUENCY ANALYSIS

Total comparisons: 634
Gray zone threshold: 5.0%


In [133]:
# ML Models Frequency
print("\n" + "="*80)
print("ML MODELS - SELECTION FREQUENCY")
print("="*80)
ml_model_freq = all_comparisons['ml_best_model'].value_counts()
ml_model_pct = (ml_model_freq / len(all_comparisons) * 100).round(2)

ml_freq_df = pd.DataFrame({
    'Frequency': ml_model_freq,
    'Percentage': ml_model_pct
})
display(ml_freq_df)
print(f"\nTotal unique ML models: {len(ml_model_freq)}")


ML MODELS - SELECTION FREQUENCY


Unnamed: 0_level_0,Frequency,Percentage
ml_best_model,Unnamed: 1_level_1,Unnamed: 2_level_1
LightGBM,255,40.22
XGBoost,206,32.49
RandomForest,143,22.56
Ridge,30,4.73



Total unique ML models: 4


In [134]:
# Stats Models Frequency
print("\n" + "="*80)
print("STATISTICAL MODELS - SELECTION FREQUENCY")
print("="*80)
stats_model_freq = all_comparisons['stats_best_model'].value_counts()
stats_model_pct = (stats_model_freq / len(all_comparisons) * 100).round(2)

stats_freq_df = pd.DataFrame({
    'Frequency': stats_model_freq,
    'Percentage': stats_model_pct
})
display(stats_freq_df)
print(f"\nTotal unique Stats models: {len(stats_model_freq)}")


STATISTICAL MODELS - SELECTION FREQUENCY


Unnamed: 0_level_0,Frequency,Percentage
stats_best_model,Unnamed: 1_level_1,Unnamed: 2_level_1
HistoricAverage,225,35.49
SARIMAX,157,24.76
Naive,117,18.45
WindowAverage,89,14.04
ARIMAX,24,3.79
SeasonalNaive,22,3.47



Total unique Stats models: 6


In [135]:
# Model frequency by state and metric
print("\n" + "="*80)
print("BEST MODEL FREQUENCY BY STATE")
print("="*80)

for state in STATES:
    print(f"\n{state}:")
    state_data = all_comparisons[all_comparisons['state'] == state]
    
    print("  ML Models:")
    ml_state_freq = state_data['ml_best_model'].value_counts()
    for model, count in ml_state_freq.head(3).items():
        pct = count / len(state_data) * 100
        print(f"    {model}: {count} ({pct:.1f}%)")
    
    print("  Stats Models:")
    stats_state_freq = state_data['stats_best_model'].value_counts()
    for model, count in stats_state_freq.head(3).items():
        pct = count / len(state_data) * 100
        print(f"    {model}: {count} ({pct:.1f}%)")


BEST MODEL FREQUENCY BY STATE

MI:
  ML Models:
    XGBoost: 62 (39.7%)
    LightGBM: 52 (33.3%)
    RandomForest: 36 (23.1%)
  Stats Models:
    HistoricAverage: 51 (32.7%)
    Naive: 38 (24.4%)
    SARIMAX: 32 (20.5%)

IN:
  ML Models:
    LightGBM: 84 (52.5%)
    XGBoost: 41 (25.6%)
    RandomForest: 28 (17.5%)
  Stats Models:
    SARIMAX: 63 (39.4%)
    HistoricAverage: 45 (28.1%)
    Naive: 22 (13.8%)

IL:
  ML Models:
    LightGBM: 53 (34.0%)
    XGBoost: 51 (32.7%)
    RandomForest: 43 (27.6%)
  Stats Models:
    HistoricAverage: 61 (39.1%)
    SARIMAX: 38 (24.4%)
    WindowAverage: 22 (14.1%)

OH:
  ML Models:
    LightGBM: 66 (40.7%)
    XGBoost: 52 (32.1%)
    RandomForest: 36 (22.2%)
  Stats Models:
    HistoricAverage: 68 (42.0%)
    Naive: 39 (24.1%)
    SARIMAX: 24 (14.8%)


In [136]:
print("\n" + "="*80)
print("BEST MODEL FREQUENCY BY METRIC")
print("="*80)

for metric in METRICS:
    metric_name = 'Units Reimbursed' if metric == 'UR' else 'Number of Prescriptions'
    print(f"\n{metric_name}:")
    metric_data = all_comparisons[all_comparisons['metric'] == metric]
    
    print("  ML Models:")
    ml_metric_freq = metric_data['ml_best_model'].value_counts()
    for model, count in ml_metric_freq.head(3).items():
        pct = count / len(metric_data) * 100
        print(f"    {model}: {count} ({pct:.1f}%)")
    
    print("  Stats Models:")
    stats_metric_freq = metric_data['stats_best_model'].value_counts()
    for model, count in stats_metric_freq.head(3).items():
        pct = count / len(metric_data) * 100
        print(f"    {model}: {count} ({pct:.1f}%)")


BEST MODEL FREQUENCY BY METRIC

Units Reimbursed:
  ML Models:
    LightGBM: 122 (38.5%)
    XGBoost: 103 (32.5%)
    RandomForest: 75 (23.7%)
  Stats Models:
    HistoricAverage: 114 (36.0%)
    SARIMAX: 80 (25.2%)
    Naive: 56 (17.7%)

Number of Prescriptions:
  ML Models:
    LightGBM: 133 (42.0%)
    XGBoost: 103 (32.5%)
    RandomForest: 68 (21.5%)
  Stats Models:
    HistoricAverage: 111 (35.0%)
    SARIMAX: 77 (24.3%)
    Naive: 61 (19.2%)


## 3. Individual Drug Class Comparison

In [137]:
print("="*80)
print("INDIVIDUAL DRUG CLASS COMPARISON SUMMARY")
print("="*80)
print(f"Total drug classes compared: {len(all_comparisons)}")
print(f"Gray zone threshold: {GRAY_ZONE_THRESHOLD}%")
print(f"\nWinner distribution:")
winner_counts = all_comparisons['winner'].value_counts()
display(winner_counts)

print(f"\nWinner percentage:")
winner_pct = all_comparisons['winner'].value_counts(normalize=True) * 100
for category, pct in winner_pct.items():
    print(f"  {category}: {pct:.2f}%")

INDIVIDUAL DRUG CLASS COMPARISON SUMMARY
Total drug classes compared: 634
Gray zone threshold: 5.0%

Winner distribution:


winner
Stats                        415
ML                           150
No Significant Difference     69
Name: count, dtype: int64


Winner percentage:
  Stats: 65.46%
  ML: 23.66%
  No Significant Difference: 10.88%


In [138]:
# Winner distribution by state and metric
print("\n" + "="*80)
print("Winner Distribution by State and Metric")
print("="*80)
winner_summary = all_comparisons.groupby(['state', 'metric', 'winner']).size().unstack(fill_value=0)
display(winner_summary)

# Calculate percentages
if 'ML' in winner_summary.columns and 'Stats' in winner_summary.columns:
    total_clear = winner_summary['ML'] + winner_summary['Stats']
    if 'No Significant Difference' in winner_summary.columns:
        total_clear = total_clear + winner_summary['No Significant Difference']
    winner_summary['ML_Pct'] = (winner_summary.get('ML', 0) / total_clear * 100).round(2)
    winner_summary['Stats_Pct'] = (winner_summary.get('Stats', 0) / total_clear * 100).round(2)
    if 'No Significant Difference' in winner_summary.columns:
        winner_summary['Gray_Pct'] = (winner_summary['No Significant Difference'] / total_clear * 100).round(2)
    
    print("\nPercentage Distribution:")
    pct_cols = [col for col in ['ML_Pct', 'Stats_Pct', 'Gray_Pct'] if col in winner_summary.columns]
    display(winner_summary[pct_cols])


Winner Distribution by State and Metric


Unnamed: 0_level_0,winner,ML,No Significant Difference,Stats
state,metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
IL,NoP,19,11,48
IL,UR,19,5,54
IN,NoP,19,6,55
IN,UR,15,10,55
MI,NoP,18,11,49
MI,UR,26,12,40
OH,NoP,20,5,56
OH,UR,14,9,58



Percentage Distribution:


Unnamed: 0_level_0,winner,ML_Pct,Stats_Pct,Gray_Pct
state,metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
IL,NoP,24.36,61.54,14.1
IL,UR,24.36,69.23,6.41
IN,NoP,23.75,68.75,7.5
IN,UR,18.75,68.75,12.5
MI,NoP,23.08,62.82,14.1
MI,UR,33.33,51.28,15.38
OH,NoP,24.69,69.14,6.17
OH,UR,17.28,71.6,11.11


## 4. State-Level Comparison

In [139]:
# State-level aggregation
state_comparison = all_comparisons.groupby('state').agg({
    'ml_avg_mae': 'mean',
    'stats_avg_mae': 'mean',
    'mae_diff': 'mean',
    'pct_improvement': 'mean'
}).round(2)

state_comparison['winner'] = state_comparison.apply(
    lambda x: 'No Significant Difference' if abs(x['pct_improvement']) < GRAY_ZONE_THRESHOLD
    else ('ML' if x['ml_avg_mae'] < x['stats_avg_mae'] else 'Stats'), 
    axis=1
)

# Count winners by state
state_winner_counts = all_comparisons.groupby(['state', 'winner']).size().unstack(fill_value=0)
state_comparison = state_comparison.join(state_winner_counts, rsuffix='_count')

print("="*80)
print("STATE-LEVEL COMPARISON")
print("="*80)
display(state_comparison)
print(f"\nNote: Positive pct_improvement means ML performs better")
print(f"Gray zone threshold: {GRAY_ZONE_THRESHOLD}%")

STATE-LEVEL COMPARISON


Unnamed: 0_level_0,ml_avg_mae,stats_avg_mae,mae_diff,pct_improvement,winner,ML,No Significant Difference,Stats
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
IL,606701.57,503790.6,-102910.97,-38.41,Stats,38,16,102
IN,436354.68,309272.39,-127082.29,-608.98,Stats,34,16,110
MI,574509.28,496830.65,-77678.63,-50.11,Stats,44,23,89
OH,801168.65,668411.17,-132757.48,-61.07,Stats,34,14,114



Note: Positive pct_improvement means ML performs better
Gray zone threshold: 5.0%


## 5. Metric-Level Comparison

In [140]:
# Metric-level aggregation
metric_comparison = all_comparisons.groupby('metric').agg({
    'ml_avg_mae': 'mean',
    'stats_avg_mae': 'mean',
    'mae_diff': 'mean',
    'pct_improvement': 'mean'
}).round(2)

metric_comparison['winner'] = metric_comparison.apply(
    lambda x: 'No Significant Difference' if abs(x['pct_improvement']) < GRAY_ZONE_THRESHOLD
    else ('ML' if x['ml_avg_mae'] < x['stats_avg_mae'] else 'Stats'), 
    axis=1
)

# Count winners by metric
metric_winner_counts = all_comparisons.groupby(['metric', 'winner']).size().unstack(fill_value=0)
metric_comparison = metric_comparison.join(metric_winner_counts, rsuffix='_count')

print("="*80)
print("METRIC-LEVEL COMPARISON")
print("="*80)
display(metric_comparison)
print("\nUR: Units Reimbursed")
print("NoP: Number of Prescriptions")
print(f"\nNote: Positive pct_improvement means ML performs better")
print(f"Gray zone threshold: {GRAY_ZONE_THRESHOLD}%")

METRIC-LEVEL COMPARISON


Unnamed: 0_level_0,ml_avg_mae,stats_avg_mae,mae_diff,pct_improvement,winner,ML,No Significant Difference,Stats
metric,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
NoP,16954.13,13619.22,-3334.91,-43.6,Stats,76,33,208
UR,1194007.89,976485.22,-217522.68,-338.54,Stats,74,36,207



UR: Units Reimbursed
NoP: Number of Prescriptions

Note: Positive pct_improvement means ML performs better
Gray zone threshold: 5.0%


In [141]:
# Detailed metric-level analysis by state
metric_state_comparison = all_comparisons.groupby(['metric', 'state']).agg({
    'ml_avg_mae': 'mean',
    'stats_avg_mae': 'mean',
    'pct_improvement': 'mean'
}).round(2)

print("\n" + "="*80)
print("METRIC-LEVEL COMPARISON BY STATE")
print("="*80)
display(metric_state_comparison)


METRIC-LEVEL COMPARISON BY STATE


Unnamed: 0_level_0,Unnamed: 1_level_0,ml_avg_mae,stats_avg_mae,pct_improvement
metric,state,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
NoP,IL,15940.38,13489.45,-26.43
NoP,IN,12067.7,8219.92,-74.67
NoP,MI,17594.16,14022.62,-45.18
NoP,OH,22140.12,18688.38,-27.9
UR,IL,1197462.75,994091.75,-50.38
UR,IN,860641.66,610324.86,-1143.28
UR,MI,1131424.4,979638.68,-55.03
UR,OH,1580197.18,1318133.96,-94.24


## 6. Overall Comparison

In [142]:
# Overall comparison across all dimensions
overall_stats = {
    'Total Drug Classes': len(all_comparisons),
    'ML Wins': (all_comparisons['winner'] == 'ML').sum(),
    'Stats Wins': (all_comparisons['winner'] == 'Stats').sum(),
    'No Significant Difference': (all_comparisons['winner'] == 'No Significant Difference').sum(),
    'ML Win Rate (%)': (all_comparisons['winner'] == 'ML').mean() * 100,
    'Stats Win Rate (%)': (all_comparisons['winner'] == 'Stats').mean() * 100,
    'Gray Zone Rate (%)': (all_comparisons['winner'] == 'No Significant Difference').mean() * 100,
    'Avg ML MAE': all_comparisons['ml_avg_mae'].mean(),
    'Avg Stats MAE': all_comparisons['stats_avg_mae'].mean(),
    'Avg MAE Difference': all_comparisons['mae_diff'].mean(),
    'Avg Pct Improvement': all_comparisons['pct_improvement'].mean()
}

print("="*80)
print("OVERALL COMPARISON SUMMARY")
print("="*80)
print(f"Gray zone threshold: {GRAY_ZONE_THRESHOLD}%")
print()
for key, value in overall_stats.items():
    if 'Avg' in key or 'Difference' in key:
        print(f"{key:.<50} {value:>20,.2f}")
    elif 'Rate' in key or 'Improvement' in key:
        print(f"{key:.<50} {value:>20.2f}%")
    else:
        print(f"{key:.<50} {value:>20}")

print("\n" + "="*80)
print("INTERPRETATION")
print("="*80)
ml_clear_wins = overall_stats['ML Wins']
stats_clear_wins = overall_stats['Stats Wins']
gray_zone = overall_stats['No Significant Difference']

print(f"✓ ML models clearly outperform Stats in {ml_clear_wins} cases ({overall_stats['ML Win Rate (%)']:.1f}%)")
print(f"✓ Stats models clearly outperform ML in {stats_clear_wins} cases ({overall_stats['Stats Win Rate (%)']:.1f}%)")
print(f"✓ No significant difference in {gray_zone} cases ({overall_stats['Gray Zone Rate (%)']:.1f}%)")
print(f"✓ On average, ML reduces MAE by {overall_stats['Avg Pct Improvement']:.2f}%")

OVERALL COMPARISON SUMMARY
Gray zone threshold: 5.0%

Total Drug Classes................................                  634
ML Wins...........................................                  150
Stats Wins........................................                  415
No Significant Difference.........................                69.00
ML Win Rate (%)...................................                23.66%
Stats Win Rate (%)................................                65.46%
Gray Zone Rate (%)................................                10.88%
Avg ML MAE........................................           605,481.01
Avg Stats MAE.....................................           495,052.22
Avg MAE Difference................................          -110,428.79
Avg Pct Improvement...............................              -191.07

INTERPRETATION
✓ ML models clearly outperform Stats in 150 cases (23.7%)
✓ Stats models clearly outperform ML in 415 cases (65.5%)
✓ No significant di

## 7. Model Performance Tables

In [143]:
# Top 20 ML improvements (excluding gray zone)
ml_clear_wins_df = all_comparisons[all_comparisons['winner'] == 'ML']
print("="*80)
print("TOP 20 DRUG CLASSES WHERE ML CLEARLY OUTPERFORMS STATS")
print(f"(Improvement > {GRAY_ZONE_THRESHOLD}%)")
print("="*80)
if len(ml_clear_wins_df) > 0:
    top_ml = ml_clear_wins_df.nlargest(20, 'pct_improvement')[[
        'unique_id', 'state', 'metric', 'ml_best_model', 'stats_best_model',
        'ml_avg_mae', 'stats_avg_mae', 'pct_improvement'
    ]]
    display(top_ml)
else:
    print("No clear ML wins found.")

TOP 20 DRUG CLASSES WHERE ML CLEARLY OUTPERFORMS STATS
(Improvement > 5.0%)


Unnamed: 0,unique_id,state,metric,ml_best_model,stats_best_model,ml_avg_mae,stats_avg_mae,pct_improvement
508,OH_G02,OH,UR,LightGBM,HistoricAverage,1216122.11,3692248.76,67.06
589,OH_G02,OH,NoP,XGBoost,WindowAverage,19918.11,59853.67,66.72
610,OH_M05,OH,NoP,XGBoost,Naive,522.2,1234.38,57.7
49,MI_L01,MI,UR,RandomForest,Naive,59325.4,129655.33,54.24
120,MI_H04,MI,NoP,RandomForest,WindowAverage,121.05,256.65,52.83
267,IN_D07,IN,NoP,XGBoost,SARIMAX,16128.92,32718.91,50.7
328,IL_B01,IL,UR,XGBoost,WindowAverage,462471.8,929992.0,50.27
252,IN_B06,IN,NoP,XGBoost,WindowAverage,223.03,446.62,50.06
429,IL_G02,IL,NoP,RandomForest,HistoricAverage,17313.7,32392.84,46.55
245,IN_A11,IN,NoP,Ridge,SARIMAX,3287.29,6031.67,45.5


In [144]:
# Top 20 Stats improvements (excluding gray zone)
stats_clear_wins_df = all_comparisons[all_comparisons['winner'] == 'Stats']
print("\n" + "="*80)
print("TOP 20 DRUG CLASSES WHERE STATS CLEARLY OUTPERFORMS ML")
print(f"(Improvement > {GRAY_ZONE_THRESHOLD}%)")
print("="*80)
if len(stats_clear_wins_df) > 0:
    top_stats = stats_clear_wins_df.nsmallest(20, 'pct_improvement')[[
        'unique_id', 'state', 'metric', 'ml_best_model', 'stats_best_model',
        'ml_avg_mae', 'stats_avg_mae', 'pct_improvement'
    ]]
    display(top_stats)
else:
    print("No clear Stats wins found.")


TOP 20 DRUG CLASSES WHERE STATS CLEARLY OUTPERFORMS ML
(Improvement > 5.0%)


Unnamed: 0,unique_id,state,metric,ml_best_model,stats_best_model,ml_avg_mae,stats_avg_mae,pct_improvement
198,IN_H04,IN,UR,LightGBM,SARIMAX,99925.75,152.99,-65216.7
207,IN_L03,IN,UR,LightGBM,Naive,99284.11,735.41,-13400.54
176,IN_C04,IN,UR,XGBoost,Naive,64277.18,2181.96,-2845.85
476,OH_A05,OH,UR,LightGBM,SARIMAX,251642.5,9078.04,-2671.99
199,IN_H05,IN,UR,LightGBM,Naive,83188.14,3141.55,-2547.99
222,IN_P02,IN,UR,LightGBM,SeasonalNaive,99429.16,4874.73,-1939.69
135,MI_M05,MI,NoP,XGBoost,HistoricAverage,4255.35,275.33,-1445.54
354,IL_H01,IL,UR,XGBoost,HistoricAverage,106539.78,9454.88,-1026.82
11,MI_A16,MI,UR,XGBoost,WindowAverage,297728.73,27482.63,-983.33
181,IN_C10,IN,UR,RandomForest,SARIMAX,1225046.41,172754.61,-609.13


In [145]:
# Gray zone cases
gray_zone_cases = all_comparisons[all_comparisons['winner'] == 'No Significant Difference']
print("\n" + "="*80)
print(f"GRAY ZONE CASES (MAE difference < {GRAY_ZONE_THRESHOLD}%)")
print("="*80)
print(f"Total: {len(gray_zone_cases)}")
if len(gray_zone_cases) > 0:
    print("\nSample cases:")
    display(gray_zone_cases.head(20)[[
        'unique_id', 'state', 'metric', 'ml_best_model', 'stats_best_model',
        'ml_avg_mae', 'stats_avg_mae', 'pct_improvement'
    ]])
else:
    print("No gray zone cases found.")


GRAY ZONE CASES (MAE difference < 5.0%)
Total: 69

Sample cases:


Unnamed: 0,unique_id,state,metric,ml_best_model,stats_best_model,ml_avg_mae,stats_avg_mae,pct_improvement
14,MI_B03,MI,UR,LightGBM,HistoricAverage,504068.96,513680.14,1.87
23,MI_C08,MI,UR,XGBoost,Naive,330720.04,324453.8,-1.93
24,MI_C09,MI,UR,XGBoost,Naive,1194435.76,1202443.59,0.67
27,MI_D02,MI,UR,LightGBM,SARIMAX,290361.48,278081.67,-4.42
30,MI_D06,MI,UR,XGBoost,HistoricAverage,363870.43,360028.14,-1.07
31,MI_D07,MI,UR,LightGBM,HistoricAverage,1062457.9,1014864.55,-4.69
38,MI_G04,MI,UR,LightGBM,Naive,454447.61,438574.55,-3.62
48,MI_J06,MI,UR,XGBoost,HistoricAverage,49159.42,51359.12,4.28
54,MI_M02,MI,UR,XGBoost,HistoricAverage,4844608.84,4788038.23,-1.18
58,MI_N01,MI,UR,XGBoost,WindowAverage,90621.76,86682.36,-4.54


In [146]:
# Model performance breakdown
print("\n" + "="*80)
print("BEST MODEL PERFORMANCE BREAKDOWN")
print("="*80)

# ML models
print("\nML Models Win Rate:")
ml_model_wins = all_comparisons[all_comparisons['winner'] == 'ML']['ml_best_model'].value_counts()
ml_model_total = all_comparisons['ml_best_model'].value_counts()
ml_model_perf = pd.DataFrame({
    'Times Selected': ml_model_total,
    'Times Won (Clear)': ml_model_wins,
    'Win Rate (%)': (ml_model_wins / ml_model_total * 100).fillna(0).round(2)
}).sort_values('Times Won (Clear)', ascending=False)
display(ml_model_perf)

# Stats models
print("\n\nStats Models Win Rate:")
stats_model_wins = all_comparisons[all_comparisons['winner'] == 'Stats']['stats_best_model'].value_counts()
stats_model_total = all_comparisons['stats_best_model'].value_counts()
stats_model_perf = pd.DataFrame({
    'Times Selected': stats_model_total,
    'Times Won (Clear)': stats_model_wins,
    'Win Rate (%)': (stats_model_wins / stats_model_total * 100).fillna(0).round(2)
}).sort_values('Times Won (Clear)', ascending=False)
display(stats_model_perf)


BEST MODEL PERFORMANCE BREAKDOWN

ML Models Win Rate:


Unnamed: 0_level_0,Times Selected,Times Won (Clear),Win Rate (%)
ml_best_model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
LightGBM,255,67,26.27
XGBoost,206,41,19.9
RandomForest,143,31,21.68
Ridge,30,11,36.67




Stats Models Win Rate:


Unnamed: 0_level_0,Times Selected,Times Won (Clear),Win Rate (%)
stats_best_model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HistoricAverage,225,143,63.56
SARIMAX,157,118,75.16
Naive,117,71,60.68
WindowAverage,89,51,57.3
SeasonalNaive,22,17,77.27
ARIMAX,24,15,62.5


## 8. Create Visualizations and Dashboards

In [147]:
# Function to create model-specific dashboard
def create_model_dashboard(data, metric_name, approach, output_file):
    """
    Create a dashboard for a specific metric and approach (ML or Stats)
    """
    fig = plt.figure(figsize=(18, 10))
    gs = GridSpec(2, 3, figure=fig, hspace=0.3, wspace=0.3)
    
    # Get appropriate model column
    model_col = 'ml_best_model' if approach == 'ML' else 'stats_best_model'
    mae_col = 'ml_avg_mae' if approach == 'ML' else 'stats_avg_mae'
    
    # Filter data by metric
    metric_data = data[data['metric'] == metric_name].copy()
    
    # Color palette for models
    models = metric_data[model_col].unique()
    n_models = len(models)
    if approach == 'ML':
        colors = plt.cm.viridis(np.linspace(0.2, 0.9, n_models))
    else:
        colors = plt.cm.plasma(np.linspace(0.2, 0.9, n_models))
    
    # ----- Plot 1: Model Selection Frequency (Pie Chart) -----
    ax1 = fig.add_subplot(gs[0, 0])
    model_freq = metric_data[model_col].value_counts()
    wedges, texts, autotexts = ax1.pie(model_freq, labels=model_freq.index, autopct='%1.1f%%',
                                        colors=colors, startangle=90, 
                                        textprops={'fontsize': 10, 'weight': 'bold'})
    
    # Add total count in center
    centre_circle = plt.Circle((0, 0), 0.70, fc='white')
    ax1.add_artist(centre_circle)
    ax1.text(0, 0, f'{len(metric_data)}', ha='center', va='center', 
             fontsize=32, weight='bold')
    ax1.text(0, -0.15, 'series', ha='center', va='center', fontsize=12)
    
    ax1.set_title('Model Selection Frequency', fontsize=14, weight='bold', pad=20)
    
    # ----- Plot 2: Model Frequency by State -----
    ax2 = fig.add_subplot(gs[0, 1])
    
    state_model_freq = {}
    for state in STATES:
        state_data = metric_data[metric_data['state'] == state]
        state_model_freq[state] = state_data[model_col].value_counts()
    
    state_freq_df = pd.DataFrame(state_model_freq).fillna(0).T
    state_freq_df.plot(kind='bar', ax=ax2, color=colors, width=0.8)
    
    ax2.set_title('Model Frequency by State', fontsize=14, weight='bold', pad=20)
    ax2.set_xlabel('State', fontsize=12, weight='bold')
    ax2.set_ylabel('Frequency', fontsize=12, weight='bold')
    ax2.tick_params(axis='x', rotation=0, labelsize=11)
    ax2.tick_params(axis='y', labelsize=10)
    ax2.legend(title='Model', fontsize=9, title_fontsize=10, loc='upper right')
    ax2.grid(axis='y', alpha=0.3)
    
    # ----- Plot 3: MAE Distribution by Model -----
    ax3 = fig.add_subplot(gs[0, 2])
    
    model_order = model_freq.index.tolist()
    mae_by_model = [metric_data[metric_data[model_col] == model][mae_col].values 
                    for model in model_order]
    
    bp = ax3.boxplot(mae_by_model, labels=model_order, patch_artist=True,
                     showfliers=True, widths=0.6)
    
    for patch, color in zip(bp['boxes'], colors):
        patch.set_facecolor(color)
        patch.set_alpha(0.7)
    
    for element in ['whiskers', 'fliers', 'means', 'medians', 'caps']:
        plt.setp(bp[element], color='black', linewidth=1.5)
    
    ax3.set_title('MAE Distribution by Model', fontsize=14, weight='bold', pad=20)
    ax3.set_xlabel('Model', fontsize=12, weight='bold')
    ax3.set_ylabel('MAE', fontsize=12, weight='bold')
    ax3.tick_params(axis='x', rotation=45, labelsize=9)
    ax3.tick_params(axis='y', labelsize=10)
    ax3.grid(axis='y', alpha=0.3)
    ax3.set_yscale('log')
    
    # ----- Plot 4: Mean MAE by Model -----
    ax4 = fig.add_subplot(gs[1, :])
    
    mean_mae = metric_data.groupby(model_col)[mae_col].mean().sort_values(ascending=False)
    counts = metric_data[model_col].value_counts()[mean_mae.index]
    
    bars = ax4.barh(range(len(mean_mae)), mean_mae.values, color=colors)
    
    ax4.set_yticks(range(len(mean_mae)))
    ax4.set_yticklabels(mean_mae.index, fontsize=11, weight='bold')
    ax4.set_xlabel('Mean MAE', fontsize=12, weight='bold')
    ax4.set_title('Mean MAE by Model', fontsize=14, weight='bold', pad=20)
    ax4.grid(axis='x', alpha=0.3)
    ax4.invert_yaxis()
    
    # Add count labels
    for i, (mae_val, count) in enumerate(zip(mean_mae.values, counts.values)):
        ax4.text(mae_val, i, f'  {mae_val:,.0f} (n={count})', 
                va='center', ha='left', fontsize=10, weight='bold')
    
    # Main title
    metric_full = 'Units Reimbursed' if metric_name == 'UR' else 'Number of Prescriptions'
    approach_full = 'Machine Learning' if approach == 'ML' else 'Statistical'
    fig.suptitle(f'{metric_full} - {approach_full} Models Analysis', 
                 fontsize=18, weight='bold', y=0.98)
    
    plt.savefig(output_file, dpi=300, bbox_inches='tight', facecolor='white')
    print(f"✓ Saved: {output_file}")
    plt.close()

print("✓ Dashboard creation function defined")

✓ Dashboard creation function defined


In [148]:
# Create 4 individual model-specific dashboards
print("Creating individual model dashboards...")
output_file = rf'C:\Users\{user}\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\Plots\\'
# 1. Units Reimbursed - ML Models
create_model_dashboard(all_comparisons, 'UR', 'ML', output_file)

# 2. Units Reimbursed - Stats Models
create_model_dashboard(all_comparisons, 'UR', 'Stats',output_file)

# 3. Number of Prescriptions - ML Models
create_model_dashboard(all_comparisons, 'NoP', 'ML',output_file)

# 4. Number of Prescriptions - Stats Models
create_model_dashboard(all_comparisons, 'NoP', 'Stats',output_file)

print("\n✓ All individual model dashboards created")

Creating individual model dashboards...
✓ Saved: C:\Users\Lilian\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\Plots\\
✓ Saved: C:\Users\Lilian\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\Plots\\
✓ Saved: C:\Users\Lilian\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\Plots\\
✓ Saved: C:\Users\Lilian\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\Plots\\

✓ All individual model dashboards created


In [149]:
# Create overall comparison dashboard
print("Creating overall comparison dashboard...")

fig = plt.figure(figsize=(20, 12))
gs = GridSpec(2, 3, figure=fig, hspace=0.3, wspace=0.3)

# ----- Row 1: Units Reimbursed -----
ur_data = all_comparisons[all_comparisons['metric'] == 'UR']

# Overall Distribution
ax1 = fig.add_subplot(gs[0, 0])
winner_counts = ur_data['winner'].value_counts()
colors_winner = [ML_COLOR if w == 'ML' else (GRAY_COLOR if w == 'No Significant Difference' else STATS_COLOR) 
                 for w in winner_counts.index]

wedges, texts, autotexts = ax1.pie(winner_counts, labels=winner_counts.index, 
                                    autopct='%1.1f%%', colors=colors_winner,
                                    startangle=90, textprops={'fontsize': 11, 'weight': 'bold'})

centre_circle = plt.Circle((0, 0), 0.70, fc='white')
ax1.add_artist(centre_circle)
ax1.text(0, 0, f'{len(ur_data)}', ha='center', va='center', fontsize=32, weight='bold')
ax1.text(0, -0.15, 'series', ha='center', va='center', fontsize=12)

ax1.set_title('Units Reimbursed\nOverall Distribution', fontsize=14, weight='bold', pad=20)

# Win Rate by State
ax2 = fig.add_subplot(gs[0, 1])
ur_state_winners = ur_data.groupby(['state', 'winner']).size().unstack(fill_value=0)
ur_state_pct = ur_state_winners.div(ur_state_winners.sum(axis=1), axis=0) * 100

winner_order = ['ML', 'No Significant Difference', 'Stats']
ur_state_pct = ur_state_pct.reindex(columns=winner_order, fill_value=0)

ur_state_pct.plot(kind='bar', stacked=True, ax=ax2, 
                  color=[ML_COLOR, GRAY_COLOR, STATS_COLOR], width=0.7)

ax2.set_title('Units Reimbursed\nWin Rate by State', fontsize=14, weight='bold', pad=20)
ax2.set_xlabel('State', fontsize=12, weight='bold')
ax2.set_ylabel('Percentage (%)', fontsize=12, weight='bold')
ax2.tick_params(axis='x', rotation=0, labelsize=11)
ax2.legend(title='Winner', fontsize=9, title_fontsize=10)
ax2.grid(axis='y', alpha=0.3)

for container in ax2.containers:
    ax2.bar_label(container, fmt='%.0f%%', label_type='center', fontsize=9, weight='bold')

# Model Selection Frequency
ax3 = fig.add_subplot(gs[0, 2])

ur_ml_freq = ur_data['ml_best_model'].value_counts().head(8)
ur_stats_freq = ur_data['stats_best_model'].value_counts().head(8)

all_models = sorted(set(ur_ml_freq.index) | set(ur_stats_freq.index))
model_freq_df = pd.DataFrame({
    'ML Models': ur_ml_freq.reindex(all_models, fill_value=0),
    'Stats Models': ur_stats_freq.reindex(all_models, fill_value=0)
})

model_freq_df.plot(kind='barh', ax=ax3, color=[ML_COLOR, STATS_COLOR], width=0.7)

ax3.set_title('Units Reimbursed\nModel Selection Frequency', fontsize=14, weight='bold', pad=20)
ax3.set_xlabel('Times Selected as Best', fontsize=12, weight='bold')
ax3.set_ylabel('')
ax3.tick_params(axis='y', labelsize=10)
ax3.legend(fontsize=10, loc='lower right')
ax3.grid(axis='x', alpha=0.3)
ax3.invert_yaxis()

# ----- Row 2: Number of Prescriptions -----
nop_data = all_comparisons[all_comparisons['metric'] == 'NoP']

# Overall Distribution
ax4 = fig.add_subplot(gs[1, 0])
winner_counts_nop = nop_data['winner'].value_counts()
colors_winner_nop = [ML_COLOR if w == 'ML' else (GRAY_COLOR if w == 'No Significant Difference' else STATS_COLOR) 
                     for w in winner_counts_nop.index]

wedges, texts, autotexts = ax4.pie(winner_counts_nop, labels=winner_counts_nop.index,
                                    autopct='%1.1f%%', colors=colors_winner_nop,
                                    startangle=90, textprops={'fontsize': 11, 'weight': 'bold'})

centre_circle = plt.Circle((0, 0), 0.70, fc='white')
ax4.add_artist(centre_circle)
ax4.text(0, 0, f'{len(nop_data)}', ha='center', va='center', fontsize=32, weight='bold')
ax4.text(0, -0.15, 'series', ha='center', va='center', fontsize=12)

ax4.set_title('Num Prescriptions\nOverall Distribution', fontsize=14, weight='bold', pad=20)

# Win Rate by State
ax5 = fig.add_subplot(gs[1, 1])
nop_state_winners = nop_data.groupby(['state', 'winner']).size().unstack(fill_value=0)
nop_state_pct = nop_state_winners.div(nop_state_winners.sum(axis=1), axis=0) * 100

nop_state_pct = nop_state_pct.reindex(columns=winner_order, fill_value=0)

nop_state_pct.plot(kind='bar', stacked=True, ax=ax5,
                   color=[ML_COLOR, GRAY_COLOR, STATS_COLOR], width=0.7)

ax5.set_title('Num Prescriptions\nWin Rate by State', fontsize=14, weight='bold', pad=20)
ax5.set_xlabel('State', fontsize=12, weight='bold')
ax5.set_ylabel('Percentage (%)', fontsize=12, weight='bold')
ax5.tick_params(axis='x', rotation=0, labelsize=11)
ax5.legend(title='Winner', fontsize=9, title_fontsize=10)
ax5.grid(axis='y', alpha=0.3)

for container in ax5.containers:
    ax5.bar_label(container, fmt='%.0f%%', label_type='center', fontsize=9, weight='bold')

# Model Selection Frequency
ax6 = fig.add_subplot(gs[1, 2])

nop_ml_freq = nop_data['ml_best_model'].value_counts().head(8)
nop_stats_freq = nop_data['stats_best_model'].value_counts().head(8)

all_models_nop = sorted(set(nop_ml_freq.index) | set(nop_stats_freq.index))
model_freq_df_nop = pd.DataFrame({
    'ML Models': nop_ml_freq.reindex(all_models_nop, fill_value=0),
    'Stats Models': nop_stats_freq.reindex(all_models_nop, fill_value=0)
})

model_freq_df_nop.plot(kind='barh', ax=ax6, color=[ML_COLOR, STATS_COLOR], width=0.7)

ax6.set_title('Num Prescriptions\nModel Selection Frequency', fontsize=14, weight='bold', pad=20)
ax6.set_xlabel('Times Selected as Best', fontsize=12, weight='bold')
ax6.set_ylabel('')
ax6.tick_params(axis='y', labelsize=10)
ax6.legend(fontsize=10, loc='lower right')
ax6.grid(axis='x', alpha=0.3)
ax6.invert_yaxis()

# Main title
fig.suptitle('Overall Model Comparison Dashboard', fontsize=20, weight='bold', y=0.98)

# Add footer note
fig.text(0.5, 0.01, f'Gray Zone Threshold: < {GRAY_ZONE_THRESHOLD}% difference in MAE between approaches',
         ha='center', fontsize=11, style='italic', color='gray')

plt.savefig(output_file + "Dashboard_Overall_Comparison.png", 
            dpi=300, bbox_inches='tight', facecolor='white')
print(f"✓ Saved: Dashboard_Overall_Comparison.png")
plt.close()

print("\n✓ Overall comparison dashboard created")

Creating overall comparison dashboard...
✓ Saved: Dashboard_Overall_Comparison.png

✓ Overall comparison dashboard created


## 9. Export Results to Excel

In [150]:
# Export detailed comparison to Excel
print("Exporting results to Excel...")
output_file = rf"C:\Users\{user}\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\Detailed_Comparison_Results.xlsx"

with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
    # All comparisons
    all_comparisons.to_excel(writer, sheet_name='All_Comparisons', index=False)
    
    # State-level summary
    state_comparison.to_excel(writer, sheet_name='State_Summary')
    
    # Metric-level summary
    metric_comparison.to_excel(writer, sheet_name='Metric_Summary')
    
    # Winner distribution
    winner_summary.to_excel(writer, sheet_name='Winner_Distribution')
    
    # Top ML improvements (clear wins only)
    if len(ml_clear_wins_df) > 0:
        ml_clear_wins_df.nlargest(50, 'pct_improvement').to_excel(
            writer, sheet_name='Top_ML_Wins', index=False
        )
    
    # Top Stats improvements (clear wins only)
    if len(stats_clear_wins_df) > 0:
        stats_clear_wins_df.nsmallest(50, 'pct_improvement').to_excel(
            writer, sheet_name='Top_Stats_Wins', index=False
        )
    
    # Gray zone cases
    if len(gray_zone_cases) > 0:
        gray_zone_cases.to_excel(writer, sheet_name='Gray_Zone_Cases', index=False)
    
    # Model performance
    ml_model_perf.to_excel(writer, sheet_name='ML_Model_Performance')
    stats_model_perf.to_excel(writer, sheet_name='Stats_Model_Performance')
    
    # Model frequency
    ml_freq_df.to_excel(writer, sheet_name='ML_Model_Frequency')
    stats_freq_df.to_excel(writer, sheet_name='Stats_Model_Frequency')
    
    # Overall summary
    overall_summary = pd.DataFrame([overall_stats]).T
    overall_summary.columns = ['Value']
    overall_summary.to_excel(writer, sheet_name='Overall_Summary')

print(f"✓ Results exported to: {output_file}")

Exporting results to Excel...
✓ Results exported to: C:\Users\Lilian\OneDrive - purdue.edu\VS code\Data\ATC\Comparison\Detailed_Comparison_Results.xlsx


## 10. Summary and Key Findings

In [151]:
print("="*80)
print("KEY FINDINGS SUMMARY")
print("="*80)
print(f"Gray zone threshold: {GRAY_ZONE_THRESHOLD}%")

print("\n1. OVERALL PERFORMANCE:")
print(f"   - ML models clearly won {overall_stats['ML Wins']} out of {overall_stats['Total Drug Classes']} comparisons")
print(f"   - Stats models clearly won {overall_stats['Stats Wins']} comparisons")
print(f"   - No significant difference in {overall_stats['No Significant Difference']} comparisons")
print(f"   - ML win rate: {overall_stats['ML Win Rate (%)']:.1f}%")
print(f"   - Stats win rate: {overall_stats['Stats Win Rate (%)']:.1f}%")
print(f"   - Gray zone rate: {overall_stats['Gray Zone Rate (%)']:.1f}%")
if len(ml_clear_wins_df) > 0:
    print(f"   - Average improvement when ML clearly wins: {ml_clear_wins_df['pct_improvement'].mean():.2f}%")
if len(stats_clear_wins_df) > 0:
    print(f"   - Average improvement when Stats clearly wins: {abs(stats_clear_wins_df['pct_improvement'].mean()):.2f}%")

print("\n2. BY STATE:")
for state in STATES:
    state_data = all_comparisons[all_comparisons['state'] == state]
    ml_wins = (state_data['winner'] == 'ML').sum()
    stats_wins = (state_data['winner'] == 'Stats').sum()
    gray = (state_data['winner'] == 'No Significant Difference').sum()
    total = len(state_data)
    print(f"   - {state}: ML={ml_wins} ({ml_wins/total*100:.1f}%), Stats={stats_wins} ({stats_wins/total*100:.1f}%), Gray={gray} ({gray/total*100:.1f}%)")

print("\n3. BY METRIC:")
for metric in METRICS:
    metric_data = all_comparisons[all_comparisons['metric'] == metric]
    ml_wins = (metric_data['winner'] == 'ML').sum()
    stats_wins = (metric_data['winner'] == 'Stats').sum()
    gray = (metric_data['winner'] == 'No Significant Difference').sum()
    total = len(metric_data)
    metric_name = 'Units Reimbursed' if metric == 'UR' else 'Number of Prescriptions'
    print(f"   - {metric_name}: ML={ml_wins} ({ml_wins/total*100:.1f}%), Stats={stats_wins} ({stats_wins/total*100:.1f}%), Gray={gray} ({gray/total*100:.1f}%)")

print("\n4. MOST FREQUENTLY SELECTED MODELS:")
print("   ML Models:")
for model, count in ml_model_freq.head(3).items():
    pct = count / len(all_comparisons) * 100
    print(f"      - {model}: {count} ({pct:.1f}%)")
print("   Stats Models:")
for model, count in stats_model_freq.head(3).items():
    pct = count / len(all_comparisons) * 100
    print(f"      - {model}: {count} ({pct:.1f}%)")

print("\n5. BEST PERFORMING MODELS (Clear Wins):")
print("   ML Models:")
if len(ml_model_wins) > 0:
    for model, count in ml_model_wins.head(3).items():
        print(f"      - {model}: {count} clear wins")
print("   Stats Models:")
if len(stats_model_wins) > 0:
    for model, count in stats_model_wins.head(3).items():
        print(f"      - {model}: {count} clear wins")

print("\n6. LARGEST IMPROVEMENTS:")
if len(ml_clear_wins_df) > 0:
    best_ml = ml_clear_wins_df.nlargest(1, 'pct_improvement').iloc[0]
    print(f"   - Best ML improvement: {best_ml['unique_id']} ({best_ml['pct_improvement']:.2f}%)")
if len(stats_clear_wins_df) > 0:
    best_stats = stats_clear_wins_df.nsmallest(1, 'pct_improvement').iloc[0]
    print(f"   - Best Stats improvement: {best_stats['unique_id']} ({abs(best_stats['pct_improvement']):.2f}%)")

print("\n" + "="*80)
print("ANALYSIS COMPLETE")
print("="*80)
print("\nGenerated files:")
print("1. Dashboard_UR_ML.png - Units Reimbursed ML Models")
print("2. Dashboard_UR_Stats.png - Units Reimbursed Stats Models")
print("3. Dashboard_NoP_ML.png - Number of Prescriptions ML Models")
print("4. Dashboard_NoP_Stats.png - Number of Prescriptions Stats Models")
print("5. Dashboard_Overall_Comparison.png - Overall Comparison")
print("6. Model_Comparison_Results.xlsx - Detailed Excel Report")
print("\nAll files saved to: /mnt/user-data/outputs/")
print("="*80)

KEY FINDINGS SUMMARY
Gray zone threshold: 5.0%

1. OVERALL PERFORMANCE:
   - ML models clearly won 150 out of 634 comparisons
   - Stats models clearly won 415 comparisons
   - No significant difference in 69 comparisons
   - ML win rate: 23.7%
   - Stats win rate: 65.5%
   - Gray zone rate: 10.9%
   - Average improvement when ML clearly wins: 24.17%
   - Average improvement when Stats clearly wins: 300.54%

2. BY STATE:
   - MI: ML=44 (28.2%), Stats=89 (57.1%), Gray=23 (14.7%)
   - IN: ML=34 (21.2%), Stats=110 (68.8%), Gray=16 (10.0%)
   - IL: ML=38 (24.4%), Stats=102 (65.4%), Gray=16 (10.3%)
   - OH: ML=34 (21.0%), Stats=114 (70.4%), Gray=14 (8.6%)

3. BY METRIC:
   - Units Reimbursed: ML=74 (23.3%), Stats=207 (65.3%), Gray=36 (11.4%)
   - Number of Prescriptions: ML=76 (24.0%), Stats=208 (65.6%), Gray=33 (10.4%)

4. MOST FREQUENTLY SELECTED MODELS:
   ML Models:
      - LightGBM: 255 (40.2%)
      - XGBoost: 206 (32.5%)
      - RandomForest: 143 (22.6%)
   Stats Models:
      - Hist