# 🏆 Notebook 05B: Model Tournament (4 Algorithms)

**Objective:** Select top 2 models for Phase 3 sampling optimization

**Critical Questions:**
1. Which algorithms handle 4.8:1 class imbalance best?
2. Do boosting methods outperform bagging (Random Forest)?
3. Which 2 models should we optimize in Phase 3?

**Models to Test:**
1. **Random Forest** - Bagging ensemble (baseline: 21.0% recall)
2. **XGBoost** - Gradient boosting (often best for tabular data)
3. **AdaBoost** - Adaptive boosting (sequential error correction)
4. **CatBoost** - Category-optimized gradient boosting

**Configuration:**
- Default hyperparameters (NO tuning)
- Class imbalance handling (balanced weights / scale_pos_weight)
- 3-fold stratified cross-validation
- Same features (18 raw features from Phase 1)

**Selection Criteria:**
- **Primary:** Recall (catch At-Risk patients)
- **Secondary:** ROC-AUC (overall discrimination)
- **Tiebreaker:** Training time

---

## 📦 Step 1: Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from sklearn.metrics import (
    make_scorer, recall_score, precision_score, f1_score, roc_auc_score,
    confusion_matrix, classification_report
)
from sklearn.pipeline import Pipeline
import time
import json
import warnings
warnings.filterwarnings('ignore')

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

print("✅ Setup complete")

## 📊 Step 2: Load Selected Features from Phase 1

In [None]:
print("=" * 60)
print("LOADING DATA FROM PHASE 1 DECISION")
print("=" * 60)

# Load selected features (Phase 1 output)
df = pd.read_csv('selected_features_05A.csv')

# Separate features and target
X = df.drop('Target', axis=1)
y = df['Target']

print(f"\n📊 Dataset: {df.shape[0]:,} rows × {df.shape[1]} columns")
print(f"   Features: {X.shape[1]}")
print(f"\n📋 Selected Features ({X.shape[1]}):")
for i, col in enumerate(X.columns, 1):
    print(f"   {i:2d}. {col}")

# Check target distribution
class_counts = y.value_counts().sort_index()
print(f"\n📊 Target Distribution:")
print(f"   Class 0 (Healthy):  {class_counts[0]:7,} ({class_counts[0]/len(y)*100:.2f}%)")
print(f"   Class 1 (At Risk):  {class_counts[1]:7,} ({class_counts[1]/len(y)*100:.2f}%)")
print(f"   Imbalance ratio: {class_counts[0]/class_counts[1]:.1f}:1")

# Load Phase 1 decision metadata
with open('feature_engineering_decision.json', 'r') as f:
    phase1_decision = json.load(f)

print(f"\n📋 Phase 1 Baseline (Random Forest):")
print(f"   Recall: {phase1_decision['baseline_recall']:.3f}")
print(f"   → This is the benchmark to beat!")

## 🔀 Step 3: Train-Test Split

In [None]:
print("=" * 60)
print("CREATING TRAIN-TEST SPLIT")
print("=" * 60)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\n📊 Split:")
print(f"   Train: {X_train.shape[0]:,} samples")
print(f"   Test:  {X_test.shape[0]:,} samples")

# Verify stratification
train_dist = y_train.value_counts(normalize=True).sort_index()
test_dist = y_test.value_counts(normalize=True).sort_index()

print(f"\n✅ Stratification verified:")
print(f"   Train At-Risk: {train_dist[1]:.3f}")
print(f"   Test At-Risk:  {test_dist[1]:.3f}")

## 🤖 Step 4: Define 4 Models with Default Parameters

In [None]:
print("=" * 60)
print("DEFINING 4 MODELS (DEFAULT PARAMS + BALANCED WEIGHTS)")
print("=" * 60)

# Calculate scale_pos_weight for XGBoost/CatBoost
scale_pos_weight = class_counts[0] / class_counts[1]

# Define models
models = {
    'Random Forest': Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', RandomForestClassifier(
            n_estimators=100,        # Default
            class_weight='balanced', # Handle imbalance
            random_state=42,
            n_jobs=-1
        ))
    ]),
    
    'XGBoost': Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', XGBClassifier(
            n_estimators=100,                      # Default
            scale_pos_weight=scale_pos_weight,     # Handle imbalance
            random_state=42,
            n_jobs=-1,
            eval_metric='logloss'                  # Suppress warning
        ))
    ]),
    
    'AdaBoost': Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', AdaBoostClassifier(
            n_estimators=100,        # Default
            algorithm='SAMME',       # Works with class_weight
            random_state=42
        ))
    ]),
    
    'CatBoost': Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', CatBoostClassifier(
            iterations=100,                  # Default
            auto_class_weights='Balanced',   # Handle imbalance
            random_state=42,
            verbose=0                        # Suppress output
        ))
    ])
}

print(f"\n📋 Models defined:")
print(f"   1. Random Forest (Bagging ensemble)")
print(f"   2. XGBoost (Gradient boosting)")
print(f"   3. AdaBoost (Adaptive boosting)")
print(f"   4. CatBoost (Categorical boosting)")

print(f"\n💡 All models use:")
print(f"   • Default hyperparameters (no tuning)")
print(f"   • Balanced class weights (imbalance handling)")
print(f"   • StandardScaler (feature normalization)")
print(f"   • n_estimators=100 (fair comparison)")

## 🧪 Step 5: Cross-Validation (3-Fold)

In [None]:
print("=" * 60)
print("RUNNING 3-FOLD CROSS-VALIDATION ON ALL MODELS")
print("=" * 60)

# Define scoring metrics
scoring = {
    'recall': make_scorer(recall_score),
    'precision': make_scorer(precision_score),
    'f1': make_scorer(f1_score),
    'roc_auc': make_scorer(roc_auc_score, needs_proba=True)
}

cv_results = {}

print(f"\n⏳ Running cross-validation (this may take 5-10 minutes)...\n")

for model_name, pipeline in models.items():
    print(f"{'='*60}")
    print(f"Testing: {model_name}")
    print(f"{'='*60}")
    
    start_time = time.time()
    
    # Run cross-validation
    cv_scores = cross_validate(
        pipeline, X_train, y_train,
        cv=3,
        scoring=scoring,
        n_jobs=-1,
        return_train_score=False
    )
    
    elapsed_time = time.time() - start_time
    
    # Calculate mean and std for each metric
    recall_mean = cv_scores['test_recall'].mean()
    recall_std = cv_scores['test_recall'].std()
    precision_mean = cv_scores['test_precision'].mean()
    f1_mean = cv_scores['test_f1'].mean()
    roc_auc_mean = cv_scores['test_roc_auc'].mean()
    
    print(f"\n📊 Cross-Validation Results (3 folds):")
    print(f"   Recall:    {recall_mean:.3f} ± {recall_std:.3f} ← PRIMARY")
    print(f"   Precision: {precision_mean:.3f}")
    print(f"   F1-Score:  {f1_mean:.3f}")
    print(f"   ROC-AUC:   {roc_auc_mean:.3f}")
    print(f"   Time:      {elapsed_time:.1f} seconds")
    
    # Store results
    cv_results[model_name] = {
        'recall_mean': recall_mean,
        'recall_std': recall_std,
        'precision_mean': precision_mean,
        'f1_mean': f1_mean,
        'roc_auc_mean': roc_auc_mean,
        'time': elapsed_time,
        'recall_scores': cv_scores['test_recall']
    }
    
    print()

print("=" * 60)
print("✅ CROSS-VALIDATION COMPLETE")
print("=" * 60)

## 📊 Step 6: Comparison & Ranking

In [None]:
print("=" * 60)
print("MODEL TOURNAMENT RESULTS")
print("=" * 60)

# Create comparison dataframe
comparison_df = pd.DataFrame([
    {
        'Model': model_name,
        'Recall_Mean': results['recall_mean'],
        'Recall_Std': results['recall_std'],
        'Precision': results['precision_mean'],
        'F1': results['f1_mean'],
        'ROC_AUC': results['roc_auc_mean'],
        'Time_Sec': results['time']
    }
    for model_name, results in cv_results.items()
]).sort_values('Recall_Mean', ascending=False)

print("\n📊 Ranked by Recall (Primary Metric):\n")
print(comparison_df.to_string(index=False))

# Identify top 2
top_2 = comparison_df.head(2)
winner = top_2.iloc[0]
runner_up = top_2.iloc[1]

print("\n" + "=" * 60)
print("TOP 2 MODELS")
print("=" * 60)

print(f"\n🥇 WINNER: {winner['Model']}")
print(f"   Recall: {winner['Recall_Mean']:.3f} ± {winner['Recall_Std']:.3f}")
print(f"   Precision: {winner['Precision']:.3f}")
print(f"   F1-Score: {winner['F1']:.3f}")
print(f"   ROC-AUC: {winner['ROC_AUC']:.3f}")

print(f"\n🥈 RUNNER-UP: {runner_up['Model']}")
print(f"   Recall: {runner_up['Recall_Mean']:.3f} ± {runner_up['Recall_Std']:.3f}")
print(f"   Gap: {winner['Recall_Mean'] - runner_up['Recall_Mean']:.3f} ({(winner['Recall_Mean'] - runner_up['Recall_Mean'])*100:.1f} percentage points)")

# Compare to Phase 1 baseline
baseline_recall = phase1_decision['baseline_recall']
winner_improvement = winner['Recall_Mean'] - baseline_recall

print(f"\n📈 Improvement over Phase 1 Baseline:")
print(f"   Baseline (Random Forest, single run): {baseline_recall:.3f}")
print(f"   Winner ({winner['Model']}, 3-fold CV): {winner['Recall_Mean']:.3f}")
print(f"   Improvement: {winner_improvement:+.3f} ({winner_improvement/baseline_recall*100:+.1f}%)")

## 📈 Step 7: Visualization

In [None]:
# Create comprehensive visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

models_list = comparison_df['Model'].tolist()
colors = ['#2ecc71', '#3498db', '#e74c3c', '#f39c12']

# 1. Recall with error bars
axes[0, 0].barh(models_list, comparison_df['Recall_Mean'], xerr=comparison_df['Recall_Std'],
               color=colors, capsize=5, alpha=0.7)
axes[0, 0].set_xlabel('Recall (At-Risk)', fontsize=12, fontweight='bold')
axes[0, 0].set_title('Recall with Standard Deviation\n(Higher is Better)', 
                    fontsize=13, fontweight='bold')
axes[0, 0].axvline(x=0.70, color='green', linestyle='--', alpha=0.5, label='Target (70%)')
axes[0, 0].axvline(x=baseline_recall, color='red', linestyle='--', alpha=0.5, 
                  label=f'Baseline ({baseline_recall:.3f})')
axes[0, 0].legend()
for i, (v, std) in enumerate(zip(comparison_df['Recall_Mean'], comparison_df['Recall_Std'])):
    axes[0, 0].text(v + std + 0.01, i, f'{v:.3f}', va='center', fontweight='bold')

# 2. Precision
axes[0, 1].barh(models_list, comparison_df['Precision'], color=colors, alpha=0.7)
axes[0, 1].set_xlabel('Precision (At-Risk)', fontsize=12, fontweight='bold')
axes[0, 1].set_title('Precision\n(Higher is Better)', fontsize=13, fontweight='bold')
for i, v in enumerate(comparison_df['Precision']):
    axes[0, 1].text(v + 0.01, i, f'{v:.3f}', va='center', fontweight='bold')

# 3. F1-Score
axes[1, 0].barh(models_list, comparison_df['F1'], color=colors, alpha=0.7)
axes[1, 0].set_xlabel('F1-Score', fontsize=12, fontweight='bold')
axes[1, 0].set_title('F1-Score (Harmonic Mean)\n(Higher is Better)', 
                    fontsize=13, fontweight='bold')
for i, v in enumerate(comparison_df['F1']):
    axes[1, 0].text(v + 0.01, i, f'{v:.3f}', va='center', fontweight='bold')

# 4. ROC-AUC
axes[1, 1].barh(models_list, comparison_df['ROC_AUC'], color=colors, alpha=0.7)
axes[1, 1].set_xlabel('ROC-AUC', fontsize=12, fontweight='bold')
axes[1, 1].set_title('ROC-AUC Score\n(Higher is Better)', fontsize=13, fontweight='bold')
axes[1, 1].axvline(x=0.5, color='red', linestyle='--', alpha=0.3, label='Random (0.5)')
axes[1, 1].legend()
for i, v in enumerate(comparison_df['ROC_AUC']):
    axes[1, 1].text(v + 0.01, i, f'{v:.3f}', va='center', fontweight='bold')

plt.tight_layout()
plt.show()

## 🔍 Step 8: Critical Analysis - Why These Results?

In [None]:
print("=" * 60)
print("CRITICAL ANALYSIS: WHY DID EACH MODEL PERFORM THIS WAY?")
print("=" * 60)

for i, row in comparison_df.iterrows():
    model_name = row['Model']
    recall = row['Recall_Mean']
    rank = list(comparison_df['Model']).index(model_name) + 1
    
    rank_emoji = {1: '🥇', 2: '🥈', 3: '🥉', 4: '4️⃣'}[rank]
    
    print(f"\n{rank_emoji} {model_name} (Recall: {recall:.3f})")
    print(f"{'='*60}")
    
    if model_name == 'Random Forest':
        if recall > 0.25:
            print("✅ Strengths:")
            print("   • Bagging ensemble reduces variance")
            print("   • Handles non-linear relationships well")
            print("   • Balanced class weights work effectively")
            print("   • Robust to overfitting with 100 trees")
        else:
            print("⚠️ Limitations:")
            print("   • Each tree trained independently (no sequential learning)")
            print("   • May not capture complex patterns as well as boosting")
            print("   • Default max_depth may be too shallow")
    
    elif model_name == 'XGBoost':
        if recall > 0.25:
            print("✅ Strengths:")
            print("   • Sequential tree building corrects previous errors")
            print("   • scale_pos_weight handles imbalance effectively")
            print("   • Regularization prevents overfitting")
            print("   • Often best-in-class for tabular data")
        else:
            print("⚠️ Possible Issues:")
            print("   • Default learning_rate (0.3) may be too high")
            print("   • May need more trees (n_estimators)")
            print("   • scale_pos_weight alone may not be enough")
    
    elif model_name == 'AdaBoost':
        if recall < 0.20:
            print("❌ Known Limitations:")
            print("   • Uses simple decision stumps (depth=1) by default")
            print("   • Sensitive to noisy data and outliers")
            print("   • No native class_weight support")
            print("   • Often underperforms on imbalanced data")
        else:
            print("⚠️ Moderate Performance:")
            print("   • Sequential learning helps but limited by weak learners")
            print("   • SAMME algorithm adds some flexibility")
            print("   • May struggle with 4.8:1 imbalance")
    
    elif model_name == 'CatBoost':
        if recall > 0.25:
            print("✅ Strengths:")
            print("   • Ordered boosting reduces overfitting")
            print("   • Auto class weights handle imbalance")
            print("   • Good default hyperparameters")
            print("   • Handles categorical features well")
        else:
            print("⚠️ Possible Issues:")
            print("   • May need more iterations")
            print("   • Auto_class_weights may not be aggressive enough")
            print("   • Could benefit from depth tuning")
    
    print()

print("=" * 60)
print("KEY INSIGHTS")
print("=" * 60)

# Check if there's a clear winner
top_2_gap = winner['Recall_Mean'] - runner_up['Recall_Mean']
if top_2_gap < 0.01:
    print("\n⚖️ Top 2 are VERY close:")
    print(f"   Gap of only {top_2_gap:.3f} means both models are viable")
    print(f"   Sampling strategies (Phase 3) will be the real differentiator")
else:
    print(f"\n🏆 Clear winner: {winner['Model']}")
    print(f"   {top_2_gap*100:.1f} percentage point lead is significant")
    print(f"   But both top 2 proceed to Phase 3 for fair comparison")

## 💾 Step 9: Save Results & Selection for Phase 3

In [None]:
print("=" * 60)
print("SAVING RESULTS FOR PHASE 3")
print("=" * 60)

# Save comparison dataframe
comparison_df.to_csv('model_tournament_results.csv', index=False)
print(f"\n✅ Saved: model_tournament_results.csv")

# Save top 2 models selection
top_2_models = top_2['Model'].tolist()
selection_data = {
    'top_2_models': top_2_models,
    'winner': {
        'name': winner['Model'],
        'recall_mean': float(winner['Recall_Mean']),
        'recall_std': float(winner['Recall_Std']),
        'precision': float(winner['Precision']),
        'f1': float(winner['F1']),
        'roc_auc': float(winner['ROC_AUC'])
    },
    'runner_up': {
        'name': runner_up['Model'],
        'recall_mean': float(runner_up['Recall_Mean']),
        'recall_std': float(runner_up['Recall_Std']),
        'precision': float(runner_up['Precision']),
        'f1': float(runner_up['F1']),
        'roc_auc': float(runner_up['ROC_AUC'])
    },
    'baseline_recall': baseline_recall,
    'winner_improvement': float(winner_improvement)
}

with open('top_2_models_selection.json', 'w') as f:
    json.dump(selection_data, f, indent=2)

print(f"✅ Saved: top_2_models_selection.json")

print("\n" + "=" * 60)
print("🎯 PHASE 2 COMPLETE: MODEL TOURNAMENT")
print("=" * 60)

print(f"\n🏆 Selected for Phase 3 (Sampling Strategies):")
print(f"   1️⃣ {top_2_models[0]}")
print(f"   2️⃣ {top_2_models[1]}")

print(f"\n📊 Current Performance:")
print(f"   Best Recall: {winner['Recall_Mean']:.3f}")
print(f"   Baseline:    {baseline_recall:.3f}")
print(f"   Improvement: {winner_improvement:+.3f} ({winner_improvement/baseline_recall*100:+.1f}%)")

print(f"\n🎯 Next: Phase 3 - Sampling Strategy Battle")
print(f"   Notebook: 05C_sampling_strategies.ipynb")
print(f"   Will test 3 sampling methods on both selected models")
print(f"   Goal: Improve recall from {winner['Recall_Mean']:.3f} toward 0.70+")