# Week 10+ Enhancement Debug: Ablation Study

**Objective**: Systematically test each Week 10+ enhancement on 2024 holdout data to identify which features hurt performance.

**Baseline**: Week 9 legacy model (expected ~60% on 2024)

**Enhancements to Test**:
1. Temporal Weighting: exp(-0.15 √ó years_ago)
2. Momentum Features: momentum_last3, momentum_advantage
3. Vegas Spread: vegas_spread feature
4. Injury Estimates: injury_pct calculations
5. Defensive Stats: defensive_ypg, defensive_ppg
6. 4th Model (GB): Gradient Boosting Classifier
7. Increased Depths: RF 5‚Üí15, XGB 5‚Üí8
8. Full Week 10: All enhancements together

**Success Criteria**: Identify which specific features caused -7.7% accuracy drop.

## Setup: Install Dependencies

In [1]:
# Install required packages
!pip install -q xgboost nfl_data_py scikit-learn pandas numpy matplotlib seaborn

## Import Libraries

In [2]:
import pandas as pd
import numpy as np
import nfl_data_py as nfl
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score, brier_score_loss, roc_auc_score, classification_report
from sklearn.feature_selection import RFECV
import xgboost as xgb
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")

Libraries imported successfully!


## Load 2024 Holdout Data

We'll test all configurations on the full 2024 season (Weeks 1-18) using walk-forward validation.

In [3]:
print("Loading NFL data for ablation study...")
print("This may take 5-10 minutes on first run...\n")

# Load data 2015-2024 for training/testing
START_YEAR = 2015
END_YEAR = 2024
TEST_YEAR = 2024

# Load play-by-play, weekly stats, and schedule
print(f"Downloading play-by-play data ({START_YEAR}-{END_YEAR})...")
pbp_data = nfl.import_pbp_data(years=range(START_YEAR, END_YEAR + 1))
print(f"  Loaded {len(pbp_data):,} plays")

print(f"\nDownloading weekly player statistics...")
weekly_data = nfl.import_weekly_data(years=range(START_YEAR, END_YEAR + 1))
print(f"  Loaded {len(weekly_data):,} player-game records")

print(f"\nDownloading schedule data...")
schedule_data = nfl.import_schedules(years=range(START_YEAR, END_YEAR + 1))
print(f"  Loaded {len(schedule_data):,} scheduled games")

print("\n‚úÖ Data loading complete!")

Loading NFL data for ablation study...
This may take 5-10 minutes on first run...

Downloading play-by-play data (2015-2024)...
2015 done.
2016 done.
2017 done.
2018 done.
2019 done.
2020 done.
2021 done.
2022 done.
2023 done.
2024 done.
Downcasting floats.
  Loaded 483,605 plays

Downloading weekly player statistics...
Downcasting floats.
  Loaded 54,479 player-game records

Downloading schedule data...
  Loaded 2,743 scheduled games

‚úÖ Data loading complete!


## Build Feature Engineering Functions

We'll create modular functions for each feature type so we can enable/disable them individually.

In [4]:
def create_team_features_legacy(weekly_data, team, season, week):
    """
    Legacy Week 9 feature engineering (baseline).
    Simple season-to-date averages, no momentum or injuries.
    """
    team_stats = weekly_data[
        (weekly_data['recent_team'] == team) &
        (weekly_data['season'] == season) &
        (weekly_data['week'] < week)
    ]
    
    if team_stats.empty:
        return {}
    
    # Basic offensive stats
    features = {
        'passing_ypg': team_stats['passing_yards'].sum() / len(team_stats['week'].unique()) if not team_stats.empty else 0,
        'rushing_ypg': team_stats['rushing_yards'].sum() / len(team_stats['week'].unique()) if not team_stats.empty else 0,
        'total_ypg': (team_stats['passing_yards'].sum() + team_stats['rushing_yards'].sum()) / len(team_stats['week'].unique()) if not team_stats.empty else 0,
        'points_pg': team_stats.groupby('week')['fantasy_points'].sum().mean() if not team_stats.empty else 0,
        'passing_tds_pg': team_stats['passing_tds'].sum() / len(team_stats['week'].unique()) if not team_stats.empty else 0,
        'turnovers_pg': (team_stats['interceptions'].sum() + team_stats.get('fumbles_lost', pd.Series([0])).sum()) / len(team_stats['week'].unique()) if not team_stats.empty else 0,
    }
    
    return features

def add_momentum_features(features, weekly_data, team, season, week):
    """
    Add momentum features (last 3 games win %).
    Week 10+ enhancement.
    """
    # This is a simplified version - full implementation would query schedule_data
    # For ablation study, we'll estimate from performance variance
    team_stats = weekly_data[
        (weekly_data['recent_team'] == team) &
        (weekly_data['season'] == season) &
        (weekly_data['week'] < week)
    ]
    
    if len(team_stats) >= 3:
        last_3_weeks = sorted(team_stats['week'].unique())[-3:]
        last_3_stats = team_stats[team_stats['week'].isin(last_3_weeks)]
        # Approximate win rate from points scored vs allowed
        features['momentum_last3'] = last_3_stats.groupby('week')['fantasy_points'].sum().mean() / 30.0
    else:
        features['momentum_last3'] = 0.5
    
    return features

def add_injury_features(features, weekly_data, team, season, week):
    """
    Add injury percentage estimates (from performance variance).
    Week 10+ enhancement - CIRCULAR LOGIC.
    """
    team_stats = weekly_data[
        (weekly_data['recent_team'] == team) &
        (weekly_data['season'] == season) &
        (weekly_data['week'] < week)
    ]
    
    if len(team_stats) > 0:
        # Estimate injury impact from variance
        passing_std = team_stats.groupby('week')['passing_yards'].sum().std()
        rushing_std = team_stats.groupby('week')['rushing_yards'].sum().std()
        features['injury_pct'] = min(0.3, (passing_std + rushing_std) / 1000.0)
    else:
        features['injury_pct'] = 0.15
    
    return features

def add_defensive_features(features, pbp_data, team, season, week):
    """
    Add defensive stats (yards/points allowed).
    Week 10+ enhancement - POTENTIALLY HELPFUL.
    """
    defensive_plays = pbp_data[
        (pbp_data['defteam'] == team) &
        (pbp_data['season'] == season) &
        (pbp_data['week'] < week)
    ]
    
    if len(defensive_plays) > 0:
        weeks = defensive_plays['week'].nunique()
        features['defensive_ypg'] = defensive_plays['yards_gained'].sum() / weeks if weeks > 0 else 0
        # Estimate points allowed from EPA
        features['defensive_ppg'] = defensive_plays['epa'].sum() * 6 / weeks if weeks > 0 else 0
    else:
        features['defensive_ypg'] = 0
        features['defensive_ppg'] = 0
    
    return features

print("‚úÖ Feature engineering functions created")

‚úÖ Feature engineering functions created


## Build Configurable Model Class

This class allows us to enable/disable specific features for ablation testing.

In [5]:
class ConfigurablePredictor:
    """
    NFL predictor with configurable features for ablation study.
    """
    
    def __init__(self, config):
        """
        Args:
            config: dict with feature flags
                {
                    'use_momentum': False,
                    'use_injuries': False,
                    'use_defensive': False,
                    'use_vegas': False,
                    'temporal_weighting': False,
                    'use_4th_model': False,
                    'tree_max_depth': 5,
                    'n_features': 10
                }
        """
        self.config = config
        self.model = None
        self.feature_names = []
    
    def build_features(self, pbp_data, weekly_data, schedule_data, season, week):
        """
        Build feature matrix for a specific season/week.
        """
        games = schedule_data[
            (schedule_data['season'] == season) &
            (schedule_data['week'] == week)
        ]
        
        X_list = []
        y_list = []
        
        for _, game in games.iterrows():
            home_team = game['home_team']
            away_team = game['away_team']
            
            # Legacy features
            home_features = create_team_features_legacy(weekly_data, home_team, season, week)
            away_features = create_team_features_legacy(weekly_data, away_team, season, week)
            
            if not home_features or not away_features:
                continue
            
            # Add momentum if enabled
            if self.config.get('use_momentum', False):
                home_features = add_momentum_features(home_features, weekly_data, home_team, season, week)
                away_features = add_momentum_features(away_features, weekly_data, away_team, season, week)
            
            # Add injuries if enabled
            if self.config.get('use_injuries', False):
                home_features = add_injury_features(home_features, weekly_data, home_team, season, week)
                away_features = add_injury_features(away_features, weekly_data, away_team, season, week)
            
            # Add defensive if enabled
            if self.config.get('use_defensive', False):
                home_features = add_defensive_features(home_features, pbp_data, home_team, season, week)
                away_features = add_defensive_features(away_features, pbp_data, away_team, season, week)
            
            # Combine features
            combined = {}
            for key in home_features.keys():
                combined[f'home_{key}'] = home_features[key]
                combined[f'away_{key}'] = away_features.get(key, 0)
            
            # Add advantage features
            combined['scoring_advantage'] = home_features.get('points_pg', 0) - away_features.get('points_pg', 0)
            combined['turnover_advantage'] = away_features.get('turnovers_pg', 0) - home_features.get('turnovers_pg', 0)
            
            if self.config.get('use_momentum', False):
                combined['momentum_advantage'] = home_features.get('momentum_last3', 0.5) - away_features.get('momentum_last3', 0.5)
            
            # Contextual
            combined['is_playoff'] = 1 if week >= 18 else 0
            combined['season'] = season
            
            # Vegas spread (if enabled) - PLACEHOLDER (would need external data)
            if self.config.get('use_vegas', False):
                combined['vegas_spread'] = 0  # Placeholder
            
            X_list.append(combined)
            
            # Target: home team win
            if pd.notna(game.get('home_score')) and pd.notna(game.get('away_score')):
                y_list.append(1 if game['home_score'] > game['away_score'] else 0)
            else:
                y_list.append(None)
        
        X = pd.DataFrame(X_list)
        y = pd.Series(y_list)
        
        # Remove rows with missing labels
        valid_mask = y.notna()
        X = X[valid_mask]
        y = y[valid_mask]
        
        return X, y
    
    def train(self, X_train, y_train):
        """
        Train ensemble model with configured architecture.
        """
        depth = self.config.get('tree_max_depth', 5)
        use_4th = self.config.get('use_4th_model', False)
        
        # Build ensemble
        rf = RandomForestClassifier(n_estimators=200, max_depth=depth, random_state=42)
        lr = LogisticRegression(C=1.0, max_iter=1000, random_state=42)
        xgb_model = xgb.XGBClassifier(max_depth=depth, learning_rate=0.1, n_estimators=200, random_state=42)
        
        if use_4th:
            gb = GradientBoostingClassifier(max_depth=8, learning_rate=0.1, n_estimators=200, random_state=42)
            estimators = [('rf', rf), ('lr', lr), ('xgb', xgb_model), ('gb', gb)]
        else:
            estimators = [('rf', rf), ('lr', lr), ('xgb', xgb_model)]
        
        voting = VotingClassifier(estimators=estimators, voting='soft')
        
        # Calibrate
        self.model = CalibratedClassifierCV(voting, method='isotonic', cv=3)
        
        # Apply temporal weighting if enabled
        if self.config.get('temporal_weighting', False):
            current_year = X_train['season'].max()
            years_ago = current_year - X_train['season']
            weights = np.exp(-0.15 * years_ago)
            self.model.fit(X_train, y_train, sample_weight=weights)
        else:
            self.model.fit(X_train, y_train)
        
        self.feature_names = X_train.columns.tolist()
    
    def predict(self, X_test):
        """
        Generate predictions.
        """
        y_pred = self.model.predict(X_test)
        y_proba = self.model.predict_proba(X_test)[:, 1]
        return y_pred, y_proba

print("‚úÖ ConfigurablePredictor class created")

‚úÖ ConfigurablePredictor class created


## Ablation Study: Test Each Enhancement

We'll test 8 configurations on 2024 data using walk-forward validation.

In [6]:
def run_ablation_test(config_name, config, pbp_data, weekly_data, schedule_data):
    """
    Run single ablation test on 2024 data.
    
    Returns:
        dict with accuracy, HC accuracy, brier score, AUC
    """
    print(f"\n{'='*70}")
    print(f"Testing: {config_name}")
    print(f"Config: {config}")
    print(f"{'='*70}")
    
    predictor = ConfigurablePredictor(config)
    
    # Walk-forward validation on 2024
    all_preds = []
    all_actuals = []
    all_probas = []
    
    for test_week in range(1, 19):  # Weeks 1-18
        print(f"  Testing week {test_week}...", end=" ")
        
        # Train on 2015-2023 + 2024 weeks 1 to test_week-1
        X_train_list = []
        y_train_list = []
        
        # Historical data (2015-2023)
        for train_year in range(2015, 2024):
            for train_week in range(1, 19):
                X_week, y_week = predictor.build_features(pbp_data, weekly_data, schedule_data, train_year, train_week)
                if len(X_week) > 0:
                    X_train_list.append(X_week)
                    y_train_list.append(y_week)
        
        # 2024 data up to test_week-1
        for train_week in range(1, test_week):
            X_week, y_week = predictor.build_features(pbp_data, weekly_data, schedule_data, 2024, train_week)
            if len(X_week) > 0:
                X_train_list.append(X_week)
                y_train_list.append(y_week)
        
        if len(X_train_list) == 0:
            print("No training data")
            continue
        
        X_train = pd.concat(X_train_list, ignore_index=True)
        y_train = pd.concat(y_train_list, ignore_index=True)
        
        # Test on current week
        X_test, y_test = predictor.build_features(pbp_data, weekly_data, schedule_data, 2024, test_week)
        
        if len(X_test) == 0:
            print("No test data")
            continue
        
        # Train and predict
        predictor.train(X_train, y_train)
        y_pred, y_proba = predictor.predict(X_test)
        
        all_preds.extend(y_pred)
        all_actuals.extend(y_test)
        all_probas.extend(y_proba)
        
        week_acc = accuracy_score(y_test, y_pred)
        print(f"Acc: {week_acc:.1%}")
    
    # Calculate metrics
    accuracy = accuracy_score(all_actuals, all_preds)
    brier = brier_score_loss(all_actuals, all_probas)
    auc = roc_auc_score(all_actuals, all_probas)
    
    # High-confidence accuracy (>70%)
    hc_mask = np.array([(p > 0.70 or p < 0.30) for p in all_probas])
    if hc_mask.sum() > 0:
        hc_preds = np.array(all_preds)[hc_mask]
        hc_actuals = np.array(all_actuals)[hc_mask]
        hc_accuracy = accuracy_score(hc_actuals, hc_preds)
    else:
        hc_accuracy = None
    
    results = {
        'config_name': config_name,
        'accuracy': accuracy,
        'hc_accuracy': hc_accuracy,
        'brier_score': brier,
        'auc_roc': auc,
        'n_games': len(all_actuals),
        'n_hc_games': hc_mask.sum() if hc_mask.sum() > 0 else 0
    }
    
    print(f"\nüìä RESULTS: {config_name}")
    print(f"  Accuracy: {accuracy:.1%}")
    print(f"  HC Accuracy: {hc_accuracy:.1%}" if hc_accuracy else "  HC Accuracy: N/A")
    print(f"  Brier Score: {brier:.3f}")
    print(f"  AUC-ROC: {auc:.3f}")
    print(f"  Games: {len(all_actuals)}")
    
    return results

print("‚úÖ Ablation test function ready")

‚úÖ Ablation test function ready


## Run All 8 Ablation Tests

**WARNING**: This will take 2-4 hours to complete all tests.

In [7]:
# Define test configurations
test_configs = [
    (
        "1. BASELINE (Week 9 Legacy)",
        {
            'use_momentum': False,
            'use_injuries': False,
            'use_defensive': False,
            'use_vegas': False,
            'temporal_weighting': False,
            'use_4th_model': False,
            'tree_max_depth': 5,
        }
    ),
    (
        "2. Baseline + Temporal Weighting",
        {
            'use_momentum': False,
            'use_injuries': False,
            'use_defensive': False,
            'use_vegas': False,
            'temporal_weighting': True,
            'use_4th_model': False,
            'tree_max_depth': 5,
        }
    ),
    (
        "3. Baseline + Momentum Features",
        {
            'use_momentum': True,
            'use_injuries': False,
            'use_defensive': False,
            'use_vegas': False,
            'temporal_weighting': False,
            'use_4th_model': False,
            'tree_max_depth': 5,
        }
    ),
    (
        "4. Baseline + Vegas Spread",
        {
            'use_momentum': False,
            'use_injuries': False,
            'use_defensive': False,
            'use_vegas': True,
            'temporal_weighting': False,
            'use_4th_model': False,
            'tree_max_depth': 5,
        }
    ),
    (
        "5. Baseline + Injury Estimates",
        {
            'use_momentum': False,
            'use_injuries': True,
            'use_defensive': False,
            'use_vegas': False,
            'temporal_weighting': False,
            'use_4th_model': False,
            'tree_max_depth': 5,
        }
    ),
    (
        "6. Baseline + Defensive Stats",
        {
            'use_momentum': False,
            'use_injuries': False,
            'use_defensive': True,
            'use_vegas': False,
            'temporal_weighting': False,
            'use_4th_model': False,
            'tree_max_depth': 5,
        }
    ),
    (
        "7. Baseline + Increased Depth + 4th Model",
        {
            'use_momentum': False,
            'use_injuries': False,
            'use_defensive': False,
            'use_vegas': False,
            'temporal_weighting': False,
            'use_4th_model': True,
            'tree_max_depth': 15,
        }
    ),
    (
        "8. FULL WEEK 10+ (All Enhancements)",
        {
            'use_momentum': True,
            'use_injuries': True,
            'use_defensive': True,
            'use_vegas': True,
            'temporal_weighting': True,
            'use_4th_model': True,
            'tree_max_depth': 15,
        }
    ),
]

print("\n" + "="*70)
print("STARTING ABLATION STUDY")
print("Testing 8 configurations on 2024 holdout data")
print("Expected duration: 2-4 hours")
print("="*70)

# Run all tests
ablation_results = []

for config_name, config in test_configs:
    result = run_ablation_test(config_name, config, pbp_data, weekly_data, schedule_data)
    ablation_results.append(result)

print("\n‚úÖ ABLATION STUDY COMPLETE!")


STARTING ABLATION STUDY
Testing 8 configurations on 2024 holdout data
Expected duration: 2-4 hours

Testing: 1. BASELINE (Week 9 Legacy)
Config: {'use_momentum': False, 'use_injuries': False, 'use_defensive': False, 'use_vegas': False, 'temporal_weighting': False, 'use_4th_model': False, 'tree_max_depth': 5}
  Testing week 1... No test data
  Testing week 2... Acc: 56.2%
  Testing week 3... Acc: 50.0%
  Testing week 4... Acc: 62.5%
  Testing week 5... Acc: 64.3%
  Testing week 6... Acc: 57.1%
  Testing week 7... Acc: 86.7%
  Testing week 8... Acc: 62.5%
  Testing week 9... Acc: 66.7%
  Testing week 10... Acc: 64.3%
  Testing week 11... Acc: 57.1%
  Testing week 12... Acc: 76.9%
  Testing week 13... Acc: 56.2%
  Testing week 14... Acc: 84.6%
  Testing week 15... Acc: 87.5%
  Testing week 16... Acc: 75.0%
  Testing week 17... Acc: 68.8%
  Testing week 18... Acc: 62.5%

üìä RESULTS: 1. BASELINE (Week 9 Legacy)
  Accuracy: 66.8%
  HC Accuracy: 73.3%
  Brier Score: 0.221
  AUC-ROC: 0.718


## Results Summary & Analysis

In [8]:
# Convert to DataFrame for analysis
results_df = pd.DataFrame(ablation_results)

# Calculate delta vs baseline
baseline_acc = results_df.iloc[0]['accuracy']
results_df['delta_vs_baseline'] = results_df['accuracy'] - baseline_acc
results_df['delta_pct'] = results_df['delta_vs_baseline'] * 100

# Sort by accuracy
results_df = results_df.sort_values('accuracy', ascending=False)

print("\n" + "="*90)
print("ABLATION STUDY RESULTS - 2024 HOLDOUT")
print("="*90)
print(f"\n{'CONFIGURATION':<45} {'ACCURACY':<12} {'DELTA':<12} {'HC_ACC':<12} {'BRIER':<10}")
print("-"*90)

for _, row in results_df.iterrows():
    delta_str = f"{row['delta_pct']:+.1f}%"
    hc_str = f"{row['hc_accuracy']:.1%}" if pd.notna(row['hc_accuracy']) else "N/A"
    print(f"{row['config_name']:<45} {row['accuracy']:>10.1%} {delta_str:>10} {hc_str:>10} {row['brier_score']:>9.3f}")

print("\n" + "="*90)
print("KEY FINDINGS")
print("="*90)

# Identify best and worst
best = results_df.iloc[0]
worst = results_df.iloc[-1]

print(f"\nüèÜ BEST: {best['config_name']}")
print(f"   Accuracy: {best['accuracy']:.1%} ({best['delta_pct']:+.1f}% vs baseline)")
print(f"   HC Accuracy: {best['hc_accuracy']:.1%}" if pd.notna(best['hc_accuracy']) else "")

print(f"\n‚ùå WORST: {worst['config_name']}")
print(f"   Accuracy: {worst['accuracy']:.1%} ({worst['delta_pct']:+.1f}% vs baseline)")

# Feature impact summary
print(f"\nüìä FEATURE IMPACT ANALYSIS:")
print(f"   Baseline (Week 9 Legacy): {baseline_acc:.1%}")

for _, row in results_df[1:].iterrows():  # Skip baseline
    impact = "HELPFUL" if row['delta_vs_baseline'] > 0.01 else "HARMFUL" if row['delta_vs_baseline'] < -0.01 else "NEUTRAL"
    print(f"   {row['config_name'][3:]}: {row['delta_pct']:+.1f}% - {impact}")

# Display full table
print("\n\nFULL RESULTS TABLE:")
print(results_df[['config_name', 'accuracy', 'delta_pct', 'hc_accuracy', 'brier_score', 'auc_roc', 'n_games']].to_string(index=False))


ABLATION STUDY RESULTS - 2024 HOLDOUT

CONFIGURATION                                 ACCURACY     DELTA        HC_ACC       BRIER     
------------------------------------------------------------------------------------------
4. Baseline + Vegas Spread                         68.0%      +1.2%      86.7%     0.222
3. Baseline + Momentum Features                    67.6%      +0.8%      92.3%     0.221
2. Baseline + Temporal Weighting                   67.2%      +0.4%      76.0%     0.221
1. BASELINE (Week 9 Legacy)                        66.8%      +0.0%      73.3%     0.221
8. FULL WEEK 10+ (All Enhancements)                65.2%      -1.6%      85.7%     0.219
5. Baseline + Injury Estimates                     63.7%      -3.1%      82.4%     0.224
6. Baseline + Defensive Stats                      62.9%      -3.9%      83.3%     0.223
7. Baseline + Increased Depth + 4th Model          62.9%      -3.9%      80.8%     0.225

KEY FINDINGS

üèÜ BEST: 4. Baseline + Vegas Spread
   Accur

## Save Results

In [9]:
# Save to CSV
results_df.to_csv('ablation_study_results.csv', index=False)
print("\n‚úÖ Results saved to: ablation_study_results.csv")

# Create summary report
with open('Week10_Enhancement_Postmortem.md', 'w') as f:
    f.write("# Week 10+ Enhancement Postmortem: Ablation Study Results\n\n")
    f.write(f"**Test Date**: {pd.Timestamp.now().strftime('%Y-%m-%d')}\n")
    f.write(f"**Holdout Dataset**: 2024 NFL Season (Weeks 1-18)\n")
    f.write(f"**Baseline**: Week 9 Legacy Model (60.8% expected)\n\n")
    
    f.write("---\n\n")
    f.write("## Executive Summary\n\n")
    f.write(f"- **Baseline Accuracy**: {baseline_acc:.1%}\n")
    f.write(f"- **Best Configuration**: {best['config_name']} ({best['accuracy']:.1%})\n")
    f.write(f"- **Worst Configuration**: {worst['config_name']} ({worst['accuracy']:.1%})\n")
    f.write(f"- **Full Week 10 Performance**: {results_df[results_df['config_name'].str.contains('FULL')].iloc[0]['accuracy']:.1%}\n\n")
    
    f.write("## Detailed Results\n\n")
    f.write("| Configuration | Accuracy | Delta | HC Accuracy | Brier | AUC | Games |\n")
    f.write("|---------------|----------|-------|-------------|-------|-----|-------|\n")
    
    for _, row in results_df.iterrows():
        hc_str = f"{row['hc_accuracy']:.1%}" if pd.notna(row['hc_accuracy']) else "N/A"
        f.write(f"| {row['config_name']} | {row['accuracy']:.1%} | {row['delta_pct']:+.1f}% | {hc_str} | {row['brier_score']:.3f} | {row['auc_roc']:.3f} | {row['n_games']} |\n")
    
    f.write("\n## Feature Impact Summary\n\n")
    
    for _, row in results_df[1:].iterrows():
        impact = "‚úÖ HELPFUL" if row['delta_vs_baseline'] > 0.01 else "‚ùå HARMFUL" if row['delta_vs_baseline'] < -0.01 else "‚ö™ NEUTRAL"
        f.write(f"- **{row['config_name'][3:]}**: {row['delta_pct']:+.1f}% - {impact}\n")
    
    f.write("\n## Recommendations\n\n")
    f.write("Based on ablation study results:\n\n")
    
    helpful = results_df[results_df['delta_vs_baseline'] > 0.01]
    harmful = results_df[results_df['delta_vs_baseline'] < -0.01]
    
    if len(helpful) > 1:  # Exclude baseline
        f.write("### Keep These Features:\n")
        for _, row in helpful[helpful['config_name'] != '1. BASELINE (Week 9 Legacy)'].iterrows():
            f.write(f"- {row['config_name'][3:]} ({row['delta_pct']:+.1f}%)\n")
        f.write("\n")
    
    if len(harmful) > 0:
        f.write("### Remove These Features:\n")
        for _, row in harmful.iterrows():
            f.write(f"- {row['config_name'][3:]} ({row['delta_pct']:+.1f}%)\n")
        f.write("\n")
    
    f.write("### Next Steps:\n")
    f.write("1. Revert to baseline for Week 16\n")
    f.write("2. Add ONLY validated helpful features one at a time\n")
    f.write("3. Test each on 2024 holdout before deploying to 2025\n")
    f.write("4. Build validation framework to prevent future regressions\n")

print("‚úÖ Postmortem report saved to: Week10_Enhancement_Postmortem.md")
print("\n" + "="*70)
print("ABLATION STUDY COMPLETE")
print("="*70)


‚úÖ Results saved to: ablation_study_results.csv
‚úÖ Postmortem report saved to: Week10_Enhancement_Postmortem.md

ABLATION STUDY COMPLETE
