# sWARm Future Projections (SYSTEM 2)

ZIPS-style future performance projections using joint longitudinal-survival modeling.
Implements 1-3 year forecasting with temporal GroupKFold cross-validation.

## Architecture
- **Cell 1**: Module imports and configuration
- **Cell 2**: Data loading and model training
- **Cell 3**: Model validation and performance metrics
- **Cell 4**: Future player projections generation
- **Cell 5**: Visualizations and analysis

All major functionality is implemented in `projection_modules/` following software development best practices.

In [1]:
# Cell 1: Module Imports and Configuration
# =====================================

import pandas as pd
import numpy as np
import warnings
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# Import SYSTEM 2 projection modules
from projection_modules import (
    ExpectedStatsCalculator,
    FutureProjectionAgeCurve,
    AgeCurveValidator,
    System2Pipeline
)

# Configuration
warnings.filterwarnings('ignore', category=FutureWarning)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# Paths configuration - corrected paths
BP_DATA_PATH = "MLB Player Data/BP_Data"
FG_DATA_PATH = "MLB Player Data/FanGraphs_Data"
MODEL_SAVE_PATH = "system2_future_projections_model.pkl"

# Analysis parameters
ANALYSIS_YEARS = list(range(2016, 2025))  # 2016-2024
MAX_PROJECTION_YEARS = 3  # Project 1-3 years ahead
VALIDATION_SPLITS = 5  # Temporal cross-validation folds
TARGET_PROJECTION_SEASON = 2024  # Generate 2025-2027 projections

print("SYSTEM 2: Future Performance Projections")
print("=" * 50)
print(f"Analysis years: {ANALYSIS_YEARS[0]}-{ANALYSIS_YEARS[-1]}")
print(f"Max projection years: {MAX_PROJECTION_YEARS}")
print(f"Target projection season: {TARGET_PROJECTION_SEASON}")
print(f"Validation folds: {VALIDATION_SPLITS}")
print("\n“ Modules imported and configuration set")

SYSTEM 2: Future Performance Projections
Analysis years: 2016-2024
Max projection years: 3
Target projection season: 2024
Validation folds: 5

“ Modules imported and configuration set


In [2]:
# Cell 2: Data Loading and Model Training
# =====================================

print("CELL 2: DATA LOADING AND MODEL TRAINING")
print("=" * 50)

# Initialize SYSTEM 2 pipeline
pipeline = System2Pipeline(
    bp_data_path=BP_DATA_PATH,
    fg_data_path=FG_DATA_PATH,
    max_projection_years=MAX_PROJECTION_YEARS
)

# Step 1: Load complete dataset
print("\n1. Loading complete dataset...")
raw_data = pipeline.load_complete_dataset(
    years=ANALYSIS_YEARS,
    player_types=['hitters', 'pitchers']
)

# Step 2: Prepare projection features
print("\n2. Preparing projection features...")
processed_data = pipeline.prepare_projection_features(raw_data)

# Step 3: Prepare training data
print("\n3. Preparing training data...")
training_data = pipeline.prepare_training_data(processed_data)

# Step 4: Train joint longitudinal-survival model
print("\n4. Training joint longitudinal-survival model...")
training_metrics = pipeline.train_projection_model(training_data)

# Display training results
print("\n" + "=" * 50)
print("TRAINING RESULTS")
print("=" * 50)

# Display WAR model results
if hasattr(pipeline, 'war_model') and pipeline.war_model is not None:
    print("\nWAR Model Performance:")
    if 'war_model' in training_metrics:
        war_metrics = training_metrics['war_model']
        if 'longitudinal_performance' in war_metrics:
            print("  Longitudinal Model (Year-to-year prediction):")
            for metric, value in war_metrics['longitudinal_performance'].items():
                if not np.isnan(value):
                    print(f"    {metric}: {value:.3f}")
        
        if 'survival_performance' in war_metrics:
            print("  Survival Model (Retirement risk):")
            for metric, value in war_metrics['survival_performance'].items():
                if not np.isnan(value):
                    print(f"    {metric}: {value:.3f}")
        
        print(f"  Training samples: {war_metrics.get('training_samples', 'N/A')}")
        print(f"  Survival observations: {war_metrics.get('survival_observations', 'N/A')}")
else:
    print("\nWAR Model: Not trained (insufficient data)")

# Display WARP model results  
if hasattr(pipeline, 'warp_model') and pipeline.warp_model is not None:
    print("\nWARP Model Performance:")
    if 'warp_model' in training_metrics:
        warp_metrics = training_metrics['warp_model']
        if 'longitudinal_performance' in warp_metrics:
            print("  Longitudinal Model (Year-to-year prediction):")
            for metric, value in warp_metrics['longitudinal_performance'].items():
                if not np.isnan(value):
                    print(f"    {metric}: {value:.3f}")
        
        if 'survival_performance' in warp_metrics:
            print("  Survival Model (Retirement risk):")
            for metric, value in warp_metrics['survival_performance'].items():
                if not np.isnan(value):
                    print(f"    {metric}: {value:.3f}")
        
        print(f"  Training samples: {warp_metrics.get('training_samples', 'N/A')}")
        print(f"  Survival observations: {warp_metrics.get('survival_observations', 'N/A')}")
else:
    print("\nWARP Model: Not trained (insufficient data)")

# Save training results
training_summary = {
    'data_records': len(training_data),
    'unique_players': training_data['mlbid'].nunique(),
    'season_range': f"{training_data['Season'].min()}-{training_data['Season'].max()}",
    'training_metrics': training_metrics
}

print(f"\nModel training complete!")
print(f"  Dataset: {training_summary['data_records']} records, {training_summary['unique_players']} players")
print(f"  Season range: {training_summary['season_range']}")
print(f"  Separate model architecture: WAR and WARP trained independently")
print(f"  Iterative projection approach: {MAX_PROJECTION_YEARS} years using year-to-year transitions")

CELL 2: DATA LOADING AND MODEL TRAINING

1. Loading complete dataset...
SYSTEM 2: Loading complete dataset...
  Loading FanGraphs hitters data...
    Loaded 5760 hitters player-seasons from 9 years
  Loading BP hitters data...
    Loaded 6145 hitters player-seasons from 9 years
  Loading FanGraphs pitchers data...
    Loaded 7237 pitchers player-seasons from 9 years
  Loading BP pitchers data...
    Loaded 7345 pitchers player-seasons from 9 years
  Filling missing Age data from BP records...
    Filled Age for 12979/12997 WAR records from WARP data
      Primary matches (mlbid): 12978
      Fallback matches (name): 1
    18 WAR records still missing Age (no matching WARP record)
  Adding position data from FanGraphs defensive files...
    Filled Position for 26287/26487 records from defensive data
    200 records still missing Position (no matching defensive record)
  Loading expected stats from Statcast data...




Complete dataset loaded: 43187 records
  Years: 2016-2024
  Players: 3537 unique
  Data sources: {'WARP': 21896, 'WAR': 21291}

2. Preparing projection features...

Preparing projection features...
  Adding default regression factor...
Projection features prepared for 43187 records

3. Preparing training data...

Preparing training data...
  Dropped 52 incomplete records (see dropped_players_log.txt)
Training data prepared: 42279 records
  Age range: 19.0-45.0
  Target metric range: -0.9-5.2
  Seasons: 2016-2024
  Data sources: WAR=20627, WARP=21652

4. Training joint longitudinal-survival model...

Training separate WAR and WARP projection models...
  WAR training data: 20627 records
  WARP training data: 21652 records
Fitting joint longitudinal-survival model for future projections...
Preparing longitudinal data for year-to-year WAR prediction...
  Valid year-to-year transitions: 8045
  Transitions with NaN features: 0
  Transitions with NaN targets: 0
  Final training examples: 8045

In [3]:
# Cell 3: Model Validation and Performance Metrics
# ===============================================

print("CELL 3: MODEL VALIDATION AND PERFORMANCE METRICS")
print("=" * 50)

# Perform temporal cross-validation
print(f"\nPerforming {VALIDATION_SPLITS}-fold temporal cross-validation...")
validation_results = pipeline.validate_model(training_data, VALIDATION_SPLITS)

# Display validation results
print("\n" + "=" * 50)
print("CROSS-VALIDATION RESULTS")
print("=" * 50)

# Display WAR model validation results
if 'war_model_validation' in validation_results:
    print("\nWAR Model Cross-Validation:")
    war_val = validation_results['war_model_validation']
    
    if 'longitudinal_performance' in war_val:
        print("  Longitudinal Model Performance:")
        long_perf = war_val['longitudinal_performance']
        for metric, stats in long_perf.items():
            print(f"    {metric.upper()}: {stats['mean']:.3f} ± {stats['std']:.3f} "
                  f"(range: {stats['min']:.3f}-{stats['max']:.3f})")

    if 'survival_performance' in war_val:
        print("  Survival Model Performance:")
        surv_perf = war_val['survival_performance']
        for metric, stats in surv_perf.items():
            print(f"    {metric}: {stats['mean']:.3f} ± {stats['std']:.3f} "
                  f"(range: {stats['min']:.3f}-{stats['max']:.3f})")

# Display WARP model validation results
if 'warp_model_validation' in validation_results:
    print("\nWARP Model Cross-Validation:")
    warp_val = validation_results['warp_model_validation']
    
    if 'longitudinal_performance' in warp_val:
        print("  Longitudinal Model Performance:")
        long_perf = warp_val['longitudinal_performance']
        for metric, stats in long_perf.items():
            print(f"    {metric.upper()}: {stats['mean']:.3f} ± {stats['std']:.3f} "
                  f"(range: {stats['min']:.3f}-{stats['max']:.3f})")

    if 'survival_performance' in warp_val:
        print("  Survival Model Performance:")
        surv_perf = warp_val['survival_performance']
        for metric, stats in surv_perf.items():
            print(f"    {metric}: {stats['mean']:.3f} ± {stats['std']:.3f} "
                  f"(range: {stats['min']:.3f}-{stats['max']:.3f})")

# Combined assessment for backward compatibility
print("\n" + "-" * 30)
print("PERFORMANCE ASSESSMENT")
print("-" * 30)

# Assess performance for each model
for model_name, model_key in [('WAR', 'war_model_validation'), ('WARP', 'warp_model_validation')]:
    if model_key in validation_results:
        val_results = validation_results[model_key]
        print(f"\n{model_name} Model Assessment:")
        
        if 'longitudinal_performance' in val_results:
            long_perf = val_results['longitudinal_performance']
            
            if 'r2' in long_perf:
                r2_mean = long_perf['r2']['mean']
                if r2_mean > 0.15:
                    r2_assessment = "GOOD - Strong predictive power"
                elif r2_mean > 0.08:
                    r2_assessment = "FAIR - Moderate predictive power"
                else:
                    r2_assessment = "POOR - Limited predictive power"
                print(f"  Longitudinal R²: {r2_mean:.3f} - {r2_assessment}")
            
            if 'rmse' in long_perf:
                rmse_mean = long_perf['rmse']['mean']
                if rmse_mean < 1.0:
                    rmse_assessment = "GOOD - Low prediction error"
                elif rmse_mean < 1.5:
                    rmse_assessment = "FAIR - Moderate prediction error"
                else:
                    rmse_assessment = "POOR - High prediction error"
                print(f"  Longitudinal RMSE: {rmse_mean:.3f} - {rmse_assessment}")

        if 'survival_performance' in val_results:
            surv_perf = val_results['survival_performance']
            
            if 'concordance_index' in surv_perf:
                c_index_mean = surv_perf['concordance_index']['mean']
                if c_index_mean > 0.75:
                    c_assessment = "GOOD - Strong discrimination"
                elif c_index_mean > 0.65:
                    c_assessment = "FAIR - Moderate discrimination"
                else:
                    c_assessment = "POOR - Limited discrimination"
                print(f"  Survival C-Index: {c_index_mean:.3f} - {c_assessment}")

# Generate detailed validation report
validation_report = pipeline.validator.generate_validation_report("validation_report.csv")

print(f"\nModel validation complete!")
print(f"  Validation folds: {validation_results.get('n_folds', 'N/A')}")
print(f"  Detailed report saved: validation_report.csv")

# Save model if performance is acceptable
save_model = True

# Check if model performance meets minimum thresholds for both models
for model_key in ['war_model_validation', 'warp_model_validation']:
    if model_key in validation_results:
        val_results = validation_results[model_key]
        if 'longitudinal_performance' in val_results:
            long_perf = val_results['longitudinal_performance']
            if 'r2' in long_perf and long_perf['r2']['mean'] < 0.02:
                model_name = 'WAR' if 'war' in model_key else 'WARP'
                print(f"\nWARNING: {model_name} Longitudinal R² very low - consider model improvements")

if save_model:
    pipeline.projection_model.save_model(MODEL_SAVE_PATH)
    print(f"Model saved to: {MODEL_SAVE_PATH}")

CELL 3: MODEL VALIDATION AND PERFORMANCE METRICS

Performing 5-fold temporal cross-validation...

Validating both models with 5-fold temporal cross-validation...
Validating WAR model...
Validating joint model with 5-fold temporal cross-validation...
Creating 5 temporal splits:
  Total seasons: 9 (2016-2024)
  Min training seasons: 3
  Fold 1: Train 2016-2018 → Val 2019-2019 (6502 → 2359 records)

Validating fold 1/5...
Preparing survival data for temporal validation (cutoff: 2018)...
Survival data prepared: 1925 observations, 596 retirement events
  Event rate: 0.310
  Censored (active): 1329
  Fold 2: Train 2016-2019 → Val 2020-2020 (8861 → 1899 records)

Validating fold 2/5...
Preparing survival data for temporal validation (cutoff: 2019)...
Survival data prepared: 2196 observations, 851 retirement events
  Event rate: 0.388
  Censored (active): 1345
  Fold 3: Train 2016-2020 → Val 2021-2021 (10760 → 2547 records)

Validating fold 3/5...
Preparing survival data for temporal validatio

In [4]:
# Cell 4: Future Player Projections Generation
# ===========================================

print("CELL 4: FUTURE PLAYER PROJECTIONS GENERATION")
print("=" * 50)

# Generate batch projections for all eligible players
print(f"\nGenerating {MAX_PROJECTION_YEARS}-year projections from {TARGET_PROJECTION_SEASON}...")

batch_projections = pipeline.batch_generate_projections(
    target_season=TARGET_PROJECTION_SEASON,
    years_ahead=MAX_PROJECTION_YEARS,
    min_career_length=2
)

# Display projection summary
print("\n" + "=" * 50)
print("PROJECTION SUMMARY")
print("=" * 50)

print(f"Total players projected: {len(batch_projections)}")
print(f"Age range: {batch_projections['Age'].min():.0f}-{batch_projections['Age'].max():.0f}")
print(f"Position distribution:")
position_counts = batch_projections['Position'].value_counts()
for pos, count in position_counts.head(10).items():
    print(f"  {pos}: {count}")

# Show projection statistics for both WAR and WARP
war_projection_cols = [col for col in batch_projections.columns if col.startswith('projected_WAR_')]
warp_projection_cols = [col for col in batch_projections.columns if col.startswith('projected_WARP_')]

print(f"\nWAR Projection statistics:")
for col in war_projection_cols:
    year = col.split('_')[-1]
    values = batch_projections[col].dropna()
    if len(values) > 0:
        print(f"  {year}: Mean={values.mean():.2f}, Std={values.std():.2f}, "
              f"Range=[{values.min():.2f}, {values.max():.2f}]")

print(f"\nWARP Projection statistics:")
for col in warp_projection_cols:
    year = col.split('_')[-1]
    values = batch_projections[col].dropna()
    if len(values) > 0:
        print(f"  {year}: Mean={values.mean():.2f}, Std={values.std():.2f}, "
              f"Range=[{values.min():.2f}, {values.max():.2f}]")

# Featured player analysis
print("\n" + "-" * 30)
print("FEATURED PLAYER PROJECTIONS")
print("-" * 30)

# Select interesting players for detailed analysis
featured_players = [
    'Mike Trout', 'Shohei Ohtani', 'Ronald Acuña Jr.', 
    'Mookie Betts', 'Aaron Judge', 'Juan Soto'
]

featured_projections = []

for player_name in featured_players:
    player_proj = batch_projections[batch_projections['Name'].str.contains(player_name, case=False, na=False)]
    
    if not player_proj.empty:
        player_row = player_proj.iloc[0]
        
        print(f"\n{player_row['Name']} (Age {player_row['Age']:.0f}, {player_row['Position']}):")
        
        # Show current values
        if pd.notna(player_row['Current_WAR']):
            print(f"  Current WAR ({TARGET_PROJECTION_SEASON}): {player_row['Current_WAR']:.1f}")
        if pd.notna(player_row['Current_WARP']):
            print(f"  Current WARP ({TARGET_PROJECTION_SEASON}): {player_row['Current_WARP']:.1f}")
        
        # Show WAR projections if available
        war_projections_exist = any(pd.notna(player_row[col]) for col in war_projection_cols)
        if war_projections_exist:
            print("  WAR Projections:")
            for col in war_projection_cols:
                year = col.split('_')[-1]
                projected_war = player_row[col]
                if pd.notna(projected_war):
                    proj_year = int(TARGET_PROJECTION_SEASON) + int(year.replace('year_', ''))
                    print(f"    {proj_year}: {projected_war:.1f}")
        
        # Show WARP projections if available
        warp_projections_exist = any(pd.notna(player_row[col]) for col in warp_projection_cols)
        if warp_projections_exist:
            print("  WARP Projections:")
            for col in warp_projection_cols:
                year = col.split('_')[-1]
                projected_warp = player_row[col]
                if pd.notna(projected_warp):
                    proj_year = int(TARGET_PROJECTION_SEASON) + int(year.replace('year_', ''))
                    print(f"    {proj_year}: {projected_warp:.1f}")
        
        # If neither WAR nor WARP projections exist, indicate this
        if not war_projections_exist and not warp_projections_exist:
            print("  No projections available for this player")
        
        featured_projections.append(player_row)

# Save projections
projections_filename = f"future_projections_{TARGET_PROJECTION_SEASON}.csv"
batch_projections.to_csv(projections_filename, index=False)

print(f"\n✓ Projections generation complete!")
print(f"  Total projections: {len(batch_projections)}")
print(f"  Featured players: {len(featured_projections)}")
print(f"  Results saved to: {projections_filename}")

# Improved projection quality check
war_valid_projections = batch_projections['projected_WAR_year_1'].dropna()
reasonable_projections = war_valid_projections[
    (war_valid_projections >= -2) & 
    (war_valid_projections <= 12)
]
total_with_war_data = len(batch_projections[batch_projections['Current_WAR'].notna()])

print(f"\nProjection quality check:")
print(f"  Players with WAR data: {total_with_war_data}/{len(batch_projections)} ({total_with_war_data/len(batch_projections)*100:.1f}%)")
print(f"  Valid WAR projections: {len(war_valid_projections)}/{total_with_war_data} ({len(war_valid_projections)/max(1, total_with_war_data)*100:.1f}%)")
print(f"  Reasonable WAR projections: {len(reasonable_projections)}/{len(war_valid_projections)} ({len(reasonable_projections)/max(1, len(war_valid_projections))*100:.1f}%)")

# Check WARP projections too
warp_valid_projections = batch_projections['projected_WARP_year_1'].dropna()
total_with_warp_data = len(batch_projections[batch_projections['Current_WARP'].notna()])

print(f"  Players with WARP data: {total_with_warp_data}/{len(batch_projections)} ({total_with_warp_data/len(batch_projections)*100:.1f}%)")
print(f"  Valid WARP projections: {len(warp_valid_projections)}/{total_with_warp_data} ({len(warp_valid_projections)/max(1, total_with_warp_data)*100:.1f}%)")

if len(war_valid_projections) < total_with_war_data * 0.85:
    print("⚠ WARNING: Many players missing WAR projections")
elif len(reasonable_projections) / max(1, len(war_valid_projections)) < 0.85:
    print("⚠ WARNING: High percentage of unreasonable WAR projections")
else:
    print("✓ WAR projection quality looks good")

if len(warp_valid_projections) < total_with_warp_data * 0.85:
    print("⚠ WARNING: Many players missing WARP projections")
else:
    print("✓ WARP projection quality looks good")

CELL 4: FUTURE PLAYER PROJECTIONS GENERATION

Generating 3-year projections from 2024...

Generating 3-year projections from 2024...
Projections generated for 1430 players

PROJECTION SUMMARY
Total players projected: 1430
Age range: 20-44
Position distribution:
  P: 792
  1B: 166
  2B: 127
  CF: 122
  C: 80
  LF: 52
  3B: 35
  SS: 27
  RF: 19
  OF: 10

WAR Projection statistics:
  1: Mean=0.66, Std=0.63, Range=[-0.06, 3.28]
  2: Mean=0.64, Std=0.54, Range=[-0.09, 2.92]
  3: Mean=0.61, Std=0.44, Range=[-0.03, 2.58]

WARP Projection statistics:
  1: Mean=0.66, Std=0.68, Range=[-0.09, 3.75]
  2: Mean=0.64, Std=0.54, Range=[-0.02, 2.85]
  3: Mean=0.65, Std=0.42, Range=[0.03, 2.45]

------------------------------
FEATURED PLAYER PROJECTIONS
------------------------------

Mike Trout (Age 32, CF):
  Current WAR (2024): 0.9
  Current WARP (2024): 1.0
  WAR Projections:
    2025: 1.6
    2026: 0.5
    2027: 0.8
  WARP Projections:
    2025: 0.7
    2026: 0.6
    2027: 0.5

Ronald Acuña Jr. (Ag

In [5]:
if hasattr(pipeline.validator, 'validation_results') and pipeline.validator.validation_results:
    # Plot validation metrics across folds
    pipeline.validator.plot_validation_metrics()
else:
    print("   Note: Validation plots not available - run validation first")