# NBA DFS Walk-Forward Backtest - Google Colab

This notebook runs per-player model training on Google Colab with Bayesian hyperparameter optimization.

## Architecture
- **Code**: Cloned from GitHub (or synced from Drive)
- **Data**: Google Drive at `/content/delapan-fantasy/MyDrive/dfs/data/`
- **Database**: SQLite at drive data directory
- **Outputs**: Saved to drive data outputs directory with full reports

## Features
- **Bayesian Optimization**: Optuna-based hyperparameter tuning with TPE sampler
- **Per-player XGBoost models**: Individual models for each player with parallel training
- **Season average benchmark**: Statistical comparison with baseline predictions
- **Statistical testing**: Paired t-test, Cohen's d effect size
- **Salary tier analysis**: Performance breakdown by player salary ranges
- **Comprehensive visualizations**: Interactive Plotly charts
- **Model persistence**: Save models and predictions for reproducibility
- **Incremental checkpointing**: Resume from interruptions without losing progress
- **Automatic resume**: Set RESUME_FROM_RUN to continue interrupted backtests

## Resume Capability
If Colab disconnects or run is interrupted:
1. Run cell 8a to check existing runs and their progress
2. Set RESUME_FROM_RUN to the timestamp (e.g., '20250205_143022')
3. Re-run from section 9 onwards
4. Completed slates are skipped automatically

## Setup Requirements
1. Data synced to Google Drive: `MyDrive/dfs/data/inputs/`
2. Database populated: `MyDrive/dfs/data/nba_dfs.db`
3. Run all cells in order
4. Results saved with detailed reports

## Estimated Time
- **Bayesian Optimization**: 30 minutes (if enabled)
- **Per-slate backtest**:
  - Free Colab: ~21 min per slate
  - Colab Pro: ~10 min per slate
  - Colab Pro+: ~5 min per slate

## 1. Mount Google Drive

Mount your Google Drive to access data and save results.

In [None]:
from google.colab import drive
drive.mount('/content/delapan-fantasy')

import os
os.chdir('/content/delapan-fantasy/MyDrive/delapan-fantasy')
print(f"Working directory: {os.getcwd()}")

## 2. Install Dependencies

Install required packages including Optuna for Bayesian optimization.

In [None]:
!pip install -q xgboost==1.7.6 pyarrow fastparquet pyyaml python-dotenv joblib scipy tqdm plotly optuna

## 3. Setup Project Structure

Verify project files exist and add to Python path.

In [None]:
import sys
from pathlib import Path

project_root = Path('/content/delapan-fantasy/MyDrive/delapan-fantasy')

if not project_root.exists():
    print("Creating project structure...")
    project_root.mkdir(parents=True, exist_ok=True)
    (project_root / 'data' / 'inputs').mkdir(parents=True, exist_ok=True)
    (project_root / 'data' / 'outputs').mkdir(parents=True, exist_ok=True)
    print("Project structure created. Please upload your src/ and config/ folders.")
else:
    print(f"Project exists at {project_root}")

sys.path.insert(0, str(project_root))
print(f"Python path: {sys.path[0]}")

## 4. Verify Data

Check that required data files are present in Google Drive.

In [None]:
from pathlib import Path

data_dir = Path('/content/delapan-fantasy/MyDrive/dfs/data')
print(f"Data directory: {data_dir}")

print("\nGames directory:")
games_dir = data_dir / 'games'
if games_dir.exists():
    count = len(list(games_dir.rglob('*.parquet')))
    print(f"  Total game files: {count}")
else:
    print(f"  games: does not exist")

print("\nInputs directories:")
inputs_dir = data_dir
for subdir in ['dfs_salaries', 'betting_odds', 'schedule', 'projections']:
    path = inputs_dir / subdir
    if path.exists():
        count = len(list(path.rglob('*.parquet')))
        print(f"  {subdir}: {count} files")
    else:
        print(f"  {subdir}: does not exist")

## 5. Check System Resources

Verify available CPU and RAM for parallel processing.

In [None]:
import psutil
import multiprocessing

cpu_count = multiprocessing.cpu_count()
ram_gb = psutil.virtual_memory().total / (1024**3)

print(f"CPU Cores: {cpu_count}")
print(f"RAM: {ram_gb:.1f} GB")
print(f"Recommended n_jobs: {cpu_count}")

if ram_gb < 12:
    print("WARNING: Low RAM detected. Consider reducing n_jobs or processing fewer players.")

## 6a. Bayesian Optimization Configuration

Configure Optuna hyperparameter optimization settings.

### Parameters:
- **RUN_OPTIMIZATION**: Set to `True` to enable Bayesian optimization before backtest
- **OPTIMIZATION_N_TRIALS**: Number of hyperparameter combinations to try (default: 50)
- **OPTIMIZATION_CV_FOLDS**: Cross-validation folds for each trial (default: 3)
- **OPTIMIZATION_SAMPLE_SIZE**: Training samples for optimization (default: 5000)
  - Larger = more accurate but slower
  - 5000 provides good balance for 30-minute optimization
- **OPTIMIZATION_TIMEOUT**: Maximum optimization time in seconds (default: 1800 = 30 min)
- **OPTIMIZATION_EARLY_STOPPING_PATIENCE**: Stop if no improvement for N trials (default: 10)
  - Prevents wasting compute on converged hyperparameter space
  - Tracks best MAPE over sliding window

### How it works:
1. Loads training data and builds features
2. Samples records for faster optimization
3. Uses TPE (Tree-structured Parzen Estimator) sampler
4. Minimizes MAPE via 3-fold cross-validation
5. Early stops if no improvement for patience window
6. Updates MODEL_PARAMS with best hyperparameters

In [None]:
RUN_OPTIMIZATION = False
OPTIMIZATION_N_TRIALS = 50
OPTIMIZATION_CV_FOLDS = 3
OPTIMIZATION_SAMPLE_SIZE = 5000
OPTIMIZATION_TIMEOUT = 1800
OPTIMIZATION_EARLY_STOPPING_PATIENCE = 10

print("Bayesian Optimization Configuration:")
print(f"  Run optimization: {RUN_OPTIMIZATION}")
print(f"  Number of trials: {OPTIMIZATION_N_TRIALS}")
print(f"  CV folds: {OPTIMIZATION_CV_FOLDS}")
print(f"  Sample size: {OPTIMIZATION_SAMPLE_SIZE} records")
print(f"  Timeout: {OPTIMIZATION_TIMEOUT}s ({OPTIMIZATION_TIMEOUT/60:.1f} min)")
print(f"  Early stopping patience: {OPTIMIZATION_EARLY_STOPPING_PATIENCE} trials")
print("\nSet RUN_OPTIMIZATION = True to enable hyperparameter tuning")

## 6b. Define Optimization Objective Function

Defines the hyperparameter search space and evaluation metric.

### Hyperparameters optimized:
- **max_depth**: Tree depth (3-10)
- **learning_rate**: Step size shrinkage (0.01-0.3, log scale)
- **n_estimators**: Number of boosting rounds (100-500)
- **min_child_weight**: Minimum sum of instance weight in child (1-10)
- **subsample**: Row sampling ratio (0.6-1.0)
- **colsample_bytree**: Column sampling ratio (0.6-1.0)
- **gamma**: Minimum loss reduction for split (0.0-5.0)
- **reg_alpha**: L1 regularization (0.0-10.0)
- **reg_lambda**: L2 regularization (0.0-10.0)

### Evaluation:
- Uses K-fold cross-validation (default: 3 folds)
- Minimizes Mean Absolute Percentage Error (MAPE)
- Returns average MAPE across folds

In [None]:
import optuna
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_percentage_error
import xgboost as xgb
import numpy as np

def objective(trial, X_sample, y_sample):
    """
    Optuna objective function for XGBoost hyperparameter optimization.
    
    Args:
        trial: Optuna trial object
        X_sample: Feature matrix (sampled training data)
        y_sample: Target vector (fantasy points)
    
    Returns:
        float: Mean MAPE across CV folds (lower is better)
    """
    params = {
        'max_depth': trial.suggest_int('max_depth', 3, 10),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'n_estimators': trial.suggest_int('n_estimators', 100, 500),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),
        'subsample': trial.suggest_float('subsample', 0.6, 1.0),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
        'gamma': trial.suggest_float('gamma', 0.0, 5.0),
        'reg_alpha': trial.suggest_float('reg_alpha', 0.0, 10.0),
        'reg_lambda': trial.suggest_float('reg_lambda', 0.0, 10.0),
        'objective': 'reg:squarederror',
        'random_state': 42,
        'n_jobs': -1
    }
    
    kf = KFold(n_splits=OPTIMIZATION_CV_FOLDS, shuffle=True, random_state=42)
    mape_scores = []
    
    for train_idx, val_idx in kf.split(X_sample):
        X_train_fold = X_sample.iloc[train_idx]
        y_train_fold = y_sample.iloc[train_idx]
        X_val_fold = X_sample.iloc[val_idx]
        y_val_fold = y_sample.iloc[val_idx]
        
        model = xgb.XGBRegressor(**params)
        model.fit(X_train_fold, y_train_fold, verbose=False)
        
        preds = model.predict(X_val_fold)
        preds = np.maximum(preds, 0)
        
        mask = y_val_fold > 0
        if mask.sum() > 0:
            mape = mean_absolute_percentage_error(y_val_fold[mask], preds[mask]) * 100
            mape_scores.append(mape)
    
    return np.mean(mape_scores) if mape_scores else float('inf')

print("Objective function defined")

## 6c. Run Bayesian Optimization

Execute Optuna optimization if enabled. This cell:
1. Loads training data from database
2. Builds features using configured pipeline
3. Samples data for efficient optimization
4. Runs TPE-based hyperparameter search
5. Updates MODEL_PARAMS with best hyperparameters

**Note**: This takes approximately 30 minutes with default settings.

In [None]:
if RUN_OPTIMIZATION:
    print("Starting Bayesian optimization...")
    print("="*80)
    
    from pathlib import Path
    from src.data.storage.sqlite_storage import SQLiteStorage
    from src.data.loaders.historical_loader import HistoricalDataLoader
    from src.utils.feature_config import load_feature_config
    from src.features.pipeline import FeaturePipeline
    from src.utils.fantasy_points import calculate_dk_fantasy_points
    import pandas as pd
    
    DATA_DIR = '/content/delapan-fantasy/MyDrive/dfs/data'
    DB_PATH = 'nba_dfs.db'
    TRAIN_START = '20241001'
    TRAIN_END = '20250204'
    FEATURE_CONFIG = 'default_features'
    
    data_path = Path(DATA_DIR)
    db_path_full = data_path / DB_PATH if not Path(DB_PATH).is_absolute() else DB_PATH
    
    storage = SQLiteStorage(str(db_path_full))
    loader = HistoricalDataLoader(storage)
    
    print("Loading training data...")
    training_data = loader.load_historical_player_logs(
        start_date=TRAIN_START,
        end_date=TRAIN_END,
        num_seasons=1
    )
    print(f"  Loaded {len(training_data)} records")
    
    print("Building features...")
    feature_config = load_feature_config(FEATURE_CONFIG)
    pipeline = feature_config.build_pipeline(FeaturePipeline)
    
    df = training_data.copy()
    df['gameDate'] = pd.to_datetime(df['gameDate'], format='%Y%m%d', errors='coerce')
    df = df.sort_values(['playerID', 'gameDate'])
    
    if 'fpts' not in df.columns:
        df['fpts'] = df.apply(calculate_dk_fantasy_points, axis=1)
    
    df['target'] = df.groupby('playerID')['fpts'].shift(-1)
    train_features = pipeline.fit_transform(df)
    train_features = train_features.dropna(subset=['target'])
    
    metadata_cols = [
        'playerID', 'playerName', 'longName', 'team', 'teamAbv', 'teamID',
        'pos', 'gameDate', 'gameID', 'fpts', 'fantasyPoints', 'fantasyPts',
        'target', 'pts', 'reb', 'ast', 'stl', 'blk', 'TOV', 'mins',
        'tech', 'created_at', 'updated_at'
    ]
    feature_cols = [col for col in train_features.columns if col not in metadata_cols]
    
    X_full = train_features[feature_cols].fillna(0)
    y_full = train_features['target']
    
    if len(X_full) > OPTIMIZATION_SAMPLE_SIZE:
        print(f"Sampling {OPTIMIZATION_SAMPLE_SIZE} records from {len(X_full)} for optimization...")
        sample_idx = np.random.choice(len(X_full), OPTIMIZATION_SAMPLE_SIZE, replace=False)
        X_sample = X_full.iloc[sample_idx]
        y_sample = y_full.iloc[sample_idx]
    else:
        X_sample = X_full
        y_sample = y_full
    
    print(f"Optimization dataset: {len(X_sample)} samples, {len(feature_cols)} features")
    print()
    
    class EarlyStoppingCallback:
        def __init__(self, patience=10, min_delta=0.01):
            self.patience = patience
            self.min_delta = min_delta
            self.best_value = float('inf')
            self.trials_without_improvement = 0
        
        def __call__(self, study, trial):
            current_value = trial.value
            
            if current_value < self.best_value - self.min_delta:
                self.best_value = current_value
                self.trials_without_improvement = 0
            else:
                self.trials_without_improvement += 1
            
            if self.trials_without_improvement >= self.patience:
                print(f"\nEarly stopping triggered after {trial.number + 1} trials")
                print(f"No improvement for {self.patience} consecutive trials")
                print(f"Best MAPE: {self.best_value:.2f}%")
                study.stop()
    
    early_stopping = EarlyStoppingCallback(patience=OPTIMIZATION_EARLY_STOPPING_PATIENCE)
    
    study = optuna.create_study(
        direction='minimize',
        sampler=optuna.samplers.TPESampler(seed=42)
    )
    
    print(f"Running optimization: {OPTIMIZATION_N_TRIALS} trials, {OPTIMIZATION_TIMEOUT}s timeout")
    print(f"Early stopping: patience={OPTIMIZATION_EARLY_STOPPING_PATIENCE} trials")
    print("="*80)
    
    study.optimize(
        lambda trial: objective(trial, X_sample, y_sample),
        n_trials=OPTIMIZATION_N_TRIALS,
        timeout=OPTIMIZATION_TIMEOUT,
        callbacks=[early_stopping],
        show_progress_bar=True
    )
    
    print()
    print("="*80)
    print("OPTIMIZATION COMPLETE")
    print("="*80)
    print(f"Best trial: #{study.best_trial.number}")
    print(f"Best MAPE: {study.best_value:.2f}%")
    print(f"Total trials completed: {len(study.trials)}")
    print()
    print("Best hyperparameters:")
    for key, value in study.best_params.items():
        print(f"  {key}: {value}")
    print()
    
    OPTIMIZED_MODEL_PARAMS = {
        **study.best_params,
        'objective': 'reg:squarederror',
        'random_state': 42
    }
    
    print("MODEL_PARAMS will be updated with optimized values")
    
else:
    print("Bayesian optimization skipped (RUN_OPTIMIZATION = False)")
    print("Using default MODEL_PARAMS from section 6d")
    OPTIMIZED_MODEL_PARAMS = None

## 6d. Visualize Optimization Results

Display interactive visualizations of the optimization process:
- **Optimization History**: MAPE over trials with best value tracking
- **Parameter Importance**: Which hyperparameters matter most
- **Best Trial Parameters**: Optimal hyperparameter values
- **Hyperparameter Relationships**: 2D projection of search space

In [None]:
if RUN_OPTIMIZATION and 'study' in locals():
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=(
            '<b>Optimization History</b>',
            '<b>Parameter Importance</b>',
            '<b>Best Trial Parameters</b>',
            '<b>Hyperparameter Relationships</b>'
        ),
        specs=[
            [{'type': 'scatter'}, {'type': 'bar'}],
            [{'type': 'bar'}, {'type': 'scatter'}]
        ],
        vertical_spacing=0.12,
        horizontal_spacing=0.12
    )
    
    trials_df = study.trials_dataframe()
    
    # Optimization history
    fig.add_trace(
        go.Scatter(
            x=trials_df['number'],
            y=trials_df['value'],
            mode='lines+markers',
            name='Trial MAPE',
            line=dict(color='#00D7FF', width=2),
            marker=dict(size=6, color='#00D7FF')
        ),
        row=1, col=1
    )
    
    best_value_so_far = trials_df['value'].cummin()
    fig.add_trace(
        go.Scatter(
            x=trials_df['number'],
            y=best_value_so_far,
            mode='lines',
            name='Best MAPE',
            line=dict(color='#9AFF6E', width=3)
        ),
        row=1, col=1
    )
    
    # Parameter importance
    importance = optuna.importance.get_param_importances(study)
    param_names = list(importance.keys())
    param_values = list(importance.values())
    
    fig.add_trace(
        go.Bar(
            x=param_values,
            y=param_names,
            orientation='h',
            marker=dict(color='#FF53A1', opacity=0.8),
            showlegend=False
        ),
        row=1, col=2
    )
    
    # Best parameters
    best_params = study.best_params
    param_names_best = list(best_params.keys())
    param_values_best = [str(v) if isinstance(v, (int, float)) else v for v in best_params.values()]
    
    fig.add_trace(
        go.Bar(
            y=param_names_best,
            x=[1]*len(param_names_best),
            orientation='h',
            text=param_values_best,
            textposition='inside',
            marker=dict(color='#18FF6D', opacity=0.8),
            showlegend=False
        ),
        row=2, col=1
    )
    
    # Hyperparameter relationships (learning_rate vs max_depth)
    if 'params_learning_rate' in trials_df.columns and 'params_max_depth' in trials_df.columns:
        fig.add_trace(
            go.Scatter(
                x=trials_df['params_learning_rate'],
                y=trials_df['params_max_depth'],
                mode='markers',
                marker=dict(
                    size=8,
                    color=trials_df['value'],
                    colorscale='RdYlGn_r',
                    showscale=True,
                    colorbar=dict(title='MAPE', x=1.15)
                ),
                text=[f"Trial {i}<br>MAPE: {v:.2f}%" 
                      for i, v in zip(trials_df['number'], trials_df['value'])],
                hoverinfo='text',
                showlegend=False
            ),
            row=2, col=2
        )
    
    fig.update_xaxes(title_text='<b>Trial Number</b>', row=1, col=1, color='white')
    fig.update_yaxes(title_text='<b>MAPE (%)</b>', row=1, col=1, color='white')
    fig.update_xaxes(title_text='<b>Importance</b>', row=1, col=2, color='white')
    fig.update_yaxes(title_text='<b>Parameter</b>', row=1, col=2, color='white')
    fig.update_xaxes(row=2, col=1, showticklabels=False)
    fig.update_yaxes(title_text='<b>Parameter</b>', row=2, col=1, color='white')
    fig.update_xaxes(title_text='<b>Learning Rate</b>', row=2, col=2, color='white')
    fig.update_yaxes(title_text='<b>Max Depth</b>', row=2, col=2, color='white')
    
    fig.update_layout(
        height=900,
        plot_bgcolor='#22272e',
        paper_bgcolor='#22272e',
        font=dict(color='white', size=12),
        title_text='<b>Bayesian Optimization Results</b>',
        title_x=0.5,
        showlegend=True
    )
    
    fig.show()
    
    print(f"\nOptimization Summary:")
    print(f"  Total trials: {len(study.trials)}")
    print(f"  Best trial: #{study.best_trial.number}")
    print(f"  Best MAPE: {study.best_value:.2f}%")
    print(f"  Improvement from first trial: {trials_df['value'].iloc[0] - study.best_value:.2f}%")
    
else:
    print("No optimization results to visualize")
    print("Set RUN_OPTIMIZATION = True in section 6a to enable optimization")

## 7. Configure Backtest Parameters

Set training/testing periods, model configuration, and backtest settings.

### Key Parameters:
- **Training Period**: Historical data for model training
- **Testing Period**: Out-of-sample evaluation dates
- **MODEL_PARAMS**: Uses optimized parameters if optimization was run, otherwise uses defaults
- **RECALIBRATE_DAYS**: How often to retrain models (default: 7 days)
- **SALARY_TIERS**: Bins for salary-based performance analysis

## Resume Capability

The backtest now supports incremental saving and resume functionality:

### Features:
- **Checkpoint Saving**: After each slate completes, results are saved to checkpoints directory
- **Automatic Resume**: If run is interrupted, set RESUME_FROM_RUN to the timestamp and re-run
- **Progress Tracking**: progress.json file tracks completed slates and timestamps
- **Predictions Saved**: Each slate's predictions and actuals saved as parquet files

### How to Resume:
1. Run cell 8a to check existing run progress
2. Copy the run timestamp you want to resume (e.g., '20250205_143022')
3. Set RESUME_FROM_RUN = '20250205_143022' in section 7
4. Re-run from section 9 onwards
5. Backtest will skip completed slates and continue from where it left off

### What Gets Saved:
- Predictions: `outputs/{timestamp}/predictions/{date}.parquet`
- Predictions with actuals: `outputs/{timestamp}/predictions/{date}_with_actuals.parquet`
- Checkpoints: `outputs/{timestamp}/checkpoints/{date}.json`
- Progress: `outputs/{timestamp}/checkpoints/progress.json`
- Models: `data/models/per_player/` (per-player) or `data/models/per_slate/` (slate-level)
- Training inputs: `outputs/{timestamp}/inputs/`

In [None]:
TRAIN_START = '20241001'
TRAIN_END = '20250204'
TEST_START = '20250205'
TEST_END = '20250430'

MODEL_TYPE = 'xgboost'
FEATURE_CONFIG = 'default_features'
MIN_PLAYER_GAMES = 10
MIN_GAMES_FOR_BENCHMARK = 5
RECALIBRATE_DAYS = 7
REWRITE_MODELS = False
SALARY_TIERS = [0, 4000, 6000, 8000, 15000]

N_JOBS = -1

DB_PATH = 'nba_dfs.db'
DATA_DIR = '/content/delapan-fantasy/MyDrive/dfs/data'
OUTPUT_DIR = 'outputs'

RESUME_FROM_RUN = None

if 'OPTIMIZED_MODEL_PARAMS' in locals() and OPTIMIZED_MODEL_PARAMS is not None:
    MODEL_PARAMS = OPTIMIZED_MODEL_PARAMS
    print("Using OPTIMIZED hyperparameters from Bayesian optimization")
else:
    MODEL_PARAMS = {
        'max_depth': 6,
        'learning_rate': 0.05,
        'n_estimators': 200,
        'min_child_weight': 5,
        'subsample': 0.8,
        'colsample_bytree': 0.8,
        'objective': 'reg:squarederror',
        'random_state': 42
    }
    print("Using DEFAULT hyperparameters (optimization not run)")

print("\nBacktest Configuration:")
print(f"  Training: {TRAIN_START} to {TRAIN_END}")
print(f"  Testing: {TEST_START} to {TEST_END}")
print(f"  Model: {MODEL_TYPE}")
print(f"  Features: {FEATURE_CONFIG}")
print(f"  Min games: {MIN_PLAYER_GAMES}")
print(f"  Benchmark min games: {MIN_GAMES_FOR_BENCHMARK}")
print(f"  Recalibrate: every {RECALIBRATE_DAYS} days")
print(f"  Rewrite models: {REWRITE_MODELS}")
print(f"  Parallel jobs: {N_JOBS} (all cores)")
print(f"  Salary tiers: {SALARY_TIERS}")
print(f"  Resume from run: {RESUME_FROM_RUN if RESUME_FROM_RUN else 'None (fresh start)'}")
print(f"\nPaths (Separated Architecture):")
print(f"  Data directory: {DATA_DIR}")
print(f"  Database: {DATA_DIR}/{DB_PATH}")
print(f"  Output: {DATA_DIR}/{OUTPUT_DIR}")
print(f"\nMODEL_PARAMS:")
for key, value in MODEL_PARAMS.items():
    print(f"  {key}: {value}")

## 8. Import Dependencies

Import core backtest framework and configure logging.

**Important**: Logging set to WARNING level to prevent notebook lag. With 500+ models being trained, INFO-level logging generates thousands of messages that freeze Colab's output rendering.

In [None]:
import logging
import pandas as pd
from datetime import datetime

from src.walk_forward_backtest import WalkForwardBacktest

# Reduce logging verbosity to prevent notebook lag
logging.basicConfig(
    level=logging.WARNING,  # Changed from INFO to WARNING
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# Suppress verbose library logs
logging.getLogger('src').setLevel(logging.WARNING)
logging.getLogger('xgboost').setLevel(logging.ERROR)
logging.getLogger('optuna').setLevel(logging.WARNING)

print("Imports successful - Logging set to WARNING level to prevent lag")

## 9. Initialize Backtest

Create WalkForwardBacktest instance with configured parameters.

In [None]:
backtest = WalkForwardBacktest(
    db_path=DB_PATH,
    data_dir=DATA_DIR,
    train_start=TRAIN_START,
    train_end=TRAIN_END,
    test_start=TEST_START,
    test_end=TEST_END,
    model_type=MODEL_TYPE,
    model_params=MODEL_PARAMS,
    feature_config=FEATURE_CONFIG,
    output_dir=OUTPUT_DIR,
    per_player_models=True,
    min_player_games=MIN_PLAYER_GAMES,
    min_games_for_benchmark=MIN_GAMES_FOR_BENCHMARK,
    recalibrate_days=RECALIBRATE_DAYS,
    num_seasons=1,
    salary_tiers=SALARY_TIERS,
    rewrite_models=REWRITE_MODELS,
    save_models=True,
    save_predictions=True,
    n_jobs=N_JOBS,
    resume_from_run=RESUME_FROM_RUN
)

print("Backtest initialized with separated architecture")
if RESUME_FROM_RUN:
    print(f"Will resume from run: {RESUME_FROM_RUN}")

## 8a. Check Existing Run Progress (Optional)

Use this cell to check progress of existing runs and set RESUME_FROM_RUN if you want to resume.

In [None]:
backtest = WalkForwardBacktest(
    db_path=DB_PATH,
    data_dir=DATA_DIR,
    train_start=TRAIN_START,
    train_end=TRAIN_END,
    test_start=TEST_START,
    test_end=TEST_END,
    model_type=MODEL_TYPE,
    model_params=MODEL_PARAMS,
    feature_config=FEATURE_CONFIG,
    output_dir=OUTPUT_DIR,
    per_player_models=True,
    min_player_games=MIN_PLAYER_GAMES,
    min_games_for_benchmark=MIN_GAMES_FOR_BENCHMARK,
    recalibrate_days=RECALIBRATE_DAYS,
    num_seasons=1,
    salary_tiers=SALARY_TIERS,
    rewrite_models=REWRITE_MODELS,
    save_models=True,
    save_predictions=True,
    n_jobs=N_JOBS
)

print("Backtest initialized with separated architecture")

## 10. Run Backtest

Execute walk-forward backtest across all test dates.

**Time estimate per slate**:
- Free Colab: ~21 minutes
- Colab Pro: ~10 minutes
- Colab Pro+: ~5 minutes

In [None]:
start_time = datetime.now()
print(f"Starting backtest at {start_time.strftime('%Y-%m-%d %H:%M:%S')}")
print("="*80)

results = backtest.run()

end_time = datetime.now()
elapsed = end_time - start_time
print("="*80)
print(f"Backtest completed at {end_time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Total time: {elapsed}")

## 11. View Results Summary

Display aggregated performance metrics and statistical tests.

In [None]:
print("\n" + "="*80)
print("BACKTEST RESULTS SUMMARY")
print("="*80)
print(f"Slates processed: {results['num_slates']}")
print(f"Date range: {results['date_range']}")
print(f"Total players evaluated: {results['total_players_evaluated']:.0f}")
print(f"Average players per slate: {results['avg_players_per_slate']:.1f}")
print()
print("Model Performance:")
print(f"  Mean MAPE: {results['model_mean_mape']:.2f}%")
print(f"  Median MAPE: {results['model_median_mape']:.2f}%")
print(f"  Std MAPE: {results['model_std_mape']:.2f}%")
print(f"  Mean RMSE: {results['model_mean_rmse']:.2f}")
print(f"  Mean MAE: {results['model_mean_mae']:.2f}")
print(f"  Mean Correlation: {results['model_mean_correlation']:.3f}")
print()
print("Benchmark Performance:")
print(f"  Mean MAPE: {results['benchmark_mean_mape']:.2f}%")
print(f"  Improvement: {results['mape_improvement']:+.2f}%")
print()
if 'statistical_test' in results:
    test = results['statistical_test']
    print("Statistical Significance:")
    print(f"  p-value: {test['p_value']:.6f}")
    print(f"  Cohen's d: {test['cohens_d']:.4f}")
    print(f"  Effect size: {test['effect_size']}")
    if test['p_value'] < 0.05:
        status = "Model BETTER" if test['cohens_d'] < 0 else "Model WORSE"
        print(f"  Result: {status} (statistically significant)")
    else:
        print(f"  Result: No significant difference")
print("="*80)

## 12. View Daily Results

Examine per-slate performance breakdown.

In [None]:
daily_df = results['daily_results']

print("Daily Results:")
print("="*80)
display(daily_df[[
    'date', 'num_players', 'model_mape', 'model_rmse', 'model_mae',
    'model_corr', 'benchmark_mape', 'mean_actual', 'mean_projected'
]])

if 'tier_comparison' in results:
    print("\n\nPerformance by Salary Tier:")
    print("="*80)
    tier_df = results['tier_comparison']
    display(tier_df[['salary_tier', 'count', 'model_mape', 'benchmark_mape', 'mape_improvement']])

## 13. Visualize Results

Interactive charts showing:
- MAPE over time (model vs benchmark)
- RMSE progression
- Correlation trends
- Players evaluated per slate

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio

pio.templates.default = "plotly_dark"

vibrant_colors = {
    "model": "#9AFF6E",
    "benchmark": "#3ABEFF", 
    "model_mean": "#FFFF35",
    "benchmark_mean": "#FD5A66",
    "rmse": "#00FFC2",
    "rmse_mean": "#EA00FF",
    "corr": "#FBBF24",
    "corr_mean": "#FF7A00",
    "players_bar": "#FF53A1",
    "players_mean": "#25FFF1",
}

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        '<b>MAPE Over Time</b>', 
        '<b>RMSE Over Time</b>', 
        '<b>Correlation Over Time</b>', 
        '<b>Players Evaluated Per Slate</b>'
    ),
    vertical_spacing=0.15,
    horizontal_spacing=0.1
)

# MAPE Over Time
fig.add_trace(
    go.Scatter(
        x=daily_df['date'], 
        y=daily_df['model_mape'], 
        mode='lines+markers', 
        name='Model', 
        line=dict(width=2, color=vibrant_colors["model"]), 
        marker=dict(size=10, color=vibrant_colors["model"], symbol='circle')
    ),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(
        x=daily_df['date'], 
        y=daily_df['benchmark_mape'], 
        mode='lines+markers', 
        name='Benchmark', 
        line=dict(width=2, color=vibrant_colors["benchmark"]),
        marker=dict(size=10, color=vibrant_colors["benchmark"], symbol='square')
    ),
    row=1, col=1
)
fig.add_hline(
    y=daily_df['model_mape'].mean(),
    line_dash="dash",
    line_color=vibrant_colors["model_mean"],
    row=1, col=1
)
fig.add_hline(
    y=daily_df['benchmark_mape'].mean(),
    line_dash="dash",
    line_color=vibrant_colors["benchmark_mean"],
    row=1, col=1
)

# RMSE Over Time  
fig.add_trace(
    go.Scatter(
        x=daily_df['date'], 
        y=daily_df['model_rmse'], 
        mode='lines+markers', 
        name='RMSE',
        line=dict(color=vibrant_colors["rmse"], width=2), 
        marker=dict(size=10, color=vibrant_colors["rmse"], symbol='diamond')
    ),
    row=1, col=2
)
fig.add_hline(
    y=daily_df['model_rmse'].mean(), 
    line_dash="dash", 
    line_color=vibrant_colors["rmse_mean"], 
    row=1, col=2
)

# Correlation Over Time
fig.add_trace(
    go.Scatter(
        x=daily_df['date'], 
        y=daily_df['model_corr'], 
        mode='lines+markers', 
        name='Correlation',
        line=dict(color=vibrant_colors["corr"], width=2), 
        marker=dict(size=10, color=vibrant_colors["corr"], symbol='cross')
    ),
    row=2, col=1
)
fig.add_hline(
    y=daily_df['model_corr'].mean(), 
    line_dash="dash", 
    line_color=vibrant_colors["corr_mean"], 
    row=2, col=1
)

# Players Evaluated
fig.add_trace(
    go.Bar(
        x=daily_df['date'], 
        y=daily_df['num_players'], 
        name='Players', 
        marker=dict(color=vibrant_colors["players_bar"], opacity=0.85)
    ),
    row=2, col=2
)
fig.add_hline(
    y=daily_df['num_players'].mean(), 
    line_dash="dash", 
    line_color=vibrant_colors["players_mean"], 
    row=2, col=2
)

axis_style = dict(color="white", showline=True, linewidth=1.5, linecolor='#666')
fig.update_xaxes(title_text="<b>Date</b>", tickangle=45, **axis_style)
fig.update_yaxes(title_text="<b>MAPE (%)</b>", row=1, col=1, **axis_style)
fig.update_yaxes(title_text="<b>RMSE</b>", row=1, col=2, **axis_style)
fig.update_yaxes(title_text="<b>Correlation</b>", row=2, col=1, **axis_style)
fig.update_yaxes(title_text="<b>Players</b>", row=2, col=2, **axis_style)

fig.update_layout(
    height=800,
    showlegend=True,
    legend=dict(orientation="h", yanchor="bottom", y=1.05, xanchor="center", x=0.5),
    plot_bgcolor="#22272e",
    paper_bgcolor="#22272e",
    font=dict(color="white", size=14),
    title_text="<b>Backtest Performance Metrics</b>",
    title_x=0.5,
    margin=dict(t=110)
)

fig.show()

## 14. Error Distribution Analysis

Compare model errors vs benchmark errors.

In [None]:
if 'all_predictions' in results and not results['all_predictions'].empty:
    all_preds = results['all_predictions']
    comparison_df = all_preds[(all_preds['projected_fpts'] > 0) & (all_preds['benchmark_pred'] > 0)].copy()
    
    vibrant_colors_error = {
        "scatter": "#00D7FF",
        "diagonal": "#FF0080",
        "histogram": "#18FF6D",
        "vline_zero": "#FF0080",
        "vline_mean": "#FFD700",
    }
    
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=(
            '<b>Model vs Benchmark Error</b>', 
            '<b>Error Difference (Positive = Model Better)</b>'
        ),
        horizontal_spacing=0.15
    )
    
    comparison_df['model_error'] = abs(comparison_df['projected_fpts'] - comparison_df['actual_fpts'])
    comparison_df['benchmark_error'] = abs(comparison_df['benchmark_pred'] - comparison_df['actual_fpts'])
    
    fig.add_trace(
        go.Scatter(
            x=comparison_df['benchmark_error'], 
            y=comparison_df['model_error'],
            mode='markers',
            marker=dict(size=7, opacity=0.7, color=vibrant_colors_error["scatter"]),
            name='Errors'
        ),
        row=1, col=1
    )
    
    max_error = max(comparison_df['benchmark_error'].max(), comparison_df['model_error'].max()) * 1.03
    fig.add_trace(
        go.Scatter(
            x=[0, max_error], y=[0, max_error],
            mode='lines',
            line=dict(color=vibrant_colors_error["diagonal"], dash='dash', width=2),
            name='Equal error'
        ),
        row=1, col=1
    )
    
    error_diff = comparison_df['benchmark_error'] - comparison_df['model_error']
    fig.add_trace(
        go.Histogram(
            x=error_diff, nbinsx=30,
            marker=dict(color=vibrant_colors_error["histogram"], opacity=0.85),
            name='Error Difference'
        ),
        row=1, col=2
    )
    
    fig.add_vline(x=0, line_dash="dash", line_color=vibrant_colors_error["vline_zero"], row=1, col=2)
    fig.add_vline(x=error_diff.mean(), line_dash="dash", line_color=vibrant_colors_error["vline_mean"], row=1, col=2)
    
    fig.update_xaxes(title_text="<b>Benchmark Error</b>", row=1, col=1, color="white")
    fig.update_yaxes(title_text="<b>Model Error</b>", row=1, col=1, color="white")
    fig.update_xaxes(title_text="<b>Error Difference</b>", row=1, col=2, color="white")
    fig.update_yaxes(title_text="<b>Frequency</b>", row=1, col=2, color="white")
    
    fig.update_layout(
        height=600,
        plot_bgcolor="#22272e",
        paper_bgcolor="#22272e",
        font=dict(color="white", size=14),
        title_text="<b>Error Distribution Analysis</b>",
        title_x=0.5
    )
    
    fig.show()
    
    print(f"\nError Statistics:")
    print(f"  Model Mean Error: {comparison_df['model_error'].mean():.2f}")
    print(f"  Benchmark Mean Error: {comparison_df['benchmark_error'].mean():.2f}")
    print(f"  Mean Difference: {error_diff.mean():.2f} (positive = model better)")
else:
    print("No prediction data available for error analysis")

## 15. Salary Tier Performance

Analyze model performance across different salary ranges.

In [None]:
if 'tier_comparison' in results:
    tier_df = results['tier_comparison']
    
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=('<b>MAPE by Salary Tier</b>', '<b>Model Improvement Over Benchmark</b>'),
        horizontal_spacing=0.15
    )
    
    x_labels = tier_df['salary_tier'].astype(str)
    
    fig.add_trace(
        go.Bar(x=x_labels, y=tier_df['model_mape'],
               name='Model', marker=dict(opacity=0.8, color='#9AFF6E')),
        row=1, col=1
    )
    fig.add_trace(
        go.Bar(x=x_labels, y=tier_df['benchmark_mape'],
               name='Benchmark', marker=dict(opacity=0.8, color='#3ABEFF')),
        row=1, col=1
    )
    
    colors = ['green' if x > 0 else 'red' for x in tier_df['mape_improvement']]
    fig.add_trace(
        go.Bar(x=x_labels, y=tier_df['mape_improvement'],
               marker=dict(color=colors, opacity=0.7),
               showlegend=False),
        row=1, col=2
    )
    
    fig.add_hline(y=0, line_color="white", line_width=0.8, row=1, col=2)
    
    fig.update_xaxes(title_text="<b>Salary Tier</b>", color="white")
    fig.update_yaxes(title_text="<b>MAPE (%)</b>", row=1, col=1, color="white")
    fig.update_yaxes(title_text="<b>MAPE Improvement (%)</b>", row=1, col=2, color="white")
    
    fig.update_layout(
        height=500,
        barmode='group',
        plot_bgcolor="#22272e",
        paper_bgcolor="#22272e",
        font=dict(color="white", size=14),
        title_text="<b>Performance by Salary Tier</b>",
        title_x=0.5
    )
    
    fig.show()
    
    print("\nSalary Tier Breakdown:")
    print("="*80)
    for _, row in tier_df.iterrows():
        improvement = row['mape_improvement']
        status = 'BETTER' if improvement > 0 else 'WORSE'
        print(f"{str(row['salary_tier']):20} {improvement:+6.1f}% {status:8} "
              f"(Model: {row['model_mape']:.1f}%, Benchmark: {row['benchmark_mape']:.1f}%, n={row['count']:.0f})")
else:
    print("No tier comparison data available")

## 16. Save Results

Export summary CSV and tier comparison to Google Drive.

In [None]:
from pathlib import Path

if 'report_path' in results:
    print(f"Comprehensive report: {results['report_path']}")

print(f"\nAll outputs saved to: {backtest.run_output_dir}")
print(f"  - Predictions: {backtest.run_predictions_dir}")
print(f"  - Training inputs: {backtest.run_inputs_dir}")
print(f"  - Features: {backtest.run_features_dir}")

output_files = list(Path(backtest.run_output_dir).rglob('*.parquet'))
json_files = list(Path(backtest.run_output_dir).rglob('*.json'))
pkl_files = list(Path(backtest.run_output_dir).rglob('*.pkl'))

print(f"\nOutput files:")
print(f"  - Parquet: {len(output_files)}")
print(f"  - JSON: {len(json_files)}")
print(f"  - Models: {len(pkl_files)}")
print(f"  - Total: {len(output_files) + len(json_files) + len(pkl_files)}")

# Save summary CSV
date_range_clean = results['date_range'].replace(' to ', '_')
summary_path = Path(DATA_DIR) / 'outputs' / f"summary_{date_range_clean}.csv"
daily_df.to_csv(summary_path, index=False)
print(f"\nSummary CSV: {summary_path}")

# Save tier comparison if available
if 'tier_comparison' in results:
    tier_path = Path(DATA_DIR) / 'outputs' / f"tier_comparison_{date_range_clean}.csv"
    results['tier_comparison'].to_csv(tier_path, index=False)
    print(f"Tier comparison: {tier_path}")