# Fantasy Football Quantile Predictions

This notebook predicts **floor, median, and ceiling** fantasy points for your ESPN roster using quantile regression.

## What You'll Get
- **pred_10 (Floor)**: Safe minimum - player should score above this 90% of the time
- **pred_50 (Median)**: Expected score - most likely outcome
- **pred_90 (Ceiling)**: Upside - player could score this high 10% of the time

## Setup Instructions

1. **Install dependencies**: `pip install espn_api nfl_data_py scikit-learn`

2. **Get your ESPN credentials**:
   - `league_id`: From your league URL (e.g., `leagueId=567575`)
   - `espn_s2` and `swid`: Browser cookies from [this guide](https://github.com/cwendt94/espn-api/discussions/150)

3. **Run all cells in order** - predictions will appear at the end!

## Why Quantile Regression?

**The Problem**: Standard projections give you one number (e.g., "14.5 points"). But two players can both project 14.5 points with very different risk profiles:
- Player A: Consistent, always scores 13-16 points
- Player B: Boom/bust, scores 5-24 points

**The Solution**: Instead of one prediction, we give you THREE:
- **Floor (10th percentile)**: "There's a 90% chance they score at least this much"
- **Median (50th percentile)**: "This is the most likely outcome"
- **Ceiling (90th percentile)**: "There's a 10% chance they score this much or more"

**How it works**:
1. Load 3+ years of NFL play-by-play data
2. Calculate weekly fantasy points for each player
3. Build features for each player-week:
   - **Game stats**: yards, TDs, interceptions, EPA, CPOE
   - **Historical stats**: last week's points, 3-week rolling avg, career avg
4. Train 3 separate Gradient Boosting models (one per quantile)
5. **At prediction time**: Use ESPN's projected stats + player's historical patterns

---
## Step 1: Connect to ESPN League


In [87]:
# Connect to ESPN Fantasy League
from espn_api.football import League
import nfl_data_py as nfl

league = League(
    league_id=567575, 
    year=2025, 
    espn_s2='AECVN3FcAWfB56xxM5SnNVqsxq9soOxMmzDH1CfYYOkX3KIrzeGSsTMZ0CwJLPQoBYxLMp59ILoZ0CvUnTrBbU15b2PwD1v9fZRoO5iMb%2Fy%2FWPaOqPwlwSx2ShvBAt%2BSqJxtboHzcpSuSOgASzSNx4divXOEc4aVZjnOx7qRJ9YbE800NnLCNiLMBpaHjdZg%2BMN6vwCInJrKejPDXsmdjo%2FIkV0IfCLQHr6QHyJjLhqOwAPozqNPyGa1ZZT8DOxA%2BmpTsa5v9cgfJ4V%2BVZzxzr95KxBS0k%2BYJMt7OWSdA%2B2yUQ%3D%3D', 
    swid='{280AA84B-DE12-4B3D-80F2-283BF634242B}'
)
team = league.teams[9]  # <-- Change index to select your team
print(f"Connected to league. Your team: {team.team_name}")



Connected to league. Your team: Math Guys


---
## Step 2: Load Data & Train Model

The next few cells:
1. **Define scoring rules** (Half-PPR: 0.5 pts per reception)
2. **Load NFL play-by-play data** (2022-2025 seasons)
3. **Build features** for each player-week:
   - **Game stats**: passing/rushing/receiving yards, TDs, interceptions, EPA, CPOE
   - **Historical stats**: `Y_lag_1` (last week), `Y_roll_avg_3` (3-week avg), `Y_cum_avg` (career)
4. **Train 3 Gradient Boosting models** (one for each quantile: 10%, 50%, 90%)

*Note: At prediction time, we use ESPN's projected stats instead of actual game stats (which we don't have yet).*


In [88]:
# Helpers: fantasy scoring per play and weekly target aggregation
import pandas as pd
import numpy as np

# Half-PPR scoring rules
scoring_rules = {
    'pass_yd': 0.04, 'pass_td': 4,
    'rush_yd': 0.1, 'rush_td': 6,
    'rec_yd': 0.1, 'rec_td': 6,
    'rec': 0.5,
    'int': -2, 'fumble_lost': -2,
    'qb_kneel_yd': -0.1
}

def calculate_fantasy_points_per_play(df: pd.DataFrame, scoring_rules: dict) -> pd.DataFrame:
    df = df.copy()
    df['fp_pass'] = (df['passing_yards'].fillna(0) * scoring_rules['pass_yd']) + \
                    (df['pass_touchdown'].fillna(0) * scoring_rules['pass_td'])
    df['fp_rush'] = (df['rushing_yards'].fillna(0) * scoring_rules['rush_yd']) + \
                    (df['rush_touchdown'].fillna(0) * scoring_rules['rush_td'])
    df['fp_rec'] = (df['receiving_yards'].fillna(0) * scoring_rules['rec_yd']) + \
                   (df['pass_touchdown'].fillna(0) * scoring_rules['rec_td']) + \
                   (df['complete_pass'].fillna(0) * scoring_rules['rec'])
    df['fp_int'] = df['interception'].fillna(0) * scoring_rules['int']
    df['fp_fumble_lost'] = df['fumble_lost'].fillna(0) * scoring_rules['fumble_lost']
    df['fp_kneel'] = np.where(df['play_type'] == 'qb_kneel',
                              df['yards_gained'].fillna(0) * scoring_rules['qb_kneel_yd'], 0)
    return df

def calculate_weekly_fantasy_points_final(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    df['fp_pass_total'] = df['fp_pass'].fillna(0) + df['fp_int'].fillna(0) + df['fp_kneel'].fillna(0)
    df['fp_rush_total'] = df['fp_rush'].fillna(0)
    df['fp_rec_total'] = df['fp_rec'].fillna(0)
    df['fp_fumble_1_total'] = df['fp_fumble_lost'].fillna(0)

    id_vars = ['season', 'week']
    contributions_map = {
        'passer_player_id': 'fp_pass_total',
        'rusher_player_id': 'fp_rush_total',
        'receiver_player_id': 'fp_rec_total',
        'fumbled_1_player_id': 'fp_fumble_1_total',
    }
    contributions = []
    for id_col, point_col in contributions_map.items():
        tmp = df.rename(columns={id_col: 'player_id', point_col: 'points'})
        contributions.append(tmp[id_vars + ['player_id', 'points']])

    df_all_points = pd.concat(contributions, ignore_index=True)
    df_all_points.dropna(subset=['player_id'], inplace=True)

    out = df_all_points.groupby(id_vars + ['player_id'])['points'].sum().reset_index()
    return out.rename(columns={'points': 'Y_target_points'})


In [89]:
# Build modeling dataframe (df_final) from play-by-play
import nfl_data_py as nfl

# Load plays (same years as modeling notebook)
df = nfl.import_pbp_data([2022, 2023, 2024, 2025])

# Keep fantasy-relevant plays
fantasy_play_types = ['pass', 'run', 'qb_kneel']
exclude_play_types = ['no_play', 'qb_spike', 'field_goal', 'extra_point', 'punt', 'kickoff']

df_fantasy_plays = df[
    df['play_type'].isin(fantasy_play_types) & ~df['play_type'].isin(exclude_play_types)
].copy()

# Select modeling columns
mdl_cols = [
    'game_id','play_id','season','week','posteam','defteam','home_team','away_team',
    'passer_player_id','passer','rusher_player_id','rusher','receiver_player_id','receiver','fumbled_1_player_id','fumbled_2_player_id',
    'play_type','down','ydstogo','yardline_100','shotgun','no_huddle',
    'passing_yards','pass_touchdown','pass_attempt','complete_pass',
    'rushing_yards','rush_touchdown','rush_attempt',
    'receiving_yards','yards_after_catch',
    'penalty_yards','interception','fumble_lost',
    'yards_gained','epa','cpoe','td_prob'
]

df_mdl0 = df_fantasy_plays[mdl_cols].copy()

# Per-play fantasy points
df_mdl1 = calculate_fantasy_points_per_play(df_mdl0, scoring_rules)

# Weekly target Y per player
df_target_Y = calculate_weekly_fantasy_points_final(df_mdl1)

# Aggregate features per player-week
feature_agg_rules = {
    'passing_yards': 'sum',
    'rushing_yards': 'sum',
    'receiving_yards': 'sum',
    'pass_touchdown': 'sum',
    'rush_touchdown': 'sum',
    'interception': 'sum',
    'epa': 'mean',
    'cpoe': 'mean',
}

id_vars = ['season','week']
feature_cols = list(feature_agg_rules.keys())

df_select = df_mdl1[id_vars + feature_cols + ['passer_player_id','rusher_player_id','receiver_player_id']].copy()

df_X_long = pd.melt(
    df_select,
    id_vars=id_vars + feature_cols,
    value_vars=['passer_player_id','rusher_player_id','receiver_player_id'],
    var_name='role_type',
    value_name='player_id'
)
df_X_long.dropna(subset=['player_id'], inplace=True)

df_features_X = df_X_long.groupby(['season','week','player_id']).agg(feature_agg_rules).reset_index()

df_counts = df_X_long.groupby(['season','week','player_id']).size().reset_index(name='total_plays_involved')
df_features_X = pd.merge(df_features_X, df_counts, on=['season','week','player_id'], how='left')

# Merge with target and create time features
df_final = pd.merge(df_features_X, df_target_Y, on=['season','week','player_id'], how='left')
df_final['Y_target_points'] = df_final['Y_target_points'].fillna(0)

df_final.sort_values(by=['player_id','season','week'], inplace=True)

df_final['Y_lag_1'] = df_final.groupby('player_id')['Y_target_points'].shift(1)
df_final['Y_roll_avg_3'] = df_final.groupby('player_id')['Y_target_points'].transform(
    lambda x: x.shift(1).rolling(window=3, min_periods=1).mean()
)
df_final['Y_cum_avg'] = df_final.groupby('player_id')['Y_target_points'].transform(
    lambda x: x.shift(1).expanding(min_periods=1).mean()
)

# NEW: Volatility features to help tighten prediction intervals for consistent players
df_final['Y_std_3'] = df_final.groupby('player_id')['Y_target_points'].transform(
    lambda x: x.shift(1).rolling(window=3, min_periods=2).std()
)
df_final['Y_min_3'] = df_final.groupby('player_id')['Y_target_points'].transform(
    lambda x: x.shift(1).rolling(window=3, min_periods=1).min()
)
df_final['Y_max_3'] = df_final.groupby('player_id')['Y_target_points'].transform(
    lambda x: x.shift(1).rolling(window=3, min_periods=1).max()
)

for col in ['Y_lag_1','Y_roll_avg_3','Y_cum_avg','Y_std_3','Y_min_3','Y_max_3']:
    df_final[col] = df_final[col].fillna(0)

# Add POSITION feature - crucial for different scoring distributions
# Infer position from play involvement patterns
role_counts = df_X_long.groupby(['player_id', 'role_type']).size().reset_index(name='count')
player_role_summary = role_counts.pivot(index='player_id', columns='role_type', values='count').fillna(0)

def infer_position(row):
    pass_count = row.get('passer_player_id', 0)
    rush_count = row.get('rusher_player_id', 0)
    rec_count = row.get('receiver_player_id', 0)
    if pass_count >= 50:
        return 'QB'
    if rush_count >= 10 and rush_count > rec_count * 2:
        return 'RB'
    if rec_count >= 10:
        return 'WR_TE'
    return 'OTHER'

player_role_summary['position'] = player_role_summary.apply(infer_position, axis=1)
position_map = player_role_summary['position'].to_dict()

# Add position to df_final and create dummy variables
df_final['position'] = df_final['player_id'].map(position_map).fillna('OTHER')
df_final = pd.get_dummies(df_final, columns=['position'], prefix='pos')

print('df_final ready:', df_final.shape)
print('Position columns added:', [c for c in df_final.columns if c.startswith('pos_')])
print(df_final.head())


2022 done.
2023 done.
2024 done.
2025 done.
Downcasting floats.
df_final ready: (20842, 23)
Position columns added: ['pos_OTHER', 'pos_QB', 'pos_RB', 'pos_WR_TE']
      season  week   player_id  passing_yards  rushing_yards  receiving_yards  \
0       2022     1  00-0019596          212.0           -1.0            212.0   
318     2022     2  00-0019596          190.0           -2.0            190.0   
634     2022     3  00-0019596          271.0           -1.0            271.0   
943     2022     4  00-0019596          385.0            0.0            371.0   
1254    2022     5  00-0019596          351.0           -3.0            351.0   

      pass_touchdown  rush_touchdown  interception       epa  ...    Y_lag_1  \
0                1.0             0.0           1.0 -0.012462  ...   0.000000   
318              1.0             0.0           0.0 -0.123334  ...  10.379999   
634              1.0             0.0           0.0 -0.165223  ...   9.400000   
943              3.0          

In [90]:
# Train quantile GBM models using ONLY PRE-GAME FEATURES
# These are the only features we actually know before the game!
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error

id_cols = ['season','week','player_id']
target_col = 'Y_target_points'

# Historical + volatility + POSITION features
model_features = [
    'Y_lag_1',       # Last week's points
    'Y_roll_avg_3',  # 3-week rolling average
    'Y_cum_avg',     # Season/career average
    'Y_std_3',       # 3-week volatility (helps identify consistent vs boom/bust)
    'Y_min_3',       # Recent floor (minimum of last 3 weeks)
    'Y_max_3',       # Recent ceiling (maximum of last 3 weeks)
    'pos_QB',        # Position indicators - crucial for scoring distributions
    'pos_RB',
    'pos_WR_TE',
    'pos_OTHER',
]

print("=" * 60)
print("TRAINING PURE QUANTILE REGRESSION MODEL")
print("=" * 60)
print(f"Features used: {model_features}")
print("(Position features help model learn different distributions per position)")

PREDICTION_WEEK = 13  # We want to predict week 13

# Include all historical data + 2025 data before the prediction week
df_train_data = df_final[
    (df_final['season'] < 2025) | 
    ((df_final['season'] == 2025) & (df_final['week'] < PREDICTION_WEEK))
].copy()

# Hold out week 13+ for true out-of-sample evaluation
df_holdout = df_final[
    (df_final['season'] == 2025) & (df_final['week'] >= PREDICTION_WEEK)
].copy()

print(f"\nTraining on: 2022-2024 + 2025 weeks 1-{PREDICTION_WEEK-1}")
print(f"Train data size: {len(df_train_data)}")
print(f"Holdout (week {PREDICTION_WEEK}+): {len(df_holdout)} player-weeks")

# Use game stats + historical features (NO epa/cpoe)
X_train = df_train_data[model_features].fillna(0)
y_train = df_train_data[target_col]

# For validation metrics, use the last 10% of training data
split_point = int(len(X_train) * 0.90)
X_val = X_train.iloc[split_point:]
y_val = y_train.iloc[split_point:]

print(f"Full train size: {len(X_train)}, Validation subset: {len(X_val)}")

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

quantiles = [0.10, 0.50, 0.90]
models = {}
y_preds = {}

for q in quantiles:
    model = HistGradientBoostingRegressor(
        loss='quantile', quantile=q, max_iter=500,
        learning_rate=0.05, max_depth=6, random_state=42
    )
    model.fit(X_train_scaled, y_train)
    models[q] = model
    y_preds[q] = model.predict(X_val_scaled)

import pandas as pd

df_predictions = pd.DataFrame({
    'y_val': y_val.values,
    'pred_10': y_preds[0.10],
    'pred_50': y_preds[0.50],
    'pred_90': y_preds[0.90]
}, index=y_val.index)

mae = mean_absolute_error(df_predictions['y_val'], df_predictions['pred_50'])
print(f"\nMAE (median) on validation set: {mae:.2f}")

# Attach identifiers to the predictions (align by index)
ids_val = df_train_data.loc[y_val.index, id_cols].reset_index(drop=True)
df_preds_full = pd.concat([ids_val, df_predictions.reset_index(drop=True)], axis=1)
print('Validation predictions shape:', df_preds_full.shape)
df_preds_full.head()


TRAINING PURE QUANTILE REGRESSION MODEL
Features used: ['Y_lag_1', 'Y_roll_avg_3', 'Y_cum_avg', 'Y_std_3', 'Y_min_3', 'Y_max_3', 'pos_QB', 'pos_RB', 'pos_WR_TE', 'pos_OTHER']
(Position features help model learn different distributions per position)

Training on: 2022-2024 + 2025 weeks 1-12
Train data size: 20348
Holdout (week 13+): 494 player-weeks
Full train size: 20348, Validation subset: 2035

MAE (median) on validation set: 4.16
Validation predictions shape: (2035, 7)


Unnamed: 0,season,week,player_id,y_val,pred_10,pred_50,pred_90
0,2023,22,00-0039067,7.4,2.392292,8.944598,19.436721
1,2024,1,00-0039067,13.8,2.776388,9.85409,18.071674
2,2024,2,00-0039067,15.999999,2.460981,8.910694,19.193157
3,2024,3,00-0039067,23.1,3.175089,9.453867,19.724315
4,2025,7,00-0039067,19.700001,2.687285,10.435074,23.244787


---
## Step 3: Evaluate Model Performance

Before using the model, let's check how accurate it is on held-out test data:
- **MAE**: How far off is our median prediction on average?
- **PICP**: How often do actual scores fall within our floor-ceiling range?
- **Floor Accuracy**: How reliable is our "safe floor" prediction?


In [91]:
# Model Evaluation Metrics (with simple explanations)
print("=" * 60)
print("MODEL EVALUATION (tested on held-out data)")
print("=" * 60)

# 1. MAE - How accurate is our median prediction?
mae = mean_absolute_error(df_predictions['y_val'], df_predictions['pred_50'])
print(f"\n1. Mean Absolute Error (MAE): {mae:.2f} points")
print("   What it means: On average, our prediction is off by this much.")
print("   Lower is better. Under 3 pts is good for fantasy.")

# 2. PICP - How well-calibrated are our uncertainty ranges?
lower = df_predictions['pred_10']
upper = df_predictions['pred_90']
covered = (df_predictions['y_val'] >= lower) & (df_predictions['y_val'] <= upper)
picp = covered.mean() * 100

print(f"\n2. Prediction Interval Coverage (PICP): {picp:.1f}%")
print("   What it means: How often actual scores fall within our floor-ceiling range.")

# 3. Pinball Loss - How good is our floor prediction?
def pinball_loss(y_true, y_pred, q):
    err = y_true - y_pred
    return np.where(err >= 0, q * err, (1 - q) * (-err)).mean()

pb_10 = pinball_loss(df_predictions['y_val'].values, df_predictions['pred_10'].values, 0.10)

# Calculate how often actual was BELOW our floor (bad - we were overconfident)
pct_below_floor = (df_predictions['y_val'] < df_predictions['pred_10']).mean() * 100

print(f"\n3. Pinball Loss (floor): {pb_10:.4f}")
print("   What it means: Measures floor prediction accuracy (lower = better).")
print(f"   - Players scored BELOW our floor {pct_below_floor:.1f}% of the time")


MODEL EVALUATION (tested on held-out data)

1. Mean Absolute Error (MAE): 4.16 points
   What it means: On average, our prediction is off by this much.
   Lower is better. Under 3 pts is good for fantasy.

2. Prediction Interval Coverage (PICP): 77.1%
   What it means: How often actual scores fall within our floor-ceiling range.

3. Pinball Loss (floor): 0.6728
   What it means: Measures floor prediction accuracy (lower = better).
   - Players scored BELOW our floor 11.2% of the time


In [83]:
# Inspect features and counts used for modeling
feature_cols_model = sorted([c for c in df_final.columns if c not in ['season','week','player_id','Y_target_points']])
print("Features used (X):", feature_cols_model)
print("Num features:", len(feature_cols_model))

num_players = df_final['player_id'].nunique()
num_player_weeks = len(df_final)
print("Unique players in df_final:", num_players)
print("Total player-weeks:", num_player_weeks)

# Optional: quick sanity on weeks per player
weeks_per_player = df_final.groupby('player_id').size()
print("Median weeks per player:", weeks_per_player.median())


Features used (X): ['Y_cum_avg', 'Y_lag_1', 'Y_max_3', 'Y_min_3', 'Y_roll_avg_3', 'Y_std_3', 'cpoe', 'epa', 'interception', 'pass_touchdown', 'passing_yards', 'receiving_yards', 'rush_touchdown', 'rushing_yards', 'total_plays_involved']
Num features: 15
Unique players in df_final: 1028
Total player-weeks: 20842
Median weeks per player: 13.0


---
## Step 4: Prepare for Roster Predictions

Set up player ID mapping and get each player's historical performance patterns (lagged features) for matching with ESPN roster.


In [92]:
# Build player ID mapping for ESPN roster
import pandas as pd
import re
import unicodedata

# Get player ID crosswalk
ids = nfl.import_ids()
name_col_ids = 'name' if 'name' in ids.columns else 'full_name'
ids_map = ids[['gsis_id', name_col_ids]].dropna().drop_duplicates('gsis_id')
ids_map = ids_map.rename(columns={'gsis_id': 'player_id', name_col_ids: 'player_name'})

# Name normalization for matching
SUFFIXES = {'jr','sr','ii','iii','iv','v'}
def normalize_name(s: str) -> str:
    if pd.isna(s):
        return ''
    s = unicodedata.normalize('NFKD', s).encode('ASCII', 'ignore').decode('ASCII')
    s = re.sub(r"[^a-zA-Z\s]", "", s).strip().lower()
    parts = [p for p in s.split() if p and p not in SUFFIXES]
    return " ".join(parts)

ids_map['name_norm'] = ids_map['player_name'].apply(normalize_name)

# Get latest season/week info
latest_season = df_final['season'].max()
df_latest_season = df_final[df_final['season'] == latest_season].copy()
latest_week_in_data = int(df_latest_season['week'].max())
upcoming_week = latest_week_in_data + 1

print(f"Latest data: {latest_season} week {latest_week_in_data}")
print(f"Predicting for: week {upcoming_week}")

# Get each player's lagged features (from their most recent actual game)
df_player_hist = df_latest_season.sort_values(['player_id', 'week']).groupby('player_id').last().reset_index()
df_player_hist = df_player_hist.merge(ids_map[['player_id', 'player_name', 'name_norm']], on='player_id', how='left')

print(f"Players with historical data: {len(df_player_hist)}")


Latest data: 2025 week 14
Predicting for: week 15
Players with historical data: 552


---
## Step 5: Your Roster Predictions

Generate predictions for your ESPN roster using:
- **ESPN projected stats** (rushing/receiving/passing yards, TDs) for the upcoming week
- **Historical patterns** (lagged features: last week, 3-week avg, career avg)

**How to use these predictions**:
- **Need a safe floor?** Start players with high `pred_10` values
- **Need upside?** Start players with high `pred_90` values  
- **Best overall?** Sort by `pred_50` (median expected points)
- **Compare to ESPN**: `espn_proj` shows ESPN's single-point projection


In [93]:
# DEBUG: Let's see what keys ESPN actually uses in the breakdown
print("Inspecting ESPN stats structure for Week 13...\n")

for p in team.roster[:5]:  # Check first 5 players
    name = getattr(p, 'name', None)
    stats = getattr(p, 'stats', {}) or {}
    week_data = stats.get(13, {})
    
    print(f"=== {name} ===")
    print(f"  Points: {week_data.get('points', 'N/A')}")
    print(f"  Projected Points: {week_data.get('projected_points', 'N/A')}")
    
    actual_breakdown = week_data.get('breakdown', {})
    proj_breakdown = week_data.get('projected_breakdown', {})
    
    print(f"  Actual breakdown keys: {list(actual_breakdown.keys())}")
    print(f"  Actual breakdown: {actual_breakdown}")
    print(f"  Projected breakdown keys: {list(proj_breakdown.keys())}")
    print()


Inspecting ESPN stats structure for Week 13...

=== Bijan Robinson ===
  Points: 28.8
  Projected Points: 18.87
  Actual breakdown keys: ['rushingAttempts', 'rushingYards', 'rushingTouchdowns', '27', '28', '29', '30', '31', '32', '33', '34', 'rushing100To199YardGame', 'rushingYardsPerAttempt', 'receivingReceptions', 'receivingYards', '47', '48', '49', '50', '51', '54', 'receivingTargets', 'receivingYardsAfterCatch', 'receivingYardsPerReception', 'teamLoss', 'pointsScored', '179', '210', '212', '213']
  Actual breakdown: {'rushingAttempts': 23.0, 'rushingYards': 142.0, 'rushingTouchdowns': 1.0, '27': 28.0, '28': 14.0, '29': 7.0, '30': 5.0, '31': 2.0, '32': 1.0, '33': 4.0, '34': 2.0, 'rushing100To199YardGame': 1.0, 'rushingYardsPerAttempt': 6.174, 'receivingReceptions': 5.0, 'receivingYards': 51.0, '47': 10.0, '48': 5.0, '49': 2.0, '50': 2.0, '51': 1.0, '54': 1.0, 'receivingTargets': 7.0, 'receivingYardsAfterCatch': 68.0, 'receivingYardsPerReception': 10.2, 'teamLoss': 1.0, 'pointsScored

In [95]:
# Get ESPN roster with WEEKLY projected stats for BOTH Week 14 and Week 15
# ESPN stats structure: stats[week_num] contains 'projected_points' and 'projected_breakdown'

# Find available weeks with projections
sample_stats = getattr(team.roster[0], 'stats', {})
available_weeks = sorted([w for w in sample_stats.keys() if w > 0 and 'projected_points' in sample_stats.get(w, {})])
print(f"Available ESPN projection weeks: {available_weeks}")

# We want to predict for Week 13, 14, and 15
weeks_to_predict = [13, 14, 15]

# Use historical + volatility + position features (same as training)
feature_order = ['Y_lag_1', 'Y_roll_avg_3', 'Y_cum_avg', 'Y_std_3', 'Y_min_3', 'Y_max_3', 
                 'pos_QB', 'pos_RB', 'pos_WR_TE', 'pos_OTHER']
print(f"Using features: {feature_order}")

for pred_week in weeks_to_predict:
    print(f"\n{'='*60}")
    print(f"WEEK {pred_week} PREDICTIONS ({latest_season} Season)")
    print(f"{'='*60}")
    
    # Get ESPN roster info (we only need names and points for display)
    roster_data = []
    for p in team.roster:
        name = getattr(p, 'name', None)
        if name and 'D/ST' not in name:
            stats = getattr(p, 'stats', {}) or {}
            week_data = stats.get(pred_week, {})
            
            espn_weekly_proj = week_data.get('projected_points', 0) or 0
            actual_pts = week_data.get('points', None)
            
            roster_data.append({
                'player_name_espn': name,
                'espn_proj_pts': espn_weekly_proj,
                'actual_pts': actual_pts,
            })
    
    espn_df = pd.DataFrame(roster_data)
    espn_df['name_norm'] = espn_df['player_name_espn'].apply(normalize_name)
    
    # Get lagged features from the week BEFORE the prediction week
    lag_week = pred_week - 1
    df_lag_data = df_latest_season[df_latest_season['week'] == lag_week].copy()
    df_lag_data = df_lag_data.merge(ids_map[['player_id', 'player_name', 'name_norm']], on='player_id', how='left')
    
    # If no data for lag_week, use the most recent available week
    if len(df_lag_data) == 0:
        print(f"  Note: No week {lag_week} data, using most recent available data")
        df_lag_data = df_player_hist.copy()
    
    # Match ESPN roster to NFL data (to get lagged features + position)
    lag_cols = ['player_id', 'name_norm', 'Y_lag_1', 'Y_roll_avg_3', 'Y_cum_avg', 'Y_std_3', 'Y_min_3', 'Y_max_3',
                'pos_QB', 'pos_RB', 'pos_WR_TE', 'pos_OTHER']
    # Only use columns that exist in df_lag_data
    available_cols = [c for c in lag_cols if c in df_lag_data.columns]
    merged = espn_df.merge(
        df_lag_data[available_cols],
        on='name_norm',
        how='left'
    )
    
    matched = merged[merged['player_id'].notna()].copy()
    unmatched = merged[merged['player_id'].isna()]['player_name_espn'].tolist()
    
    print(f"Using week {lag_week} historical features (Y_lag_1, Y_roll_avg_3, Y_cum_avg)")
    print(f"Matched {len(matched)} / {len(espn_df)} roster players")
    if unmatched:
        print(f"Unmatched: {unmatched}")
    
    # Build feature matrix using ONLY pre-game features
    X_roster = matched[feature_order].fillna(0)
    
    # Scale and predict
    X_roster_scaled = scaler.transform(X_roster)
    
    # PURE QUANTILE REGRESSION - all predictions from the trained models
    matched['pred_10'] = models[0.10].predict(X_roster_scaled)
    matched['pred_50'] = models[0.50].predict(X_roster_scaled)
    matched['pred_90'] = models[0.90].predict(X_roster_scaled)
    
    # Fix quantile crossing: ensure pred_10 <= pred_50 <= pred_90
    for idx in matched.index:
        q10, q50, q90 = matched.loc[idx, 'pred_10'], matched.loc[idx, 'pred_50'], matched.loc[idx, 'pred_90']
        sorted_quantiles = sorted([q10, q50, q90])
        matched.loc[idx, 'pred_10'] = sorted_quantiles[0]
        matched.loc[idx, 'pred_50'] = sorted_quantiles[1]
        matched.loc[idx, 'pred_90'] = sorted_quantiles[2]
    
    # Display results
    print(f"\npred_10 = floor (90% chance to beat), pred_50 = median, pred_90 = ceiling")
    
    output = matched[['player_name_espn', 'espn_proj_pts', 'actual_pts', 'pred_10', 'pred_50', 'pred_90']].copy()
    output = output.rename(columns={'player_name_espn': 'player', 'espn_proj_pts': 'espn_proj', 'actual_pts': 'actual'})
    
    # Remove players with 0 projection (bye week or not playing)
    bye_players = output[output['espn_proj'] == 0]['player'].tolist()
    if bye_players:
        print(f"Excluding bye week/inactive players: {bye_players}")
    output = output[output['espn_proj'] > 0]
    
    output = output.sort_values('pred_50', ascending=False).reset_index(drop=True)
    
    # Round for readability
    for col in ['pred_10', 'pred_50', 'pred_90', 'espn_proj']:
        output[col] = output[col].round(1)
    # Round actual if it exists (will be NaN for future weeks)
    if 'actual' in output.columns:
        output['actual'] = output['actual'].round(1)
    
    display(output)


Available ESPN projection weeks: [13, 14]
Using features: ['Y_lag_1', 'Y_roll_avg_3', 'Y_cum_avg', 'Y_std_3', 'Y_min_3', 'Y_max_3', 'pos_QB', 'pos_RB', 'pos_WR_TE', 'pos_OTHER']

WEEK 13 PREDICTIONS (2025 Season)
Using week 12 historical features (Y_lag_1, Y_roll_avg_3, Y_cum_avg)
Matched 11 / 16 roster players
Unmatched: ['Josh Jacobs', 'Tucker Kraft', 'Hollywood Brown', 'Justin Herbert', 'Troy Franklin']

pred_10 = floor (90% chance to beat), pred_50 = median, pred_90 = ceiling
Excluding bye week/inactive players: ['Tee Higgins']


Unnamed: 0,player,espn_proj,actual,pred_10,pred_50,pred_90
0,Jalen Hurts,23.9,16.3,6.0,20.1,28.9
1,Bijan Robinson,18.9,28.8,6.0,15.2,26.2
2,Brock Purdy,15.0,17.1,6.0,15.1,23.0
3,Kyren Williams,13.1,13.2,4.5,12.6,22.7
4,Trey McBride,14.0,18.2,3.6,10.7,24.7
5,Davante Adams,14.3,19.8,3.7,10.2,21.3
6,Javonte Williams,14.1,15.5,2.4,10.1,20.1
7,Kirk Cousins,12.7,13.4,2.3,10.0,21.0
8,DJ Moore,8.1,3.7,1.9,5.9,18.2
9,Theo Johnson,7.0,4.4,1.2,5.5,14.5



WEEK 14 PREDICTIONS (2025 Season)
Using week 13 historical features (Y_lag_1, Y_roll_avg_3, Y_cum_avg)
Matched 13 / 16 roster players
Unmatched: ['Tee Higgins', 'Tucker Kraft', 'Hollywood Brown']

pred_10 = floor (90% chance to beat), pred_50 = median, pred_90 = ceiling
Excluding bye week/inactive players: ['Brock Purdy', 'Theo Johnson']


Unnamed: 0,player,espn_proj,actual,pred_10,pred_50,pred_90
0,Jalen Hurts,19.8,,7.8,20.7,28.1
1,Bijan Robinson,18.4,8.4,5.4,14.9,27.1
2,Kirk Cousins,10.7,2.5,4.6,14.1,21.8
3,Justin Herbert,16.5,,4.9,14.1,26.9
4,Josh Jacobs,20.3,16.2,4.7,12.3,24.1
5,Davante Adams,14.5,4.9,4.0,11.8,22.6
6,Kyren Williams,13.0,16.7,3.7,10.8,20.3
7,Trey McBride,12.3,8.3,2.5,9.8,22.7
8,Javonte Williams,13.2,13.7,2.3,9.2,20.1
9,DJ Moore,10.7,0.1,1.9,7.6,18.9



WEEK 15 PREDICTIONS (2025 Season)
Using week 14 historical features (Y_lag_1, Y_roll_avg_3, Y_cum_avg)
Matched 4 / 16 roster players
Unmatched: ['Trey McBride', 'Kyren Williams', 'Josh Jacobs', 'Davante Adams', 'DJ Moore', 'Tucker Kraft', 'Hollywood Brown', 'Jalen Hurts', 'Brock Purdy', 'Justin Herbert', 'Theo Johnson', 'Troy Franklin']

pred_10 = floor (90% chance to beat), pred_50 = median, pred_90 = ceiling
Excluding bye week/inactive players: ['Bijan Robinson', 'Tee Higgins', 'Javonte Williams', 'Kirk Cousins']


Unnamed: 0,player,espn_proj,actual,pred_10,pred_50,pred_90
