# Fantasy Football Quantile Predictions

This notebook predicts **floor, median, and ceiling** fantasy points for your ESPN roster using quantile regression.

## What You'll Get
- **pred_10 (Floor)**: Safe minimum - player should score above this 90% of the time
- **pred_50 (Median)**: Expected score - most likely outcome
- **pred_90 (Ceiling)**: Upside - player could score this high 10% of the time

## Setup Instructions

1. **Install dependencies**: `pip install espn_api nfl_data_py scikit-learn`

2. **Get your ESPN credentials**:
   - `league_id`: From your league URL (e.g., `leagueId=567575`)
   - `espn_s2` and `swid`: Browser cookies from [this guide](https://github.com/cwendt94/espn-api/discussions/150)

3. **Run all cells in order** - predictions will appear at the end!

## Why Quantile Regression?

**The Problem**: Standard projections give you one number (e.g., "14.5 points"). But two players can both project 14.5 points with very different risk profiles:
- Player A: Consistent, always scores 13-16 points
- Player B: Boom/bust, scores 5-24 points

**The Solution**: Instead of one prediction, we give you THREE:
- **Floor (10th percentile)**: "There's a 90% chance they score at least this much"
- **Median (50th percentile)**: "This is the most likely outcome"
- **Ceiling (90th percentile)**: "There's a 10% chance they score this much or more"

**How it works**:
1. Load 3+ years of NFL play-by-play data
2. Calculate weekly fantasy points for each player
3. Build features: yards, TDs, rolling averages, career stats
4. Train 3 separate Gradient Boosting models (one per quantile)
5. Generate predictions for your roster

---
## Step 1: Connect to ESPN League


In [8]:
# Connect to ESPN Fantasy League
from espn_api.football import League
import nfl_data_py as nfl

league = League(
    league_id=567575, 
    year=2025, 
    espn_s2='AECVN3FcAWfB56xxM5SnNVqsxq9soOxMmzDH1CfYYOkX3KIrzeGSsTMZ0CwJLPQoBYxLMp59ILoZ0CvUnTrBbU15b2PwD1v9fZRoO5iMb%2Fy%2FWPaOqPwlwSx2ShvBAt%2BSqJxtboHzcpSuSOgASzSNx4divXOEc4aVZjnOx7qRJ9YbE800NnLCNiLMBpaHjdZg%2BMN6vwCInJrKejPDXsmdjo%2FIkV0IfCLQHr6QHyJjLhqOwAPozqNPyGa1ZZT8DOxA%2BmpTsa5v9cgfJ4V%2BVZzxzr95KxBS0k%2BYJMt7OWSdA%2B2yUQ%3D%3D', 
    swid='{280AA84B-DE12-4B3D-80F2-283BF634242B}'
)
team = league.teams[9]  # <-- Change index to select your team
print(f"Connected to league. Your team: {team.team_name}")



Player(Bijan Robinson)
{'rushingAttempts': 23.0, 'rushingYards': 142.0, 'rushingTouchdowns': 1.0, '27': 28.0, '28': 14.0, '29': 7.0, '30': 5.0, '31': 2.0, '32': 1.0, '33': 4.0, '34': 2.0, 'rushing100To199YardGame': 1.0, 'rushingYardsPerAttempt': 6.174, 'receivingReceptions': 5.0, 'receivingYards': 51.0, '47': 10.0, '48': 5.0, '49': 2.0, '50': 2.0, '51': 1.0, '54': 1.0, 'receivingTargets': 7.0, 'receivingYardsAfterCatch': 68.0, 'receivingYardsPerReception': 10.2, 'teamLoss': 1.0, 'pointsScored': 6.0, '179': 1.0, '210': 1.0, '212': 9.0, '213': 1.0}
6.174
18.87
28.8


---
## Step 2: Load Data & Train Model

The next few cells:
1. **Define scoring rules** (Half-PPR: 0.5 pts per reception)
2. **Load NFL play-by-play data** (2022-2025 seasons)
3. **Build features** for each player-week:
   - Current stats: passing/rushing/receiving yards, TDs, interceptions
   - Historical stats: last week's points, 3-week rolling average, career average
4. **Train 3 Gradient Boosting models** (one for each quantile: 10%, 50%, 90%)


In [56]:
# Helpers: fantasy scoring per play and weekly target aggregation
import pandas as pd
import numpy as np

# Half-PPR scoring rules
scoring_rules = {
    'pass_yd': 0.04, 'pass_td': 4,
    'rush_yd': 0.1, 'rush_td': 6,
    'rec_yd': 0.1, 'rec_td': 6,
    'rec': 0.5,
    'int': -2, 'fumble_lost': -2,
    'qb_kneel_yd': -0.1
}

def calculate_fantasy_points_per_play(df: pd.DataFrame, scoring_rules: dict) -> pd.DataFrame:
    df = df.copy()
    df['fp_pass'] = (df['passing_yards'].fillna(0) * scoring_rules['pass_yd']) + \
                    (df['pass_touchdown'].fillna(0) * scoring_rules['pass_td'])
    df['fp_rush'] = (df['rushing_yards'].fillna(0) * scoring_rules['rush_yd']) + \
                    (df['rush_touchdown'].fillna(0) * scoring_rules['rush_td'])
    df['fp_rec'] = (df['receiving_yards'].fillna(0) * scoring_rules['rec_yd']) + \
                   (df['pass_touchdown'].fillna(0) * scoring_rules['rec_td']) + \
                   (df['complete_pass'].fillna(0) * scoring_rules['rec'])
    df['fp_int'] = df['interception'].fillna(0) * scoring_rules['int']
    df['fp_fumble_lost'] = df['fumble_lost'].fillna(0) * scoring_rules['fumble_lost']
    df['fp_kneel'] = np.where(df['play_type'] == 'qb_kneel',
                              df['yards_gained'].fillna(0) * scoring_rules['qb_kneel_yd'], 0)
    return df

def calculate_weekly_fantasy_points_final(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    df['fp_pass_total'] = df['fp_pass'].fillna(0) + df['fp_int'].fillna(0) + df['fp_kneel'].fillna(0)
    df['fp_rush_total'] = df['fp_rush'].fillna(0)
    df['fp_rec_total'] = df['fp_rec'].fillna(0)
    df['fp_fumble_1_total'] = df['fp_fumble_lost'].fillna(0)

    id_vars = ['season', 'week']
    contributions_map = {
        'passer_player_id': 'fp_pass_total',
        'rusher_player_id': 'fp_rush_total',
        'receiver_player_id': 'fp_rec_total',
        'fumbled_1_player_id': 'fp_fumble_1_total',
    }
    contributions = []
    for id_col, point_col in contributions_map.items():
        tmp = df.rename(columns={id_col: 'player_id', point_col: 'points'})
        contributions.append(tmp[id_vars + ['player_id', 'points']])

    df_all_points = pd.concat(contributions, ignore_index=True)
    df_all_points.dropna(subset=['player_id'], inplace=True)

    out = df_all_points.groupby(id_vars + ['player_id'])['points'].sum().reset_index()
    return out.rename(columns={'points': 'Y_target_points'})


In [45]:
# Build modeling dataframe (df_final) from play-by-play
import nfl_data_py as nfl

# Load plays (same years as modeling notebook)
df = nfl.import_pbp_data([2022, 2023, 2024, 2025])

# Keep fantasy-relevant plays
fantasy_play_types = ['pass', 'run', 'qb_kneel']
exclude_play_types = ['no_play', 'qb_spike', 'field_goal', 'extra_point', 'punt', 'kickoff']

df_fantasy_plays = df[
    df['play_type'].isin(fantasy_play_types) & ~df['play_type'].isin(exclude_play_types)
].copy()

# Select modeling columns
mdl_cols = [
    'game_id','play_id','season','week','posteam','defteam','home_team','away_team',
    'passer_player_id','passer','rusher_player_id','rusher','receiver_player_id','receiver','fumbled_1_player_id','fumbled_2_player_id',
    'play_type','down','ydstogo','yardline_100','shotgun','no_huddle',
    'passing_yards','pass_touchdown','pass_attempt','complete_pass',
    'rushing_yards','rush_touchdown','rush_attempt',
    'receiving_yards','yards_after_catch',
    'penalty_yards','interception','fumble_lost',
    'yards_gained','epa','cpoe','td_prob'
]

df_mdl0 = df_fantasy_plays[mdl_cols].copy()

# Per-play fantasy points
df_mdl1 = calculate_fantasy_points_per_play(df_mdl0, scoring_rules)

# Weekly target Y per player
df_target_Y = calculate_weekly_fantasy_points_final(df_mdl1)

# Aggregate features per player-week
feature_agg_rules = {
    'passing_yards': 'sum',
    'rushing_yards': 'sum',
    'receiving_yards': 'sum',
    'pass_touchdown': 'sum',
    'rush_touchdown': 'sum',
    'interception': 'sum',
    'epa': 'mean',
    'cpoe': 'mean',
}

id_vars = ['season','week']
feature_cols = list(feature_agg_rules.keys())

df_select = df_mdl1[id_vars + feature_cols + ['passer_player_id','rusher_player_id','receiver_player_id']].copy()

df_X_long = pd.melt(
    df_select,
    id_vars=id_vars + feature_cols,
    value_vars=['passer_player_id','rusher_player_id','receiver_player_id'],
    var_name='role_type',
    value_name='player_id'
)
df_X_long.dropna(subset=['player_id'], inplace=True)

df_features_X = df_X_long.groupby(['season','week','player_id']).agg(feature_agg_rules).reset_index()

df_counts = df_X_long.groupby(['season','week','player_id']).size().reset_index(name='total_plays_involved')
df_features_X = pd.merge(df_features_X, df_counts, on=['season','week','player_id'], how='left')

# Merge with target and create time features
df_final = pd.merge(df_features_X, df_target_Y, on=['season','week','player_id'], how='left')
df_final['Y_target_points'] = df_final['Y_target_points'].fillna(0)

df_final.sort_values(by=['player_id','season','week'], inplace=True)

df_final['Y_lag_1'] = df_final.groupby('player_id')['Y_target_points'].shift(1)
df_final['Y_roll_avg_3'] = df_final.groupby('player_id')['Y_target_points'].transform(
    lambda x: x.shift(1).rolling(window=3, min_periods=1).mean()
)
df_final['Y_cum_avg'] = df_final.groupby('player_id')['Y_target_points'].transform(
    lambda x: x.shift(1).expanding(min_periods=1).mean()
)

for col in ['Y_lag_1','Y_roll_avg_3','Y_cum_avg']:
    df_final[col] = df_final[col].fillna(0)

print('df_final ready:', df_final.shape)
print(df_final.head())


2022 done.
2023 done.
2024 done.
2025 done.
Downcasting floats.
df_final ready: (20676, 16)
      season  week   player_id  passing_yards  rushing_yards  receiving_yards  \
0       2022     1  00-0019596          212.0           -1.0            212.0   
318     2022     2  00-0019596          190.0           -2.0            190.0   
634     2022     3  00-0019596          271.0           -1.0            271.0   
943     2022     4  00-0019596          385.0            0.0            371.0   
1254    2022     5  00-0019596          351.0           -3.0            351.0   

      pass_touchdown  rush_touchdown  interception       epa       cpoe  \
0                1.0             0.0           1.0 -0.012462   3.290235   
318              1.0             0.0           0.0 -0.123334 -13.116215   
634              1.0             0.0           0.0 -0.165223   4.887816   
943              3.0             0.0           0.0  0.179459   5.982167   
1254             1.0             0.0          

In [46]:
# Train quantile GBM models and compute predictions
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error

id_cols = ['season','week','player_id']
target_col = 'Y_target_points'

X = df_final.drop(columns=id_cols + [target_col]).fillna(0)
y = df_final[target_col]

split_point = int(len(X) * 0.90)
X_train, X_test = X.iloc[:split_point], X.iloc[split_point:]
y_train, y_test = y.iloc[:split_point], y.iloc[split_point:]

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

quantiles = [0.10, 0.50, 0.90]
models = {}
y_preds = {}

for q in quantiles:
    model = HistGradientBoostingRegressor(
        loss='quantile', quantile=q, max_iter=500,
        learning_rate=0.05, max_depth=6, random_state=42
    )
    model.fit(X_train_scaled, y_train)
    models[q] = model
    y_preds[q] = model.predict(X_test_scaled)

import pandas as pd

df_predictions = pd.DataFrame({
    'y_test': y_test.values,
    'pred_10': y_preds[0.10],
    'pred_50': y_preds[0.50],
    'pred_90': y_preds[0.90]
}, index=y_test.index)

mae = mean_absolute_error(df_predictions['y_test'], df_predictions['pred_50'])
print(f"MAE (median): {mae:.2f}")

# Attach identifiers to the predictions (align by index)
ids_test = df_final.loc[df_predictions.index, id_cols].reset_index(drop=True)
df_preds_full = pd.concat([ids_test.reset_index(drop=True), df_predictions.reset_index(drop=True)], axis=1)
print('Predictions shape:', df_preds_full.shape)
df_preds_full.head()


MAE (median): 0.48
Predictions shape: (2068, 7)


Unnamed: 0,season,week,player_id,y_test,pred_10,pred_50,pred_90
0,2025,2,00-0039075,23.6,16.87648,20.892123,21.98287
1,2025,3,00-0039075,17.300001,9.352937,15.747823,17.976025
2,2025,4,00-0039075,29.5,19.64161,28.019555,30.648956
3,2025,5,00-0039075,19.5,16.969888,19.05987,19.759117
4,2025,6,00-0039075,3.8,3.689791,4.022108,4.057427


---
## Step 3: Evaluate Model Performance

Before using the model, let's check how accurate it is on held-out test data:
- **MAE**: How far off is our median prediction on average?
- **PICP**: How often do actual scores fall within our floor-ceiling range?
- **Floor Accuracy**: How reliable is our "safe floor" prediction?


In [55]:
# Model Evaluation Metrics (with simple explanations)
print("=" * 60)
print("MODEL EVALUATION (tested on held-out data)")
print("=" * 60)

# 1. MAE - How accurate is our median prediction?
mae = mean_absolute_error(df_predictions['y_test'], df_predictions['pred_50'])
print(f"\n1. Mean Absolute Error (MAE): {mae:.2f} points")
print("   What it means: On average, our prediction is off by this much.")
print("   Lower is better. Under 3 pts is good for fantasy.")

# 2. PICP - How well-calibrated are our uncertainty ranges?
lower = df_predictions['pred_10']
upper = df_predictions['pred_90']
covered = (df_predictions['y_test'] >= lower) & (df_predictions['y_test'] <= upper)
picp = covered.mean() * 100

print(f"\n2. Prediction Interval Coverage (PICP): {picp:.1f}%")
print("   What it means: How often actual scores fall within our floor-ceiling range.")

# 3. Pinball Loss - How good is our floor prediction?
def pinball_loss(y_true, y_pred, q):
    err = y_true - y_pred
    return np.where(err >= 0, q * err, (1 - q) * (-err)).mean()

pb_10 = pinball_loss(df_predictions['y_test'].values, df_predictions['pred_10'].values, 0.10)

# Calculate how often actual was BELOW our floor (bad - we were overconfident)
pct_below_floor = (df_predictions['y_test'] < df_predictions['pred_10']).mean() * 100

print(f"\n3. Pinball Loss (floor): {pb_10:.4f}")
print("   What it means: Measures floor prediction accuracy (lower = better).")
print(f"   - Players scored BELOW our floor {pct_below_floor:.1f}% of the time")


MODEL EVALUATION (tested on held-out data)

1. Mean Absolute Error (MAE): 0.48 points
   What it means: On average, our prediction is off by this much.
   Lower is better. Under 3 pts is good for fantasy.

2. Prediction Interval Coverage (PICP): 71.7%
   What it means: How often actual scores fall within our floor-ceiling range.

3. Pinball Loss (floor): 0.1537
   What it means: Measures floor prediction accuracy (lower = better).
   - Players scored BELOW our floor 16.3% of the time


In [None]:
# Inspect features and counts used for modeling
feature_cols_model = sorted([c for c in df_final.columns if c not in ['season','week','player_id','Y_target_points']])
print("Features used (X):", feature_cols_model)
print("Num features:", len(feature_cols_model))

num_players = df_final['player_id'].nunique()
num_player_weeks = len(df_final)
print("Unique players in df_final:", num_players)
print("Total player-weeks:", num_player_weeks)

# Optional: quick sanity on weeks per player
weeks_per_player = df_final.groupby('player_id').size()
print("Median weeks per player:", weeks_per_player.median())


---
## Step 4: Generate Predictions for All Players

Now we use the trained model to predict floor/median/ceiling for every NFL player based on their most recent performance.


In [50]:
# Generate predictions for ALL players using their MOST RECENT week's features
import pandas as pd

# Get each player's most recent week from the latest season
latest_season = df_final['season'].max()
print(f"Latest season in data: {latest_season}")

df_latest_season = df_final[df_final['season'] == latest_season].copy()
latest_week_in_data = df_latest_season['week'].max()
print(f"Latest week in {latest_season} data: {latest_week_in_data}")

# For each player, get their most recent week
df_player_latest = df_latest_season.sort_values(['player_id', 'week']).groupby('player_id').last().reset_index()
print(f"Generating predictions for {len(df_player_latest)} players")

# Prepare features (same as training)
id_cols = ['season','week','player_id']
target_col = 'Y_target_points'
X_latest = df_player_latest.drop(columns=id_cols + [target_col]).fillna(0)

# Scale features using the same scaler from training
X_latest_scaled = scaler.transform(X_latest)

# Generate predictions for all quantiles
preds_latest = df_player_latest[id_cols + [target_col]].copy()
preds_latest = preds_latest.rename(columns={target_col: 'y_actual'})
preds_latest['pred_10'] = models[0.10].predict(X_latest_scaled)
preds_latest['pred_50'] = models[0.50].predict(X_latest_scaled)
preds_latest['pred_90'] = models[0.90].predict(X_latest_scaled)

# Add player names from import_ids
ids = nfl.import_ids()
name_col_ids = 'name' if 'name' in ids.columns else 'full_name'
ids_names = ids[['gsis_id', name_col_ids]].dropna().drop_duplicates('gsis_id')
ids_names = ids_names.rename(columns={'gsis_id': 'player_id', name_col_ids: 'player_name'})

preds_latest = preds_latest.merge(ids_names, on='player_id', how='left')
print(f"Players with names: {preds_latest['player_name'].notna().sum()}")

# Add upcoming week column
preds_latest['predict_week'] = preds_latest['week'].astype(int) + 1

# Show top predictions
cols_out = ['player_name', 'week', 'predict_week', 'pred_10', 'pred_50', 'pred_90']
print(f"\nTop 25 predictions for upcoming weeks:")
print(f"(week = last played, predict_week = prediction for)")
preds_latest[cols_out].sort_values('pred_50', ascending=False).head(25)


Latest season in data: 2025
Latest week in 2025 data: 14
Generating predictions for 547 players
Players with names: 540

Top 25 predictions for upcoming weeks:
(week = last played, predict_week = prediction for)


Unnamed: 0,player_name,week,predict_week,pred_10,pred_50,pred_90
152,A.J. Brown,13,14,24.469809,30.20573,31.077868
390,Jahmyr Gibbs,14,15,23.855391,29.737709,32.814078
68,Patrick Mahomes,13,14,27.066502,28.665233,31.165898
328,Bijan Robinson,13,14,25.781046,28.399004,28.549774
20,Jameis Winston,12,13,25.504196,27.554422,30.869954
177,Jordan Love,13,14,24.690412,25.908947,27.564412
387,Rashee Rice,13,14,21.236795,25.051715,25.093495
317,Dontayvion Wicks,13,14,23.298573,25.023628,25.684353
26,Marcus Mariota,13,14,22.250072,24.475818,25.406345
393,Bryce Young,13,14,21.743504,22.665745,26.146599


---
## Step 5: Your Roster Predictions

Finally, we match the predictions to your ESPN fantasy roster and show floor/median/ceiling for each player.

**How to use these predictions**:
- **Need a safe floor?** Start players with high `pred_10` values
- **Need upside?** Start players with high `pred_90` values
- **Best overall?** Sort by `pred_50` (median expected points)


In [52]:
# Filter latest-week predictions to your ESPN roster
import re
import unicodedata

# Helper: normalize names
SUFFIXES = {'jr','sr','ii','iii','iv','v'}

def normalize_name(s: str) -> str:
    if pd.isna(s):
        return ''
    s = unicodedata.normalize('NFKD', s).encode('ASCII', 'ignore').decode('ASCII')
    s = re.sub(r"[^a-zA-Z\s]", "", s).strip().lower()
    parts = [p for p in s.split() if p and p not in SUFFIXES]
    return " ".join(parts)

# ESPN roster names (skip D/ST)
espn_names = []
for p in team.roster:
    name = getattr(p, 'name', None)
    if name and 'D/ST' not in name:
        espn_names.append(name)

espn_df = pd.DataFrame({'player_name_espn': espn_names})
espn_df['player_name_norm'] = espn_df['player_name_espn'].apply(normalize_name)

# Build normalized name map from preds_latest
preds_latest['player_name_norm'] = preds_latest['player_name'].apply(normalize_name)

# Match by normalized name (include week info)
merged = espn_df.merge(
    preds_latest[['player_id', 'player_name', 'player_name_norm', 'season', 'week', 'y_actual', 'pred_10', 'pred_50', 'pred_90']],
    on='player_name_norm', 
    how='left'
)

matched = merged[merged['player_id'].notna()]
unmatched = merged[merged['player_id'].isna()]['player_name_espn'].tolist()

print(f"Matched {len(matched)} / {len(espn_df)} ESPN roster players.")
if unmatched:
    print(f"Unmatched: {unmatched}")

# Display predictions for your roster
if len(matched) > 0:
    latest_season = int(matched['season'].max())
    # The upcoming week is the latest week in data + 1
    upcoming_week = int(latest_week_in_data) + 1
    
    matched = matched.copy()
    matched['data_week'] = matched['week'].astype(int)  # Which week's data was used
    
    print(f"\n=== Your Roster Predictions for Week {upcoming_week} ({latest_season} Season) ===")
    print(f"pred_10 = floor (10%), pred_50 = median (50%), pred_90 = ceiling (90%)")
    
    cols_show = ['player_name_espn', 'data_week', 'pred_10', 'pred_50', 'pred_90']
    output = matched[cols_show].sort_values('pred_50', ascending=False).reset_index(drop=True)
    output = output.rename(columns={'player_name_espn': 'player', 'data_week': 'last_played'})
    output.insert(1, 'predict_week', upcoming_week)  # Add prediction week column
    
    # Round predictions for readability
    for col in ['pred_10', 'pred_50', 'pred_90']:
        output[col] = output[col].round(1)
    
    # Flag players who haven't played recently
    stale_data = output[output['last_played'] < int(latest_week_in_data)]
    if len(stale_data) > 0:
        print(f"\n⚠️  Note: Some players haven't played recently (last_played < week {int(latest_week_in_data)}).")
        print(f"   Their predictions may be less accurate.")
    
    display(output)


Matched 15 / 16 ESPN roster players.
Unmatched: ['Hollywood Brown']

=== Your Roster Predictions for Week 15 (2025 Season) ===
pred_10 = floor (10%), pred_50 = median (50%), pred_90 = ceiling (90%)

⚠️  Note: Some players haven't played recently (last_played < week 14).
   Their predictions may be less accurate.


Unnamed: 0,player,predict_week,last_played,pred_10,pred_50,pred_90
0,Bijan Robinson,15,13,25.8,28.4,28.5
1,Davante Adams,15,13,19.1,20.1,21.2
2,Trey McBride,15,13,17.0,17.8,18.8
3,Brock Purdy,15,13,15.4,17.5,18.9
4,Jalen Hurts,15,13,15.6,16.7,18.1
5,Justin Herbert,15,13,12.6,14.0,17.8
6,Kirk Cousins,15,13,12.4,13.5,14.3
7,Kyren Williams,15,13,11.3,13.3,14.1
8,Javonte Williams,15,14,12.6,13.1,14.0
9,Josh Jacobs,15,13,9.5,10.4,10.7
