# NFL Fantasy Football Projection Model Using XGBoost

This notebook presents a comprehensive workflow for building an NFL fantasy football projection model using XGBoost. Data sources include Yahoo, Pro-Football Reference, and SportsDataIO. The goal is to leverage advanced machine learning techniques and rich datasets to generate accurate player projections for fantasy football analysis and decision-making.

This notebook will use weekly data from the 2024 NFL Regular Season. 

In [1]:
import pandas as pd
import numpy as np
import os
import sys

project_root = os.path.abspath(os.path.join(os.getcwd(), os.pardir))
print(f"Project Root: {project_root}")
print("Sys Path Before:", sys.path)
if project_root not in sys.path:
    print("Inserting project root to sys.path")
    sys.path.insert(0, project_root)

# Now import
from data_api import SportsDataIO, Yahoo, PFR
from utils import describe_endpoint
from yahoo_helpers import get_all_players, get_player_details, get_player_stats

from dotenv import load_dotenv
load_dotenv()

Project Root: c:\Users\bengu\Documents\Sports Analysis Project\clairvoyent-raven-sports-analysis\src
Sys Path Before: ['C:\\Users\\bengu\\AppData\\Local\\Programs\\Python\\Python310\\python310.zip', 'C:\\Users\\bengu\\AppData\\Local\\Programs\\Python\\Python310\\DLLs', 'C:\\Users\\bengu\\AppData\\Local\\Programs\\Python\\Python310\\lib', 'C:\\Users\\bengu\\AppData\\Local\\Programs\\Python\\Python310', 'c:\\Users\\bengu\\.virtualenvs\\cfeproj-oIABPDjj', '', 'c:\\Users\\bengu\\.virtualenvs\\cfeproj-oIABPDjj\\lib\\site-packages', 'c:\\Users\\bengu\\.virtualenvs\\cfeproj-oIABPDjj\\lib\\site-packages\\win32', 'c:\\Users\\bengu\\.virtualenvs\\cfeproj-oIABPDjj\\lib\\site-packages\\win32\\lib', 'c:\\Users\\bengu\\.virtualenvs\\cfeproj-oIABPDjj\\lib\\site-packages\\Pythonwin']
Inserting project root to sys.path


True

In [2]:
# Initialize API wrappers
sdio_api = SportsDataIO(api_key=os.getenv("SPORTS_DATA_IO_API_KEY"))
yahoo_api = Yahoo(os.getenv("YAHOO_OAUTH_KEYS_PATH"))
pfr_api = PFR()

[2025-09-29 13:14:19,758 DEBUG] [yahoo_oauth.oauth.__init__] Checking 
[2025-09-29 13:14:19,763 DEBUG] [yahoo_oauth.oauth.token_is_valid] ELAPSED TIME : 1987.6880526542664
[2025-09-29 13:14:19,765 DEBUG] [yahoo_oauth.oauth.token_is_valid] TOKEN IS STILL VALID


Getting league key
League key: 461.l.242497


In [3]:
"""
Define data paradigms

Position name-abbreviation mapping
"""

position_groups = {
    "rushing_and_receiving": { 'RB', 'WR', 'TE', 'FB' },
    "kicking": { 'K' },
    "passing": { 'QB' },
    "defense": { 'DE', 'CB', 'LB', 'FS','OLB', 'S', 'DT', 'ILB', 'DL', 'DB', 'SS', 'NT', }
}


## Extract and Clean Necessary Data

Needed data

* 2024 Regular Season Weekly Stats (Sports Data IO)
* 2023 Regular Season Yearly Stats (Pro Football Reference)

Paradigm:
1. Get player game data from the week prior to the week of interest from SportsDataIO (GameFrame).
   Split up data by positional group.
2. Get previous season data from Pro Football Reference (SeasonFrame). Iterate through names in the GameFrame.
3. Merge the SeasonFrame with the GameFrame, matching Name in the GameFrame to query_name in SeasonFrame.


In [4]:
all_players = get_all_players()
print(f"Total players loaded: {len(all_players)}")

Total players loaded: 2165


In [5]:
player_game_stats_2024 = sdio_api.fantasy.get_player_game_stats(season="2024REG", week=1)

In [6]:
# Initialize position separated dataframes
rushing_and_receiving_df = player_game_stats_2024[player_game_stats_2024["Position"].isin(position_groups["rushing_and_receiving"])].reset_index(drop=True)
kicking_df = player_game_stats_2024[player_game_stats_2024["Position"].isin(position_groups["kicking"])].reset_index(drop=True)
passing_df = player_game_stats_2024[player_game_stats_2024["Position"].isin(position_groups["passing"])].reset_index(drop=True)
# May use down the line
defense_df = player_game_stats_2024[player_game_stats_2024["Position"].isin(position_groups["defense"])].reset_index(drop=True)

In [7]:
name = passing_df.loc[0, "Name"]
player_stats_df = pd.DataFrame(passing_df.loc[0]).T
season_stats_df = pfr_api.get_player_stats(name, 2023)

/players/A/AlleJo02.htm


In [8]:
season_stats_df = season_stats_df.add_prefix("season_stats_")

In [13]:
pd.concat([player_stats_df, season_stats_df.reset_index(drop=True)], axis=1)

Unnamed: 0,GameKey,PlayerID,SeasonType,Season,GameDate,Week,Team,Opponent,HomeOrAway,Number,...,season_stats_qbr,season_stats_pass_sacked,season_stats_pass_sacked_yds,season_stats_pass_sacked_pct,season_stats_pass_net_yds_per_att,season_stats_pass_adj_net_yds_per_att,season_stats_comebacks,season_stats_gwd,season_stats_av,season_stats_awards
0,202410104,19801,1,2024,2024-09-08T13:00:00,1,BUF,ARI,HOME,17,...,69.6,24,152,3.98,6.89,6.51,2,4,18,"AP MVP-5,AP OPoY-6"


## Apply the XGBoost Model

In [None]:
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing