# NFL Game Winner Prediction - Brandon He

This notebook implements a machine learning pipeline to **predict the winners of NFL games** from Week 5 on of the 2025 NFL season, using team statistics and advanced metrics. 

A **Random Forest Classifier** was used, trained on past games to generate predictions for upcoming games. The model leverages features such as:

- Points per game scored
- Points per game allowed
- Turnover differential
- 3rd down conversion % (offense & defense)
- Offensive EPA/play
- Defensive EPA/play
- Home-field advantage

The pipeline is designed to run **week-by-week**, updating rolling averages for each team and generating matchup-based features to predict the next week's outcomes. It also tracks **actual results and prediction accuracy** over the season.

In [2]:
# Import libraries
import pandas as pd
import nflreadpy as nfl
import nfl_data_py as nfl_data
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt

In [None]:
# Load NFL data fo2 2025 season
season = 2025
schedule = nfl.load_schedules(seasons=[season])
schedule = schedule.to_pandas()

# Select only regular season games, create target variable
games = schedule[schedule['game_type'] <= 'REG'].copy()
games["winner"] = (games["home_score"] > games["away_score"]).astype(int) # 1 if home team wins, 0 if away team wins
team_stats = nfl.load_team_stats([season])
team_stats = team_stats.to_pandas()

In [None]:
# Add column for EPA per play
team_stats["pass_plays"] = team_stats["attempts"] + team_stats["sacks_suffered"]
team_stats["rush_plays"] = team_stats["carries"]
team_stats["total_plays"] = team_stats["pass_plays"] + team_stats["rush_plays"]
team_stats["off_epa_per_play"] = (
    (team_stats["passing_epa"] * team_stats["pass_plays"]) +
    (team_stats["rushing_epa"] * team_stats["rush_plays"])
) / team_stats["total_plays"]

In [None]:
# Add columns points for and against to team_stats
home = schedule[["season", "week", "home_team", "away_team", "home_score", "away_score"]].copy()
home["team"] = home["home_team"]
home["points_for"] = home["home_score"]
home["points_against"] = home["away_score"]

away = schedule[["season", "week", "home_team", "away_team", "home_score", "away_score"]].copy()
away["team"] = away["away_team"]
away["points_for"] = away["away_score"]
away["points_against"] = away["home_score"]

schedule_long = pd.concat([home, away], ignore_index=True)

# Merge into team_stats
team_stats = team_stats.merge(
    schedule_long[["season", "week", "team", "points_for", "points_against"]],
    on=["season", "week", "team"],
    how="left"
)

In [25]:
# Add column for turnover differential
# Offensive giveaways
team_stats["giveaways"] = (
    team_stats["passing_interceptions"] + 
    team_stats["rushing_fumbles_lost"] + 
    team_stats["receiving_fumbles_lost"] + 
    team_stats["sack_fumbles_lost"]
)

# Defensive takeaways
team_stats["takeaways"] = (
    team_stats["def_interceptions"] + 
    team_stats["fumble_recovery_opp"]   # opp fumbles recovered by your team
)

# Turnover differential
team_stats["turnover_diff"] = team_stats["takeaways"] - team_stats["giveaways"]

In [None]:
# Web scrape defensive EPA/play from https://sumersports.com/teams/defensive/

Unnamed: 0,points_for,epa_per_play
points_for,1.0,0.752358
epa_per_play,0.752358,1.0


In [None]:
# Create dataframe with rolling averages of relevant features

In [22]:
schedule.columns

Index(['game_id', 'season', 'game_type', 'week', 'gameday', 'weekday',
       'gametime', 'away_team', 'away_score', 'home_team', 'home_score',
       'location', 'result', 'total', 'overtime', 'old_game_id', 'gsis',
       'nfl_detail_id', 'pfr', 'pff', 'espn', 'ftn', 'away_rest', 'home_rest',
       'away_moneyline', 'home_moneyline', 'spread_line', 'away_spread_odds',
       'home_spread_odds', 'total_line', 'under_odds', 'over_odds', 'div_game',
       'roof', 'surface', 'temp', 'wind', 'away_qb_id', 'home_qb_id',
       'away_qb_name', 'home_qb_name', 'away_coach', 'home_coach', 'referee',
       'stadium_id', 'stadium'],
      dtype='object')