<h1 style="text-align:center">NFL March Madness</h1>

A model that aims to predict which NFL team beats which based on historical data, for use in a March Madness-style prediction game.

How you doin

In [1]:
# General Imports
import polars as pl
import numpy as np
import matplotlib.pyplot as plt

# Data Import
import nflreadpy as nfl

# EDA

`nflreadpy` will be used to obtain the data for this project. The `load_team_stats` function loads team data for each game in any season or range of seasons, from week 1 to the Super Bowl. 

There are 102 features, ranging from completions and attempts to field goal data. Each row focuses on one team, distinguishing between their stats and their opponents' stats. This means that one game will appear twice on the dataset, each from the perspective of a separate team.

# Data Wrangling

## Dropped Columns

Some columns will be dropped. These are:
- `season`,`week`: Self explanatory. These aren't stats, just reference points that are not directly relevant to the task at hand.
- `season_type`: Further explanation below as to why this stat doesn't matter.
- `receiving_tds`, `receiving_2pt_conversions`: The `passing_tds` and `passing_2pt_conversions` features already give us this information.
- `fg_made_list`, `fg_missed_list`,`fg_blocked_list`,`pat_pct`: The list features have a lot of null values. The `pat_pct` one seems irrelevant since we already have numbers for field goals made and attempted.
- `gwfg_made`,`gwfg_att`,`gwfg_missed`,`gwfg_blocked`,`gwfg_distance`: These features seem like an example of data leakage, since a game-winning field goal would clearly indicate a win (or loss, in case of it being blocked)

## Dropped Rows
Some rows will also be dropped. To include all 32 teams, this project will focus on regular season games only, so all postseason games will be eliminated from the sample.

## Null Value treatment

2 of the remaining features have null values. These are `fg_long` and `fg_pct`

`fg_long` has null values if a team didn't make a field goal in that game. This includes teams that tried at least one but missed. Therefore, the null values here will be replaced with zeros, for now. This may also punish teams so good that they only scored touchdowns and didn't need to attempt a single field goal. The impact of this will be examined later.

`fg_pct`, like `fg_long`, has null values if a team didn't make a single field goal in the game, regardless of how many they tried. Like above, for now these will be replaced by zeros and the impact will be measured later on.

## New Features
Because this is a predictive model, a new target feature called `win` will be added. This can either be "yes", "no" or "tied". This is how it will be obtained.

First, Team A's total scoring will be calculated using 8 features. These are:
- `passing_tds`, `ruhsing_tds`, `def_tds` and `fumble_recovery_tds` (+6 for each)
- `passing_2pt_conversions`, `rushing_2pt_conversions` and `def_safeties` (+2 for each) 
- `pat_made` (+1 for each) *

This will give us Team A's scoring. Then, using the week column (for this step it will be useful), we are going to do the same for Team B. 

If Team A has more points than Team B, Team A's `win` value will be "yes", and "no" for Team B and viceversa. If the points are the same, "tied" will be issued.

*There is also the one point safety, but since this has never happened in the NFL and doesn't show up on our data, for our purposes it doesn't exist. 

In order, the preprocessing flow will go like this:
- Get new `win` feature
- Drop the now-unnecessary columns
- Drop postseason games
- Treat null values.

In [2]:
# Loading data from 2024
df = nfl.load_team_stats(seasons=2024, summary_level="week")
df

season,week,team,season_type,opponent_team,completions,attempts,passing_yards,passing_tds,passing_interceptions,sacks_suffered,sack_yards_lost,sack_fumbles,sack_fumbles_lost,passing_air_yards,passing_yards_after_catch,passing_first_downs,passing_epa,passing_cpoe,passing_2pt_conversions,carries,rushing_yards,rushing_tds,rushing_fumbles,rushing_fumbles_lost,rushing_first_downs,rushing_epa,rushing_2pt_conversions,receptions,targets,receiving_yards,receiving_tds,receiving_fumbles,receiving_fumbles_lost,receiving_air_yards,receiving_yards_after_catch,receiving_first_downs,…,punt_return_yards,kickoff_returns,kickoff_return_yards,fg_made,fg_att,fg_missed,fg_blocked,fg_long,fg_pct,fg_made_0_19,fg_made_20_29,fg_made_30_39,fg_made_40_49,fg_made_50_59,fg_made_60_,fg_missed_0_19,fg_missed_20_29,fg_missed_30_39,fg_missed_40_49,fg_missed_50_59,fg_missed_60_,fg_made_list,fg_missed_list,fg_blocked_list,fg_made_distance,fg_missed_distance,fg_blocked_distance,pat_made,pat_att,pat_missed,pat_blocked,pat_pct,gwfg_made,gwfg_att,gwfg_missed,gwfg_blocked,gwfg_distance
i32,i32,str,str,str,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,f64,f64,i32,i32,i32,i32,i32,i32,i32,f64,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,…,i32,i32,i32,i32,i32,i32,i32,i32,f64,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,str,str,str,i32,i32,i32,i32,i32,i32,i32,f64,i32,i32,i32,i32,i32
2024,1,"""ARI""","""REG""","""BUF""",21,31,162,1,0,4,-16,1,1,191,100,10,-1.18485,1.196628,0,25,124,1,0,0,7,4.98495,1,21,31,162,1,0,0,191,100,10,…,6,3,123,2,2,0,0,31,1.0,0,1,1,0,0,0,0,0,0,0,0,0,"""29;31""",,,60,0,0,2,2,0,0,1.0,0,0,0,0,0
2024,1,"""ATL""","""REG""","""PIT""",16,26,155,1,2,2,-18,0,0,140,73,8,-11.084853,-1.985487,0,22,89,0,0,0,7,-6.224773,0,16,23,155,1,0,0,140,73,8,…,28,2,56,1,1,0,0,24,1.0,0,1,0,0,0,0,0,0,0,0,0,0,"""24""",,,24,0,0,1,1,0,0,1.0,0,0,0,0,0
2024,1,"""BAL""","""REG""","""KC""",26,41,273,1,0,1,-6,1,1,267,166,11,-2.49116,-3.686818,0,32,185,1,0,0,13,8.459208,0,26,40,273,1,0,0,267,166,11,…,3,0,0,2,3,1,0,32,0.666667,0,1,1,0,0,0,0,0,0,0,1,0,"""25;32""","""53""",,57,53,0,2,2,0,0,1.0,0,0,0,0,0
2024,1,"""BUF""","""REG""","""ARI""",18,23,232,2,0,2,-10,1,1,166,125,11,9.161098,9.891942,0,33,130,2,0,0,10,3.49771,0,18,23,232,2,0,0,166,125,11,…,7,1,53,2,2,0,0,39,1.0,0,0,2,0,0,0,0,0,0,0,0,0,"""37;39""",,,76,0,0,4,4,0,0,1.0,0,0,0,0,0
2024,1,"""CAR""","""REG""","""NO""",13,31,161,0,2,4,-26,0,0,383,50,7,-18.467295,-17.186285,0,20,58,1,1,1,2,-8.230615,0,13,29,161,0,0,0,383,50,7,…,0,9,232,1,1,0,0,43,1.0,0,0,0,1,0,0,0,0,0,0,0,0,"""43""",,,43,0,0,1,1,0,0,1.0,0,0,0,0,0
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
2024,21,"""KC""","""POST""","""BUF""",18,26,245,1,0,2,-12,0,0,193,141,14,17.219217,5.19079,1,35,135,3,1,1,11,-4.106223,0,18,25,245,1,0,0,193,141,14,…,41,4,107,1,1,0,0,35,1.0,0,0,1,0,0,0,0,0,0,0,0,0,"""35""",,,35,0,0,3,3,0,0,1.0,0,0,0,0,0
2024,21,"""PHI""","""POST""","""WAS""",20,28,246,1,0,2,-16,0,0,161,146,12,16.010679,15.384739,0,36,229,7,0,0,14,7.140148,0,20,24,246,1,0,0,161,146,12,…,10,6,151,0,1,1,0,,0.0,0,0,0,0,0,0,0,0,0,0,1,0,,"""54""",,0,54,0,7,7,0,0,1.0,0,0,0,0,0
2024,21,"""WAS""","""POST""","""PHI""",30,49,278,1,1,3,-27,0,0,364,119,18,3.99305,-5.136315,1,25,99,1,0,0,4,-1.150234,0,30,47,278,1,2,2,364,119,18,…,0,4,98,3,3,0,0,46,1.0,0,0,1,2,0,0,0,0,0,0,0,0,"""34;46;42""",,,122,0,0,0,0,0,0,,0,0,0,0,0
2024,22,"""KC""","""POST""","""PHI""",21,32,257,3,2,6,-31,1,1,246,90,11,-15.294486,-3.069745,2,11,49,0,0,0,2,-2.119523,0,21,32,257,3,0,0,246,90,11,…,5,3,84,0,0,0,0,,,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,,0,0,0,0,0


In [12]:
# Pre-Processing Pipeline

# Function to get total scoring fromt Team A and Team B
def win_qm(iterator_df: pl.DataFrame) -> str:
    for i in iterator_df:
        print(i[0])
    return ""
    
    return 0
def preprocess_raw_data(df: pl.DataFrame) -> pl.DataFrame:
    processed_df = df
    iterator_df = processed_df.iter_rows()

    win_qm(iterator_df)
    
    
    return processed_df

In [15]:
preprocess_raw_data(df[0:5])

2024
2024
2024
2024
2024


season,week,team,season_type,opponent_team,completions,attempts,passing_yards,passing_tds,passing_interceptions,sacks_suffered,sack_yards_lost,sack_fumbles,sack_fumbles_lost,passing_air_yards,passing_yards_after_catch,passing_first_downs,passing_epa,passing_cpoe,passing_2pt_conversions,carries,rushing_yards,rushing_tds,rushing_fumbles,rushing_fumbles_lost,rushing_first_downs,rushing_epa,rushing_2pt_conversions,receptions,targets,receiving_yards,receiving_tds,receiving_fumbles,receiving_fumbles_lost,receiving_air_yards,receiving_yards_after_catch,receiving_first_downs,…,punt_return_yards,kickoff_returns,kickoff_return_yards,fg_made,fg_att,fg_missed,fg_blocked,fg_long,fg_pct,fg_made_0_19,fg_made_20_29,fg_made_30_39,fg_made_40_49,fg_made_50_59,fg_made_60_,fg_missed_0_19,fg_missed_20_29,fg_missed_30_39,fg_missed_40_49,fg_missed_50_59,fg_missed_60_,fg_made_list,fg_missed_list,fg_blocked_list,fg_made_distance,fg_missed_distance,fg_blocked_distance,pat_made,pat_att,pat_missed,pat_blocked,pat_pct,gwfg_made,gwfg_att,gwfg_missed,gwfg_blocked,gwfg_distance
i32,i32,str,str,str,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,f64,f64,i32,i32,i32,i32,i32,i32,i32,f64,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,…,i32,i32,i32,i32,i32,i32,i32,i32,f64,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,i32,str,str,str,i32,i32,i32,i32,i32,i32,i32,f64,i32,i32,i32,i32,i32
2024,1,"""ARI""","""REG""","""BUF""",21,31,162,1,0,4,-16,1,1,191,100,10,-1.18485,1.196628,0,25,124,1,0,0,7,4.98495,1,21,31,162,1,0,0,191,100,10,…,6,3,123,2,2,0,0,31,1.0,0,1,1,0,0,0,0,0,0,0,0,0,"""29;31""",,,60,0,0,2,2,0,0,1.0,0,0,0,0,0
2024,1,"""ATL""","""REG""","""PIT""",16,26,155,1,2,2,-18,0,0,140,73,8,-11.084853,-1.985487,0,22,89,0,0,0,7,-6.224773,0,16,23,155,1,0,0,140,73,8,…,28,2,56,1,1,0,0,24,1.0,0,1,0,0,0,0,0,0,0,0,0,0,"""24""",,,24,0,0,1,1,0,0,1.0,0,0,0,0,0
2024,1,"""BAL""","""REG""","""KC""",26,41,273,1,0,1,-6,1,1,267,166,11,-2.49116,-3.686818,0,32,185,1,0,0,13,8.459208,0,26,40,273,1,0,0,267,166,11,…,3,0,0,2,3,1,0,32,0.666667,0,1,1,0,0,0,0,0,0,0,1,0,"""25;32""","""53""",,57,53,0,2,2,0,0,1.0,0,0,0,0,0
2024,1,"""BUF""","""REG""","""ARI""",18,23,232,2,0,2,-10,1,1,166,125,11,9.161098,9.891942,0,33,130,2,0,0,10,3.49771,0,18,23,232,2,0,0,166,125,11,…,7,1,53,2,2,0,0,39,1.0,0,0,2,0,0,0,0,0,0,0,0,0,"""37;39""",,,76,0,0,4,4,0,0,1.0,0,0,0,0,0
2024,1,"""CAR""","""REG""","""NO""",13,31,161,0,2,4,-26,0,0,383,50,7,-18.467295,-17.186285,0,20,58,1,1,1,2,-8.230615,0,13,29,161,0,0,0,383,50,7,…,0,9,232,1,1,0,0,43,1.0,0,0,0,1,0,0,0,0,0,0,0,0,"""43""",,,43,0,0,1,1,0,0,1.0,0,0,0,0,0
