# NFL Draft Success Analysis

In this notebook we analyze NFL data regarding team draft performance and team success in the following years. Three data sets are used for this project:

1. NFL Team Draft Data (who each team drafted which year)
2. NFL Player Performance Data (how each player performed each year in the NFL)
3. NFL Team Performance Data (how each team performed each year)

## Import

In [1]:
#Import Packages
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
import datetime as dt
from pandas.plotting import scatter_matrix

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import BayesianRidge
from sklearn.cluster import Birch
from sklearn.cluster import KMeans

In [2]:
#Import files
ddf = pd.read_excel('team_draft_data.xlsx')
pdf = pd.read_excel('player_data.xlsx', sheet_name='final')
tpf = pd.read_excel('team_historical_records.xlsx', sheet_name='Cleaned')

## Analysis Description

In order to perform analysis on the data set, for each NFL draft class we need to sum the approximate value (AV) for each player over the first five years of their career. Then we do a sum for all the players for a single team as long as they were with that team. Therefore, players who were traded within that time frame will not add AV to the team after they have been traded. This will produce a cumulative 5 year team AV score for each team for each draft year.

The team AV score will then be compared to the team records for those years to observe trends in team draft performance and team performance. 

### Clean the dataframes: Player Yearly AV Data

In [3]:
#Remove the career totals
pdf = pdf[pdf.year != 'Career']
pdf

Unnamed: 0,year,age,team,av,name
0,2021,22,ARI,0,Zaven Collins
2,2021,21,ARI,0,Rondale Moore
4,2021,22,ARI,0,Marco Wilson
6,2021,22,ARI,0,Victor Dimukeje
7,2021,23,PHI,0,Tay Gowan
...,...,...,...,...,...
50056,2 yrs,2 yrs,DEN,8,Gus Frerotte
50057,2 yrs,2 yrs,STL,-2,Gus Frerotte
50058,1 yr,1 yr,CIN,1,Gus Frerotte
50059,1 yr,1 yr,DET,7,Gus Frerotte


In [4]:
#Remove total values for a team (values containing 'yr' or 'yrs') and just leave the year values 
pdf = pdf[(~pdf.year.str.contains('yrs', na=False)) & (~pdf.year.str.contains('yr', na=False))]
pdf

Unnamed: 0,year,age,team,av,name
0,2021,22,ARI,0,Zaven Collins
2,2021,21,ARI,0,Rondale Moore
4,2021,22,ARI,0,Marco Wilson
6,2021,22,ARI,0,Victor Dimukeje
7,2021,23,PHI,0,Tay Gowan
...,...,...,...,...,...
50048,2004,33,MIN,0,Gus Frerotte
50049,2005,34,MIA,9,Gus Frerotte
50050,2006,35,STL,0,Gus Frerotte
50051,2007,36,STL,-2,Gus Frerotte


In [5]:
#Remove any wierd characters in the year column (* and +)
pdf = pdf.replace(to_replace=r'\*', value='', regex=True)
pdf = pdf.replace(to_replace=r'\+' , value='', regex=True)
pdf[pdf.name == 'Budda Baker']

Unnamed: 0,year,age,team,av,name
68,2017,21,ARI,9,Budda Baker
69,2018,22,ARI,5,Budda Baker
70,2019,23,ARI,6,Budda Baker
71,2020,24,ARI,14,Budda Baker
72,2021,25,ARI,0,Budda Baker


In [6]:
#Check for the data types and ensure that the year, age, and av are not strings.
#If they are then convert into a numeric value

for column in pdf.columns:
    if column == 'year' or column == 'age' or column == 'av':
        if type(pdf[column][0]) == str:
            pdf[column] = pd.to_numeric(pdf[column])
            print(column)
            print(type(pdf[column][0]))

year
<class 'numpy.float64'>
age
<class 'numpy.float64'>


### Team Historical Data

In [7]:
# Only taking into consideration the years from 1994 to 2020, and relevant columns. 
tpf = tpf[(tpf['Year'] >= 1994) & (tpf['Year'] <= 2020)]
tpf = tpf[['Year','Lg','Tm','W','L','T','Div. Finish','Playoffs']]

In [8]:
# Removing the asterisk symbol in the Team's name and convert previous Team names to current ones.
tpf = tpf.replace(to_replace='\*', value='', regex=True)
tpf = tpf.replace(to_replace='Washington Redskins', value='Washington Football Team')
tpf = tpf.replace(to_replace='Oakland Raiders', value='Las Vegas Raiders')
tpf = tpf.replace(to_replace='San Diego Chargers', value='Los Angeles Chargers')
tpf = tpf.replace(to_replace='St. Louis Rams', value='Los Angeles Rams')
tpf = tpf.replace(to_replace='Houston Oilers', value='Tennessee Titans')

In [9]:
# Assigning numeric value to playoff stance. 
tpf['Playoffs'].replace({'Won SB':5,'Lost SB':4,'Lost Conf':3,'Lost Div':2,'Lost WC':1}, inplace=True)
tpf['Playoffs'] = tpf['Playoffs'].fillna(0)

### Team Draft Data

In [10]:
ddf = ddf[['draft_year','round','name','pick','team']]

In [11]:
ddf['team'].replace({'Chicago/St. Louis/Arizona Cardinals':'Arizona Cardinals','Baltimore/Indianapolis Colts':'Indianapolis Colts'
                    ,'Cleveland/LA/St. Louis Rams':'Los Angeles Rams','Houston Oilers/Tennessee Titans':'Tennessee Titans',
                    'Las Vegas/LA/Oakland Raiders':'Las Vegas Raiders','San Diego/Los Angeles Chargers':'Los Angeles Chargers'}
                   , inplace=True)