# NFL Draft Success Analysis

In this notebook we analyze NFL data regarding team draft performance and team success in the following years. Three data sets are used for this project:

1. NFL Team Draft Data (who each team drafted which year)
2. NFL Player Performance Data (how each player performed each year in the NFL)
3. NFL Team Performance Data (how each team performed each year)

## Import

In [55]:
#Import Packages
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import BayesianRidge
from sklearn.cluster import Birch
from sklearn.cluster import KMeans

In [48]:
#Import files
ddf = pd.read_excel('team_draft_data.xlsx')
pdf = pd.read_excel('player_data.xlsx', sheet_name='final')
tpf = pd.read_excel('team_historical_records.xlsx')

## Analysis Description

In order to perform analysis on the data set, for each NFL draft class we need to sum the approximate value (AV) for each player over the first five years of their career. Then we do a sum for all the players for a single team as long as they were with that team. Therefore, players who were traded within that time frame will not add AV to the team after they have been traded. This will produce a cumulative 5 year team AV score for each team for each draft year.

The team AV score will then be compared to the team records for those years to observe trends in team draft performance and team performance. 

### Clean the dataframes: Player Yearly AV Data

In [49]:
#Remove the career totals
pdf = pdf[pdf.year != 'Career']
pdf

Unnamed: 0,name,team,pos,year,age,av
0,Zaven Collins,ARI,lb,2021,22,0.0
2,Rondale Moore,ARI,wr,2021,21,0.0
4,Marco Wilson,ARI,cb,2021,22,0.0
6,Victor Dimukeje,ARI,,2021,22,0.0
7,Tay Gowan,PHI,,2021,23,0.0
...,...,...,...,...,...,...
24715,Scott Turner,WAS,,1995,23,0.0
24716,Scott Turner,SDG,,1999,27,0.0
24718,Scott Turner,SDG,,4 yrs,4 yrs,0.0
24719,Scott Turner,WAS,,3 yrs,3 yrs,0.0


In [50]:
#Remove total values for a team (values containing 'yr' or 'yrs') and just leave the year values 
pdf = pdf[(~pdf.year.str.contains('yrs', na=False)) & (~pdf.year.str.contains('yr', na=False))]
pdf

Unnamed: 0,name,team,pos,year,age,av
0,Zaven Collins,ARI,lb,2021,22,0.0
2,Rondale Moore,ARI,wr,2021,21,0.0
4,Marco Wilson,ARI,cb,2021,22,0.0
6,Victor Dimukeje,ARI,,2021,22,0.0
7,Tay Gowan,PHI,,2021,23,0.0
...,...,...,...,...,...,...
24710,Jamie Asher,WAS,TE,1997,25,0.0
24711,Jamie Asher,WAS,TE,1998,26,0.0
24713,Rich Owens,MIA,"DE,LDE",1999,27,0.0
24715,Scott Turner,WAS,,1995,23,0.0


In [58]:
#Remove any wierd characters in the year column (* and +)
pdf = pdf.replace(to_replace=r'\*', value='', regex=True)
pdf = pdf.replace(to_replace=r'\+' , value='', regex=True)
pdf[pdf.name == 'Budda Baker']

Unnamed: 0,name,team,pos,year,age,av
71,Budda Baker,ARI,ss,2017,21,9.0
72,Budda Baker,ARI,SS,2018,22,5.0
73,Budda Baker,ARI,FS,2019,23,6.0
74,Budda Baker,ARI,SS,2020,24,18.0
75,Budda Baker,ARI,saf,2021,25,0.0


In [59]:
#Check for the data types and ensure that the year, age, and av are not strings.
#If they are then convert into a numeric value

for column in pdf.columns:
    if column == 'year' or column == 'age' or column == 'av':
        if type(pdf[column][0]) == str:
            pdf[column] = pd.to_numeric(pdf[column])
            print(column)
            print(type(pdf[column][0]))

year
<class 'numpy.float64'>
age
<class 'numpy.float64'>


numpy.float64