We want to provide a thorough analysis of past NFL Combine test statistics for NFL coaches, owners, and scouts to use going into the NFL draft. To do so, we'll create a dashboard that allows users to get summary statistics and a visualization of results from past Combines. They'll also be given a recommendation for what Combine statistics they may want to look for when considering players at a certain position.

Throughout this notebook, we are going to clean the NFL Combine data from 2000-2017 to accomplish those goals and help NFL owners, coaches, and scouts make informed decisions during the draft. 

Once we have a clean dataframe, we'll put it into a python file and use that to create a dashboard.

We first import all the necessary packages used throughout the project. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

We have all of the packages that we'll need in the notebook.

Now let's load in our data file with combine statistics for 2000-2017, save it to a dataframe, and inspect the first 5 rows to get an idea of the data.

In [None]:
all_data = pd.read_csv('combine_data_since_2000.csv') # read in the csv file with the dataset
all_data.head() # print out the first 5 rows

Unnamed: 0,Player,Pos,Ht,Wt,Forty,Vertical,BenchReps,BroadJump,Cone,Shuttle,Year,Pfr_ID,AV,Team,Round,Pick
0,John Abraham,OLB,76,252,4.55,,,,,,2000,AbraJo00,26,New York Jets,1.0,13.0
1,Shaun Alexander,RB,72,218,4.58,,,,,,2000,AlexSh00,26,Seattle Seahawks,1.0,19.0
2,Darnell Alford,OT,76,334,5.56,25.0,23.0,94.0,8.48,4.98,2000,AlfoDa20,0,Kansas City Chiefs,6.0,188.0
3,Kyle Allamon,TE,74,253,4.97,29.0,,104.0,7.29,4.49,2000,,0,,,
4,Rashard Anderson,CB,74,206,4.55,34.0,,123.0,7.18,4.15,2000,AndeRa21,6,Carolina Panthers,1.0,23.0


The data is in the all_data dataframe. Each row represents a player. The columns show that we have player name, position, height, weight, a unique identifier, and the statistics for the combine tests. There are also columns for the year they were drafted in, the team that drafted them, and their round and pick numbers. 

The forty, cone, and shuttle columns give statistics for agility tests at the Combine. The vertical, bench reps, and broad jump columns give statistics for strength-based tests at the Combine.

All of the columns, except for Round, have the appropriate datatype. Some players have nan values for some of the tests. There are also some players with nan values for team, round, and pick. These players were not drafted that year.

We only want to look at players who have been drafted, so we'll make a dataframe with only rows that do not have an nan value in the Team column. We're also going to change Round to be an integer datatype. We'll print the dataframe out after doing so.

In [None]:
nfl_data['Round'] =nfl_data['Round'].astype(int) # change round to integer
nfl_data = all_data[all_data['Team'].notna()] # keep only players who have a team in the Team column
nfl_data # print out the dataframe

Unnamed: 0,Player,Pos,Ht,Wt,Forty,Vertical,BenchReps,BroadJump,Cone,Shuttle,Year,Pfr_ID,AV,Team,Round,Pick
0,John Abraham,OLB,76,252,4.55,,,,,,2000,AbraJo00,26,New York Jets,1.0,13.0
1,Shaun Alexander,RB,72,218,4.58,,,,,,2000,AlexSh00,26,Seattle Seahawks,1.0,19.0
2,Darnell Alford,OT,76,334,5.56,25.0,23.0,94.0,8.48,4.98,2000,AlfoDa20,0,Kansas City Chiefs,6.0,188.0
4,Rashard Anderson,CB,74,206,4.55,34.0,,123.0,7.18,4.15,2000,AndeRa21,6,Carolina Panthers,1.0,23.0
6,LaVar Arrington,OLB,75,250,4.53,,,,,,2000,ArriLa00,31,Washington Redskins,1.0,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5876,Quincy Wilson-02,CB,73,211,4.54,32.0,14.0,118.0,6.86,4.02,2017,WilsQu01,2,Indianapolis Colts,2.0,46.0
5878,Howard Wilson,CB,73,184,4.57,33.5,,119.0,6.68,3.94,2017,WilsHo00,0,Cleveland Browns,4.0,126.0
5879,Ahkello Witherspoon,CB,75,198,4.45,40.5,,127.0,6.93,4.13,2017,WithAh00,3,San Francisco 49ers,3.0,66.0
5880,Xavier Woods,SS,71,197,4.54,33.5,19.0,122.0,6.72,4.13,2017,WoodXa00,3,Dallas Cowboys,6.0,191.0


So now our dataframe has data only for the players who were drafted, and the Round column is an integer datatype. There are still nan values for some tests for some players, but we'll keep them for now. 

We're going to focus on only five of the more well-known positions in the draft: quarterback (QB), defensive tackle (DT), running back (RB), wide receiver (WR), and outside linebacker (OLB).

We'll make five new dataframes that contain data for players for each of these five positions so we can analyze them separately.

In [None]:
qb_df = nfl_data.loc[nfl_data['Pos']== 'QB'] # get only rows for quarter backs
dt_df = nfl_data.loc[nfl_data['Pos']== 'DT'] # get only rows for defensive tackles
rb_df = nfl_data.loc[nfl_data['Pos']== 'RB'] # get only rows for running backs
wr_df = nfl_data.loc[nfl_data['Pos']== 'WR'] # get only rows for wide receivers
olb_df = nfl_data.loc[nfl_data['Pos']== 'OLB'] # get rows for only outside linebackers

There's now 5 dataframes: one for quarterbacks (qb_df), one for defensive tackles (dt_df), one for running backs (rb_df), one for wide receivers (wr_df), and one for outside linebackers (olb_df).

We want to look at only the 3 most important combine tests for these positions, so we'll keep columns in these dataframes with only those statistics. We'll also keep player name, position, height and weight, the year he was drafted, and his round and pick numbers.

We're going to look at the following statistics for these positions:

Running Back= Forty, Cone, Shuttle

Quarterback= Forty, Shuttle, Benchreps

Defensive Tackle = Benchreps, Cone, Forty 

Wide Receiver = Forty, Vertical, Shuttle

Outside Linebacker = Shuttle, Cone, Vertical

Let's change our quarterback dataframe to have only those columns for statistics we care about. We'll make a copy of that dataframe to use for manipulation when it comes time to exploring and analyzing the data, and then we'll print out the quarterback dataframe.

In [None]:
qb_df = qb_df[['Player', 'Ht', 'Wt', 'Forty', 'Shuttle', 'BenchReps', 'Year', 'Round', 'Pick']] # keep columns for forty, shuttle run, bench reps
qb_df2 = qb_df.copy()
qb_df

The QB dataframe has only statistics for the forty, shuttle test, and bench reps. Again, some of the players have nans for some of the statistics, but we'll leave them for now. We also have a copy of the dataframe that we can manipulate later on.

Let's change our wide receiver dataframe to make sure it has just the three combine test statistics we want. We'll again make a copy of the dataframe and print it out.

In [None]:
wr_df = wr_df[['Player', 'Ht', 'Wt', 'Forty', 'Shuttle', 'Vertical', 'Year', 'Round', 'Pick']] # keep columns for forty, shuttle run, and vertical jump
wr_df2 = wr_df.copy()
wr_df

The wide receiver dataframe has staistics for just the forty, shuttle, and vertical combine tests now. We'll leave the nan values, and we have a copy. 

We'll do the same for the running back dataframe and keep columns with the forty, shuttle, and cone test statistics. Again, we'll create a copy of it and print out the results.

In [None]:
rb_df = rb_df[['Player', 'Ht', 'Wt', 'Forty', 'Shuttle', 'Cone', 'Year', 'Round', 'Pick']] # keep columns for forty, cone, and shuttle run
rb_df2 = rb_df.copy()
rb_df

Ignoring the nan values, the running back daatframe has columns for the forty, shuttle, and cone tests now. Again, we've made a copy which we will manipulate later.

We'll turn to the defensive tackle dataframe now. We're going to keep the forty, cone, and bench reps statistics here and make a copy.

In [None]:
dt_df = dt_df[['Player', 'Ht', 'Wt', 'Forty', 'Cone', 'BenchReps', 'Year', 'Round', 'Pick']] # keep columns for forty, cone, and bench reps
dt_df2 = dt_df.copy()
dt_df

The defensive tackle dataframe has just forty, cone, and bench reps statistics now. We're still ignoring the nan values and have a copy of the dataframe. 

We'll finish off with the outside lineback dataframe. We'll keep only statistics for the vertical, shuttle, and cone tests and make a copy of this one too then print it out.

In [None]:
olb_df = olb_df[['Player', 'Ht', 'Wt', 'Vertical', 'Shuttle', 'Cone', 'Year', 'Round', 'Pick']] # keep columns for vertical jump, shuttle run, and cone
olb_df2 = olb_df.copy()
olb_df

So now the outside linebacker dataframe contains columns with just vertical jump, shuttle, and cone statistics (ignoring nan values). We have a copy of this one as well and it again which we will use it later. 

The five individual dataframes have only the most important combine test statistics that NFL coaches and owners would be interested in analyzing for each position.

These dataframes will now be used in the python file to create the dashboard. 

You can access the dashboard using this link: https://secret-gorge-55796.herokuapp.com/