# 2 QB Data Wrangling

   ## 2.1 Contents<a id='2.1_Contents'></a>
* [2 Data wrangling](#2_Data_wrangling)
  * [2.1 Contents](#2.1_Contents)
  * [2.2 Introduction](#2.2_Introduction)
    * [2.2.1 Recap Of Data Science Problem](#2.2.1_Recap_Of_Data_Science_Problem)
    * [2.2.2 Introduction To Notebook](#2.2.2_Introduction_To_Notebook)
  * [2.3 Imports](#2.3_Imports)
  * [2.4 Objectives](#2.4_Objectives)
  * [2.5 Load The QB Data](#2.5_Load_The_QB_Data)
    * [2.5.1 Clean the QB Data](#2.5.1_Clean_the_QB_Data)
    * [2.5.2 Adding Rows](#2.5.2_Adding_Rows)
    * [2.5.3 Rearrange Columns](#2.5.3_Rearrange_Columns)
  * [2.6 Explore the Data](#2.6_Explore_the_Data)
    * [2.6.1 Create Values for the NaN Columns](#2.6.1_Create_Values_for_the_NaN_Columns)
      * [2.6.1.1 Create Values for td per cmp](#2.6.1.1_Create_Values_for_td_per_cmp)
      * [2.6.1.2 Create Values for ypc](#2.6.1.2_Create_Values_for_ypc)
      * [2.6.1.3 Create Values for QB Rating](#2.6.1.3_Create_Values_for_QB_Rating)
    * [2.6.2 Looking at NaN Values](#2.6.2_Looking_at_NaN_Values)
  * [2.7 Explore the Columns](#2.7_Explore_the_Columns) 
      * [2.7.1 QB Column](#2.7.1_QB_Column)
      * [2.7.2 CMP Column](#2.7.2_CMP_Column)
      * [2.7.3 ATT Column](#2.7.3_ATT_Column)
      * [2.7.4 Comp % Column](#2.7.4_Comp_%_Column)
   


## 2.2 Introduction<a id='2.2_Introduction'></a>

This section is about organizing the QB data and making sure it's well defined. Doing this will help imensely in the future to get what we need out of the data. Some data cleaning will be done at this stage, but we will get into cleaning more as we understand what we need in later parts of the data exploration.

### 2.2.1 Recap Of Data Science Problem<a id='2.2.1_Recap_Of_Data_Science_Problem'></a>

The purpose of this data science project is to evaluate Quarterbacks in the most efficient manner. A NFL team needs help in assessing the Quarterback position with more information than the usual Passer Rating. The QB position is arguably the most important position on the field as they touch the ball on nearly every play. Their decisions can have a huge effect on the outcome of the game. This model will help teams evaluate the QBs on whom to offer a contract, draft, bench or start.

### 2.2.2 Introduction To Notebook<a id='2.2.2_Introduction_To_Notebook'></a>

In Jupyter notebooks you can work in sequence. Here you can add, edit and rearrange cells around without needing to cross out code or wording. Every new cell is an oppurtunity to express a idea.

# 2.3 Imports<a id='2.3_Imports'></a>

Here we will use os, pandas, mathplotlib, seaborn and numpy to access and manipulate the QB data. Placing your imports all together at the start of your notebook means you only need to check one place for your notebook's dependencies.

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## 2.4 Objectives<a id='2.4_Objectives'></a>

There are four objectives:

1. Do we think we have the correct data for our desired question?

2. Have we identified the required target value?

3. Do we have features that are potentially useful?

4. Are there fundamental issues with the data?

## 2.5 Load The QB Data<a id='2.5_Load_The_QB_Data'></a>

In [2]:
# this is the supplied QB CSV data file 
df= pd.read_csv('QBStats_all.csv')

Now we need to see the first five lines to see what the data looks like just t make sure everythign looks ok! I noticed that there was NaN value for lg when I called the .info(). We will change that soon.

In [3]:
#call .head() to see the first few rows 
df.head()

Unnamed: 0,qb,att,cmp,yds,ypa,td,int,lg,sack,loss,rate,game_points,home_away,year
0,Boomer EsiasonB. Esiason,38,25,237.0,6.2,0,0,20,2.0,11.0,82.9,13,away,1996
1,Jim HarbaughJ. Harbaugh,25,16,196.0,7.8,2,1,35t,0.0,0.0,98.1,20,home,1996
2,Paul JustinP. Justin,8,5,53.0,6.6,0,0,30,1.0,11.0,81.8,20,home,1996
3,Jeff GeorgeJ. George,35,16,215.0,6.1,0,0,55,7.0,53.0,65.8,6,away,1996
4,Kerry CollinsK. Collins,31,17,198.0,6.4,2,0,30,4.0,12.0,95.9,29,home,1996


### 2.5.1 Clean the QB Data<a id='2.5.1_Clean_the_QB_Data'></a>

First lets rename 'lg' to 'long' and 'loss' to 'loss_yds' to be clear on what the column represents. 

In [4]:
#rename columns
dict = {'lg': 'long',
       'loss': 'loss_yds'}
 
df.rename(columns=dict,
          inplace=True)

Here we can see that there is a 't' in the lg column. This represents that the QB had their longest pass of 35 yard pass that resulted into a td. we need to get rid of all the 't' and turn lg into an integer. Also lets change the NaN values of long to 0 since a QB that did not complete a pass was filled in as NaN and needs to be numeric.

In [5]:
#remove all t from long
df['long'] = df['long'].str.replace('t', '')
#replace all NaN values with 0
df['long'] = df['long'].fillna(0)
#long into a int
df['long']=df.long.astype(int)

### 2.5.2 Adding Rows<a id='2.5.2_Adding_Rows'></a>

We also need to create a few more columns to help us grade a QB on their preformance. Here we will make a column for:
1. Completion % 
2. Yards per completion
3. Touchdown passes per attempts %
4. Touchdown passes per completion %

In [6]:
#create completion %
df['comp %'] = df['cmp'] / df['att'] 
df['comp %'] = round(df['comp %']*100,1)


#create yards per completion
df['ypc'] = df['yds'] / df['cmp']
df['ypc'] =round(df['ypc'], 1)


#Touchdown Passes per attempts % = Touchdown Passes / Pass Attempts
df['td_per_att'] = df['td'] / df['att']
df['td_per_att'] =round(df['td_per_att'], 3)


#Touchdown Passes per completion % = Touchdown Passes / Pass completions
df['td_per_cmp'] = df['td'] / df['cmp']
df['td_per_cmp'] =round(df['td_per_cmp'], 3)

### 2.5.3 Rearrange Columns<a id='2.5.3_Rearrange_Columns'></a>

Lets rearrange the columns to make it look nice and put relavant columns next to each other.

In [7]:
#reargange columns
df = df[['qb', 'cmp', 'att', 'comp %', 'yds', 'td', 'int', 'rate',  'long', 'sack', 'game_points', 
          'ypa','ypc', 'td_per_cmp', 'td_per_att', 'loss_yds', 'home_away', 'year']]

Lets take a look at the four new columns, the rearganed columns and to see if the 't' was removed from the long column.

In [8]:
df.head()

Unnamed: 0,qb,cmp,att,comp %,yds,td,int,rate,long,sack,game_points,ypa,ypc,td_per_cmp,td_per_att,loss_yds,home_away,year
0,Boomer EsiasonB. Esiason,25,38,65.8,237.0,0,0,82.9,20,2.0,13,6.2,9.5,0.0,0.0,11.0,away,1996
1,Jim HarbaughJ. Harbaugh,16,25,64.0,196.0,2,1,98.1,35,0.0,20,7.8,12.2,0.125,0.08,0.0,home,1996
2,Paul JustinP. Justin,5,8,62.5,53.0,0,0,81.8,30,1.0,20,6.6,10.6,0.0,0.0,11.0,home,1996
3,Jeff GeorgeJ. George,16,35,45.7,215.0,0,0,65.8,55,7.0,6,6.1,13.4,0.0,0.0,53.0,away,1996
4,Kerry CollinsK. Collins,17,31,54.8,198.0,2,0,95.9,30,4.0,29,6.4,11.6,0.118,0.065,12.0,home,1996


Looks good!

## 2.6 Explore the Data<a id='2.6_Explore_the_Data'></a>

Count the number of missing values in each column and sort them.

In [9]:
missing = pd.concat([df.isnull().sum(), 100 * df.isnull().mean()], axis=1)
missing.columns=["count", '%']
missing.sort_values(by='count', ascending=False)

Unnamed: 0,count,%
td_per_cmp,719,5.451926
ypc,719,5.451926
comp %,97,0.735517
td_per_att,97,0.735517
sack,17,0.128905
loss_yds,17,0.128905
rate,17,0.128905
game_points,0,0.0
home_away,0,0.0
ypa,0,0.0


td_per_comp and ypc, (yards per catch) have the most missing values at 5.45%. Comp % and td_per_att are missing .735% of their respective data. Sack, loss_yds, and rate are missing an equal amount as well at .128%. We need to investigate the missing data and decide what to enter for this missing data. Sucn as zero, the mean or something else.

In [10]:
df[df.isnull().any(axis=1)]

Unnamed: 0,qb,cmp,att,comp %,yds,td,int,rate,long,sack,game_points,ypa,ypc,td_per_cmp,td_per_att,loss_yds,home_away,year
19,Kordell StewartK. Stewart,0,2,0.0,0.0,0,0,39.6,0,0.0,9,0.0,,,0.0,0.0,away,1996
28,Elvis GrbacE. Grbac,0,2,0.0,0.0,0,0,39.6,0,0.0,27,0.0,,,0.0,0.0,home,1996
32,Bill MusgraveB. Musgrave,0,1,0.0,0.0,0,0,39.6,0,0.0,31,0.0,,,0.0,0.0,home,1996
55,Tom TupaT. Tupa,0,1,0.0,0.0,0,0,39.6,0,0.0,10,0.0,,,0.0,0.0,away,1996
66,Sean SalisburyS. Salisbury,0,1,0.0,0.0,0,1,0.0,0,0.0,27,0.0,,,0.0,0.0,home,1996
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13126,Larry FitzgeraldL. Fitzgerald,0,1,0.0,0.0,0,0,39.6,0,0.0,34,0.0,,,0.0,0.0,away,2016
13156,Antonio MorrisonA. Morrison,0,1,0.0,0.0,0,0,39.6,0,0.0,24,0.0,,,0.0,0.0,home,2016
13158,Jimmy GaroppoloJ. Garoppolo,0,1,0.0,0.0,0,0,39.6,0,0.0,35,0.0,,,0.0,0.0,away,2016
13164,Shaun HillS. Hill,0,1,0.0,0.0,0,0,39.6,0,0.0,38,0.0,,,0.0,0.0,home,2016


Here we can see that the 97 comp % and td_per att are NaN values because they did not throw a pass but were credited as having a sack, (where they were going to attempt a pass but never even made it). These 97 data points dont really help us to determine a Qb rating system. We should get rid of these values since they are outliers during the data clean phase.

The loss_yds and sack are correlated with 17 NaN values since the QB is accredited with loss yards for every sack that occurs. A sack is when a QB drops back to pass but is tackle behind the line of scrimmage. We will fill them in with 0s since they still have other relavant data.

The rate, td_per_cmp and ypc columns do not show any NaN values.

### 2.6.1 Create Values for the NaN Columns<a id='2.6.1_Create_Values_for_the_NaN_Columns'></a>

1. Lets take a look at the Interception 'int' column first. AN interception is when a QB throws the ball to the other team.

In [11]:
df['int'].value_counts()

0     6384
1     4089
2     1870
3      624
4      166
5       32
--      17
6        5
7        1
Name: int, dtype: int64

Here we will enter 0 for the '--'. We can assume they meant that the QB did not have an interception for that particular game.

In [12]:
df['int'] = df['int'].replace('--', 0)

We need to turn the 'int' into a integer as well.

In [13]:
#int into float
df['int']=df.int.astype(int)

### 2.6.1.1 Create Values for td per cmp<a id='2.6.1.1_Create_Values_for_td_per_cmp'></a>

2. Now lets look at the touchdown per completion/'td_per_cmp' column.

In [14]:
pd.set_option("display.max_rows", None)
df['td_per_cmp'].value_counts()

0.000    4077
0.100     308
0.083     304
0.111     304
0.125     291
0.077     286
0.067     285
0.059     283
0.091     283
0.143     283
0.071     281
0.062     278
0.056     265
0.053     246
0.045     240
0.050     234
0.048     232
0.043     207
0.167     205
0.095     181
0.105     177
0.087     162
0.118     162
0.042     160
0.133     153
0.200     139
1.000     136
0.080     127
0.040     127
0.038     104
0.074      99
0.136      86
0.250      77
0.036      74
0.037      74
0.150      71
0.154      71
0.182      70
0.034      68
0.130      65
0.069      63
0.120      61
0.158      61
0.115      57
0.188      55
0.176      50
0.033      49
0.103      43
0.222      42
0.333      41
0.107      37
0.032      37
0.065      29
0.029      29
0.031      29
0.174      27
0.214      24
0.097      22
0.190      22
0.160      21
0.057      21
0.231      21
0.061      20
0.148      20
0.273      19
0.235      19
0.500      18
0.138      17
0.030      17
0.211      15
0.286      15
0.094 

In [15]:
#replace all td per cmp NaN values with 0
df['td_per_cmp'] = df['td_per_cmp'].fillna(0)

The 'td_per_cmp' column looks good!

### 2.6.1.2 Create Values for ypc<a id='2.6.1.2_Create_Values_for_ypc'></a>

3. Lets look at the yards per completion/'ypc' column

In [16]:
pd.set_option("display.max_rows", None)
df['ypc'].value_counts()

 11.0    227
 10.0    225
 11.1    221
 11.5    203
 10.8    198
 11.2    197
 10.6    195
 12.2    195
 10.4    195
 12.0    193
 11.6    186
 10.5    183
 11.4    183
 11.3    182
 11.8    179
 10.2    176
 10.9    176
 9.8     174
 10.7    171
 11.7    169
 9.6     167
 9.0     166
 10.1    165
 9.9     162
 12.1    162
 9.4     159
 12.4    158
 11.9    156
 10.3    156
 13.0    152
 12.8    150
 9.5     147
 9.3     144
 12.7    140
 12.5    139
 12.9    138
 9.2     135
 12.3    134
 9.7     134
 12.6    130
 8.8     130
 9.1     129
 13.2    126
 14.0    125
 8.0     122
 8.9     116
 13.6    115
 13.5    113
 13.4    112
 13.8    111
 13.1    106
 8.6     103
 8.7     101
 8.2      97
 8.5      94
 14.2     93
 13.3     93
 13.7     92
 8.4      88
 13.9     87
 14.4     81
 14.1     79
 14.8     78
 7.0      77
 15.0     77
 14.3     75
 7.8      73
 14.5     72
 14.7     71
 14.6     71
 8.3      68
 8.1      65
 14.9     61
 7.5      60
 15.2     60
 6.0      60
 7.6      60

In [17]:
#replace all ypc NaN values with 0
df['ypc'] = df['ypc'].fillna(0)

Looks like this column is good! There are no NaN values.

### 2.6.1.3 Create Values for QB Rating<a id='2.6.1.3_Create_Values_for_QB_Rating'></a>

4. Lets look at the  QB rating/'rate' data. A standard QB rating is based on four metrics: completion percentage, yards per passing attempt, touchdown percentage, and interception percentage.

In [18]:
pd.set_option("display.max_rows", None)
df['rate'].value_counts()

39.6     579
0.0      221
118.8    198
158.3    130
56.2      83
79.2      79
95.8      51
91.7      47
87.5      41
42.4      40
104.2     37
68.8      35
97.9      34
108.3     34
83.3      33
77.1      32
100.0     32
85.4      31
72.9      30
89.6      30
60.4      29
116.7     29
81.1      28
78.5      27
81.8      27
112.5     27
81.2      26
70.1      26
87.8      25
76.5      25
89.2      25
64.6      25
88.0      25
93.1      25
47.9      25
58.3      24
75.4      24
85.3      23
72.0      23
109.7     23
68.2      22
81.9      22
72.8      22
57.0      22
88.4      22
74.8      22
87.0      21
74.1      21
68.5      21
85.0      21
102.1     21
94.8      21
80.2      21
84.8      21
81.7      21
82.3      21
79.0      21
70.8      21
79.6      21
99.3      21
82.6      20
53.0      20
94.4      20
105.2     20
85.7      20
68.9      20
71.5      20
69.0      20
82.9      20
93.8      20
65.6      20
104.9     20
96.6      20
84.2      20
91.2      20
80.6      20
82.1      20

Looks like there are no NaN values. We can fill in NaN with 0 just to be sure. Here we have lot of different rate values which is good for our data. This could give us a wide range to determine what makes a QB good or bad. 

In [19]:
#replace all rate NaN values with 0
df['rate'] = df['rate'].fillna(0)

### 2.6.2 Looking at NaN Values<a id='2.6.2_Looking_at_NaN_Values'></a>

In [20]:
missing = pd.concat([df.isnull().sum(), 100 * df.isnull().mean()], axis=1)
missing.columns=["count", '%']
missing.sort_values(by='count', ascending=False)

Unnamed: 0,count,%
comp %,97,0.735517
td_per_att,97,0.735517
sack,17,0.128905
loss_yds,17,0.128905
game_points,0,0.0
home_away,0,0.0
td_per_cmp,0,0.0
ypc,0,0.0
ypa,0,0.0
qb,0,0.0


During the data cleaning phase we will determine what to do with the comp %, td_per_att, sack and loss_yds columns.

## 2.7 Explore the Columns<a id='2.7_Explore_the_Columns'></a>

Here we will find out what the values, range, mean, counts, unique values, median, mode and standard deviation is for each column.

### 2.7.1 QB Column<a id='2.7.1_QB_Column'></a>

In [42]:
df['qb'].dtypes

dtype('O')

The QB column is an object.

In [84]:
df['qb'].unique()

array(['Boomer EsiasonB.\xa0Esiason', 'Jim HarbaughJ.\xa0Harbaugh',
       'Paul JustinP.\xa0Justin', 'Jeff GeorgeJ.\xa0George',
       'Kerry CollinsK.\xa0Collins', 'Jeff BlakeJ.\xa0Blake',
       'Steve WalshS.\xa0Walsh', 'Scott MitchellS.\xa0Mitchell',
       'Mark RoyalsM.\xa0Royals', 'Brad JohnsonB.\xa0Johnson',
       'Warren MoonW.\xa0Moon', 'Steve BonoS.\xa0Bono',
       'Chris ChandlerC.\xa0Chandler', 'Billy Joe HobertB.\xa0Hobert',
       'Vinny TestaverdeV.\xa0Testaverde', 'Rodney PeeteR.\xa0Peete',
       'Gus FrerotteG.\xa0Frerotte', 'Jim MillerJ.\xa0Miller',
       'Mike TomczakM.\xa0Tomczak', 'Kordell StewartK.\xa0Stewart',
       'Mark BrunellM.\xa0Brunell', 'Brett FavreB.\xa0Favre',
       'Jim McMahonJ.\xa0McMahon', 'Trent DilferT.\xa0Dilfer',
       'Drew BledsoeD.\xa0Bledsoe', 'Dan MarinoD.\xa0Marino',
       'Jim EverettJ.\xa0Everett', 'Steve YoungS.\xa0Young',
       'Elvis GrbacE.\xa0Grbac', "Neil O'DonnellN.\xa0O'Donnell",
       'Frank ReichF.\xa0Reich', 'John 

There are 13,187 unique QB values.

In [37]:
df['qb'].value_counts().head()

Peyton ManningP. Manning    265
Tom BradyT. Brady           237
Brett FavreB. Favre         235
Drew BreesD. Brees          232
Eli ManningE. Manning       199
Name: qb, dtype: int64

Peyton Manning played in the most games from 1996-2016.

In [38]:
qb_rating_means = df.groupby(by='qb')['rate'].mean().sort_values(ascending=False)
qb_rating_means.head()

qb
Josh MillerJ. Miller          158.3
Tyrone WheatleyT. Wheatley    158.3
Brandon BanksB. Banks         158.3
Ken DilgerK. Dilger           158.3
Doug BaldwinD. Baldwin        158.3
Name: rate, dtype: float64

Looks like a few people have a perfect QB rating of 158.3.

### 2.7.2 CMP Column<a id='2.7.2_CMP_Column'></a>

In [47]:
df['cmp'].dtypes

dtype('int64')

It is an integer. Which it should be!

In [51]:
df['cmp'].max()

58

In [52]:
df['cmp'].min()

-6

Here we will have to get rid of the -6 value because it is impossible to throw -6 completions. Lets see if there are any other rows with a value below 0.

In [49]:
df['cmp'].max()-df['cmp'].min()

64

In [64]:
df[df['cmp'].isin([-6,-5,-4,-3,-2,-1])]

Unnamed: 0,qb,cmp,att,comp %,yds,td,int,rate,long,sack,game_points,ypa,ypc,td_per_cmp,td_per_att,loss_yds,home_away,year
8597,Chris GreisenC. Greisen,-1,1,-100.0,-1,0,0,0.0,0,0,24,-1.0,1.0,-0.0,0.0,0,away,2009
8602,LaBrandon ToefieldL. Toefield,-1,3,-33.3,0,0,0,0.0,0,0,6,1.0,0.3,-0.0,0.0,0,home,2009
8647,Brooks BollingerB. Bollinger,-6,3,-200.0,-2,0,0,0.0,0,0,34,0.0,0.3,-0.0,0.0,0,away,2009


We will have to get rid of these three rows as they are incorrect.

In [70]:
df.drop([8597,8602,8647], axis=0, inplace=True)

In [72]:
df['cmp'].min()

0

It worked!

In [77]:
df['cmp'].mean() 

16.122515551509636

In [76]:
df['cmp'].median()

17.0

In [98]:
df['cmp'].mode()

0    0
dtype: int64

In [79]:
df['cmp'].std()

8.767621606454494

In [86]:
df['cmp'].unique()

array([25, 16,  5, 17, 23, 13, 20,  1, 21, 19, 12,  9,  3,  0, 18,  7,  4,
       24, 11, 22, 14, 27, 15, 26,  2, 10,  6,  8, 37, 29, 28, 32, 31, 33,
       30, 35, 34, 38, 42, 36, 39, 40, 43, 46, 58, 41], dtype=int64)

In [101]:
df['cmp'].value_counts().head()

0     719
18    687
17    676
1     666
21    661
Name: cmp, dtype: int64

The mean and median are very close. The most frequent number of completions is 0. The standard deviation is pretty spread out. We have a unique range of values from 0-58. 

### 2.7.3 ATT Column<a id='2.7.3_ATT_Column'></a>

In [88]:
df['att'].dtypes

dtype('int64')

In [90]:
df['att'].max()

69

In [91]:
df['att'].min()

0

In [92]:
df['att'].mean() 

26.877712031558186

In [93]:
df['att'].median() 

29.0

In [99]:
df['att'].mode() 

0    1
dtype: int64

In [103]:
df['att'].value_counts().head()

1     1003
30     495
33     479
35     474
34     472
Name: att, dtype: int64

In [94]:
df['att'].std() 

13.581235442088403

In [96]:
df['att'].unique()

array([38, 25,  8, 35, 31, 40, 41,  1, 23, 14, 37, 29, 26, 33, 34, 17,  4,
        2, 27, 30, 22, 13, 12, 24, 21, 32, 20, 46, 19,  3, 28, 11,  6,  0,
       18, 44, 39, 36,  5,  7, 42, 54, 58, 16, 45, 10, 43, 48, 50, 15, 61,
       52, 51,  9, 49, 59, 47, 60, 53, 56, 57, 63, 55, 69, 62, 68, 64, 67,
       65], dtype=int64)

The mean and median are relatively close. One attempt is the most out of the value counts. The standard deviation is pretty spread out as well. We have a unique range of values from 0-69. 

### 2.7.4 Comp % Column<a id='2.7.4_Comp_%_Column'></a>

In [43]:


#yds into a int
df['yds']=df.yds.astype(int)



#replace all sack NaN values with 0
df['sack'] = df['sack'].fillna(0)
#sack into int
df['sack']=df.sack.astype(int)



#replace all comp % NaN values with 0
df['comp %'] = df['comp %'].fillna(0)



#replace all td per att NaN values with 0
df['td_per_att'] = df['td_per_att'].fillna(0)

#replace all loss yds NaN values with 0
df['loss_yds'] = df['loss_yds'].fillna(0)
#loss_yds into int
df['loss_yds']=df.loss_yds.astype(int)

In [40]:
df.T.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,13178,13179,13180,13181,13182,13183,13184,13185,13186,13187
count,18,18,18,18,18,18,18,18,18,18,...,18,18,18,18,18,18,18,18,18,18
unique,15,17,15,15,18,18,16,17,9,17,...,9,6,17,12,17,16,16,17,15,18
top,0,0,0,0,Kerry CollinsK. Collins,Jeff BlakeJ. Blake,0,13,0,1,...,0,0,1,0,0,2,2,1,0,Colin KaepernickC. Kaepernick
freq,4,2,4,4,1,1,3,2,6,2,...,6,12,2,6,2,2,2,2,4,1


In [31]:
#check if any QBs didnt attempt a pass/null values for a qb.
df['att'].isnull().values.any()


False

In [32]:
df['long'].isnull().values.any()

False

In [33]:
print(type(df['sack']))

<class 'pandas.core.series.Series'>


In [34]:
df['year'].isnull().values.any()

False

In [35]:
df['long'].dtypes


dtype('int32')

In [36]:
df['sack'].dtypes

dtype('int32')