### All days of the challange:

Here's what we're going to do today:

* [Take a first look at the data](#Take-a-first-look-at-the-data)
* [See how many missing data points we have](#See-how-many-missing-data-points-we-have)
* [Figure out why the data is missing](#Figure-out-why-the-data-is-missing)
* [Drop missing values](#Drop-missing-values)
* [Filling in missing values](#Filling-in-missing-values)

Let's get started!

# Take a first look at the data
________

The first thing we'll need to do is load in the libraries and datasets we'll be using. For today, I'll be using a dataset of events that occured in American Football games for demonstration, and you'll be using a dataset of building permits issued in San Francisco.

> **Important!** Make sure you run this cell yourself or the rest of your code won't work!

In [26]:
# libraries we'll use
import pandas as pd
import numpy as np

# read in all our data NFL nfl1-> Play by Play 2009-2017 (v4), nfl2 -> NFL Play by Play 2009-2016 (v3)
nfl1 = pd.read_csv("NFL Play by Play 2009-2016 (v3).csv" , low_memory = False)
nfl2 = pd.read_csv("NFL Play by Play 2009-2017 (v4).csv", low_memory = False)

pd.set_option('display.max_columns', 102)

The first thing I do when I get a new dataset is take a look at some of it. This lets me see that it all read in correctly and get an idea of what's going on with the data. In this case, I'm looking to see if I see any missing values, which will be reprsented with `NaN` or `None`.

In [29]:
# look at a few rows of the nfl1_data file. I can see a handful of missing data already!
nfl1.head()

Unnamed: 0,Date,GameID,Drive,qtr,down,time,TimeUnder,TimeSecs,PlayTimeDiff,SideofField,yrdln,yrdline100,ydstogo,ydsnet,GoalToGo,FirstDown,posteam,DefensiveTeam,desc,PlayAttempted,Yards.Gained,sp,Touchdown,ExPointResult,TwoPointConv,DefTwoPoint,Safety,Onsidekick,PuntResult,PlayType,Passer,Passer_ID,PassAttempt,PassOutcome,PassLength,AirYards,YardsAfterCatch,QBHit,PassLocation,InterceptionThrown,Interceptor,Rusher,Rusher_ID,RushAttempt,RunLocation,RunGap,Receiver,Receiver_ID,Reception,ReturnResult,Returner,BlockingPlayer,Tackler1,Tackler2,FieldGoalResult,FieldGoalDistance,Fumble,RecFumbTeam,RecFumbPlayer,Sack,Challenge.Replay,ChalReplayResult,Accepted.Penalty,PenalizedTeam,PenaltyType,PenalizedPlayer,Penalty.Yards,PosTeamScore,DefTeamScore,ScoreDiff,AbsScoreDiff,HomeTeam,AwayTeam,Timeout_Indicator,Timeout_Team,posteam_timeouts_pre,HomeTimeouts_Remaining_Pre,AwayTimeouts_Remaining_Pre,HomeTimeouts_Remaining_Post,AwayTimeouts_Remaining_Post,No_Score_Prob,Opp_Field_Goal_Prob,Opp_Safety_Prob,Opp_Touchdown_Prob,Field_Goal_Prob,Safety_Prob,Touchdown_Prob,ExPoint_Prob,TwoPoint_Prob,ExpPts,EPA,airEPA,yacEPA,Home_WP_pre,Away_WP_pre,Home_WP_post,Away_WP_post,Win_Prob,WPA,airWPA,yacWPA,Season
0,2009-09-10,2009091000,1,1,,15:00,15,3600.0,0.0,TEN,30.0,30.0,0,0,0.0,,PIT,TEN,R.Bironas kicks 67 yards from TEN 30 to PIT 3....,1,39,0,0,,,,0,0,,Kickoff,,,0,,,0,0,0,,0,,,,0,,,,,0,,S.Logan,,M.Griffin,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.001506,0.179749,0.006639,0.281138,0.2137,0.003592,0.313676,0.0,0.0,0.323526,2.014474,,,0.485675,0.514325,0.546433,0.453567,0.485675,0.060758,,,2009
1,2009-09-10,2009091000,1,1,1.0,14:53,15,3593.0,7.0,PIT,42.0,58.0,10,5,0.0,0.0,PIT,TEN,(14:53) B.Roethlisberger pass short left to H....,1,5,0,0,,,,0,0,,Pass,B.Roethlisberger,00-0022924,1,Complete,Short,-3,8,0,left,0,,,,0,,,H.Ward,00-0017162,1,,,,C.Hope,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.000969,0.108505,0.001061,0.169117,0.2937,0.003638,0.423011,0.0,0.0,2.338,0.077907,-1.068169,1.146076,0.546433,0.453567,0.551088,0.448912,0.546433,0.004655,-0.032244,0.036899,2009
2,2009-09-10,2009091000,1,1,2.0,14:16,15,3556.0,37.0,PIT,47.0,53.0,5,2,0.0,0.0,PIT,TEN,(14:16) W.Parker right end to PIT 44 for -3 ya...,1,-3,0,0,,,,0,0,,Run,,,0,,,0,0,0,,0,,W.Parker,00-0022250,1,right,end,,,0,,,,S.Tulloch,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.001057,0.105106,0.000981,0.162747,0.304805,0.003826,0.421478,0.0,0.0,2.415907,-1.40276,,,0.551088,0.448912,0.510793,0.489207,0.551088,-0.040295,,,2009
3,2009-09-10,2009091000,1,1,3.0,13:35,14,3515.0,41.0,PIT,44.0,56.0,8,2,0.0,0.0,PIT,TEN,(13:35) (Shotgun) B.Roethlisberger pass incomp...,1,0,0,0,,,,0,0,,Pass,B.Roethlisberger,00-0022924,1,Incomplete Pass,Deep,34,0,0,right,0,,,,0,,,M.Wallace,00-0026901,0,,,,,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.001434,0.149088,0.001944,0.234801,0.289336,0.004776,0.318621,0.0,0.0,1.013147,-1.712583,3.318841,-5.031425,0.510793,0.489207,0.461217,0.538783,0.510793,-0.049576,0.106663,-0.156239,2009
4,2009-09-10,2009091000,1,1,4.0,13:27,14,3507.0,8.0,PIT,44.0,56.0,8,2,0.0,1.0,PIT,TEN,(13:27) (Punt formation) D.Sepulveda punts 54 ...,1,0,0,0,,,,0,0,Clean,Punt,,,0,,,0,0,0,,0,,,,0,,,,,0,,,,,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.001861,0.21348,0.003279,0.322262,0.244603,0.006404,0.208111,0.0,0.0,-0.699436,2.097796,,,0.461217,0.538783,0.558929,0.441071,0.461217,0.097712,,,2009


Yep, it looks like there's some missing values. What about in the nfl2 dataset?

In [31]:
# your turn! Look at a couple of rows from the nfl2 dataset. Do you notice any missing data?
nfl2.sample(10)
# your code goes here :)

Unnamed: 0,Date,GameID,Drive,qtr,down,time,TimeUnder,TimeSecs,PlayTimeDiff,SideofField,yrdln,yrdline100,ydstogo,ydsnet,GoalToGo,FirstDown,posteam,DefensiveTeam,desc,PlayAttempted,Yards.Gained,sp,Touchdown,ExPointResult,TwoPointConv,DefTwoPoint,Safety,Onsidekick,PuntResult,PlayType,Passer,Passer_ID,PassAttempt,PassOutcome,PassLength,AirYards,YardsAfterCatch,QBHit,PassLocation,InterceptionThrown,Interceptor,Rusher,Rusher_ID,RushAttempt,RunLocation,RunGap,Receiver,Receiver_ID,Reception,ReturnResult,Returner,BlockingPlayer,Tackler1,Tackler2,FieldGoalResult,FieldGoalDistance,Fumble,RecFumbTeam,RecFumbPlayer,Sack,Challenge.Replay,ChalReplayResult,Accepted.Penalty,PenalizedTeam,PenaltyType,PenalizedPlayer,Penalty.Yards,PosTeamScore,DefTeamScore,ScoreDiff,AbsScoreDiff,HomeTeam,AwayTeam,Timeout_Indicator,Timeout_Team,posteam_timeouts_pre,HomeTimeouts_Remaining_Pre,AwayTimeouts_Remaining_Pre,HomeTimeouts_Remaining_Post,AwayTimeouts_Remaining_Post,No_Score_Prob,Opp_Field_Goal_Prob,Opp_Safety_Prob,Opp_Touchdown_Prob,Field_Goal_Prob,Safety_Prob,Touchdown_Prob,ExPoint_Prob,TwoPoint_Prob,ExpPts,EPA,airEPA,yacEPA,Home_WP_pre,Away_WP_pre,Home_WP_post,Away_WP_post,Win_Prob,WPA,airWPA,yacWPA,Season
196131,2013-10-13,2013101312,17,4,2.0,08:56,9,536.0,40.0,WAS,1.0,1.0,1,3,1.0,0.0,DAL,WAS,"(8:56) J.Randle left tackle for 1 yard, TOUCHD...",1,1,1,1,,,,0,0,,Run,,,0,,,0,0,0,,0,,J.Randle,00-0030388,1,left,tackle,,,0,,,,,,,,0,,,0,0,,0,,,,0,23.0,16.0,7.0,7.0,DAL,WAS,0,,3,3,3,3,3,0.013986,0.010296,0.000196,0.013626,0.121286,0.004403,0.836206,0.0,0.0,6.099445,0.900555,,,0.914448,0.085552,0.926598,0.073402,0.914448,0.01215,,,2013
274955,2015-09-20,2015092003,9,2,2.0,13:22,14,2602.0,44.0,TEN,23.0,77.0,7,3,0.0,0.0,TEN,CLE,(13:22) (Shotgun) M.Mariota pass incomplete sh...,1,0,0,0,,,,0,0,,Pass,M.Mariota,00-0032268,1,Incomplete Pass,Short,12,0,1,middle,0,,,,0,,,C.Coffman,00-0027069,0,,,,,,,,0,,,0,0,,0,,,,0,0.0,14.0,-14.0,14.0,CLE,TEN,0,,2,2,2,2,2,0.075947,0.171844,0.005315,0.257152,0.193735,0.003531,0.292477,0.0,0.0,0.309377,-0.890353,1.594084,-2.484437,0.888608,0.111392,0.899628,0.100372,0.111392,-0.011021,0.022303,-0.033324,2015
306845,2015-12-13,2015121305,13,3,1.0,04:41,5,1181.0,33.0,KC,34.0,66.0,10,17,0.0,0.0,KC,SD,(4:41) (Shotgun) A.Smith up the middle to KC 3...,1,3,0,0,,,,0,0,,Run,,,0,,,0,0,0,,0,,A.Smith,00-0023436,1,middle,,,,0,,,,M.Te'o,,,,0,,,0,0,,0,,,,0,10.0,3.0,7.0,7.0,KC,SD,0,,3,3,3,3,3,0.0135,0.127547,0.002004,0.194469,0.266185,0.003531,0.392765,0.0,0.0,1.807045,-0.252488,,,0.788491,0.211509,0.781661,0.218339,0.788491,-0.00683,,,2015
359824,2017-01-01,2017010110,4,1,2.0,06:12,7,3072.0,19.0,HOU,45.0,45.0,19,-1,0.0,0.0,TEN,HOU,(6:12) M.Cassel pass short left to D.Murray to...,1,-2,0,0,,,,0,0,,Pass,M.Cassel,00-0023662,1,Complete,Short,-4,2,0,left,0,,,,0,,,D.Murray,00-0028009,1,,,,A.Bouye,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,TEN,HOU,0,,3,3,3,3,3,0.005998,0.092826,0.00073,0.145508,0.36095,0.003712,0.390277,0.0,0.0,2.523723,-1.147361,-1.370408,0.223048,0.57104,0.42896,0.535091,0.464909,0.57104,-0.035948,-0.042111,0.006162,2016
323621,2016-09-25,2016092503,2,1,1.0,04:15,5,2955.0,43.0,GB,17.0,17.0,10,52,0.0,0.0,DET,GB,(4:15) (Shotgun) T.Riddick left guard to GB 20...,1,-3,0,0,,,,0,0,,Run,,,0,,,0,0,0,,0,,T.Riddick,00-0030107,1,left,guard,,,0,,,,N.Perry,,,,0,,,0,0,,0,,,,0,0.0,7.0,-7.0,7.0,GB,DET,0,,3,3,3,3,3,0.003711,0.029201,2.4e-05,0.04416,0.381098,0.002559,0.539248,0.0,0.0,4.526374,-0.587718,,,0.584423,0.415577,0.60621,0.39379,0.415577,-0.021787,,,2016
213980,2013-12-02,2013120200,19,4,3.0,06:46,7,406.0,0.0,SEA,40.0,60.0,5,23,0.0,1.0,SEA,NO,"(6:46) (Shotgun) PENALTY on NO-C.Jordan, Neutr...",1,0,0,0,,,,0,0,,No Play,,,0,,,0,0,0,,0,,,,0,,,,,0,,,,,,,,0,,,0,0,,1,NO,Neutral Zone Infraction,C.Jordan,5,27.0,7.0,20.0,20.0,SEA,NO,0,,3,3,0,3,0,0.230064,0.115004,0.001735,0.169963,0.203618,0.003157,0.276458,0.0,0.0,1.014155,1.336818,,,0.988229,0.011771,0.992748,0.007252,0.988229,0.004519,,,2013
327496,2016-10-02,2016100210,22,4,1.0,06:50,7,410.0,9.0,SD,10.0,90.0,10,3,0.0,1.0,SD,NO,(6:50) M.Gordon right guard to SD 13 for 3 yar...,1,3,0,0,,,,0,0,,Run,,,0,,,0,0,0,,0,,M.Gordon,00-0032144,1,right,guard,,,0,,,,V.Bell,,,,1,NO,D.Tapp,0,0,,0,,,,0,34.0,21.0,13.0,13.0,SD,NO,0,,2,2,2,2,2,0.325446,0.139428,0.009233,0.207197,0.124624,0.002187,0.191885,0.0,0.0,-0.165691,-4.253158,,,0.928756,0.071244,0.856883,0.143117,0.928756,-0.071873,,,2016
231913,2014-09-21,2014092101,6,2,4.0,08:43,9,2323.0,31.0,CIN,47.0,53.0,10,13,0.0,1.0,CIN,TEN,"(8:43) K.Huber punts 51 yards to TEN 2, Center...",1,0,0,0,,,,0,0,Clean,Punt,,,0,,,0,0,0,,0,,,,0,,,,,0,,,,,,,,0,,,0,0,,0,,,,0,10.0,0.0,10.0,10.0,CIN,TEN,0,,3,3,3,3,3,0.175194,0.160106,0.002298,0.230676,0.250125,0.005207,0.176393,0.0,0.0,-0.104102,0.885308,,,0.80601,0.19399,0.83554,0.16446,0.80601,0.02953,,,2014
331254,2016-10-16,2016101606,3,1,2.0,09:27,10,3267.0,36.0,NYG,22.0,22.0,2,7,0.0,0.0,BAL,NYG,(9:27) T.West left end to NYG 23 for -1 yards ...,1,-1,0,0,,,,0,0,,Run,,,0,,,0,0,0,,0,,T.West,00-0031375,1,left,end,,,0,,,,L.Collins,D.Kennard,,,0,,,0,0,,0,,,,0,3.0,0.0,3.0,3.0,NYG,BAL,0,,3,3,3,3,3,0.001313,0.034251,6.1e-05,0.051781,0.379872,0.00267,0.530051,0.0,0.0,4.38997,-0.803086,,,0.269062,0.730938,0.294426,0.705574,0.730938,-0.025364,,,2016
253565,2014-11-16,2014111611,14,3,2.0,09:59,10,1499.0,6.0,NE,44.0,44.0,10,35,0.0,0.0,IND,NE,(9:59) (Run formation) A.Luck pass short left ...,1,4,0,0,,,,0,0,,Pass,A.Luck,00-0029668,1,Complete,Short,3,1,0,left,0,,,,0,,,C.Fleener,00-0029697,1,,,,J.Collins,,,,0,,,0,0,,0,,,,0,10.0,21.0,-11.0,11.0,IND,NE,0,,3,3,3,3,3,0.002346,0.085728,0.000579,0.133704,0.353601,0.003658,0.420384,0.0,0.0,2.816533,-0.336813,-0.485294,0.148481,0.213518,0.786482,0.20326,0.79674,0.213518,-0.010258,-0.013246,0.002988,2014


# See how many missing data points we have
___

Ok, now we know that we do have some missing values. Let's see how many we have in each column. 

In [33]:
# get the number of missing data points per column
print(nfl1.isna().sum() , nfl2.isna().sum())

print("-" * 25)
# look at the # of missing points in the first ten columns
print(nfl1.isna().sum().head(10) , nfl2.isna().sum().head(10))

Date             0
GameID           0
Drive            0
qtr              0
down         54218
             ...  
Win_Prob     21993
WPA           4817
airWPA      220738
yacWPA      220956
Season           0
Length: 102, dtype: int64 Date             0
GameID           0
Drive            0
qtr              0
down         61154
             ...  
Win_Prob     25009
WPA           5541
airWPA      248501
yacWPA      248762
Season           0
Length: 102, dtype: int64
-------------------------
Date                0
GameID              0
Drive               0
qtr                 0
down            54218
time              188
TimeUnder           0
TimeSecs          188
PlayTimeDiff      374
SideofField       450
dtype: int64 Date                0
GameID              0
Drive               0
qtr                 0
down            61154
time              224
TimeUnder           0
TimeSecs          224
PlayTimeDiff      444
SideofField       528
dtype: int64


That seems like a lot! It might be helpful to see what percentage of the values in our dataset were missing to give us a better sense of the scale of this problem:

In [55]:
# how many total missing values do we have?
print(nfl1.isna().sum().sum() , nfl2.isna().sum().sum())

# percent of data that is missing
print(nfl1.isna().sum().sum() / nfl1.size * 100, nfl2.isna().sum().sum() / nfl2.size * 100)

10222931 11505187
27.652267428200588 27.66722370547874


Wow, almost a quarter of the cells in this dataset are empty! In the next step, we're going to take a closer look at some of the columns with missing values and try to figure out what might be going on with them.

# Figure out why the data is missing
____
 
This is the point at which we get into the part of data science that I like to call "data intution", by which I mean "really looking at your data and trying to figure out why it is the way it is and how that will affect your analysis". It can be a frustrating part of data science, especially if you're newer to the field and don't have a lot of experience. For dealing with missing values, you'll need to use your intution to figure out why the value is missing. One of the most important question you can ask yourself to help figure this out is this:

> **Is this value missing becuase it wasn't recorded or becuase it dosen't exist?**

If a value is missing becuase it doens't exist (like the height of the oldest child of someone who doesn't have any children) then it doesn't make sense to try and guess what it might be. These values you probalby do want to keep as NaN. On the other hand, if a value is missing becuase it wasn't recorded, then you can try to guess what it might have been based on the other values in that column and row. (This is called "imputation" and we'll learn how to do it next! :)

Let's work through an example. Looking at the number of missing values in the nfl_data dataframe, I notice that the column `TimesSec` has a lot of missing values in it: 

In [23]:
# look at the # of missing points in the first ten columns

# << Same one above >>

By looking at [the documentation](https://www.kaggle.com/maxhorowitz/nflplaybyplay2009to2016), I can see that this column has information on the number of seconds left in the game when the play was made. This means that these values are probably missing because they were not recorded, rather than because they don't exist. So, it would make sense for us to try and guess what they should be rather than just leaving them as NA's.

On the other hand, there are other fields, like `PenalizedTeam` that also have lot of missing fields. In this case, though, the field is missing because if there was no penalty then it doesn't make sense to say *which* team was penalized. For this column, it would make more sense to either leave it empty or to add a third value like "neither" and use that to replace the NA's.

> **Tip:** This is a great place to read over the dataset documentation if you haven't already! If you're working with a dataset that you've gotten from another person, you can also try reaching out to them to get more information.

If you're doing very careful data analysis, this is the point at which you'd look at each column individually to figure out the best strategy for filling those missing values. For the rest of this notebook, we'll cover some "quick and dirty" techniques that can help you with missing values but will probably also end up removing some useful information or adding some noise to your data.

## Your turn!

* Look at the columns `Street Number Suffix` and `Zipcode` from the `sf_permits` datasets. Both of these contain missing values. Which, if either, of these are missing because they don't exist? Which, if either, are missing because they weren't recorded?

# Drop missing values
___

If you're in a hurry or don't have a reason to figure out why your values are missing, one option you have is to just remove any rows or columns that contain missing values. (Note: I don't generally recommend this approch for important projects! It's usually worth it to take the time to go through your data and really look at all the columns with missing values one-by-one to really get to know your dataset.)  

If you're sure you want to drop rows with missing values, pandas does have a handy function, `dropna()` to help you do this. Let's try it out on our NFL dataset!

In [57]:
# remove all the rows that contain a missing value
nfl1.dropna()

Unnamed: 0,Date,GameID,Drive,qtr,down,time,TimeUnder,TimeSecs,PlayTimeDiff,SideofField,yrdln,yrdline100,ydstogo,ydsnet,GoalToGo,FirstDown,posteam,DefensiveTeam,desc,PlayAttempted,Yards.Gained,sp,Touchdown,ExPointResult,TwoPointConv,DefTwoPoint,Safety,Onsidekick,PuntResult,PlayType,Passer,Passer_ID,PassAttempt,PassOutcome,PassLength,AirYards,YardsAfterCatch,QBHit,PassLocation,InterceptionThrown,Interceptor,Rusher,Rusher_ID,RushAttempt,RunLocation,RunGap,Receiver,Receiver_ID,Reception,ReturnResult,Returner,BlockingPlayer,Tackler1,Tackler2,FieldGoalResult,FieldGoalDistance,Fumble,RecFumbTeam,RecFumbPlayer,Sack,Challenge.Replay,ChalReplayResult,Accepted.Penalty,PenalizedTeam,PenaltyType,PenalizedPlayer,Penalty.Yards,PosTeamScore,DefTeamScore,ScoreDiff,AbsScoreDiff,HomeTeam,AwayTeam,Timeout_Indicator,Timeout_Team,posteam_timeouts_pre,HomeTimeouts_Remaining_Pre,AwayTimeouts_Remaining_Pre,HomeTimeouts_Remaining_Post,AwayTimeouts_Remaining_Post,No_Score_Prob,Opp_Field_Goal_Prob,Opp_Safety_Prob,Opp_Touchdown_Prob,Field_Goal_Prob,Safety_Prob,Touchdown_Prob,ExPoint_Prob,TwoPoint_Prob,ExpPts,EPA,airEPA,yacEPA,Home_WP_pre,Away_WP_pre,Home_WP_post,Away_WP_post,Win_Prob,WPA,airWPA,yacWPA,Season


Oh dear, it looks like that's removed all our data! 😱 This is because every row in our dataset had at least one missing value. We might have better luck removing all the *columns* that have at least one missing value instead.

In [63]:
# remove all columns with at least one missing value
nfl1_col_dropped = nfl1.dropna(axis = 1)
nfl1_col_dropped

Unnamed: 0,Date,GameID,Drive,qtr,TimeUnder,ydstogo,ydsnet,PlayAttempted,Yards.Gained,sp,Touchdown,Safety,Onsidekick,PlayType,PassAttempt,AirYards,YardsAfterCatch,QBHit,InterceptionThrown,RushAttempt,Reception,Fumble,Sack,Challenge.Replay,Accepted.Penalty,Penalty.Yards,HomeTeam,AwayTeam,Timeout_Indicator,posteam_timeouts_pre,HomeTimeouts_Remaining_Pre,AwayTimeouts_Remaining_Pre,HomeTimeouts_Remaining_Post,AwayTimeouts_Remaining_Post,ExPoint_Prob,TwoPoint_Prob,Season
0,2009-09-10,2009091000,1,1,15,0,0,1,39,0,0,0,0,Kickoff,0,0,0,0,0,0,0,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.000000,0.0,2009
1,2009-09-10,2009091000,1,1,15,10,5,1,5,0,0,0,0,Pass,1,-3,8,0,0,0,1,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.000000,0.0,2009
2,2009-09-10,2009091000,1,1,15,5,2,1,-3,0,0,0,0,Run,0,0,0,0,0,1,0,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.000000,0.0,2009
3,2009-09-10,2009091000,1,1,14,8,2,1,0,0,0,0,0,Pass,1,34,0,0,0,0,0,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.000000,0.0,2009
4,2009-09-10,2009091000,1,1,14,8,2,1,0,0,0,0,0,Punt,0,0,0,0,0,0,0,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.000000,0.0,2009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
362442,2017-01-01,2017010102,20,4,1,10,35,1,35,1,1,0,0,Pass,1,35,0,0,0,0,1,0,0,0,0,0,DET,GB,0,0,0,0,0,0,0.000000,0.0,2016
362443,2017-01-01,2017010102,20,4,1,0,35,1,0,1,0,0,0,Extra Point,0,0,0,0,0,0,0,0,0,0,0,0,DET,GB,0,0,0,0,0,0,0.931115,0.0,2016
362444,2017-01-01,2017010102,21,4,1,0,0,1,0,0,0,0,1,Kickoff,0,0,0,0,0,0,0,0,0,0,0,0,DET,GB,0,0,0,0,0,0,0.000000,0.0,2016
362445,2017-01-01,2017010102,21,4,1,10,-1,1,-1,0,0,0,0,QB Kneel,0,0,0,0,0,0,0,0,0,0,0,0,DET,GB,0,0,0,0,0,0,0.000000,0.0,2016


In [65]:
# just how much data did we lose?
nfl1.shape[1] - nfl1_col_dropped.shape[1]

65

We've lost quite a bit of data, but at this point we have successfully removed all the `NaN`'s from our data. 

In [69]:
# Your turn! Try removing all the rows from the nfl2 dataset that contain missing values. How many are left?
nfl2.dropna()


Unnamed: 0,Date,GameID,Drive,qtr,down,time,TimeUnder,TimeSecs,PlayTimeDiff,SideofField,yrdln,yrdline100,ydstogo,ydsnet,GoalToGo,FirstDown,posteam,DefensiveTeam,desc,PlayAttempted,Yards.Gained,sp,Touchdown,ExPointResult,TwoPointConv,DefTwoPoint,Safety,Onsidekick,PuntResult,PlayType,Passer,Passer_ID,PassAttempt,PassOutcome,PassLength,AirYards,YardsAfterCatch,QBHit,PassLocation,InterceptionThrown,Interceptor,Rusher,Rusher_ID,RushAttempt,RunLocation,RunGap,Receiver,Receiver_ID,Reception,ReturnResult,Returner,BlockingPlayer,Tackler1,Tackler2,FieldGoalResult,FieldGoalDistance,Fumble,RecFumbTeam,RecFumbPlayer,Sack,Challenge.Replay,ChalReplayResult,Accepted.Penalty,PenalizedTeam,PenaltyType,PenalizedPlayer,Penalty.Yards,PosTeamScore,DefTeamScore,ScoreDiff,AbsScoreDiff,HomeTeam,AwayTeam,Timeout_Indicator,Timeout_Team,posteam_timeouts_pre,HomeTimeouts_Remaining_Pre,AwayTimeouts_Remaining_Pre,HomeTimeouts_Remaining_Post,AwayTimeouts_Remaining_Post,No_Score_Prob,Opp_Field_Goal_Prob,Opp_Safety_Prob,Opp_Touchdown_Prob,Field_Goal_Prob,Safety_Prob,Touchdown_Prob,ExPoint_Prob,TwoPoint_Prob,ExpPts,EPA,airEPA,yacEPA,Home_WP_pre,Away_WP_pre,Home_WP_post,Away_WP_post,Win_Prob,WPA,airWPA,yacWPA,Season


In [71]:
# Now try removing all the columns with empty values. Now how much of your data is left?
nfl2_col_dropped = nfl2.dropna(axis = 1)
nfl2_col_dropped

Unnamed: 0,Date,GameID,Drive,qtr,TimeUnder,ydstogo,ydsnet,PlayAttempted,Yards.Gained,sp,Touchdown,Safety,Onsidekick,PlayType,PassAttempt,AirYards,YardsAfterCatch,QBHit,InterceptionThrown,RushAttempt,Reception,Fumble,Sack,Challenge.Replay,Accepted.Penalty,Penalty.Yards,HomeTeam,AwayTeam,Timeout_Indicator,posteam_timeouts_pre,HomeTimeouts_Remaining_Pre,AwayTimeouts_Remaining_Pre,HomeTimeouts_Remaining_Post,AwayTimeouts_Remaining_Post,ExPoint_Prob,TwoPoint_Prob,Season
0,2009-09-10,2009091000,1,1,15,0,0,1,39,0,0,0,0,Kickoff,0,0,0,0,0,0,0,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.0,0.0,2009
1,2009-09-10,2009091000,1,1,15,10,5,1,5,0,0,0,0,Pass,1,-3,8,0,0,0,1,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.0,0.0,2009
2,2009-09-10,2009091000,1,1,15,5,2,1,-3,0,0,0,0,Run,0,0,0,0,0,1,0,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.0,0.0,2009
3,2009-09-10,2009091000,1,1,14,8,2,1,0,0,0,0,0,Pass,1,34,0,0,0,0,0,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.0,0.0,2009
4,2009-09-10,2009091000,1,1,14,8,2,1,0,0,0,0,0,Punt,0,0,0,0,0,0,0,0,0,0,0,0,PIT,TEN,0,3,3,3,3,3,0.0,0.0,2009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
407683,2017-12-31,2017123101,29,4,1,0,-4,1,0,0,0,0,0,Timeout,0,0,0,0,0,0,0,0,0,0,0,0,BAL,CIN,1,0,3,0,2,0,0.0,0.0,2017
407684,2017-12-31,2017123101,29,4,1,14,-4,1,0,0,0,0,0,Pass,1,12,0,0,0,0,0,0,0,0,0,0,BAL,CIN,0,2,2,0,2,0,0.0,0.0,2017
407685,2017-12-31,2017123101,29,4,1,14,9,1,13,0,0,0,0,Pass,1,10,3,0,0,0,1,0,0,0,0,0,BAL,CIN,0,2,2,0,2,0,0.0,0.0,2017
407686,2017-12-31,2017123101,30,4,1,10,-1,1,-1,0,0,0,0,QB Kneel,0,0,0,0,0,0,0,0,0,0,0,0,BAL,CIN,0,0,2,0,2,0,0.0,0.0,2017


# Filling in missing values automatically
_____

Another option is to try and fill in the missing values. For this next bit, I'm getting a small sub-section of the NFL data so that it will print well.

In [73]:
# get a small subset of the NFL dataset
df = nfl1.head(5000)
df

Unnamed: 0,Date,GameID,Drive,qtr,down,time,TimeUnder,TimeSecs,PlayTimeDiff,SideofField,yrdln,yrdline100,ydstogo,ydsnet,GoalToGo,FirstDown,posteam,DefensiveTeam,desc,PlayAttempted,Yards.Gained,sp,Touchdown,ExPointResult,TwoPointConv,DefTwoPoint,Safety,Onsidekick,PuntResult,PlayType,Passer,Passer_ID,PassAttempt,PassOutcome,PassLength,AirYards,YardsAfterCatch,QBHit,PassLocation,InterceptionThrown,Interceptor,Rusher,Rusher_ID,RushAttempt,RunLocation,RunGap,Receiver,Receiver_ID,Reception,ReturnResult,Returner,BlockingPlayer,Tackler1,Tackler2,FieldGoalResult,FieldGoalDistance,Fumble,RecFumbTeam,RecFumbPlayer,Sack,Challenge.Replay,ChalReplayResult,Accepted.Penalty,PenalizedTeam,PenaltyType,PenalizedPlayer,Penalty.Yards,PosTeamScore,DefTeamScore,ScoreDiff,AbsScoreDiff,HomeTeam,AwayTeam,Timeout_Indicator,Timeout_Team,posteam_timeouts_pre,HomeTimeouts_Remaining_Pre,AwayTimeouts_Remaining_Pre,HomeTimeouts_Remaining_Post,AwayTimeouts_Remaining_Post,No_Score_Prob,Opp_Field_Goal_Prob,Opp_Safety_Prob,Opp_Touchdown_Prob,Field_Goal_Prob,Safety_Prob,Touchdown_Prob,ExPoint_Prob,TwoPoint_Prob,ExpPts,EPA,airEPA,yacEPA,Home_WP_pre,Away_WP_pre,Home_WP_post,Away_WP_post,Win_Prob,WPA,airWPA,yacWPA,Season
0,2009-09-10,2009091000,1,1,,15:00,15,3600.0,0.0,TEN,30.0,30.0,0,0,0.0,,PIT,TEN,R.Bironas kicks 67 yards from TEN 30 to PIT 3....,1,39,0,0,,,,0,0,,Kickoff,,,0,,,0,0,0,,0,,,,0,,,,,0,,S.Logan,,M.Griffin,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.001506,0.179749,0.006639,0.281138,0.213700,0.003592,0.313676,0.0,0.0,0.323526,2.014474,,,0.485675,0.514325,0.546433,0.453567,0.485675,0.060758,,,2009
1,2009-09-10,2009091000,1,1,1.0,14:53,15,3593.0,7.0,PIT,42.0,58.0,10,5,0.0,0.0,PIT,TEN,(14:53) B.Roethlisberger pass short left to H....,1,5,0,0,,,,0,0,,Pass,B.Roethlisberger,00-0022924,1,Complete,Short,-3,8,0,left,0,,,,0,,,H.Ward,00-0017162,1,,,,C.Hope,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.000969,0.108505,0.001061,0.169117,0.293700,0.003638,0.423011,0.0,0.0,2.338000,0.077907,-1.068169,1.146076,0.546433,0.453567,0.551088,0.448912,0.546433,0.004655,-0.032244,0.036899,2009
2,2009-09-10,2009091000,1,1,2.0,14:16,15,3556.0,37.0,PIT,47.0,53.0,5,2,0.0,0.0,PIT,TEN,(14:16) W.Parker right end to PIT 44 for -3 ya...,1,-3,0,0,,,,0,0,,Run,,,0,,,0,0,0,,0,,W.Parker,00-0022250,1,right,end,,,0,,,,S.Tulloch,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.001057,0.105106,0.000981,0.162747,0.304805,0.003826,0.421478,0.0,0.0,2.415907,-1.402760,,,0.551088,0.448912,0.510793,0.489207,0.551088,-0.040295,,,2009
3,2009-09-10,2009091000,1,1,3.0,13:35,14,3515.0,41.0,PIT,44.0,56.0,8,2,0.0,0.0,PIT,TEN,(13:35) (Shotgun) B.Roethlisberger pass incomp...,1,0,0,0,,,,0,0,,Pass,B.Roethlisberger,00-0022924,1,Incomplete Pass,Deep,34,0,0,right,0,,,,0,,,M.Wallace,00-0026901,0,,,,,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.001434,0.149088,0.001944,0.234801,0.289336,0.004776,0.318621,0.0,0.0,1.013147,-1.712583,3.318841,-5.031425,0.510793,0.489207,0.461217,0.538783,0.510793,-0.049576,0.106663,-0.156239,2009
4,2009-09-10,2009091000,1,1,4.0,13:27,14,3507.0,8.0,PIT,44.0,56.0,8,2,0.0,1.0,PIT,TEN,(13:27) (Punt formation) D.Sepulveda punts 54 ...,1,0,0,0,,,,0,0,Clean,Punt,,,0,,,0,0,0,,0,,,,0,,,,,0,,,,,,,,0,,,0,0,,0,,,,0,0.0,0.0,0.0,0.0,PIT,TEN,0,,3,3,3,3,3,0.001861,0.213480,0.003279,0.322262,0.244603,0.006404,0.208111,0.0,0.0,-0.699436,2.097796,,,0.461217,0.538783,0.558929,0.441071,0.461217,0.097712,,,2009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,2009-09-20,2009092012,11,2,3.0,00:32,1,1832.0,4.0,CLE,21.0,21.0,10,73,0.0,0.0,DEN,CLE,(:32) (Shotgun) K.Orton pass incomplete deep l...,1,0,0,0,,,,0,0,,Pass,K.Orton,00-0023541,1,Incomplete Pass,Deep,16,0,0,left,0,,,,0,,,E.Royal,00-0026182,0,,,,,,,,0,,,0,0,,0,,,,0,10.0,6.0,4.0,4.0,DEN,CLE,0,,1,1,3,1,3,0.348271,0.021585,0.000120,0.016588,0.446939,0.000835,0.165662,0.0,0.0,2.321009,0.078900,1.733169,-1.654270,0.719814,0.280186,0.722549,0.277451,0.719814,0.002735,0.055464,-0.052728,2009
4996,2009-09-20,2009092012,11,2,4.0,00:28,1,1828.0,4.0,CLE,21.0,21.0,10,73,0.0,1.0,DEN,CLE,"(:28) M.Prater 39 yard field goal is No Good, ...",1,0,0,0,,,,0,0,,Field Goal,,,0,,,0,0,0,,0,,,,0,,,,,0,,,,,,No Good,39.0,0,,,0,0,,0,,,,0,10.0,6.0,4.0,4.0,DEN,CLE,0,,1,1,3,1,3,0.108184,0.019628,0.000071,0.017468,0.850027,0.000245,0.004377,0.0,0.0,2.399908,-2.399908,,,0.722549,0.277451,0.638579,0.361421,0.722549,-0.083971,,,2009
4997,2009-09-20,2009092012,12,2,1.0,00:23,1,1823.0,5.0,CLE,29.0,71.0,10,-1,0.0,0.0,CLE,DEN,(:23) B.Quinn kneels to CLE 28 for -1 yards.,1,-1,0,0,,,,0,0,,QB Kneel,,,0,,,0,0,0,,0,,,00-0025409,0,,,,,0,,,,,,,,0,,,0,0,,0,,,,0,6.0,10.0,-4.0,4.0,DEN,CLE,0,,3,1,3,1,3,1.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,,,0.638579,0.361421,,,0.361421,,,,2009
4998,2009-09-20,2009092012,12,2,,00:00,0,1800.0,23.0,CLE,29.0,29.0,0,-1,0.0,0.0,,,END QUARTER 2,1,0,0,0,,,,0,0,,Quarter End,,,0,,,0,0,0,,0,,,,0,,,,,0,,,,,,,,0,,,0,0,,0,,,,0,,,,,DEN,CLE,0,,3,1,3,1,3,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,,,,,,,,0.000000,,,2009


We can use the Panda's fillna() function to fill in missing values in a dataframe for us. One option we have is to specify what we want the `NaN` values to be replaced with. Here, I'm saying that I would like to replace all the `NaN` values with 0.

In [75]:
# replace all NA's with 0
df.fillna(0)

  df.fillna(0)


Unnamed: 0,Date,GameID,Drive,qtr,down,time,TimeUnder,TimeSecs,PlayTimeDiff,SideofField,yrdln,yrdline100,ydstogo,ydsnet,GoalToGo,FirstDown,posteam,DefensiveTeam,desc,PlayAttempted,Yards.Gained,sp,Touchdown,ExPointResult,TwoPointConv,DefTwoPoint,Safety,Onsidekick,PuntResult,PlayType,Passer,Passer_ID,PassAttempt,PassOutcome,PassLength,AirYards,YardsAfterCatch,QBHit,PassLocation,InterceptionThrown,Interceptor,Rusher,Rusher_ID,RushAttempt,RunLocation,RunGap,Receiver,Receiver_ID,Reception,ReturnResult,Returner,BlockingPlayer,Tackler1,Tackler2,FieldGoalResult,FieldGoalDistance,Fumble,RecFumbTeam,RecFumbPlayer,Sack,Challenge.Replay,ChalReplayResult,Accepted.Penalty,PenalizedTeam,PenaltyType,PenalizedPlayer,Penalty.Yards,PosTeamScore,DefTeamScore,ScoreDiff,AbsScoreDiff,HomeTeam,AwayTeam,Timeout_Indicator,Timeout_Team,posteam_timeouts_pre,HomeTimeouts_Remaining_Pre,AwayTimeouts_Remaining_Pre,HomeTimeouts_Remaining_Post,AwayTimeouts_Remaining_Post,No_Score_Prob,Opp_Field_Goal_Prob,Opp_Safety_Prob,Opp_Touchdown_Prob,Field_Goal_Prob,Safety_Prob,Touchdown_Prob,ExPoint_Prob,TwoPoint_Prob,ExpPts,EPA,airEPA,yacEPA,Home_WP_pre,Away_WP_pre,Home_WP_post,Away_WP_post,Win_Prob,WPA,airWPA,yacWPA,Season
0,2009-09-10,2009091000,1,1,0.0,15:00,15,3600.0,0.0,TEN,30.0,30.0,0,0,0.0,0.0,PIT,TEN,R.Bironas kicks 67 yards from TEN 30 to PIT 3....,1,39,0,0,0,0,0,0,0,0,Kickoff,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,S.Logan,0,M.Griffin,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,PIT,TEN,0,0,3,3,3,3,3,0.001506,0.179749,0.006639,0.281138,0.213700,0.003592,0.313676,0.0,0.0,0.323526,2.014474,0.000000,0.000000,0.485675,0.514325,0.546433,0.453567,0.485675,0.060758,0.000000,0.000000,2009
1,2009-09-10,2009091000,1,1,1.0,14:53,15,3593.0,7.0,PIT,42.0,58.0,10,5,0.0,0.0,PIT,TEN,(14:53) B.Roethlisberger pass short left to H....,1,5,0,0,0,0,0,0,0,0,Pass,B.Roethlisberger,00-0022924,1,Complete,Short,-3,8,0,left,0,0,0,0,0,0,0,H.Ward,00-0017162,1,0,0,0,C.Hope,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,PIT,TEN,0,0,3,3,3,3,3,0.000969,0.108505,0.001061,0.169117,0.293700,0.003638,0.423011,0.0,0.0,2.338000,0.077907,-1.068169,1.146076,0.546433,0.453567,0.551088,0.448912,0.546433,0.004655,-0.032244,0.036899,2009
2,2009-09-10,2009091000,1,1,2.0,14:16,15,3556.0,37.0,PIT,47.0,53.0,5,2,0.0,0.0,PIT,TEN,(14:16) W.Parker right end to PIT 44 for -3 ya...,1,-3,0,0,0,0,0,0,0,0,Run,0,0,0,0,0,0,0,0,0,0,0,W.Parker,00-0022250,1,right,end,0,0,0,0,0,0,S.Tulloch,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,PIT,TEN,0,0,3,3,3,3,3,0.001057,0.105106,0.000981,0.162747,0.304805,0.003826,0.421478,0.0,0.0,2.415907,-1.402760,0.000000,0.000000,0.551088,0.448912,0.510793,0.489207,0.551088,-0.040295,0.000000,0.000000,2009
3,2009-09-10,2009091000,1,1,3.0,13:35,14,3515.0,41.0,PIT,44.0,56.0,8,2,0.0,0.0,PIT,TEN,(13:35) (Shotgun) B.Roethlisberger pass incomp...,1,0,0,0,0,0,0,0,0,0,Pass,B.Roethlisberger,00-0022924,1,Incomplete Pass,Deep,34,0,0,right,0,0,0,0,0,0,0,M.Wallace,00-0026901,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,PIT,TEN,0,0,3,3,3,3,3,0.001434,0.149088,0.001944,0.234801,0.289336,0.004776,0.318621,0.0,0.0,1.013147,-1.712583,3.318841,-5.031425,0.510793,0.489207,0.461217,0.538783,0.510793,-0.049576,0.106663,-0.156239,2009
4,2009-09-10,2009091000,1,1,4.0,13:27,14,3507.0,8.0,PIT,44.0,56.0,8,2,0.0,1.0,PIT,TEN,(13:27) (Punt formation) D.Sepulveda punts 54 ...,1,0,0,0,0,0,0,0,0,Clean,Punt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,PIT,TEN,0,0,3,3,3,3,3,0.001861,0.213480,0.003279,0.322262,0.244603,0.006404,0.208111,0.0,0.0,-0.699436,2.097796,0.000000,0.000000,0.461217,0.538783,0.558929,0.441071,0.461217,0.097712,0.000000,0.000000,2009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,2009-09-20,2009092012,11,2,3.0,00:32,1,1832.0,4.0,CLE,21.0,21.0,10,73,0.0,0.0,DEN,CLE,(:32) (Shotgun) K.Orton pass incomplete deep l...,1,0,0,0,0,0,0,0,0,0,Pass,K.Orton,00-0023541,1,Incomplete Pass,Deep,16,0,0,left,0,0,0,0,0,0,0,E.Royal,00-0026182,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,10.0,6.0,4.0,4.0,DEN,CLE,0,0,1,1,3,1,3,0.348271,0.021585,0.000120,0.016588,0.446939,0.000835,0.165662,0.0,0.0,2.321009,0.078900,1.733169,-1.654270,0.719814,0.280186,0.722549,0.277451,0.719814,0.002735,0.055464,-0.052728,2009
4996,2009-09-20,2009092012,11,2,4.0,00:28,1,1828.0,4.0,CLE,21.0,21.0,10,73,0.0,1.0,DEN,CLE,"(:28) M.Prater 39 yard field goal is No Good, ...",1,0,0,0,0,0,0,0,0,0,Field Goal,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,No Good,39.0,0,0,0,0,0,0,0,0,0,0,0,10.0,6.0,4.0,4.0,DEN,CLE,0,0,1,1,3,1,3,0.108184,0.019628,0.000071,0.017468,0.850027,0.000245,0.004377,0.0,0.0,2.399908,-2.399908,0.000000,0.000000,0.722549,0.277451,0.638579,0.361421,0.722549,-0.083971,0.000000,0.000000,2009
4997,2009-09-20,2009092012,12,2,1.0,00:23,1,1823.0,5.0,CLE,29.0,71.0,10,-1,0.0,0.0,CLE,DEN,(:23) B.Quinn kneels to CLE 28 for -1 yards.,1,-1,0,0,0,0,0,0,0,0,QB Kneel,0,0,0,0,0,0,0,0,0,0,0,0,00-0025409,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,6.0,10.0,-4.0,4.0,DEN,CLE,0,0,3,1,3,1,3,1.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.638579,0.361421,0.000000,0.000000,0.361421,0.000000,0.000000,0.000000,2009
4998,2009-09-20,2009092012,12,2,0.0,00:00,0,1800.0,23.0,CLE,29.0,29.0,0,-1,0.0,0.0,0,0,END QUARTER 2,1,0,0,0,0,0,0,0,0,0,Quarter End,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,DEN,CLE,0,0,3,1,3,1,3,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2009


I could also be a bit more savvy and replace missing values with whatever value comes directly after it in the same column. (This makes a lot of sense for datasets where the observations have some sort of logical order to them.)

In [82]:
# replace all NA's the value that comes directly after it in the same column, 
# then replace all the reamining na's with 0
df = df.fillna(method = "bfill")
df = df.fillna(0)

  df = df.fillna(method = "bfill")
  df = df.fillna(method = "bfill")


Filling in missing values is also known as "imputation", and you can find more exercises on it [in this lesson, also linked under the "More practice!" section](https://www.kaggle.com/dansbecker/handling-missing-values). First, however, why don't you try replacing some of the missing values in the sf_permit dataset?