# Dealing with Missing Data
At some point in your dealings with data, you will have to deal with missing values.

Depending on the situation, you might need to fill these gaps, or delete rows/columns entirely. Luckily Pandas makes this easy for us.

Let’s get our modules and dataset prepared, before we look to delete series or fill gaps.

In [1]:
import numpy as np
import pandas as pd

In [5]:
#DataFrame is the contract details for our transfer targets, where known.
#'np.nan' is a numpy value that shows that there is not a number.
# In this case, it demonstrates missing data.

df = pd.DataFrame(
 {'Wage':[150000,123000,np.nan] ,'GoalBonus':[4000,np.nan,np.nan],'ImageRights':[50000,70000,100000]
}, index=['Konda','Makho','Grey'], columns=['Wage','GoalBonus','ImageRights']
)
df

Unnamed: 0,Wage,GoalBonus,ImageRights
Konda,150000.0,4000.0,50000
Makho,123000.0,,70000
Grey,,,100000


## Removing rows & columns with missing data
If you decide to bin the players with missing data, it is simple with the ‘.dropna()’ method:




In [6]:
df.dropna()

Unnamed: 0,Wage,GoalBonus,ImageRights
Konda,150000.0,4000.0,50000


In [7]:
#drop columns with missing values
df.dropna(axis=1)


Unnamed: 0,ImageRights
Konda,50000
Makho,70000
Grey,100000


‘.dropna()’ can also take the argument ‘thresh’ to change the amount of missing values you’re happy to deal with. Makho has only 1 missing value, whereas Grey has 2. Below, we’ll allow Makho into our dataset, but continue to exclude Grey:

In [8]:
df.dropna(thresh=2)

Unnamed: 0,Wage,GoalBonus,ImageRights
Konda,150000.0,4000.0,50000
Makho,123000.0,,70000


## Fill Data
Sometimes, deleting rows and columns is a bit drastic. You may instead want to simply fill in the gaps instead. Rather than ‘.dropna()’, we can instead ‘.fillna()’, passing the desired value as the argument.

In [9]:
df.fillna(value=0)

Unnamed: 0,Wage,GoalBonus,ImageRights
Konda,150000.0,4000.0,50000
Makho,123000.0,0.0,70000
Grey,0.0,0.0,100000


You might want to be a bit smarter than filling with 0s. As an example, you might want to take a column and use the average to fill the rest of the gaps:



In [10]:
df['Wage'].fillna(value=df['Wage'].mean())

Konda    150000.0
Makho    123000.0
Grey     136500.0
Name: Wage, dtype: float64