<a href="https://colab.research.google.com/github/DeanPhillipsOKC/pandas-notes/blob/master/Pandas_Missing_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Missing Data

In [0]:
import numpy as np
import pandas as pd

Create a Pandas DataFrame with some missing values

In [0]:
df = pd.DataFrame({'A': [1, 2, np.nan, 4],
                   'B': [5, np.nan, np.nan, 8],
                   'C': [10, 20, 30, 40]})

## Strategy One: Just leave the missing data.

In [3]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,10
1,2.0,,20
2,,,30
3,4.0,8.0,40


## Strategy Two: Remove the missing data

dropna will, by default, ignore all rows that contain missing data

In [5]:
df.dropna()

Unnamed: 0,A,B,C
0,1.0,5.0,10
3,4.0,8.0,40


If we specify an axis of 1 (columns), we will only show columns that do not contain missing data

In [6]:
df.dropna(axis=1)

Unnamed: 0,C
0,10
1,20
2,30
3,40


We can pass in a threshold which will tell Pandas to drop columns that do not have at least x (3 in this case) non-missing values

In [7]:
df.dropna(axis=1, thresh=3)

Unnamed: 0,A,C
0,1.0,10
1,2.0,20
2,,30
3,4.0,40


## Strategy Three: Fill in the missing data

The simplest way is to just use the fillna method, and pass in the value that you would like missing values to be replaced with.  The data types do not appear to matter.

In [8]:
df.fillna(value="Fill Value")

Unnamed: 0,A,B,C
0,1,5,10
1,2,Fill Value,20
2,Fill Value,Fill Value,30
3,4,8,40


We do not have to do the replacement across the entire DataFrame, and can instead do it one column at a time

In [0]:
df['A'] = df['A'].fillna(value=0)

In [14]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,10
1,2.0,,20
2,0.0,,30
3,4.0,8.0,40


We can also use statistical functions to come up with pretty decent substitutions.  In this case we replace the missing values in column 'B' with the mean of column 'B'

In [0]:
df['B'] = df['B'].fillna(value=df['B'].mean())

In [17]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,10
1,2.0,6.5,20
2,0.0,6.5,30
3,4.0,8.0,40
