# Filling in Missing Data

Rather than filtering out missing data (and potentially discarding other data along with it), you may want to fill in the “holes” in any number of ways. For most purposes, the  fillna method is the workhorse function to use. Calling fillna with a constant replaces missing values with that value:

In [70]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [71]:
df = DataFrame(np.random.randn(6,3))

df

Unnamed: 0,0,1,2
0,0.27304,-0.881983,1.394145
1,0.265555,0.022098,-0.953081
2,-1.424244,0.130817,1.238333
3,1.28601,0.981607,-0.951626
4,0.422663,1.443615,-0.100987
5,-1.580793,-0.041105,0.505117


In [72]:
df.iloc[:4, 0] = np.nan

df.iloc[:2, 2] = np.nan
df

Unnamed: 0,0,1,2
0,,-0.881983,
1,,0.022098,
2,,0.130817,1.238333
3,,0.981607,-0.951626
4,0.422663,1.443615,-0.100987
5,-1.580793,-0.041105,0.505117


In [73]:
df.fillna(2)

Unnamed: 0,0,1,2
0,2.0,-0.881983,2.0
1,2.0,0.022098,2.0
2,2.0,0.130817,1.238333
3,2.0,0.981607,-0.951626
4,0.422663,1.443615,-0.100987
5,-1.580793,-0.041105,0.505117


Calling fillna with a dict you can use a different fill value for each column:

In [74]:
df.fillna({0: 0.5, 2: -1})

Unnamed: 0,0,1,2
0,0.5,-0.881983,-1.0
1,0.5,0.022098,-1.0
2,0.5,0.130817,1.238333
3,0.5,0.981607,-0.951626
4,0.422663,1.443615,-0.100987
5,-1.580793,-0.041105,0.505117


fillna returns a new object, but you can modify the existing object in place:

In [79]:
# always returns a reference to the filled object

_ = df.fillna({0: 0.5, 2: -1}, inplace=True)


df

Unnamed: 0,0,1,2
0,1.0,-0.881983,-1.0
1,2.0,0.022098,-1.0
2,3.0,0.130817,1.238333
3,4.0,0.981607,-0.951626
4,0.422663,1.443615,-0.100987
5,-1.580793,-0.041105,0.505117


In [89]:
df = DataFrame(np.random.randn(6,3))

df

Unnamed: 0,0,1,2
0,0.426222,1.719846,-0.191757
1,0.724249,1.077314,-1.117824
2,1.230607,0.339966,-0.419597
3,0.976984,0.750932,0.381184
4,-0.878631,-0.156063,1.007992
5,0.161557,-0.041599,-0.239105


In [90]:
df.iloc[-4:, 1] = np.nan

df.iloc[-2:, 2] = np.nan

df

Unnamed: 0,0,1,2
0,0.426222,1.719846,-0.191757
1,0.724249,1.077314,-1.117824
2,1.230607,,-0.419597
3,0.976984,,0.381184
4,-0.878631,,
5,0.161557,,


In [91]:
df.fillna(method='ffill')

Unnamed: 0,0,1,2
0,0.426222,1.719846,-0.191757
1,0.724249,1.077314,-1.117824
2,1.230607,1.077314,-0.419597
3,0.976984,1.077314,0.381184
4,-0.878631,1.077314,0.381184
5,0.161557,1.077314,0.381184


In [92]:
df.fillna(method='ffill', limit=2)

Unnamed: 0,0,1,2
0,0.426222,1.719846,-0.191757
1,0.724249,1.077314,-1.117824
2,1.230607,1.077314,-0.419597
3,0.976984,1.077314,0.381184
4,-0.878631,,0.381184
5,0.161557,,0.381184


With fillna you can do lots of other things with a little creativity. For example, you might pass the mean or median value of a Series

In [98]:
data = Series([1, np.nan, 3.5, np.nan, 7])

data

0    1.0
1    NaN
2    3.5
3    NaN
4    7.0
dtype: float64

In [99]:
data.fillna(data.mean())

0    1.000000
1    3.833333
2    3.500000
3    3.833333
4    7.000000
dtype: float64

![fillna function arguments](../../Pictures/fillna%20function%20arguments.png)