In [1]:
# Load libraries
import pandas as pd
import numpy as np
# Create date
time_index = pd.date_range("01/01/2010", periods=5, freq="M")
# Create data frame, set index
dataframe = pd.DataFrame(index=time_index)
# Create feature with a gap of missing values
dataframe["Sales"] = [1.0,2.0,np.nan,np.nan,5.0]

In [2]:
dataframe

Unnamed: 0,Sales
2010-01-31,1.0
2010-02-28,2.0
2010-03-31,
2010-04-30,
2010-05-31,5.0


In [3]:
# Interpolate missing values
dataframe.interpolate()

Unnamed: 0,Sales
2010-01-31,1.0
2010-02-28,2.0
2010-03-31,3.0
2010-04-30,4.0
2010-05-31,5.0


In [5]:
#Alternatively, we can replace missing values with the last known value (i.e.,forward-filling):
# Forward-fill
dataframe.ffill()

Unnamed: 0,Sales
2010-01-31,1.0
2010-02-28,2.0
2010-03-31,2.0
2010-04-30,2.0
2010-05-31,5.0


In [6]:
# We can also replace missing values with the latest known value (i.e., backfilling):
# Back-fill
dataframe.bfill()

Unnamed: 0,Sales
2010-01-31,1.0
2010-02-28,2.0
2010-03-31,5.0
2010-04-30,5.0
2010-05-31,5.0


Interpolation is a technique for filling in gaps caused by missing values by, in effect, drawing a line or curve between the known values bordering the gap and
using that line or curve to predict reasonable values. Interpolation can be
particularly useful when the time intervals between are constant, the data is not
prone to noisy fluctuations, and the gaps caused by missing values are small. For
example, in our solution a gap of two missing values was bordered by 2.0 and
5.0. By fitting a line starting at 2.0 and ending at 5.0, we can make reasonable
guesses for the two missing values in between of 3.0 and 4.0.
If we believe the line between the two known points is nonlinear, we can use
interpolate’s method to specify the interpolation method

In [7]:
# Interpolate missing values
dataframe.interpolate(method="quadratic")

Unnamed: 0,Sales
2010-01-31,1.0
2010-02-28,2.0
2010-03-31,3.059808
2010-04-30,4.038069
2010-05-31,5.0


Finally, there might be cases when we have large gaps of missing values and do
not want to interpolate values across the entire gap. In these cases we can use
limit to restrict the number of interpolated values and limit_direction to set
whether to interpolate values forward from at the last known value before the
gap or vice versa

In [8]:
# Interpolate missing values
dataframe.interpolate(limit=1, limit_direction="forward")

Unnamed: 0,Sales
2010-01-31,1.0
2010-02-28,2.0
2010-03-31,3.0
2010-04-30,
2010-05-31,5.0


Back-filling and forward-filling can be thought of as a form of naive
interpolation, where we draw a flat line from a known value and use it to fill in
missing values. One (minor) advantage back- and forward-filling have over
interpolation is the lack of the need for known values on both sides of missing
value(s).