In [1]:
import pandas as pd

### Pandas Fill NA

Pandas Fill NA has a ton of functionality and flexibility once you dive into the parameters. Let's start off simple then explore around.

We will run through 3 examples:
1. Default Fill NA
2. Fill NA based off of the index - specific values for rows and columns
3. Fill NA - Backfill Foward fill
4. Fill NA - Backfill Foward fill w/ limits

But first, let's create our DataFrame with NA values. Luckily Pandas as a pd.NA that we can use.

In [6]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', pd.NA),
                   ('Liho Liho', 'Restaurant', 224.0),
                   (pd.NA, 'bar', 80.5),
                   (pd.NA, 'bar', pd.NA),
                   (pd.NA, 'bar', 65.23),
                   ('Blue Barn', pd.NA, 361.98)],
           columns=('name', 'type', 'AvgBill')
                 )
df

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,
1,Liho Liho,Restaurant,224.0
2,,bar,80.5
3,,bar,
4,,bar,65.23
5,Blue Barn,,361.98


### 1. Default Fill NA

To start off, let's fill in our NA values with another string "No Value Available." You can also do a number or timestamp or anything you want.

Notice how all of the NAs have been replaced.

In [7]:
df.fillna("No Value Available")

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,No Value Available
1,Liho Liho,Restaurant,224
2,No Value Available,bar,80.5
3,No Value Available,bar,No Value Available
4,No Value Available,bar,65.23
5,Blue Barn,No Value Available,361.98


### 2. Fill NA based off of the index - specific values for rows and columns

However, "No Value Available" is weird to fill-in for INT and String columns. Luckily Pandas will allow us to fill in values per index (per column or row) with a dict, Series, or DataFrame.

**dict** = {key: value} key=index, value=fill_with

Notice how columns or axis that I don't specify do not get filled in.

In [13]:
df.fillna({'name': 'No Name Rest.', 'type': 'No Name Type'})

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,
1,Liho Liho,Restaurant,224.0
2,No Name Rest.,bar,80.5
3,No Name Rest.,bar,
4,No Name Rest.,bar,65.23
5,Blue Barn,No Name Type,361.98


To fill with **Series**, have your index be the index you want to fill, and the value the fill value.

In [15]:
s = pd.Series(data=["No Name Type2", 100], index=["type", 'AvgBill'])
s

type       No Name Type2
AvgBill              100
dtype: object

In [16]:
df.fillna(s)

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,100.0
1,Liho Liho,Restaurant,224.0
2,,bar,80.5
3,,bar,100.0
4,,bar,65.23
5,Blue Barn,No Name Type2,361.98


### 3. Fill NA - Backfill Foward fill

Next up is Backfill and Forward Fill - These awesome *methods* help you fill in null values with other values from your DataFrame.

**Backfill** = 'Step back and fill your values'

**Forward Fill** = 'Step forward and fill your values'

Notice below how Blue Barn replaces the 3 filled in restaurant names above is. Blue Barn is stepped back and filled in. There is no row in front of Row 5, Column type - So nothing gets filled in.

In [20]:
df.fillna(method='bfill', axis=0)

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,224.0
1,Liho Liho,Restaurant,224.0
2,Blue Barn,bar,80.5
3,Blue Barn,bar,65.23
4,Blue Barn,bar,65.23
5,Blue Barn,,361.98


Here the inverse happens, the values that do the filling are propagated forward.

In [23]:
df.fillna(method='ffill', axis=0)

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,
1,Liho Liho,Restaurant,224.0
2,Liho Liho,bar,80.5
3,Liho Liho,bar,80.5
4,Liho Liho,bar,65.23
5,Blue Barn,bar,361.98


You can also back/forward fill on the row axis. Notice how 'bar' fills the NAs of 'name' column.

In [25]:
df.fillna(method='bfill', axis=1)

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,
1,Liho Liho,Restaurant,224.0
2,bar,bar,80.5
3,bar,bar,
4,bar,bar,65.23
5,Blue Barn,361.98,361.98


### 4. Fill NA - Backfill Foward fill w/ limits

Say you have a ton of NAs and you want to forward or backfill them. However, you don't want to forward fill or backfill too many cells ahead/behind. You can set a *limit* which will tell pandas how many cells. 

Here we will set the limit to 2 and the 3rd cell will not get forward filled

In [26]:
df.fillna(method='ffill', axis=0, limit=2)

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,
1,Liho Liho,Restaurant,224.0
2,Liho Liho,bar,80.5
3,Liho Liho,bar,80.5
4,,bar,65.23
5,Blue Barn,bar,361.98
