# Removing Null Values in a Pandas Data Frame

Given below is the function used to drop null values in a DataFrame (df)

df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Parameter Definition:
    
axis : {0 or ‘index’, 1 or ‘columns’}, or tuple/list thereof
Pass tuple or list to drop on multiple axes

-------------------------------------------------------------

how : {‘any’, ‘all’}

any : if any NA values are present, drop that label
all : if all values are NA, drop that label
    
-------------------------------------------------------------

thresh : int, default None
int value : require that many non-NA values

-------------------------------------------------------------

subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include

-------------------------------------------------------------

inplace : boolean, default False
If True, do operation inplace and return None.

Since by default inplace is False
We do not need to write df = df.dropna(...)
it automatically makes changes in the DataFrame

In [18]:
#Suppose you created the following DataFrame that contains NaN values:

import pandas as pd
import numpy as np

df = pd.DataFrame([[np.nan, 2, np.nan, 0], 
                   [3, 4, np.nan, 1],
                   [np.nan, np.nan, np.nan, 5]],
                   columns=list('ABCD'))

print(df)

     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5


In [None]:
Now we will understand how do we calculate NaN values for the the below 6 cases:

Dropping Columns:
1) Drop the columns where any of the elements is nan
2) Drop the columns where all elements are nan

Dropping Rows:
1) Drop the rows where any of the elements are nan
2) Drop the rows where all of the elements are nan (there is no row to drop, so df stays the same):
3) Keep only the rows with at least 2 non-na values
4) Drop the rows based on null values present in specific columns

Dropping Columns:

1) Drop the columns where any of the elements is nan

In [4]:
df.dropna(axis=1, how='any')

Unnamed: 0,D
0,0
1,1
2,5


Dropping Columns:

2) Drop the columns where all of the elements is nan

In [6]:
df.dropna(axis=1, how='all')

Unnamed: 0,A,B,D
0,,2.0,0
1,3.0,4.0,1
2,,,5


Dropping Rows:

1) Drop the columns where all of the elements is nan

In [8]:
df.dropna(axis=0, how='all')

Unnamed: 0,A,B,C,D
0,,2.0,,0
1,3.0,4.0,,1
2,,,,5


Dropping Rows:

2) Drop the rows where all of the elements are nan (there is no row to drop, so df stays the same):

In [10]:
df.dropna(axis=0, how='any')

Unnamed: 0,A,B,C,D


Dropping Rows:

3) Keep only the rows with at least 2 non-na values:

In [14]:
df.dropna(thresh=2)

Unnamed: 0,A,B,C,D
0,,2.0,,0
1,3.0,4.0,,1


In [19]:
Dropping Rows:

3) Keep only the rows with at least 2 non-na values:

Unnamed: 0,A,B,C,D
0,,2.0,,0
1,3.0,4.0,,1


In [None]:
Dropping Rows:

4) Drop the rows based on null values present in specific columns

In [None]:
df.dropna(subset = ["B","C"])