# Handling Missing Data with Pandas

Pandas borrows all the capabilites from numoy selection + adds a number of convenient methods to handle missing values. 

In [1]:
import numpy as np
import pandas as pd

## Pandas utility functions

Similarly to numpy, pandas also has a few utility functions to identity and detect null values:

In [2]:
pd.isnull(np.nan)

True

In [3]:
pd.isnull(None)

True

In [5]:
pd.isna(np.nan)

True

In [6]:
pd.isna(None)

True

In [7]:
pd.notnull(None)

False

In [8]:
pd.notnull(np.nan)

False

In [9]:
pd.notna(np.nan)

False

In [10]:
pd.notnull(3)

True

These functions also work with series and DataFrames :

In [13]:
pd.isnull(pd.Series([1, np.nan,7]))

0    False
1     True
2    False
dtype: bool

In [14]:
pd.notnull(pd.Series([1, np.nan, 7]))

0     True
1    False
2     True
dtype: bool

In [15]:
pd.isnull(pd.DataFrame({
    'Column A': [1, np.nan, 7],
    'Column B': [np.nan, 2, 3],
    'Column C': [np.nan, 2, np.nan]
}) )

Unnamed: 0,Column A,Column B,Column C
0,False,True,True
1,True,False,False
2,False,False,True


### NOTES

NaN: Not a number and represents missing values in pandas


pd.isnull: Checks for missing NaN values and returns True where a vlaue is NaN.


pd.notnull: Checks for non-missing values and returns True where a value is present. 

# Pandas Operations with Missing values

In [16]:
pd.Series([1, 2, np.nan]).count()

2

In [17]:
pd.Series([1,2,np.nan]).sum()

3.0

In [18]:
pd.Series([1,2, np.nan]).mean()

1.5

# Filtering Missing Datas

In [19]:
s= pd.Series([1,2,3, np.nan, np.nan, 4])

In [20]:
pd.notnull(s)

0     True
1     True
2     True
3    False
4    False
5     True
dtype: bool

In [21]:
pd.notnull(s).sum()

4

In [22]:
pd.isnull(s).sum()

2

In [24]:
s[pd.notnull(s)]

0    1.0
1    2.0
2    3.0
5    4.0
dtype: float64

This means that pd.notnull() filters the series S, and keeps only elements where the result of pd.notnull() is true.
that means 0,1,2,5 
and removes the rows with missing Nan values.

In [26]:
s= pd.Series(['a', 3, np.nan, 1, np.nan])
print(s.notnull().sum())

3
