## Handling missing Data with Pandas

Pandas borrows its capabilities from numpy selection + adds a number of convenient methods to handle missing values. Let's see one at a time:

<hr style="border: 3px solid SlateGray"> </hr>

In [1]:
import numpy as np
import pandas as pd

## Pandas utility functions

Similarly to `numpy`, pandas aslo has a few utility functions to identify and detect null values:

In [2]:
pd.isnull(np.nan)

True

In [3]:
pd.isnull(None)

True

In [4]:
pd.isna(np.nan)

True

In [5]:
pd.isna(None)

True

The opposite ones also exist:

In [6]:
pd.notnull(None)

False

In [7]:
pd.notnull(np.nan)

False

In [8]:
pd.notnull(3)

True

The functions also work with `Series` and `DataFrame`s:

In [9]:
pd.isnull(pd.Series([1, np.nan, 7]))

0    False
1     True
2    False
dtype: bool

In [10]:
pd.notnull(pd.Series([1, np.nan, 7]))

0     True
1    False
2     True
dtype: bool

In [11]:
pd.isnull(pd.DataFrame({
    'Column A': [1, np.nan, 7],
    'Column B': [np.nan, 2, 3],
    'Column C': [np.nan, 2, np.nan]
}))

Unnamed: 0,Column A,Column B,Column C
0,False,True,True
1,True,False,False
2,False,False,True


<hr style='border:3px solid SlateGray'></hr>

### Pandas Operations with Missing Values

Pandas manages missing values more gracefully than numpy. `nan`s will no longer behave as "values", and operations will just ignore them completely:

In [12]:
pd.Series([1, 2, np.nan]).count()

2

In [13]:
pd.Series([1, 2, np.nan]).sum()

3.0

In [14]:
pd.Series([1, 2, np.nan]).mean()

1.5

---

### Filtering missing data

As we saw with numpy, we could combine boolean slection + `pd.isnull` to filter out those `nan`s and null values:

In [15]:
s = pd.Series([1, 2, 3, np.nan, np.nan, 4])

In [16]:
pd.notnull(s)

0     True
1     True
2     True
3    False
4    False
5     True
dtype: bool

In [17]:
pd.notnull(s).sum()

4

In [18]:
s[pd.notnull(s)]

0    1.0
1    2.0
2    3.0
5    4.0
dtype: float64

But both `notnull` and `isnull` are also methods of `Series` and `DataFrame`s, so we could use it that way:

In [19]:
s.isnull()

0    False
1    False
2    False
3     True
4     True
5    False
dtype: bool

In [20]:
s.notnull()

0     True
1     True
2     True
3    False
4    False
5     True
dtype: bool

In [21]:
s[s.notnull()]

0    1.0
1    2.0
2    3.0
5    4.0
dtype: float64

<hr style='border: 3px solid SlateGray'></hr>

### Dropping null values

Boolean selection + `notnull()` seems a little bit verbose and repetitibe. And as we said before: any rerpetitive task will probably have a better, more DRY way, in this case we can use the `dropna` method: 

In [22]:
s.dropna()

0    1.0
1    2.0
2    3.0
5    4.0
dtype: float64

---

### Dropping null values on DataFrames