# CHAPTER 7: Data Cleaning and Preparation

In [1]:
import pandas as pd
import numpy as np

## 7.1 Handling Missing Data

In [2]:
float_data = pd.Series([1.2, -3.5, np.nan, 0])
float_data

0    1.2
1   -3.5
2    NaN
3    0.0
dtype: float64

The `isna` method gives us a Boolean Series with True where values are null:

In [3]:
float_data.isna()

0    False
1    False
2     True
3    False
dtype: bool

The built-in Python None value is also treated as NA:

In [None]:
string_data = pd.Series(["aardvark", np.nan, None, "avocado"])
string_data

Unnamed: 0,0
0,aardvark
1,
2,
3,avocado


In [None]:
string_data.isna()

Unnamed: 0,0
0,False
1,True
2,True
3,False


In [None]:
float_data = pd.Series([1, 2, None], dtype='float64')
float_data

Unnamed: 0,0
0,1.0
1,2.0
2,


In [None]:
float_data.isna()

Unnamed: 0,0
0,False
1,False
2,True


### Filtering Out Missing Data
There are a few ways to filter out missing data. While you always have the option to
do it by hand using `pandas.isna` and `Boolean` indexing, `dropna` can be helpful.

In [4]:
data = pd.Series([1, np.nan, 3.5, np.nan, 7])
data.dropna()

0    1.0
2    3.5
4    7.0
dtype: float64

This is the same thing as doing:

In [5]:
data[data.notna()]

0    1.0
2    3.5
4    7.0
dtype: float64

In [6]:
data = pd.DataFrame([[1., 6.5, 3.], [1., np.nan, np.nan],
[np.nan, np.nan, np.nan], [np.nan, 6.5, 3.]])
data

Unnamed: 0,0,1,2
0,1.0,6.5,3.0
1,1.0,,
2,,,
3,,6.5,3.0
