# Handling Missing Data

In [1]:
import numpy as np
import pandas as pd

**None: Pythonic missing data**  
**None** cannot be used in any arbitrary NumPy/Pandas array, but only in arrays with data type 'object' (i.e., arrays of Python objects).  
The use of Python objects in an array also means that if you perform aggregations like _sum()_ or _min()_ across an array with a **None** value, you will generally get an error.

**NaN: Missing numerical data**  
**NaN** (acronym for Not a Number) is a special floating-point value recognized by all systems that use the standard IEEE floating-point representation.  
NumPy does provide some special aggregations that will ignore these missing values:

In [4]:
vals = np.array([1, np.nan, 3, 4])
np.nansum(vals), np.nanmin(vals), np.nanmax(vals)

(8.0, 1.0, 4.0)

**NaN and None in Pandas**  
NaN and None both have their place, and Pandas is built to handle the two of them nearly interchangeablt, converting between them where appropriate:

In [5]:
pd.Series([1, np.nan, 2, None])

0    1.0
1    NaN
2    2.0
3    NaN
dtype: float64

For types that don't have an available sentinel value, Pandas automatically type-casts when NA values are present. For example, if we set a value in an integer array to np.nan, it will automatically be upcast to a floating-point type to accommoodate the NA:

In [7]:
x = pd.Series(range(2), dtype=int)
x[0] = None
x

0    NaN
1    1.0
dtype: float64

![](1.jpg)

Keep in mind that in Pandas, string data is always stored with an object dtype.

**Detecting null values**

In [None]:
data = pd.Series([1, np.nan, 'hello', None])
data.isnull()
data[data.notnull()]

**Dropping null values**

In [None]:
data.dropna()

By default, dropna() will drop all rows in which any null value is present:

In [13]:
df = pd.DataFrame([[1, np.nan, 2], [2, 3, 5], [np.nan, 4, 6]])
df.dropna()

Unnamed: 0,0,1,2
1,2.0,3.0,5


In [None]:
df.dropna(axis='columns')
df.dropna(axis=1)

In [None]:
df[3] = np.nan
df.dropna(axis='columns', how='any')
df.dropna(axis='columns', how='all')
df.dropna(axis='rows', thresh=3)     # a minimum number of non-null values to be kept

**Filling null values**

In [None]:
data = pd.Series([1, np.nan, 2, None, 3], index=list('abcde'))
data.fillna(0)
data.fillna(method='ffill')
data.fillna(method='bfill')
df.fillna(method='ffill', axis=1)