##  Introduction to Missing Data

Handling missing data is crucial in data analysis. Pandas uses sentinels like None and NaN to represent missing data.


## None

None is a Python object often used to represent missing data. However, it requires the array to have an object data type, leading to slower operations.

In [7]:
import numpy as np
import pandas as pd
arr = np.array([2,None,5,6])
print(arr)

[2 None 5 6]


Operations involving None may result in errors, such as when summing a list containing None

## NaN: Missing Numerical Data

NaN (Not a Number) is a special floating-point value used to represent missing numerical data. Unlike None, operations with NaN do not throw errors but propagate the NaN value.

In [18]:
arr = np.array([1,2,np.NaN,5])
print(arr)

[ 1.  2. nan  5.]


Arithmetic operations with NaN result in NaN.

## Detecting Null Values

`isnull() and notnull()`: These functions help detect missing values in data.

In [32]:
data = pd.Series([1,2,np.NaN,5,None])
print(data.isnull())

0    False
1    False
2     True
3    False
4     True
dtype: bool


## Dropping Null Values

`dropna()` Function: Removes missing values from a Series or DataFrame.

In [37]:
print(data.dropna())

0    1.0
1    2.0
3    5.0
dtype: float64


Dropping rows or columns with missing values in a DataFrame.

## Filling Null Values

`fillna()` Function: Replaces missing values with a specified value.

In [9]:
data = pd.Series([1,"Hi",np.NaN,None,5],index = ['A','B','C','D','E'])
print(data.isnull())
print(data.fillna(0))

A    False
B    False
C     True
D     True
E    False
dtype: bool
A     1
B    Hi
C     0
D     0
E     5
dtype: object


Using methods like forward-fill `(ffill)` and back-fill `(bfill)` to propagate non-null values forward or backward.

## Forward Fill (ffill):

- Propagates the last valid observation forward to fill missing values.
- It looks at the previous row and fills the NaN value with the value from that row.

In [60]:
df = pd.DataFrame({'A': [1, np.nan, 3, np.nan], 'B': [5, 6, np.nan, 8]})
print(df)
df.ffill()

     A    B
0  1.0  5.0
1  NaN  6.0
2  3.0  NaN
3  NaN  8.0


Unnamed: 0,A,B
0,1.0,5.0
1,1.0,6.0
2,3.0,6.0
3,3.0,8.0


## Backward Fill (bfill):

- Propagates the next valid observation backward to fill missing values.
- It looks at the next row and fills the NaN value with the value from that row.

In [71]:
df = pd.DataFrame({'A': [1, np.nan, 3, np.nan], 'B': [5, 6, np.nan, 8]})
print(df)
df.bfill()

     A    B
0  1.0  5.0
1  NaN  6.0
2  3.0  NaN
3  NaN  8.0


Unnamed: 0,A,B
0,1.0,5.0
1,3.0,6.0
2,3.0,8.0
3,,8.0
