## **Pandas Library - Continuation**

* **Missing Data**: In **Pandas**, missing data refers to missing values (**NaN**). They can affect analysis, so we use functions like **`isnull()`, `dropna()`, and `fillna()`** to detect, remove, or fill missing values. Handling missing data ensures consistency and accuracy in results.

In [37]:
import numpy as np
import pandas as pd
from numpy.random import randn

In [38]:
# First example
tab = {'A':[1,2,np.nan], 'B':[4, np.nan, np.nan], 'C':[7, 8, 9]}

In [39]:
df = pd.DataFrame(tab)
df

Unnamed: 0,A,B,C
0,1.0,4.0,7
1,2.0,,8
2,,,9


In [40]:
df.dropna()

Unnamed: 0,A,B,C
0,1.0,4.0,7


In [41]:
df.dropna(axis=0)

Unnamed: 0,A,B,C
0,1.0,4.0,7


In [42]:
df.dropna(axis=1)

Unnamed: 0,C
0,7
1,8
2,9


* The command **`df.dropna(thresh=2)`** drops rows (or columns if axis=1 is used) that have fewer than 2 non-null values.
In other words, a row will only be kept if it has at least 2 valid values.

In [43]:
df.dropna(thresh=2)

Unnamed: 0,A,B,C
0,1.0,4.0,7
1,2.0,,8


In [45]:
df

Unnamed: 0,A,B,C
0,1.0,4.0,7
1,2.0,,8
2,,,9


In [51]:
df.fillna(value='FIll VAL')

Unnamed: 0,A,B,C
0,1.0,4.0,7
1,2.0,FIll VAL,8
2,FIll VAL,FIll VAL,9


In [53]:
df['A'].fillna(value=df['A'].mean())

0    1.0
1    2.0
2    1.5
Name: A, dtype: float64