___

<a href='http://www.pieriandata.com'> <img src='./Pierian_Data_Logo.png' /></a>
___

# Missing Data

Let's show a few convenient methods to deal with Missing Data in pandas:

In [11]:
import numpy as np
import pandas as pd

In [12]:
df = pd.DataFrame({'A':[1,2,np.nan],
                  'B':[5,np.nan,np.nan],
                  'C':[1,2,3]})
'''
np.nan is from the NumPy library, representing missing or undefined values in numerical data.
Because of np.nan, pandas will automatically set the dtype of 'A' and 'B' to float, even though the original numbers were integers.
''';

In [13]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


📉 1. Detect missing values — .isna()   → This tells you which values are missing (True) in each cell.

In [14]:
df.isna()

Unnamed: 0,A,B,C
0,False,False,False
1,False,True,False
2,True,True,False


📉 2. Drop rows with missing values — .dropna() → Only keeps rows where no column has missing values.

In [None]:
df.dropna() #by default axis=0

Unnamed: 0,A,B,C
0,1.0,5.0,1


📉 3. Drop columns with all missing values → If you had any column with all values missing, this would remove it. In your case, it keeps all columns because none are fully NaN.

In [20]:
df.dropna(axis=1, how='all')

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


In [23]:
df.dropna(axis=1, how='any')

Unnamed: 0,C
0,1
1,2
2,3


In [None]:
df.dropna(axis=1) #This means by default how='any'. It does the operation of 'df.dropna()' but along the columns.

Unnamed: 0,C
0,1
1,2
2,3


In [None]:
df.dropna(thresh=2) #This drops rows that have fewer than 2 non-NaN (non-missing) values.

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2


In [24]:
df.dropna(axis=1, thresh=2)

Unnamed: 0,A,C
0,1.0,1
1,2.0,2
2,,3


📉 4. Fill missing values — .fillna()

    a. Fill with a fixed value:

In [18]:
df.fillna(value='FILL VALUE')

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,FILL VALUE,2
2,FILL VALUE,FILL VALUE,3


    b. Fill with column-wise means:

In [21]:
df.fillna(df.mean(numeric_only=True))   #→ Replaces each NaN with the mean of its column.

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,5.0,2
2,1.5,5.0,3


In [25]:
df.fillna(df.mean())

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,5.0,2
2,1.5,5.0,3


In [19]:
df['A'].fillna(value=df['A'].mean())

0    1.0
1    2.0
2    1.5
Name: A, dtype: float64

📉 5. Count missing values per column

In [22]:
df.isna().sum() #→ Tells you how many missing values are in each column.

A    1
B    2
C    0
dtype: int64

# Great Job!