
# STEP 4: Handle Missing Data – FILL (Complete Pandas Guide)

This notebook covers **ALL practical and real-world ways** to FILL missing (null) values
in a Pandas DataFrame.

Focus: **what to fill, how to fill, and when to use which method**.


In [None]:

import pandas as pd
import numpy as np


## 1. Sample Dataset with Missing Values

In [None]:

df = pd.DataFrame({
    "employee": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "department": ["IT", "HR", "IT", None, "Finance"],
    "salary": [60000, None, 75000, 50000, None],
    "bonus": [5000, None, None, 3000, None],
    "join_date": ["2020-01-10", "2019-06-15", None, "2021-03-01", None]
})
df


## 2. Fill Missing Values with Fixed Value

In [None]:

df_fixed = df.fillna(0)
df_fixed


## 3. Fill Numerical Data using Mean

In [None]:

df_mean = df.copy()
df_mean['salary'] = df_mean['salary'].fillna(df_mean['salary'].mean())
df_mean


## 4. Fill Numerical Data using Median (Best Practice)

In [None]:

df_median = df.copy()
df_median['salary'] = df_median['salary'].fillna(df_median['salary'].median())
df_median


## 5. Fill Categorical Data using Mode

In [None]:

df_mode = df.copy()
df_mode['department'] = df_mode['department'].fillna(df_mode['department'].mode()[0])
df_mode


## 6. Fill Date Columns after Conversion

In [None]:

df_dates = df.copy()
df_dates['join_date'] = pd.to_datetime(df_dates['join_date'], errors='coerce')
df_dates['join_date'] = df_dates['join_date'].fillna(df_dates['join_date'].mode()[0])
df_dates


## 7. Forward Fill (Time-Series / Sequential Data)

In [None]:

df_ffill = df.copy()
df_ffill.fillna(method='ffill')


## 8. Backward Fill (Time-Series / Sequential Data)

In [None]:

df_bfill = df.copy()
df_bfill.fillna(method='bfill')


## 9. Group-wise Fill using Mean

In [None]:

df_group_mean = df.copy()
df_group_mean['salary'] = df_group_mean.groupby('department')['salary']                                         .transform(lambda x: x.fillna(x.mean()))
df_group_mean


## 10. Group-wise Fill using Median

In [None]:

df_group_median = df.copy()
df_group_median['salary'] = df_group_median.groupby('department')['salary']                                           .transform(lambda x: x.fillna(x.median()))
df_group_median


## 11. Fill Missing Values using Interpolation

In [None]:

df_interp = df.copy()
df_interp['salary'] = df_interp['salary'].interpolate()
df_interp


## 12. Conditional Fill using Business Rules

In [None]:

df_rule = df.copy()
df_rule.loc[df_rule['bonus'].isnull(), 'bonus'] = 0
df_rule


## 13. Inplace Fill (Memory Efficient)

In [None]:

df_inplace = df.copy()
df_inplace['salary'].fillna(df_inplace['salary'].median(), inplace=True)
df_inplace



## ✅ Best Practices & Interview Notes
- Use **median** for skewed / finance data
- Use **mode** for categorical data
- Use **group-wise fill** for realistic datasets
- Use **ffill/bfill** for time-series
- Avoid filling blindly without understanding data



## ✔ Summary
- `fillna()` is the core filling method
- Different data types require different strategies
- Filling preserves data volume better than removal
- Always validate after filling
