# Pandas: Handling Missing Values

This notebook covers:
1. Detecting missing values
2. Dropping rows/columns with missing values
3. Filling missing values with constants, mean, or mode
4. Interpolating missing numerical values

In [10]:
import pandas as pd
import numpy as np

df = pd.read_csv(r"../Datasets/employees.csv", encoding="latin1")
dataset = df.head(10)

## Detect Missing Values
- `isnull()` returns a boolean DataFrame indicating which values are null.
- `sum()` can be used to count missing values per column.

In [7]:
dataset.isnull()
dataset.isnull().sum()

First Name           1
Gender               0
Start Date           0
Last Login Time      0
Salary               0
Bonus %              0
Senior Management    1
Team                 1
dtype: int64

## Drop Missing Values
- `dropna(axis=0)` removes rows with missing values.
- `axis=0` → drop rows, `axis=1` → drop columns.
- `inplace=True` modifies the original DataFrame.

In [None]:
dataset.dropna(axis=0, inplace=True)

## Fill Missing Values
- Replace all missing values with a constant:
    ```python
    dataset.fillna(0, inplace=True)
    ```
- Fill numerical column with mean:
    ```python
    dataset["Salary"].fillna(dataset["Salary"].mean(), inplace=True)
    ```
- Fill string column with mode (most common value):
    ```python
    dataset["Team"].fillna(dataset["Team"].mode()[0], inplace=True)
    ```

In [None]:
dataset.fillna(0, inplace=True)
dataset["Salary"].fillna(dataset["Salary"].mean(), inplace=True)
dataset["Team"].fillna(dataset["Team"].mode()[0], inplace=True)

## Interpolate Missing Values
- `interpolate()` estimates missing numerical values.
- `method="linear"` performs linear interpolation along the specified axis.

In [None]:
dataset.loc[0, "Salary"] = np.nan
dataset.interpolate(method="linear", axis=0, inplace=True)
dataset