# Handling Different Types of Missing Data in Python

This notebook demonstrates various techniques for handling missing data using Python's `pandas` library.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np

## Sample DataFrame with Missing Values

In [None]:
# Create a sample DataFrame with NaN values
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry', 'Ivy', 'Jack'],
    'age': [25, 30, np.nan, 40, 35, np.nan, 45, 50, np.nan, 55],
    'gender': ['F', 'M', np.nan, 'M', 'F', 'M', 'F', np.nan, 'F', 'M']
}
df = pd.DataFrame(data)
df

## 1. Identifying Missing Values

In [None]:
# Check for missing values
df.isnull()

## 2. Counting Missing Values in Each Column

In [None]:
# Count missing values in each column
df.isnull().sum()

## 3. Dropping Missing Values

In [None]:
# Drop rows with any missing values
df_drop_any = df.dropna()
df_drop_any

In [None]:
# Drop rows only if all values are missing
df_drop_all = df.dropna(how='all')
df_drop_all

## 4. Filling Missing Values

In [None]:
# Fill missing values in 'age' with the mean
age_mean = df['age'].mean()
df_filled_mean = df['age'].fillna(age_mean)
df_filled_mean

In [None]:
# Fill missing values in 'age' with the median
age_median = df['age'].median()
df['age'] = df['age'].fillna(age_median)
df

In [None]:
# Fill missing values in 'gender' with the mode
gender_mode = df['gender'].mode()[0]
df['gender'] = df['gender'].fillna(gender_mode)
df

## 5. Filling Missing Values Using Forward Fill and Backward Fill

In [None]:
# Forward fill
df_ffill = df.fillna(method='ffill')
df_ffill

In [None]:
# Backward fill
df_bfill = df.fillna(method='bfill')
df_bfill

## 6. Interpolating Missing Values

In [None]:
# Linear interpolation
df_interpolated = df.interpolate(method='linear')
df_interpolated

## 7. Replacing NaN in 'age' with the Mean of 4 Previous and 4 Next Values

In [None]:
# Function to replace NaN with mean of 4 previous and 4 next values
def replace_na_with_neighborhood_mean(column):
    for i in range(len(column)):
        if pd.isna(column[i]):  # Check if the current value is NaN
            # Get the 4 previous and 4 next values
            start = max(0, i - 4)
            end = min(len(column), i + 5)
            # Compute mean excluding NaN values
            neighborhood_mean = column[start:end].mean()
            # Replace NaN with the calculated neighborhood mean
            column[i] = neighborhood_mean
    return column

# Apply the function to the 'age' column
df['age'] = replace_na_with_neighborhood_mean(df['age'])
df

## 8. Using Scikit-Learn's `SimpleImputer` for Advanced Imputation

In [None]:
from sklearn.impute import SimpleImputer

# Using SimpleImputer to fill NaN values with the median in 'age'
imputer = SimpleImputer(strategy='median')
df['age'] = imputer.fit_transform(df[['age']])
df

## Summary
This notebook covered various methods to handle missing data, including:
- Identifying and counting missing values
- Dropping missing data
- Filling with specific values (mean, median, mode)
- Forward and backward filling
- Interpolating missing values
- Custom filling based on neighborhood values
- Using Scikit-Learn's `SimpleImputer` for advanced imputation options

Choose the method that best suits your dataset and analysis needs.