## Strategies for Handling Missing Values:

### Filling Missing Values:

Filling missing values involves replacing them with a specific value, such as a mean, median, or a constant.
This strategy is useful when the missing values follow a pattern or when the missingness is informative.

## Dropping Missing Values:

Dropping missing values involves removing rows or columns containing missing values.
This strategy is suitable when the missing values are randomly distributed and dropping them doesn't cause significant data loss.

## Interpolation:

Interpolation estimates the missing values based on the surrounding values.
It's useful when the data follows a trend or pattern and the missing values can be reasonably estimated from neighboring data points.

In [4]:
import pandas as pd
from faker import Faker
import numpy as np

# Initialize Faker
fake = Faker()

# Generate fake data
data = {'Name': [fake.name() for _ in range(10)],
        'Age': [fake.random_int(min=18, max=80) for _ in range(10)],
        'Income': [fake.random_number(digits=5) for _ in range(10)]}

# Introduce missing values
data['Age'][2] = np.nan
data['Income'][5] = np.nan

# Create DataFrame
df = pd.DataFrame(data)

# Display DataFrame
print("Original DataFrame:")
print(df)


Original DataFrame:
              Name   Age   Income
0    Kelly Jackson  62.0  44351.0
1  Michael Pearson  55.0  14797.0
2      Andre Smith   NaN  84850.0
3     Grace Wright  55.0  60998.0
4  Jonathon Butler  51.0  48900.0
5   Charles Little  64.0      NaN
6     Kevin Rivera  39.0   2524.0
7   Raymond Walker  72.0  24060.0
8   Daniel Miranda  26.0  57391.0
9        Kerri Fox  22.0  62402.0


## Dropping Missing Values

In [5]:
# Drop rows with missing values
df_dropped = df.dropna()

print("\nDataFrame after dropping rows with missing values:")
print(df_dropped)



DataFrame after dropping rows with missing values:
              Name   Age   Income
0    Kelly Jackson  62.0  44351.0
1  Michael Pearson  55.0  14797.0
3     Grace Wright  55.0  60998.0
4  Jonathon Butler  51.0  48900.0
6     Kevin Rivera  39.0   2524.0
7   Raymond Walker  72.0  24060.0
8   Daniel Miranda  26.0  57391.0
9        Kerri Fox  22.0  62402.0


## Interpolation

In [6]:
# Interpolate missing values
df_interpolated = df.interpolate()

print("\nDataFrame after interpolating missing values:")
print(df_interpolated)



DataFrame after interpolating missing values:
              Name   Age   Income
0    Kelly Jackson  62.0  44351.0
1  Michael Pearson  55.0  14797.0
2      Andre Smith  55.0  84850.0
3     Grace Wright  55.0  60998.0
4  Jonathon Butler  51.0  48900.0
5   Charles Little  64.0  25712.0
6     Kevin Rivera  39.0   2524.0
7   Raymond Walker  72.0  24060.0
8   Daniel Miranda  26.0  57391.0
9        Kerri Fox  22.0  62402.0


  df_interpolated = df.interpolate()
