## 🔹 2. Handling Missing Values
Missing data can negatively impact model performance. Scikit-learn provides SimpleImputer to fill missing values.

In [1]:
import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer


In [3]:
# Sample dataset with missing values
data = {'Age': [25, np.nan, 30, 35, np.nan, 40], 
        'Salary': [50000, 60000, np.nan, 80000, 90000, 100000]}


In [5]:
df = pd.DataFrame(data)
print("Original Data:\n", df)


Original Data:
     Age    Salary
0  25.0   50000.0
1   NaN   60000.0
2  30.0       NaN
3  35.0   80000.0
4   NaN   90000.0
5  40.0  100000.0


In [7]:

# Replace missing values with the column mean
imputer = SimpleImputer(strategy='mean')
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)

print("\nAfter Imputation:\n", df_imputed)


After Imputation:
     Age    Salary
0  25.0   50000.0
1  32.5   60000.0
2  30.0   76000.0
3  35.0   80000.0
4  32.5   90000.0
5  40.0  100000.0


## 📌 Other strategies:

"median" – replaces missing values with the median
"most_frequent" – fills with the most common value
"constant" – fills with a fixed value