### Garbage In, Garbage Out (GIGO): Cleaning Missing Data
**Description**: Load a dataset (e.g., Titanic dataset) and identify missing values. Use
appropriate techniques to handle these missing values.

In [2]:
# Write your code from here

import pandas as pd

# Step 1: Sample Titanic-like dataset with intentional missing values
data = {
    "survived": [1, 0, 1, 0, 1],
    "pclass": [1, 3, 2, 3, 1],
    "sex": ["female", "male", "female", "male", "female"],
    "age": [29.0, None, 24.0, 35.0, None],
    "embarked": ["C", "S", "Q", None, "C"],
    "fare": [72.0, 7.25, 13.0, 8.05, None],
    "deck": ["B", None, "E", None, None],  # too sparse
    "embark_town": ["Cherbourg", "Southampton", "Queenstown", None, "Cherbourg"]
}

df = pd.DataFrame(data)

print("🔍 Original data:")
print(df)

# Step 2: Check missing values
print("\n❓ Missing values:")
print(df.isnull().sum())

# Step 3: Handle missing values
df.drop(columns=["deck"], inplace=True)  # too many missing values

# Impute numerical: age (median), fare (mean)
df["age"].fillna(df["age"].median(), inplace=True)
df["fare"].fillna(df["fare"].mean(), inplace=True)

# Impute categorical: embarked, embark_town
df["embarked"].fillna(df["embarked"].mode()[0], inplace=True)
df["embark_town"].fillna(df["embark_town"].mode()[0], inplace=True)

# Step 4: Confirm data is clean
print("\n✅ Cleaned data:")
print(df)

print("\n✅ Remaining missing values (should be 0):")
print(df.isnull().sum().sum())


🔍 Original data:
   survived  pclass     sex   age embarked   fare  deck  embark_town
0         1       1  female  29.0        C  72.00     B    Cherbourg
1         0       3    male   NaN        S   7.25  None  Southampton
2         1       2  female  24.0        Q  13.00     E   Queenstown
3         0       3    male  35.0     None   8.05  None         None
4         1       1  female   NaN        C    NaN  None    Cherbourg

❓ Missing values:
survived       0
pclass         0
sex            0
age            2
embarked       1
fare           1
deck           3
embark_town    1
dtype: int64

✅ Cleaned data:
   survived  pclass     sex   age embarked    fare  embark_town
0         1       1  female  29.0        C  72.000    Cherbourg
1         0       3    male  29.0        S   7.250  Southampton
2         1       2  female  24.0        Q  13.000   Queenstown
3         0       3    male  35.0        C   8.050    Cherbourg
4         1       1  female  29.0        C  25.075    Cherbourg
