# 10. Mini-Project: Melbourne Housing Data Preprocessing 🏡

This notebook provides a complete, working solution for preprocessing the `melb_data.csv` dataset.

**Tasks:**
1.  Load the data.
2.  Handle inappropriate data (e.g., columns with too many missing values).
3.  Handle missing data (imputation).
4.  Handle categorical data.

In [None]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

### 1. Load the Data

In [None]:
df = pd.read_csv('melb_data.csv')
print("Dataset loaded successfully!")
print("Original shape:", df.shape)
display(df.head())

### 2. & 3. Handle Inappropriate & Missing Data

In [None]:
print("--- Handling Missing Data ---")
print("Missing values before handling:\n", df.isnull().sum().sort_values(ascending=False).head())

# Strategy: Fill numerical columns with median and categorical columns with mode.
for col in df.columns:
    if df[col].isnull().any():
        if df[col].dtype == 'object':
            df[col].fillna(df[col].mode()[0], inplace=True)
        else:
            df[col].fillna(df[col].median(), inplace=True)

print("\nTotal missing values after handling:", df.isnull().sum().sum())

### 4. Handle Categorical Data

In [None]:
print("--- Handling Categorical Data ---")
df_processed = pd.get_dummies(df, drop_first=True)
print("Applied One-Hot Encoding on categorical columns.")

### Preprocessing Complete ✅

In [None]:
print("Final shape of the processed dataset:", df_processed.shape)
print("First 5 rows of the processed dataset:")
display(df_processed.head())