# Task 2: Data Cleaning & Missing Value Handling
**Dataset:** House Prices

## Load Dataset

In [1]:

import pandas as pd
df = pd.read_csv("house_prices.csv")
df.head()


Unnamed: 0.1,Unnamed: 0,property_type,price,location,city,baths,purpose,bedrooms,Area_in_Marla
0,0,Flat,10000000,G-10,Islamabad,2,For Sale,2,4.0
1,1,Flat,6900000,E-11,Islamabad,3,For Sale,3,5.6
2,2,House,16500000,G-15,Islamabad,6,For Sale,5,8.0
3,3,House,43500000,Bani Gala,Islamabad,4,For Sale,4,40.0
4,4,House,7000000,DHA Defence,Islamabad,3,For Sale,3,8.0


## Identify Missing Values

In [2]:
df.isnull().sum()

Unnamed: 0       0
property_type    0
price            0
location         0
city             0
baths            0
purpose          0
bedrooms         0
Area_in_Marla    0
dtype: int64

## Separate Numerical & Categorical Columns

In [3]:

num_cols = df.select_dtypes(include=["int64", "float64"]).columns
cat_cols = df.select_dtypes(include=["object"]).columns
num_cols, cat_cols


(Index(['Unnamed: 0', 'price', 'baths', 'bedrooms', 'Area_in_Marla'], dtype='object'),
 Index(['property_type', 'location', 'city', 'purpose'], dtype='object'))

## Handle Missing Values

In [4]:

# Numerical: Median Imputation
for col in num_cols:
    if df[col].isnull().any():
        df[col] = df[col].fillna(df[col].median())

# Categorical: Mode Imputation
for col in cat_cols:
    if df[col].isnull().any():
        df[col] = df[col].fillna(df[col].mode()[0])


## Validate After Cleaning

In [5]:
df.isnull().sum()

Unnamed: 0       0
property_type    0
price            0
location         0
city             0
baths            0
purpose          0
bedrooms         0
Area_in_Marla    0
dtype: int64

## Save Cleaned Dataset

In [7]:
df.to_csv("House_Prices_Cleaned_Task2.csv", index=False)


## Summary
- Missing values handled using median (numerical) and mode (categorical)
- Columns with very high missing values removed
- Dataset validated and ready for ML
