# Pakistan Housing Market — Exploratory Data Analysis

An in-depth exploration of **16,000+ property listings** from [Zameen.com](https://www.zameen.com/), Pakistan's largest real estate marketplace. This analysis examines pricing patterns, geographic trends, and property characteristics across 12 major Pakistani cities to uncover actionable insights for real estate investors and market analysts.

**Dataset:** Zameen.com Housing Prices (Kaggle)  
**Tools:** Python, pandas, matplotlib, seaborn, plotly

## 1. Setup & Data Loading

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Visualization defaults
sns.set_style('whitegrid')
sns.set_palette('Set2')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 150

pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:,.2f}'.format)

In [2]:
df = pd.read_csv('../data/raw/archive_2/zameen.csv', sep='|')
print(f'Dataset: {df.shape[0]:,} listings × {df.shape[1]} features')

Dataset: 16,044 listings × 6 features


## 2. Initial Exploration

### 2.1 Sample Listings

In [3]:
df.head(10)

Unnamed: 0,city,location,price,bedrooms,baths,size
0,Lahore,"DHA Phase 6, DHA Defence",74500000,5,6,4500.0
1,Lahore,"DHA Phase 7, DHA Defence",51500000,5,6,4500.0
2,Lahore,"Dream Gardens, Defence Road",7500000,1,1,518.0
3,Lahore,"DHA Phase 6, DHA Defence",73000000,5,6,4500.0
4,Lahore,"Bahria Town - Sector B, Bahria Town",5700000,1,1,472.0
5,Lahore,"DHA Phase 5 - Block L, DHA Phase 5",53500000,5,6,2250.0
6,Lahore,"Bahria Town - Overseas A, Bahria Town - Overse...",97500000,5,6,4500.0
7,Lahore,"Bahria Town - Jasmine Block, Bahria Town - Sec...",47000000,5,7,4500.0
8,Lahore,"Bahria Town - Sector E, Bahria Town",5000000,1,1,450.0
9,Lahore,"Raiwind Road, Lahore",8299000,1,1,630.0


### 2.2 Data Types & Memory

In [4]:
df.info()

<class 'pandas.DataFrame'>
RangeIndex: 16044 entries, 0 to 16043
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   city      16044 non-null  str    
 1   location  16044 non-null  str    
 2   price     16044 non-null  int64  
 3   bedrooms  16044 non-null  int64  
 4   baths     16044 non-null  int64  
 5   size      16044 non-null  float64
dtypes: float64(1), int64(3), str(2)
memory usage: 752.2 KB


### 2.3 Numerical Summary

In [5]:
df.describe()

Unnamed: 0,price,bedrooms,baths,size
count,16044.0,16044.0,16044.0,16044.0
mean,45711134.19,3.78,4.06,2540.64
std,85041953.69,1.99,2.14,10607.07
min,70000.0,0.0,0.0,0.0
25%,14000000.0,3.0,3.0,1125.0
50%,24000000.0,4.0,4.0,1575.0
75%,45000000.0,5.0,6.0,2700.0
max,2100000000.0,11.0,10.0,1215000.0


### 2.4 Categorical Summary

In [6]:
df.describe(include='object')

See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3.
  df.describe(include='object')


Unnamed: 0,city,location
count,16044,16044
unique,12,2038
top,Lahore,"DHA Villas, DHA Defence"
freq,2500,481


### 2.5 First Impressions

- **16,044 listings** across **12 cities** and **2,038 unique locations** — a broad snapshot of Pakistan's urban housing market
- **No missing values** in any column, though data quality issues exist (zero-size properties, extreme outliers)
- **Price range** spans 70,000 PKR to 2.1 Billion PKR — the high end likely includes commercial or incorrectly entered listings
- **Median price** (24M PKR) is roughly half the mean (45.7M PKR), indicating strong right-skew typical of real estate markets
- **Lahore dominates** with the highest listing count, followed by DHA/Bahria Town locations — Pakistan's premium housing societies
- **Size column** contains zero values and an extreme max of 1.2M sq ft — cleaning required before analysis