objective

Analyze Bengaluru housing data to understand how location, area, and BHK affect house prices and to generate real-life insights for buyers and investors.

Data Loading & Basic Exploration

In [None]:
import numpy as np
import pandas as pd 
data=pd.read_csv(r"c:\Users\Wajiz.pk\Downloads\bengaluru_house_prices.csv")
data.head(10)
data.columns
data.dtypes
data.isnull().sum()

Insight
Initial exploration helps understand:
Dataset structure
Available columns
Data types
Data quality (missing values)
This step is crucial before starting any analysis.

Handling Missing Values


In [2]:
data["location"]=data["location"].fillna(data["location"].mode()[0])
data.isnull().sum()

area_type          0
availability       0
location           0
size              16
society         5502
total_sqft         0
bath              73
balcony          609
price              0
dtype: int64

Insight
Missing values in location were filled using mode because it is a categorical variable.
This avoids dropping large amounts of data while keeping analysis consistent.
Business Meaning:
Accurate location data is critical for pricing and demand analysis

Duplicate Rows Handling


In [3]:
data.duplicated().sum()
data=data.drop_duplicates(subset=["area_type","availability"])

Insight
Duplicate rows can inflate demand and price trends.
Removing duplicates improves reliability and avoids biased insights.

Data Type Conversion

In [4]:
def convert_sqr(value):
    try:
        if "-" in value:
            new=value.split("-")
            if len(new)==2:
                return (float(new[0])+float(new[1]))/2
        return float(value)   
    except:
        return None

data["total_sqft"]=data["total_sqft"].apply(convert_sqr)


Insight
Area values like "2000-3000" were converted to numeric by taking the average.
This enables proper statistical analysis.
Business Meaning:
Without numeric area values, price-per-area and size comparisons are impossible.

Descriptive Statistics

In [5]:
data[["price","total_sqft"]].mean()
data[["price","total_sqft"]].min()
data[["price","total_sqft"]].max()
data[["price","total_sqft"]].median()
data[["price","total_sqft"]].std()


price          184.449846
total_sqft    2569.653391
dtype: float64

Insight
Dataset includes both budget and luxury properties.
Large standard deviation shows price and area variability in the market.

BHK Extraction & Distribution

In [6]:
data["Num_Bhk"]=data["size"].str.extract(r"(\d+)")
data=data.dropna(subset=["Num_Bhk"])
data["Num_Bhk"]=data["Num_Bhk"].astype(int)

data["Num_Bhk"].value_counts()

Num_Bhk
3    58
2    54
4    23
1    15
8     1
Name: count, dtype: int64

Insight
BHK values were extracted from text format.
2 BHK emerged as the most common configuration.
Business Meaning:
Shows buyer preference and demand trend.

Filtering Analysis
3 BHK and Above

In [8]:

filtered=len(data.loc[data["Num_Bhk"]>=3])
total=len(data)
percentage=(filtered/total)*100
round(percentage,2)

54.3

Insight:
Shows how much of the market is focused on larger families or premium buyers.

Price > 1 Crore

In [9]:

data["price"]=data["price"].astype(int)
data.loc[data["price"]>100]


Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price,Num_Bhk
1,Plot Area,Ready To Move,Chikka Tirupathi,4 Bedroom,Theanmp,2600.0,5.0,3.0,120,4
6,Super built-up Area,18-May,Old Airport Road,4 BHK,Jaades,2732.0,4.0,,204,4
56,Built-up Area,20-Feb,Devanahalli,4 Bedroom,BrereAt,3210.0,,,192,4
81,Built-up Area,18-Oct,Hennur Road,4 Bedroom,Gollela,3203.5,,,224,4
183,Super built-up Area,21-Jun,Vijayanagar,3 BHK,Saahaat,1704.0,3.0,2.0,110,3
190,Built-up Area,18-Dec,Kanakpura Road,4 Bedroom,,2250.0,4.0,1.0,110,4
210,Super built-up Area,20-May,1st Block Jayanagar,4 BHK,,2850.0,4.0,1.0,428,4
248,Plot Area,17-May,Meenakunte,3 Bedroom,Sreat R,4050.0,3.0,2.0,280,3
520,Built-up Area,19-Jul,"Yemlur, Old Airport Road,",3 BHK,,1595.0,3.0,2.0,115,3
524,Super built-up Area,17-Dec,Jakkur,4 BHK,Lecco C,5230.0,6.0,1.0,465,4


Insight:
Identifies luxury housing segment

Whitefield – 2 BHK Houses

In [10]:

data.loc[(data["location"]=="Whitefield") & (data["Num_Bhk"]==2)]

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price,Num_Bhk
47,Super built-up Area,20-Sep,Whitefield,2 BHK,Goted U,1459.0,2.0,1.0,94,2
1072,Plot Area,18-Sep,Whitefield,2 Bedroom,,1200.0,2.0,1.0,45,2


Insight:
Helps buyers targeting a specific location and configuration.

Area < 800 sqft

In [11]:
len(data.loc[data["total_sqft"]<800])


19

Insight:
Identifies compact or affordable housing units

Grouping & Aggregation

In [12]:

house_price_average=data.groupby("location")["price"].mean().reset_index()
sort_avg=house_price_average.sort_values(by="price",ascending=False)
sort_avg.head(5)

Unnamed: 0,location,price
25,Dodsworth Layout,2100.0
54,Langford Gardens,650.0
44,Jakkur,357.0
32,HBR Layout,320.0
59,Meenakunte,280.0


Insight
Top 5 most expensive locations highlight premium real estate zone

Sorting Analysis

In [None]:
data.sort_values(by="price",ascending=False)[["location","price"]].head(10)

Insight:
Shows top luxury properties and their locations.

Statistical Analysis

In [None]:

data["price"].min()
data["price"].max()
data["price"].std()
data["Num_Bhk"].var()
data[["price","Num_Bhk"]].corr()


Insight
Positive correlation exists between BHK and price.
However, location also plays a major role.

Real-Life Scenario


data.loc[(data["price"]<=75)&(data["Num_Bhk"]==2)]["location"].head(3)

Insight
For a ₹75 lakh budget and 2 BHK requirement, suitable locations were identified — useful for buyer recommendations.