# House Price Analysis 

## Import Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Matplotlib is building the font cache; this may take a moment.


**Task:** Import pandas, numpy, matplotlib, and seaborn.

**Reflection:** Why do we need seaborn and matplotlib for data analysis?

## Load the Cleaned Dataset

In [4]:
# Load the cleaned dataset (CSV exported from previous notebook)
data = pd.read_csv('cleaned_house_price_data.csv')

In [5]:
# show the first 5 rows
data.head(5)

Unnamed: 0,Index,Title,Description,Amount(in rupees),Price (in rupees),location,Carpet Area,Status,Floor,Transaction,Furnishing,facing,overlooking,Society,Bathroom,Balcony,Car Parking,Ownership,Super Area
0,181234,3 BHK Ready to Occupy Flat for sale in Nest Ha...,Have a look at this immaculate 3 BHK flat for ...,1400.30 Cr,6700000.0,vadodara,1252.0,Ready to Move,5 out of 9,New Property,Unfurnished,,,Nest Harmony,3,5.0,1 Covered,,
1,180679,2 BHK Ready to Occupy Flat for sale Fatehpura,This attractive 2 BHK apartment can be found f...,45 Lac,4500000.0,udaipur,1000.0,Ready to Move,4 out of 4,Resale,Semi-Furnished,,,,2,,,,1 sqft
2,147912,5 BHK Ready to Occupy Flat for sale Dayal Bagh,Discover this immaculate 5 BHK flat for sale a...,1.35 Cr,4500000.0,agra,3.0,Ready to Move,1 out of 2,Resale,Furnished,,Main Road,,5,1.0,,,
3,176536,3 BHK Ready to Occupy Flat for sale Hill Cart ...,This ready to move-in 3 BHK flat is available ...,510.04 Cr,4041600.0,siliguri,970.0,Ready to Move,1 out of 4,New Property,Unfurnished,North,Main Road,,2,1.0,1 Covered,Freehold,
4,174894,2 BHK Ready to Occupy Flat for sale Kachna Road,"Kachna Road, Raipur has an attractive 2 BHK Fl...",396.75 Cr,3450000.0,raipur,920.0,Ready to Move,2 out of 2,New Property,Unfurnished,East,"Garden/Park, Pool, Main Road",,2,2.0,"1 Covered,",Freehold,


**Task:** Load the cleaned dataset and explore it.

**Reflection:** Are all columns clean? Anything you notice about the data?

## Overview of Dataset

In [10]:
# Shape, column names, and missing values
data.shape
data.columns
data.isnull().sum()

Index                     0
Title                     0
Description            3023
Amount(in rupees)         0
Price (in rupees)         0
location                  0
Carpet Area               0
Status                    0
Floor                  7077
Transaction              83
Furnishing             2897
facing                70233
overlooking           81436
Society              109678
Bathroom                828
Balcony               48935
Car Parking          103357
Ownership             65517
Super Area           107685
dtype: int64

**Task:** Understand dataset structure and completeness.

**Reflection:** Are there still any missing values? Which columns are most important for analysis?

## Basic Aggregations

In [12]:
# Average Price
data['Price (in rupees)'].mean()

np.float64(7583.771884897507)

In [14]:
# Min and Max Carpet Area
min_carpet_area = data['Carpet Area'].min()
max_carpet_area = data['Carpet Area'].max()
print(f"Minimum Carpet Area: {min_carpet_area} sqft")
print(f"Maximum Carpet Area: {max_carpet_area} sqft")

Minimum Carpet Area: 1.0 sqft
Maximum Carpet Area: 709222.0 sqft


In [15]:
# Average Price by Status
average_price_by_status = data.groupby('Status')['Price (in rupees)'].mean()
print("Average Price by Status:" , average_price_by_status)

Average Price by Status: Status
Ready to Move    7589.451646
Unknown          5857.530793
Name: Price (in rupees), dtype: float64


**Task:** Calculate basic statistics and groupby analysis (You can Try anything you want)

**Reflection:** What patterns do you notice in Price and Carpet Area? How does Status affect average price?

## Visualizations

### Price vs Carpet Area

**Task:** Visualize relationship between Carpet Area and Price.

**Reflection:** Do bigger flats always cost more? Are there any outliers?

### Average Price by Status

In [16]:
# Average Price by Status
average_price_by_status = data.groupby('Status')['Price (in rupees)'].mean()
print("Average Price by Status:" , average_price_by_status)

Average Price by Status: Status
Ready to Move    7589.451646
Unknown          5857.530793
Name: Price (in rupees), dtype: float64


**Task:** Compare average prices by Status.

**Reflection:** Which type of Status is most expensive? Least expensive? Any surprises?

### Flats Count per Location

In [18]:
# Display the Flats Count Per Location
flats_count_per_location = data['location'].value_counts()
print("Flats Count Per Location:" , flats_count_per_location)

Flats Count Per Location: location
new-delhi      27599
bangalore      24030
kolkata        22380
gurgaon        20070
ahmedabad      12750
               ...  
ahmadnagar        30
pondicherry       30
madurai           30
palakkad          30
navsari           30
Name: count, Length: 81, dtype: int64


**Task:** See distribution of flats across locations.

**Reflection:** Which locations have the most flats? The least? Why might that be?

### Top Expensive Flats

In [19]:
# Identify the top 5 most expensive flats in the dataset.
top_5_expensive_flats = data.nlargest(5, 'Price (in rupees)')

# Display their prices and locations.
print(top_5_expensive_flats[['Price (in rupees)', 'location']])

# Create a visualization that helps compare these flats clearly.


   Price (in rupees)  location
0          6700000.0  vadodara
1          4500000.0   udaipur
2          4500000.0      agra
3          4041600.0  siliguri
4          3450000.0    raipur



***Hint:** Think about which chart type best shows comparison between prices.*

### Large Flats Analysis

In [20]:
# Filter flats with Carpet Area greater than 700 sqft.
flats_large_carpet_area = data[data['Carpet Area'] > 700]

# Analyze how these flats are distributed in terms of price or location.
price_distribution_large_carpet_area = flats_large_carpet_area['Price (in rupees)']
location_distribution_large_carpet_area = flats_large_carpet_area['location']
print("Price Distribution for Flats with Carpet Area > 700 sqft:")
print(price_distribution_large_carpet_area.describe())
print("\nLocation Distribution for Flats with Carpet Area > 700 sqft:")
print(location_distribution_large_carpet_area.value_counts())

# Visualize your findings using an appropriate chart.


Price Distribution for Flats with Carpet Area > 700 sqft:
count    1.632400e+05
mean     7.667887e+03
std      2.539655e+04
min      0.000000e+00
25%      4.521000e+03
50%      6.549000e+03
75%      9.000000e+03
max      6.700000e+06
Name: Price (in rupees), dtype: float64

Location Distribution for Flats with Carpet Area > 700 sqft:
location
bangalore     23916
new-delhi     20423
kolkata       18752
gurgaon       18147
hyderabad     11616
              ...  
nellore          29
solapur          28
navsari          26
madurai          25
ahmadnagar       23
Name: count, Length: 81, dtype: int64


***Hint:** Consider whether you want to show distribution, comparison, or trends.*

### Mean vs Median Price

In [21]:
# Calculate the mean price and median price of flats.
mean_price = data['Price (in rupees)'].mean()
median_price = data['Price (in rupees)'].median()
print(f"Mean Price: {mean_price} rupees")
print(f"Median Price: {median_price} rupees")
# Compare the two values.
if(mean_price > median_price):
    print("The mean price is higher than the median price, indicating a right-skewed distribution.")

# Visualize the comparison in a clear and simple way.


Mean Price: 7583.771884897507 rupees
Median Price: 6499.0 rupees
The mean price is higher than the median price, indicating a right-skewed distribution.


**Reflection:** What does the difference between mean and median tell you about the data?

### Floor vs Price Relationship

In [23]:
# Explore the relationship between Floor number and Price.
floor_price_relationship = data.groupby('Floor')['Price (in rupees)'].mean()
print("Average Price by Floor Number:" , floor_price_relationship)

# Create a visualization to represent this relationship.

Average Price by Floor Number: Floor
1                           6231.461538
1 out of 1                 11672.562154
1 out of 10                 8167.762932
1 out of 11                 5579.812577
1 out of 12                 5845.949060
                               ...     
Upper Basement out of 5     5201.254053
Upper Basement out of 6     8590.000000
Upper Basement out of 7     9794.692308
Upper Basement out of 8     4894.800000
Upper Basement out of 9    10521.000000
Name: Price (in rupees), Length: 947, dtype: float64


**Task :** Describe any patterns or trends you observe.

***Hint:** Think about which plots are best for showing relationships between two numeric variables.*

### Average Carpet Area by Location

In [24]:
# Calculate the average Carpet Area for each location.
average_carpet_area_by_location = data.groupby('location')['Carpet Area'].mean()
print("Average Carpet Area by Location:" , average_carpet_area_by_location)

# Visualize the results to compare locations.

Average Carpet Area by Location: location
agra             1972.780000
ahmadnagar        935.200000
ahmedabad         984.767765
allahabad         960.827778
aurangabad       1027.766667
                    ...     
varanasi          993.059259
vijayawada       1060.454386
visakhapatnam    1109.985556
vrindavan         789.166667
zirakpur         1175.022222
Name: Carpet Area, Length: 81, dtype: float64


***Note :** Focus on clarity and readability of your chart.*

**Reflection:** Which locations tend to have larger flats on average?

## Summary & Insights

For each task above, write a short summary including:

- What you did

- Which visualization you chose and why

- One or two insights you discovered from the data

### **Final Note** 

This dataset is part of a Mini Data Analysis Project.

You are encouraged to:

- Add your own questions based on the data

- Create additional visualizations

- Explore relationships that you personally find interesting

***⚠️ Bonus:** Adding meaningful questions and visualizations will be considered in your task evaluation.*