# House Pricing Analysis & Prediction

# Team Members

| Name  | Student ID | Steps Performed |
| :- | -: | :- |
| Sai Kumar Adulla   | C0863741 | Data Selection, GitHub Repository and Cloud Deployment |
| Jenny Jitender Joshi | C0862907 | Data Cleaning, Data Exploration and Flask Web Application |
| Kanika Kataria  | C0866652 | Data Exploration, Feature Engineering and Flask Web Application |
| Christin Paul | C0863254 | Feature Engineering, Model Selection and GitHub Repository|
| Abbas Ismail | C0867092 | Hyperparameter Tuning, Pickle Files and Cloud Deployment |


## Importing All Dependencies & Dataset

In [26]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LinearRegression,Ridge,Lasso
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.base import clone
import warnings
warnings.filterwarnings('ignore')

# Why we selected this particular dataset?

- **Real-world Relevance:**
  - Focus on Bangalore real estate market.
  - Practical applications for investors and homebuyers.
<p></p>

- **Data Size and Diversity:**
  - 13,320 records with diverse features.
  - Includes area type, availability, location, size, society, etc.
<p></p>

- **Prediction Challenge:**
  - Complex relationships influence property prices.
  - Challenging problem for predictive modeling.
<p></p>

- **Data Quality Issues:**
  - Null values in multiple columns.
  - Reflects real-world data scenarios.
<p></p>

- **Economic Indicators:**
  - Property prices as indicators of economic health.
  - Insights into regional economic trends.
<p></p>

- **Practical Significance:**
  - Impacts real estate developers, investors, individuals.
  - Decision-making aid in property transactions.
<p></p>

- **Interdisciplinary Aspect:**
  - Combines data analysis, cleaning, and predictive modeling.
  - Applies skills in statistics, machine learning, and domain knowledge.
<p></p>

- **Data Types:**
  - Mix of categorical and numerical data.
  - Diverse techniques for preprocessing and modeling.
<p></p>

- **Data Exploration Opportunities:**
  - Variety of features for exploratory data analysis.
  - Uncover patterns, trends, and correlations.
<p></p>

- **Potential Business Impact:**
  - Predicting property prices influences decisions.
  - Significant impact on business and investment strategies.


In [27]:
house=pd.read_csv('House Prices.csv')

# Output a message indicating the dataset for house pricing
print("House Pricing Dataset:")

# Display the contents of the dataset using the 'display' function
display(house)

House Pricing Dataset:


Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
0,Super built-up Area,19-Dec,Electronic City Phase II,2 BHK,Coomee,1056,2.0,1.0,39.07
1,Plot Area,Ready To Move,Chikka Tirupathi,4 Bedroom,Theanmp,2600,5.0,3.0,120.00
2,Built-up Area,Ready To Move,Uttarahalli,3 BHK,,1440,2.0,3.0,62.00
3,Super built-up Area,Ready To Move,Lingadheeranahalli,3 BHK,Soiewre,1521,3.0,1.0,95.00
4,Super built-up Area,Ready To Move,Kothanur,2 BHK,,1200,2.0,1.0,51.00
...,...,...,...,...,...,...,...,...,...
13315,Built-up Area,Ready To Move,Whitefield,5 Bedroom,ArsiaEx,3453,4.0,0.0,231.00
13316,Super built-up Area,Ready To Move,Richards Town,4 BHK,,3600,5.0,,400.00
13317,Built-up Area,Ready To Move,Raja Rajeshwari Nagar,2 BHK,Mahla T,1141,2.0,1.0,60.00
13318,Super built-up Area,18-Jun,Padmanabhanagar,4 BHK,SollyCl,4689,4.0,1.0,488.00


# Reviewing the Dataset

In [28]:
# Print a message displaying the counts of null values
print("Null Value Counts:")

# Use the 'isnull().sum()' method to show the sum of null values for each column in the 'house' dataset
display(house.isnull().sum())

Null Value Counts:


area_type          0
availability       0
location           1
size              16
society         5502
total_sqft         0
bath              73
balcony          609
price              0
dtype: int64

In [29]:
# Print a message providing a description of the dataset
print("Dataset Description:")

# Use the 'describe()' method to display the statistical summary of the 'house' dataset
display(house.describe())

Dataset Description:


Unnamed: 0,bath,balcony,price
count,13247.0,12711.0,13320.0
mean,2.69261,1.584376,112.565627
std,1.341458,0.817263,148.971674
min,1.0,0.0,8.0
25%,2.0,1.0,50.0
50%,2.0,2.0,72.0
75%,3.0,2.0,120.0
max,40.0,3.0,3600.0


In [30]:
# Print a message conveying dataset information
print("Dataset Information:")

# Utilize the 'info()' method to showcase the information summary of the 'house' dataset
display(house.info())

Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13320 entries, 0 to 13319
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   area_type     13320 non-null  object 
 1   availability  13320 non-null  object 
 2   location      13319 non-null  object 
 3   size          13304 non-null  object 
 4   society       7818 non-null   object 
 5   total_sqft    13320 non-null  object 
 6   bath          13247 non-null  float64
 7   balcony       12711 non-null  float64
 8   price         13320 non-null  float64
dtypes: float64(3), object(6)
memory usage: 936.7+ KB


None

In [31]:
# Displaying the value counts for each column in the 'house' dataset
print("Checking the Value Counts of Each Column:")
for column in house.columns:
    print(house[column].value_counts())
    print('*' * 30)

Checking the Value Counts of Each Column:
area_type
Super built-up  Area    8790
Built-up  Area          2418
Plot  Area              2025
Carpet  Area              87
Name: count, dtype: int64
******************************
availability
Ready To Move    10581
18-Dec             307
18-May             295
18-Apr             271
18-Aug             200
                 ...  
15-Aug               1
17-Jan               1
16-Nov               1
16-Jan               1
14-Jul               1
Name: count, Length: 81, dtype: int64
******************************
location
Whitefield                        540
Sarjapur  Road                    399
Electronic City                   302
Kanakpura Road                    273
Thanisandra                       234
                                 ... 
Bapuji Layout                       1
1st Stage Radha Krishna Layout      1
BEML Layout 5th stage               1
singapura paradise                  1
Abshot Layout                       1
Name: count, 