## Bangalore House Price Prediction with K-Nearest Neighbors (KNN) Regression

### Introduction:

In the bustling real estate landscape of Bangalore, predicting house prices is both a challenge and an opportunity. Leveraging the power of machine learning, particularly the K-Nearest Neighbors (KNN) regression algorithm, we embark on a journey to unravel the intricacies of house price dynamics. Our dataset encompasses crucial features like area type, availability, location, size, society, total square footage, bathrooms, balconies, and, of course, the house prices

In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")



**Data Collection:**<br>
The foundation of our project is a comprehensive dataset capturing diverse features influencing house prices in Bangalore. Features range from categorical aspects like area type and availability to numerical metrics like total square footage, bathrooms, and balconies. The dataset is labeled with the corresponding house prices.

In [2]:
df = pd.read_csv("Bengaluru_House_Data.csv")

In [3]:
df.head(5)

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
0,Super built-up Area,19-Dec,Electronic City Phase II,2 BHK,Coomee,1056,2.0,1.0,39.07
1,Plot Area,Ready To Move,Chikka Tirupathi,4 Bedroom,Theanmp,2600,5.0,3.0,120.0
2,Built-up Area,Ready To Move,Uttarahalli,3 BHK,,1440,2.0,3.0,62.0
3,Super built-up Area,Ready To Move,Lingadheeranahalli,3 BHK,Soiewre,1521,3.0,1.0,95.0
4,Super built-up Area,Ready To Move,Kothanur,2 BHK,,1200,2.0,1.0,51.0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13320 entries, 0 to 13319
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   area_type     13320 non-null  object 
 1   availability  13320 non-null  object 
 2   location      13319 non-null  object 
 3   size          13304 non-null  object 
 4   society       7818 non-null   object 
 5   total_sqft    13320 non-null  object 
 6   bath          13247 non-null  float64
 7   balcony       12711 non-null  float64
 8   price         13320 non-null  float64
dtypes: float64(3), object(6)
memory usage: 936.7+ KB


In [5]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
bath,13247.0,2.69261,1.341458,1.0,2.0,2.0,3.0,40.0
balcony,12711.0,1.584376,0.817263,0.0,1.0,2.0,2.0,3.0
price,13320.0,112.565627,148.971674,8.0,50.0,72.0,120.0,3600.0


**Data Preprocessing:**<br>
Cleaning and preprocessing the data are imperative to ensure its readiness for KNN regression. Handling missing values, encoding categorical variables, and normalizing numerical features prepare the dataset for accurate predictions.

In [6]:
df.isna().sum()

area_type          0
availability       0
location           1
size              16
society         5502
total_sqft         0
bath              73
balcony          609
price              0
dtype: int64

In [7]:
import methods
cat, con = methods.catconsep(df)

In [8]:
cat

['area_type', 'availability', 'location', 'size', 'society', 'total_sqft']

In [9]:
con

['bath', 'balcony', 'price']

In [10]:
from methods import replacer
replacer(df)

In [11]:
df.isna().sum()

area_type          0
availability       0
location           1
size              16
society         5502
total_sqft         0
bath               0
balcony            0
price              0
dtype: int64

In [12]:
X = df.drop(labels= ['price','society','size'], axis=1)
y = df['price']

In [13]:
X

Unnamed: 0,area_type,availability,location,total_sqft,bath,balcony
0,Super built-up Area,19-Dec,Electronic City Phase II,1056,2.0,1.000000
1,Plot Area,Ready To Move,Chikka Tirupathi,2600,5.0,3.000000
2,Built-up Area,Ready To Move,Uttarahalli,1440,2.0,3.000000
3,Super built-up Area,Ready To Move,Lingadheeranahalli,1521,3.0,1.000000
4,Super built-up Area,Ready To Move,Kothanur,1200,2.0,1.000000
...,...,...,...,...,...,...
13315,Built-up Area,Ready To Move,Whitefield,3453,4.0,0.000000
13316,Super built-up Area,Ready To Move,Richards Town,3600,5.0,1.584376
13317,Built-up Area,Ready To Move,Raja Rajeshwari Nagar,1141,2.0,1.000000
13318,Super built-up Area,18-Jun,Padmanabhanagar,4689,4.0,1.000000


In [14]:
from methods import preprocessing
Xnew = preprocessing(X)
Xnew

Unnamed: 0,bath,balcony,area_type_Built-up Area,area_type_Carpet Area,area_type_Plot Area,area_type_Super built-up Area,availability_14-Jul,availability_14-Nov,availability_15-Aug,availability_15-Dec,...,total_sqft_990,total_sqft_991,total_sqft_992,total_sqft_993,total_sqft_994,total_sqft_995,total_sqft_996,total_sqft_997,total_sqft_998,total_sqft_999
0,-0.517751,-7.319972e-01,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1.724859,1.773231e+00,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,-0.517751,1.773231e+00,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0.229786,-7.319972e-01,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,-0.517751,-7.319972e-01,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13315,0.977323,-1.984611e+00,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13316,1.724859,-2.781362e-16,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13317,-0.517751,-7.319972e-01,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13318,0.977323,-7.319972e-01,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### KNN Regression Model:
**Training the KNN Regression Model:**<br>
Split the dataset into training and testing sets. Train the KNN regression model, which considers the proximity of neighboring data points to predict house prices. The choice of an optimal 'k' (number of neighbors) is crucial for the model's accuracy.


In [15]:
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest = train_test_split(Xnew, y, test_size=0.2, random_state=31)

In [16]:
xtrain.shape

(10656, 3509)

In [17]:
ytrain.shape

(10656,)

In [18]:
xtest.shape

(2664, 3509)

In [19]:
ytest.shape

(2664,)

In [20]:
from sklearn.neighbors import KNeighborsRegressor

knn_res = KNeighborsRegressor()
knn_res.fit(xtrain,ytrain)

KNeighborsRegressor()

In [21]:
knn_res.score(xtest,ytest)

0.4270419826886519

In [22]:
knn_res.score(xtrain,ytrain)

0.5789438493502366

In [23]:
from sklearn.neighbors import KNeighborsRegressor

knn_res = KNeighborsRegressor(n_neighbors=6)
knn_res.fit(xtrain,ytrain)
knn_res.score(xtest,ytest)


0.4489336076628193

In [24]:
knn_res.score(xtrain,ytrain)

0.5425268904554567

**Price Predictions and Evaluation:**<br>
Apply the trained KNN regression model to new data to predict house prices in Bangalore. Evaluate the model's performance using metrics like Mean Absolute Error and Root Mean Squared Error to quantify the accuracy of the predictions.

In [25]:
y_pred = knn_res.predict(xtest)
y_pred

array([ 76.93333333, 337.        ,  85.48916667, ...,  99.5       ,
       114.83333333,  62.91833333])

In [26]:
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
r2_score(ytest, y_pred)

0.4489336076628193

In [27]:
mean_absolute_error(ytest, y_pred)

40.11484484484484

In [28]:
mean_squared_error(ytest,y_pred)

8436.776271074199

**Conclusion:**<br>
The Bangalore house price prediction project utilizing KNN regression highlights the intersection of technology and real estate. As we navigate through the intricate landscape of features influencing house prices, the KNN algorithm serves as a powerful tool for making accurate predictions. By understanding the dynamics of the real estate market through data-driven insights, we *empower both buyers and sellers to make informed decisions.* This journey into machine learning showcases the transformative potential of predictive analytics in shaping the future of real estate transactions and investments.