# HOUSE PRICE PREDICTION PROJECT

![image.png](attachment:5bbb13bf-7716-490a-b45d-bd74b8511577.png)

**INTRODUCTION ABOUT THE PROJECT**

Welcome to the House Price Prediction Challenge, you will test your regression skills by designing an algorithm to accurately predict the house prices in India. Accurately predicting house prices can be a daunting task. The buyers are just not concerned about the size(square feet) of the house and there are various other factors that play a key role to decide the price of a house/property. It can be extremely difficult to figure out the right set of attributes that are contributing to understanding the buyer's behavior as such. This dataset has been collected across various property aggregators across India. In this competition, provided the 12 influencing factors your role as a data scientist is to predict the prices as accurately as possible.

Also, in this competition, you will get a lot of room for feature engineering and mastering advanced regression techniques such as Random Forest, Deep Neural Nets, and various other ensembling techniques.

**IMPORTING ALL LIBRARIES**

In [1]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import root_mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

**LOADING THE DATA**

In [4]:
csv_url1 = r"C:\Users\ASHUTOSH YADAV\OneDrive\Desktop\DATASET\msc_training_dataset.csv"
csv_url2 = r"C:\Users\ASHUTOSH YADAV\OneDrive\Desktop\DATASET\msc_testing_dataset.csv"

train = pd.read_csv(csv_url1)
test = pd.read_csv(csv_url2)

In [None]:
train

In [None]:
test

In [None]:
train.head()

In [None]:
train.shape

In [None]:
train.columns

In [None]:
# check null value from each column
train.info()

In [None]:
# check the data type of column
train.dtypes

In [None]:
# check the uniqueness of data
train.nunique()

In [None]:
# checking the duplicate values in column
train.duplicated().sum()

In [None]:
# how to delete duplicate rows from table
train.drop_duplicates(inplace=True)

In [None]:
train.duplicated().sum()

In [None]:
train.shape

In [None]:
# show complete info of data set
train.info()

In [None]:
# describe the data, describe the statistics, five-number summary
train.describe()

In [None]:
train.head()

In [None]:
train.sample(2)

In [None]:
pie = train.groupby('bathroom')['price'].mean().reset_index()
pie

# EDA (Exploratory Data Analysis)

In [None]:
sns.set_style('dark')

plt.figure(figsize=(10,6))
plt.subplot(1,2,1)
sns.barplot(data=train,x='room',y='price',palette='rainbow')
plt.title('AVERAGE PRICE BY ROOM',fontweight='bold')

plt.subplot(1,2,2)
plt.pie(pie['price'],labels=pie['bathroom'],autopct="%0.2f%%")
plt.title('AVERAGE PRICE BY BATHROOM')

plt.show()

**INSIGHTS:** 
1. with the help of the column chart we can see the average maximum price of a house dependency on number of rooms
2. 

In [None]:
plt.figure(figsize=(10,5))
sns.histplot(data=train,x='price',kde=1,color='red')
plt.title('Check The Frequency Of Price',fontweight='bold')
plt.show()

In [None]:
plt.figure(figsize=(10,5))
sns.histplot(data=train,x='price',kde=1,hue='kitchen',color='viridis')
plt.title('Check The Frequency Of Price',fontweight='bold')
plt.show()

In [None]:
train.head()

In [None]:
train.nunique().reset_index().T

In [None]:
plt.figure(figsize=(36,27))
c = 1
for i in train:
    if i in ('bathroom','kitchen','price'):
        continue
    else:
        plt.subplot(3,3,c)
        sns.histplot(data=train,x='price',hue=i,kde=1)
        plt.title(f'Frequnecy of Price By {i}',fontweight='bold')
        c += 1
plt.show()

In [None]:
train.shape

In [None]:
train.head(3)

In [None]:
for i in train:
    if i == 'price':
        continue
    else:
        print('-'*40)
        val =  train[i].value_counts().reset_index()
        print(val)
        print('-'*40)

In [None]:
train.head()

In [None]:
train.corr()

In [None]:
plt.figure(figsize=(10,5))
sns.heatmap(train.corr(),annot=True)
plt.show()

# linear regression

In [None]:
train.head()

In [None]:
x_train = train.drop(columns = 'price')
y_train = train[['price']]

In [None]:
x_train

In [None]:
y_train

In [None]:
x_train.head()

In [None]:
y_train.head()

In [None]:
test.head()

In [None]:
x_test =  test.drop(columns = 'price')
y_test = test[['price']]

In [None]:
x_test.head()

In [None]:
y_test.head()

# BUILDING THE MACHINE LEARNING ALGORITHMS

In [None]:
linear = LinearRegression()
linear.fit(x_train,y_train)

In [None]:
ac = linear.score(x_train,y_train)
print(f'Accuracy Of Training Dataset : {ac}')

In [None]:
ac2 = linear.score(x_test,y_test)
print(f'Accuracy Of Testing Dataset : {ac2}')

In [None]:
pred = linear.predict(x_test)
pred.T