**Problem Statement**

This Boston House Price Prediction model is a regression model trained to predict the price of houses in the Boston area based on various parameters such as crime rate, number of rooms, and other relevant features. By analyzing these input variables, the model learns the relationship between them and the corresponding house prices, allowing it to make accurate predictions. This predictive capability is valuable for real estate professionals, homeowners, and potential buyers, aiding in pricing decisions, property valuation, and investment analysis in the Boston housing market.

In [None]:
# Importing Dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import metrics
from xgboost import XGBRegressor

In [None]:
# Loading the dataset as pandas DataFrame
house_dataset = pd.read_csv('/content/BostonHousing.csv')

In [None]:
# Checking the first 5 rows of the DataFrame
house_dataset.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,price
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [None]:
# Checking the number of rows and columns in the DataFrame
house_dataset.shape

(506, 14)

In [None]:
# Checkning the statistical measures
house_dataset.describe()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,price
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


In [None]:
# Checking if there are any null values
house_dataset.isnull().sum()

crim       0
zn         0
indus      0
chas       0
nox        0
rm         0
age        0
dis        0
rad        0
tax        0
ptratio    0
b          0
lstat      0
price      0
dtype: int64

Separating the features from the labels


In [None]:
x = house_dataset.drop(columns='price',axis=1)
y = house_dataset['price']

Splitting the training and testing data

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,random_state=2)

Model building and Training


XGBoost Regressor

In [None]:
model = XGBRegressor()

In [None]:
# Training the model with training data
model.fit(x_train,y_train)

In [None]:
# Training Accuracy
training_prediction = model.predict(x_train)
score_1 = metrics.r2_score(y_train,training_prediction)

In [None]:
print("Model's accuracy on training data is:", score_1)

Model's accuracy on training data is: 0.9999980039471451


In [None]:
# Testing Accuracy
testing_prediction = model.predict(x_test)
score_2 = metrics.r2_score(y_test,testing_prediction)

In [None]:
print("Model's accuracy on testing data is:", score_2)

Model's accuracy on testing data is: 0.9051721149855378


Making our own Prediction System

In [None]:
# Taking input of user for Prediction System
input = [	0.00632,	18.0,	2.31,	0,	0.538,	6.575,	65.2,	4.0900,	1,	296,	15.3,	396.90,	4.98]
input_array = np.asarray(input)
input_array_reshaped = input_array.reshape(1,-1)

In [None]:
# Calculating prediction
prediction = model.predict(input_array_reshaped)
print(prediction)

[23.99494]
