## Navigating Real Estate : Boston House Price Prediction with Random Forest Regression

**Introduction:**<br>
The bustling real estate market of Boston, with its diverse neighborhoods and housing landscapes, presents a dynamic challenge for predicting house prices. In this blog, we embark on a journey into the world of data science and machine learning to tackle this challenge. Our tool of choice? The Random Forest Regression algorithm. Through a comprehensive dataset encompassing features such as crime rates, accessibility to highways, average room numbers, and more, we aim to unveil the intricate patterns that determine Boston house prices

In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt

In [2]:
import warnings
warnings.filterwarnings("ignore")

**Data Collection:**<br>
The foundation of our project rests on a dataset brimming with features that potentially influence house prices in Boston. From crime rates to the proximity to employment centers and accessibility to highways, the dataset includes diverse factors. Crucially, it is labeled with the corresponding house prices.

In [3]:
df = pd.read_csv("boston.csv")

In [4]:
df.head(5)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


**Data Preprocessing:**<br>
A clean dataset is pivotal for accurate predictions. We engage in preprocessing steps, addressing missing values, normalizing numerical features, and ensuring compatibility for the Random Forest Regression algorithm.

In [5]:
x  = pd.DataFrame(df.iloc[ : , : -1])

In [6]:
x

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.0900,1,296.0,15.3,396.90,4.98
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.90,9.14
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.90,5.33
...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0,0.573,6.593,69.1,2.4786,1,273.0,21.0,391.99,9.67
502,0.04527,0.0,11.93,0,0.573,6.120,76.7,2.2875,1,273.0,21.0,396.90,9.08
503,0.06076,0.0,11.93,0,0.573,6.976,91.0,2.1675,1,273.0,21.0,396.90,5.64
504,0.10959,0.0,11.93,0,0.573,6.794,89.3,2.3889,1,273.0,21.0,393.45,6.48


In [7]:
y = df["MEDV"]

In [16]:
type(y)

pandas.core.series.Series

**Training the Random Forest Regression Model:**<br>
The Random Forest model, an ensemble of decision trees, is employed for predicting house prices. We split the dataset into training and testing sets and train the Random Forest Regression model. The ensemble nature of Random Forests helps capture intricate patterns in the data.


In [9]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=31)

In [10]:
xtrain.shape

(404, 13)

In [11]:
ytest.shape

(102,)

In [12]:
xtest.shape

(102, 13)

In [13]:
from sklearn.ensemble import RandomForestRegressor
reg_rf = RandomForestRegressor(n_estimators=21, random_state=31)
reg_rf.fit(xtrain,ytrain)

RandomForestRegressor(n_estimators=21, random_state=31)

**Price Predictions and Evaluation:**<br>
With a trained model in place, we apply it to new data to predict Boston house prices. The model's performance is evaluated using metrics such as Mean Absolute Error and Root Mean Squared Error, providing insights into the accuracy of our predictions.

In [14]:
y_pred = reg_rf.predict(xtest)

In [15]:
y_pred

array([34.91428571, 10.27619048, 22.1       , 19.97619048, 22.3047619 ,
       23.9       , 12.5952381 , 22.35714286, 21.94285714, 20.48571429,
        9.56190476, 19.14285714, 14.18571429, 16.99047619, 18.03333333,
       21.91904762, 21.24285714, 23.08095238, 24.16190476, 25.90952381,
       19.76666667, 15.49047619, 26.06666667, 34.97619048, 21.76666667,
       18.05238095, 19.04761905, 19.28095238, 18.61904762, 26.72857143,
       24.64285714, 14.5047619 , 14.85714286, 18.98095238, 22.71904762,
       24.64285714, 33.97619048, 11.27142857, 13.21428571, 49.2047619 ,
       11.22857143, 15.40952381, 13.95238095, 30.9       , 20.45714286,
       24.57142857, 26.55238095, 19.55238095, 23.34285714, 25.04285714,
       20.24761905, 17.8952381 , 15.07142857, 32.53809524, 24.16190476,
       20.92380952, 18.66190476, 16.16190476, 10.55714286, 24.11428571,
       15.42380952, 12.53333333, 22.44761905, 30.87142857,  9.27619048,
        7.87619048, 24.5047619 , 14.62380952, 35.86666667, 21.82

In [29]:
#evaluate model

In [30]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [31]:
print("mean absolute error :", mean_absolute_error(ytest,y_pred))


mean absolute error : 2.3811391223155938


In [32]:
print("mean absolute error :", mean_squared_error(ytest,y_pred))

mean absolute error : 14.163937575030019


In [33]:
#find the score 

In [34]:
reg_rf.score(xtest,ytest)

0.815784141106844

**Conclusion:**<br>
The Boston house price prediction project utilizing Random Forest Regression embodies the synergy between data science and real estate. By deciphering intricate patterns and relationships within the data, we empower stakeholders in the real estate market with insights to make informed decisions. The journey into machine learning reflects the transformative potential of predictive analytics in shaping the future of real estate transactions and investments in dynamic markets like Boston.