# **House Price Prediction**

Build a machine learning model to predict the median house prices based on different independent variables.

There are 14 attributes in each case of the dataset. They are:

- CRIM - per capita crime rate by town
- ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS - proportion of non-retail business acres per town.
- CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
- NOX - nitric oxides concentration (parts per 10 million)
- RM - average number of rooms per dwelling
- AGE - proportion of owner-occupied units built prior to 1940
- DIS - weighted distances to five Boston employment centres
- RAD - index of accessibility to radial highways
- TAX - full-value property-tax rate per dollar 10,000
- PTRATIO - pupil-teacher ratio by town
- B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT - % lower status of the population
- MEDV - Median value of owner-occupied homes in dollar 1000's

Dataset : https://github.com/ybifoundation/Dataset/raw/main/Boston.csv

**Import Libraries**

In [122]:
import pandas as pd


In [123]:
import warnings
warnings.filterwarnings('ignore')

**Importing Dataset**

In [124]:
House = pd.read_csv("https://github.com/ybifoundation/Dataset/raw/main/Boston.csv")

**Exploring Dataset**

In [125]:
House.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


In [126]:
House.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   CRIM     506 non-null    float64
 1   ZN       506 non-null    float64
 2   INDUS    506 non-null    float64
 3   CHAS     506 non-null    int64  
 4   NX       506 non-null    float64
 5   RM       506 non-null    float64
 6   AGE      506 non-null    float64
 7   DIS      506 non-null    float64
 8   RAD      506 non-null    int64  
 9   TAX      506 non-null    float64
 10  PTRATIO  506 non-null    float64
 11  B        506 non-null    float64
 12  LSTAT    506 non-null    float64
 13  MEDV     506 non-null    float64
dtypes: float64(12), int64(2)
memory usage: 55.5 KB


In [127]:
House.describe()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


**Initializing target and feature data**

In [128]:
House.columns

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT', 'MEDV'],
      dtype='object')

In [129]:
y = House["MEDV"]
X = House[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT']]

**Splitting of data into train and test**

In [130]:
from sklearn.model_selection import train_test_split

In [131]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8, random_state = 2529)

 **Standardize the data**

In [132]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

In [133]:
X_train = scaler.fit_transform(X_train)

In [134]:
X_test = scaler.transform(X_test)

**Creating model with linear regression**

In [135]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

In [136]:
model.fit(X_train, y_train)

LinearRegression()

In [137]:
print(model.intercept_)

22.733168316831684


In [138]:
print(model.coef_)

[-1.1015004   0.76145504 -0.10838294  0.5612375  -2.13193813  2.93852996
  0.02285122 -3.07025055  2.11621638 -1.62259577 -2.12886797  0.91700488
 -3.58121428]


**Prediction and error analysis**

In [139]:
y_pred = model.predict(X_test)
y_pred

array([31.40302131, 22.20306757, 21.14129681, 39.02572558, 19.94724785,
       22.52706384, 18.38247303, 15.06724303, 22.23085598, 21.50511568,
       18.63185223, 27.78256005, 29.55845678,  6.74561575, 10.96733427,
       26.11838345, 21.76193498, 25.09461618,  4.12752192, 35.43847464,
       24.26042681, 22.70045425, 14.54163914, 20.78847359, 24.36918454,
       17.30350272, 19.28542938, 21.34193595, 28.47441715, 20.57479566,
        8.99248642, 17.25832287, 22.2614938 , 21.93764391, 38.42965866,
       25.85926261, 41.7078269 , 19.78972139, 33.52651144, 13.99129785,
       14.0719068 , 22.70094941, 12.13026681,  8.79448627, 21.96110153,
       24.64099694, 18.12113371, 16.79062927, 14.54524738, 15.52114801,
       33.08125641, 32.94674536, 15.73994046, 23.89216744, 27.24147586,
       19.35695509, 44.33688165, 20.86690196, 20.08716421, 27.68066845,
       33.88475442, 13.03812091, 23.79341986, 31.50808784, 28.56817454,
       32.07529249, 13.62020008, 35.00351931, 19.4318934 , 19.13

In [140]:
from sklearn.metrics import mean_absolute_percentage_error

In [141]:
error = float(mean_absolute_percentage_error(y_test, y_pred))
print(round(error, 3)*100, "%")

16.400000000000002 %


**New Prediction**

In [150]:
import numpy as np
num = np.array(X.iloc[0]).reshape(1, -1)
num

array([[6.320e-03, 1.800e+01, 2.310e+00, 0.000e+00, 5.380e-01, 6.575e+00,
        6.520e+01, 4.090e+00, 1.000e+00, 2.960e+02, 1.530e+01, 3.969e+02,
        4.980e+00]])

In [143]:
model.predict(scaler.transform(num))

array([30.56842644])

## **Deployment**

In [144]:
import pickle

In [145]:
pickle.dump(model, open('House_predictor.pkl', 'wb'))

In [146]:
pickle.dump(scaler, open('Scaling.pkl', 'wb'))

In [147]:
pickle_model = pickle.load(open('House_predictor.pkl', 'rb'))

In [148]:
pickle_model.predict(scaler.transform(np.array(X.iloc[0]).reshape(1, -1)))

array([30.56842644])