# **House Price Prediction**

Build a machine learning model to predict the median house prices based on different independent variables.

There are 14 attributes in each case of the dataset. They are:

- CRIM - per capita crime rate by town
- ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS - proportion of non-retail business acres per town.
- CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
- NOX - nitric oxides concentration (parts per 10 million)
- RM - average number of rooms per dwelling
- AGE - proportion of owner-occupied units built prior to 1940
- DIS - weighted distances to five Boston employment centres
- RAD - index of accessibility to radial highways
- TAX - full-value property-tax rate per dollar 10,000
- PTRATIO - pupil-teacher ratio by town
- B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT - % lower status of the population
- MEDV - Median value of owner-occupied homes in dollar 1000's

Dataset : https://github.com/ybifoundation/Dataset/raw/main/Boston.csv

In [1]:
import pandas as pd

In [2]:
house=pd.read_csv('https://github.com/ybifoundation/Dataset/raw/main/Boston.csv')

In [3]:
house.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   CRIM     506 non-null    float64
 1   ZN       506 non-null    float64
 2   INDUS    506 non-null    float64
 3   CHAS     506 non-null    int64  
 4   NX       506 non-null    float64
 5   RM       506 non-null    float64
 6   AGE      506 non-null    float64
 7   DIS      506 non-null    float64
 8   RAD      506 non-null    int64  
 9   TAX      506 non-null    float64
 10  PTRATIO  506 non-null    float64
 11  B        506 non-null    float64
 12  LSTAT    506 non-null    float64
 13  MEDV     506 non-null    float64
dtypes: float64(12), int64(2)
memory usage: 55.5 KB


In [5]:
house.columns

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT', 'MEDV'],
      dtype='object')

In [7]:
y=house['MEDV']
x=house[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT']]

In [8]:
from sklearn.model_selection import train_test_split

In [10]:
x_train, x_test, y_train, y_test=train_test_split(x,y,random_state=2529)

In [12]:
x_train

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
246,0.33983,22.0,5.86,0,0.431,6.108,34.9,8.0555,7,330.0,19.1,390.18,9.16
252,0.08221,22.0,5.86,0,0.431,6.957,6.8,8.9067,7,330.0,19.1,386.09,3.53
198,0.03768,80.0,1.52,0,0.404,7.274,38.3,7.3090,2,329.0,12.6,392.20,6.62
94,0.04294,28.0,15.04,0,0.464,6.249,77.3,3.6150,4,270.0,18.2,396.90,10.59
271,0.16211,20.0,6.96,0,0.464,6.240,16.3,4.4290,3,223.0,18.6,396.90,6.59
...,...,...,...,...,...,...,...,...,...,...,...,...,...
228,0.29819,0.0,6.20,0,0.504,7.686,17.0,3.3751,8,307.0,17.4,377.51,3.92
399,9.91655,0.0,18.10,0,0.693,5.852,77.8,1.5004,24,666.0,20.2,338.16,29.97
316,0.31827,0.0,9.90,0,0.544,5.914,83.2,3.9986,4,304.0,18.4,390.70,18.33
50,0.08873,21.0,5.64,0,0.439,5.963,45.7,6.8147,4,243.0,16.8,395.56,13.45


In [11]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

In [13]:
model.fit(x_train, y_train)

LinearRegression()

In [14]:
y_pred=model.predict(x_test)

In [15]:
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, mean_squared_error

In [16]:
mean_absolute_percentage_error(y_test,y_pred)

0.17028034096449765

In [17]:
mean_absolute_error(y_test,y_pred)

3.2479052911120188

In [18]:
mean_squared_error(y_test, y_pred)

22.777996510720225