# Housing Values in Suburbs of Boston
The medv variable is the target variable.

Data description


The Boston data frame has 506 rows and 14 columns.

This data frame contains the following columns:

crim -------------per capita crime rate by town.

zn----------proportion of residential land zoned for lots over 25,000 sq.ft.

indus-----------proportion of non-retail business acres per town.

chas-----------Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).

nox------------nitrogen oxides concentration (parts per 10 million).

rm-------average number of rooms per dwelling.

age-----------proportion of owner-occupied units built prior to 1940.

dis----------weighted mean of distances to five Boston employment centres.

rad-------------index of accessibility to radial highways.

tax----------full-value property-tax rate per $10,000.

ptratio----------pupil-teacher ratio by town.

black--------------1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.

lstat----------lower status of the population (percent).

medv-----------median value of owner-occupied homes in $1000s.

Source


Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81â€“102.

Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
    


### Reading the File using pandas

In [1]:
import pandas as pd
import numpy as np
import os 
import math
import warnings
warnings.filterwarnings('ignore')


In [2]:
boston = pd.read_csv("BostonHousingData.csv")

boston.shape

(506, 14)

In [3]:
boston.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [4]:
boston.columns

Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'b', 'lstat', 'medv'],
      dtype='object')

### Renaming columns

In [5]:
boston=boston.rename(columns= {"nox" : "nitrogen oxides concentration" , 
                        "rm" : "room",
                        "dis" : "distance",
                        "rad" : "radial highways",
                        "ptratio" : "pupil-teacher ratio ",
                        "b" :  "Black",
                        "lstat" : "lower status in %",
                        "medv" : "prices"})

### Checking for null values

In [6]:
boston.isnull().any()

crim                             False
zn                               False
indus                            False
chas                             False
nitrogen oxides concentration    False
room                             False
age                              False
distance                         False
radial highways                  False
tax                              False
pupil-teacher ratio              False
Black                            False
lower status in %                False
prices                           False
dtype: bool

In [7]:
boston.head()

Unnamed: 0,crim,zn,indus,chas,nitrogen oxides concentration,room,age,distance,radial highways,tax,pupil-teacher ratio,Black,lower status in %,prices
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


### Checking for Duplicates 

In [8]:
boston.columns.duplicated()

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False])

In [9]:
boston = boston.round(3)


In [10]:
boston.head()

Unnamed: 0,crim,zn,indus,chas,nitrogen oxides concentration,room,age,distance,radial highways,tax,pupil-teacher ratio,Black,lower status in %,prices
0,0.006,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.027,0.0,7.07,0,0.469,6.421,78.9,4.967,2,242,17.8,396.9,9.14,21.6
2,0.027,0.0,7.07,0,0.469,7.185,61.1,4.967,2,242,17.8,392.83,4.03,34.7
3,0.032,0.0,2.18,0,0.458,6.998,45.8,6.062,3,222,18.7,394.63,2.94,33.4
4,0.069,0.0,2.18,0,0.458,7.147,54.2,6.062,3,222,18.7,396.9,5.33,36.2


In [11]:
set(boston['chas'])

{0, 1}

In [12]:
features = boston.drop('prices', axis = 1)

### Splitting data

In [13]:
from sklearn.model_selection import train_test_split
from sklearn import metrics


train,test = train_test_split(boston, test_size=0.3)

print(train.shape, test.shape)

(354, 14) (152, 14)


In [14]:
boston.columns

Index(['crim', 'zn', 'indus', 'chas', 'nitrogen oxides concentration', 'room',
       'age', 'distance', 'radial highways', 'tax', 'pupil-teacher ratio ',
       'Black', 'lower status in %', 'prices'],
      dtype='object')

In [15]:
train_x = train[['crim', 'zn', 'indus', 'chas', 'nitrogen oxides concentration', 'room','age', 'distance', 'radial highways', 'tax', 'pupil-teacher ratio ','Black', 'lower status in %']]
train_y =train.prices

test_x = test[['crim', 'zn', 'indus', 'chas', 'nitrogen oxides concentration', 'room','age', 'distance', 'radial highways', 'tax', 'pupil-teacher ratio ','Black', 'lower status in %']]
test_y = test.prices

## Applying Mechine Learning Algorithms 
### Linear Regression

In [25]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

model = LinearRegression()
model.fit(train_x, train_y)

# predicting the  test set results
y_pred = model.predict(test_x)


# finding the mean squared error and variance
mse = mean_squared_error(test_y, y_pred)
print('RMSE :', np.sqrt(mse))
print('Variance score: %.2f' % r2_score(test_y, y_pred))

RMSE : 3.8297180084878795
Variance score: 0.76
