# Multiple Linear Regression - Boston House Price

This project have the objective to demonstrate by a simple form the use of a multiple linear regression model to predict house prices in Boston.

The dataset used in this project was the "Boston House Prices" from Scikit-Learn datasets.

# Libraries

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_squared_error

# Data

In [2]:
data = datasets.load_boston()

print(data.DESCR)

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu

# Model - Multiple Linear Regression

### Variables

In [3]:
X = data.data[:, 5:8] # Selecting just three features from our dataset (RM, AGE, DIS)

y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

### Multiple Linear Regression

In [4]:
multiple_linear_reg = LinearRegression()

multiple_linear_reg.fit(X_train, y_train)

LinearRegression()

### Prediction

In [5]:
y_pred = multiple_linear_reg.predict(X_test)

### Results - Linear Regression Equation

In [6]:
coef_a = multiple_linear_reg.coef_

coef_b =  multiple_linear_reg.intercept_

print('Slope values:', coef_a)

print('\nIntercept value:', coef_b)

print('\nEstimated model: y = %.2fx1 + (%.2fx2) + (%.2fx3) + (%.2f)' %(coef_a[0], coef_a[1], coef_a[2], coef_b))

Slope values: [ 8.66050613 -0.09928741 -0.60991669]

Intercept value: -22.58322491914025

Estimated model: y = 8.66x1 + (-0.10x2) + (-0.61x3) + (-22.58)


### Results - Model evaluation

In [7]:
mse = mean_squared_error(y_test,y_pred)

rsq = r2_score(y_test,y_pred)

print('Mean Squared Error :',mse)

print('\nR square :',rsq)

Mean Squared Error : 22.68100421398416

R square : 0.5816803449248215
