<div class="alert" style="background-color:#fff; color:white; padding:0px 10px; border-radius:5px;"><h1 style='margin:15px 15px; color:#006a79; font-size:40px'> Evaluation Metrics - Regression</h1>
Copyright Machine Learning Plus
</div>

### Table of contents :
1. Build a regression model
2. Model Evaluation
    * [Mean Squared Error]
    * [Mean Absolute Error]
    * [R2 score]
    * [max error]
    * [median absolute error]
    * [Mean square log error]
    * [Root mean squared error]


<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:10px 5px'>1. Build a linear regression model</h2>
</div>

In [None]:
# Basic libraries required
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#libraries required for regression and evaluation
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,mean_squared_error,mean_squared_log_error,median_absolute_error,r2_score,max_error


### Business Problem

Price of a house dependends on various factors like locality, area of the house, number of rooms in the house etc. Hence it's very difficult to know if the price quoted by broker is correct or it's overprices.

Let's build a linear regression model to predict price of the house using multiple variables.

In this tutorial, you will learn various evaluation metrics for linear regression

### Load Data

In [None]:
# reading the data from csv file to dataframe
data = pd.read_csv('USA_Housing.csv')
data.head()

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price,Address
0,79545.458574,5.682861,7.009188,4.09,23086.800503,1059034.0,"208 Michael Ferry Apt. 674\nLaurabury, NE 3701..."
1,79248.642455,6.0029,6.730821,3.09,40173.072174,1505891.0,"188 Johnson Views Suite 079\nLake Kathleen, CA..."
2,61287.067179,5.86589,8.512727,5.13,36882.1594,1058988.0,"9127 Elizabeth Stravenue\nDanieltown, WI 06482..."
3,63345.240046,7.188236,5.586729,3.26,34310.242831,1260617.0,USS Barnett\nFPO AP 44820
4,59982.197226,5.040555,7.839388,4.23,26354.109472,630943.5,USNS Raymond\nFPO AE 09386


Let's remove the Address column

In [None]:
# drop 'Address' column
data.drop('Address',axis=1,inplace=True)

In [None]:
data.head()

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price
0,79545.458574,5.682861,7.009188,4.09,23086.800503,1059034.0
1,79248.642455,6.0029,6.730821,3.09,40173.072174,1505891.0
2,61287.067179,5.86589,8.512727,5.13,36882.1594,1058988.0
3,63345.240046,7.188236,5.586729,3.26,34310.242831,1260617.0
4,59982.197226,5.040555,7.839388,4.23,26354.109472,630943.5


### Model Building

Split the data into X and y variables

In [None]:
X = data[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
               'Avg. Area Number of Bedrooms', 'Area Population']]
y = data['Price']


Now **X dataframe** have all features where as **y dataframe** have target values

In [None]:
# split the data such that 70% is used for training and remaining 30% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [None]:
# model initialization
model = LinearRegression(normalize=True)

# train the model on training data
model.fit(X_train,y_train)

LinearRegression(normalize=True)

In [None]:
# predict the price of house for test data
test_pred  = model.predict(X_test)

In [None]:
# print the intercept
print(model.intercept_)

-2637752.7494629105


In [None]:
# coefficients
coeff_df = pd.DataFrame(model.coef_, X.columns, columns=['Coefficient'])
coeff_df

Unnamed: 0,Coefficient
Avg. Area Income,21.615726
Avg. Area House Age,165539.267859
Avg. Area Number of Rooms,120570.222615
Avg. Area Number of Bedrooms,978.375844
Area Population,15.273446


Let's check the performance of model using various evaluation metrics


<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:10px 5px'>2. Model Evaluation using evaluation metrics</h2>
</div>

<a id="1"></a>
<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h3 style='margin:10px 5px'>2.1. Mean Square Error </h3>
</div>

Mean of square of the difference between the predicted and the original value

In [None]:
# test data error
test_mse = mean_squared_error(y_test,test_pred)

In [None]:
print('test data error :',test_mse)

test data error : 10645837125.790102


<a id="2"></a>
<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h3 style='margin:10px 5px'>2.2. Mean Absolute Error </h3>
</div>

Mean of absolute difference between the predicted and the original value

In [None]:
# test data error
test_mae = mean_absolute_error(y_test,test_pred)

In [None]:
print('test data error :',test_mae)

test data error : 83291.40523552582


<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h3 style='margin:10px 5px'>2.3. Max Error </h3>
</div>

Maximum error in any of the original value and the predicted value

In [None]:
# test data error
test_max_error = max_error(y_test,test_pred)

In [None]:
print('test data error :',test_max_error)

test data error : 328699.31573108


<a id="5"></a>
<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h3 style='margin:10px 5px'>2.5. Median absolute Error </h3>
</div>

Median of absolute difference between the predicted and the original value

In [None]:
# test data error
test_median_ae = median_absolute_error(y_test,test_pred)

In [None]:
print('test data error :',test_median_ae)

test data error : 71441.464019562


<a id="6"></a>
<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h3 style='margin:10px 5px'>2.6. Mean squared log Error </h3>
</div>

Mean of square of the difference between the log of predicted and the original value

In [None]:
# test data error
test_msle= mean_squared_log_error(y_test,test_pred)

In [None]:
print('test data error :',test_msle)

test data error : 0.010682507590407342


<a id="7"></a>
<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h3 style='margin:10px 5px'>2.7. Root Mean squared Error </h3>
</div>

Square root of the MSE

In [None]:
# test data error
test_rmse = np.sqrt(test_mse)

In [None]:
print('test data error :',test_rmse)

test data error : 103178.66603998184
