### Regression Evaluation Metrics

Wine Dataset\
Predictor Variable: Quality (Tells quality of wine)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

In [2]:
wine = pd.read_csv("https://raw.githubusercontent.com/dphi-official/ML_Models/master/Performance_Evaluation/winequality.csv", sep=";")
wine.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


**Separate Input and Output Variables**

In [3]:
X = wine.drop('quality', axis = 1)
y = wine.quality

**Split into training and testing (80:20)**

In [4]:
# Here we are performing both separation of input and output variable and the splitting.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3)

In [5]:
from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)
y_pred[:10]

array([5.44455619, 5.57868309, 5.99091469, 5.19864346, 6.0666099 ,
       5.01639077, 5.68416174, 6.26611011, 5.97010538, 5.65519351])

### Performance Measurement

**Mean Absolute Error**

- MAE is the absolute difference between the target value and the value predicted by the model.
- The MAE is more robust to outliers and does not penalize the errors as extremely 

In [6]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test, y_pred)

0.5972358558776483

**Mean Squared Error**

- It is simply the average of the squared difference between the target value and the value predicted by the regression model.
- As it squares the differences, it penalizes even a small error which leads to over-estimation of how bad the model is.
- MSE or Mean Squared Error is one of the most preferred metrics for regression tasks.

In [7]:
from sklearn.metrics import mean_squared_error
print("Mean Squared Error: ",mean_squared_error(y_test, y_pred))

Mean Squared Error:  0.5906658099548077


**Root Mean Square Error**

- RMSE is the square root of the averaged squared difference between the target value and the value predicted by the model.
- It is preferred more in some cases because the errors are first squared before averaging which poses a high penalty on large errors.
- This implies that RMSE is useful when large errors are undesired. 

In [8]:
print("Root Mean Squared Error: ",mean_squared_error(y_test, y_pred, squared=False))

Root Mean Squared Error:  0.7685478579469256


**R Squared**

The metric helps us to compare our current model with a constant baseline and tells us how much our model is better

In [9]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.2832037191111023