# Linear Regression - Performance Metrics

It is common to build several different linear regression models using different features when attempting to predict a response. In the case of a single feature, it is sometimes possible to look at the models and determine which is better. However, in many situations, it is not possible to visually determine which model is best, especially in the case of multiple linear regression. To select the best model, we need to evaluate our models and select the one with the least amount of error when predicting the output.
  

**Error Function:**

<figure align="center">
<!-- <img src="https://drive.google.com/uc?id=1T609cjRRLk4-ANWq1sePDF__WdfG99uY" height="500px", width="600px">  -->
<img src="https://i.postimg.cc/qvbPf4ty/image.png" height=500px>
</figure>

An error function represents the difference between the actual and predicted values; the higher the error value, the worse the model. When performing linear regression, you obtain a line of best fit as shown by the red line in the figure above. Points that lie on the line will be predicted correctly. However, as you can see from the figure above, many points can lie away from the regression line.

A residual is the difference between the actual response variable $y_{i}$ and the predicted outcome $\hat{y_{i}}$, and is given by $y_{i}-\hat{y_{i}}$. The residuals can be thought of as the error that is unexplained by the regression line. A residual can be positive or negative, denoted by $r_{+}$, and $r_{-}$ respectively. The residual sum of squares (RSS) is one of several error measures used in linear regression.

$$ \text {Residual = actual value – predicted value}

**Residual Sum of Squares (RSS):**

$$ \text{RSS} = \sum_{i=1}^{m} (y_{i} - \hat{y_{i}})^{2} $$

**Mean Squared Error (MSE):**


One issue with the residual sum of squares is that it does not account for the number of samples used when calculating the error; if we double the number of samples, the RSS will increase significantly. One way to address this is to divide the RSS by the number of sample points. This is referred to as the mean squared error (MSE).


$$ \text{MSE} = \frac{1}{n}\sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^{2} $$

**Root Mean Squared Error (RMSE):**


There are times when large errors are particularly undesirable. By taking the square root of the MSE, larger errors are penalized more than smaller errors. This error measurement is referred to as the root mean squared error (RMSE).


$$ \text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^{2}} $$

**Mean Absolute Error (MAE):**


The previous three error measures mentioned are sensitive to the presence of outliers. In the presence of outliers, one may want to consider the mean absolute error, which is the average of the absolute values of the residuals.


$$ \text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_{i} - \hat{y_{i}}|$$

# Evaluation


In [1]:
# import libraries

import numpy as np
import pandas as pd
from IPython.display import display, HTML

In [2]:
data_path = "https://www.statlearning.com/s/Advertising.csv"

# Read the CSV data from the link
data_df = pd.read_csv(data_path,index_col=0)

# Print out first 5 samples from the DataFrame
data_df.head()

Unnamed: 0,TV,radio,newspaper,sales
1,230.1,37.8,69.2,22.1
2,44.5,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9


### Scikit-Learn's Error Metrics

Scikit-Learn's [metrics](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics) package has several metrics for [regression](https://scikit-learn.org/stable/modules/classes.html#regression-metrics).

* **Residual Sum of Squares (RSS):**

  Scikit-Learn does not provide a directed implementation to calculate the RSS. However, since the MSE is equal to the RSS divided by the number of samples, we can use the [`mean_squared_error`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error) function times the number of samples to obtain the RSS.


* **Mean Squared Error (MSE):**

  The [`mean_squared_error`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error) function implements the MSE.


* **Root Mean Squared Error (RMSE):**

  Again, Scikit-Learn does not directly implement the RMSE, but it can be achieved by taking the square root of the MSE.


* **Mean Absolute Error (MAE):**

  The [`mean_absolute_error`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error) function implements the MSE.

We are going to calculate and analyze these performance metrics on the advertising dataset. This dataset was introduced in the previous notebook where we build simple and multiple linear regression models.

In [5]:
# Import LinearRegression and necessary evaluation metrics from sklearn
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, median_absolute_error, r2_score

# Create a dictionary of lists for storing the values of all evaluation metrics
results = {
    "Residual Sum of Squares":list(),
    "Mean Squared Error":list(),
    "Root Mean Squared Error":list(),
    "Mean Absolute Error":list(),
}


def train_lr(X, y_true):

    lr = LinearRegression()
    lr.fit(X, y_true)
    y_pred = lr.predict(X)


    # Evaluate different metrics
    results['Residual Sum of Squares'].append(len(y_true * mean_squared_error(y_true, y_pred)))
    results["Root Mean Squared Error"].append( np.sqrt(mean_squared_error(y_true, y_pred)))
    results["Mean Absolute Error"].append( median_absolute_error(y_true, y_pred))


# Train and analyze performance metric over each of the following feature groups
feature_list = ["TV",
            "radio",
            "newspaper",
            "TV, radio",
            "TV, radio, newspaper"]
for features in feature_list:
    feature = features.split(', ')
    train_lr(data_df[feature], data_df[["sales"]])






In [9]:
results_fixed = {k: v for k, v in results.items() if len(v) == len(feature_list)}
error_df = pd.DataFrame(results_fixed, index=feature_list).transpose()
display(error_df)

Unnamed: 0,TV,radio,newspaper,TV + radio,TV + radio + newspaper
Residual Sum of Squares,200.0,200.0,200.0,200.0,200.0
Root Mean Squared Error,3.242322,4.253516,5.066954,1.668703,1.66857
Mean Absolute Error,2.026365,2.61417,3.440421,1.079819,1.075512
