# Regression Metrics

In this notebook we examine different metrics used to determine the error of a model for regression tasks. Most metrics are determined from the residuals which are defined as:

\begin{equation*}
e_i= y_i-\hat{y}_i
\end{equation*}

### Root Mean Squared Error
The most commonly used metric for regression is the *Root Mean Squared Error* (RMSE) which is defined as follows:

\begin{equation*}
RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}e_i^{2}}
\end{equation*}

Below we have implemented RMSE on data generated from [Linear Regression](https://github.com/AlbinFranzen/Machine-Learning-Portfolio/blob/master/ML%20algorithms%20from%20scratch/Linear%20Methods/Simple%20Linear%20Regression.ipynb)

In [None]:
import numpy as np
np.random.seed(42)

In [9]:
#generate the data
X = 2 * np.random.rand(100,1)
y = 4 + 3 * X + np.random.randn(100,1)

In [10]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression() #apply model and make predictions
lin_reg.fit(X, y)
y_pred = lin_reg.predict(X)

residuals = y_pred - y #calculate residuals
root_mean_squared_error = np.sqrt(1/len(y)*np.sum(residuals**2)) #RMSE formula
root_mean_squared_error

0.8915256019083877

### Mean Absolute Error

The RMSE allows for convex solutions to many problems which is why it is generally preffered. Another metric that can be used is the *Mean Absolute Error* (MAE):

\begin{equation*}
MAE = \frac{1}{n}\sum_{i=1}^{n}|e_i|
\end{equation*}

We implement it below:

In [11]:
mean_absolute_error = 1/len(y)*np.sum(abs(residuals)) #MAE formula
mean_absolute_error

0.7036701730509569

Both the RMSE and MAE can be used to measure distances between vectors. Various distance measures or *norms* can be used to do this:

- The MAE is part of the l1 norm which is called *Manhattan norm* because it is the average residual

- The RMSE is part of the l1 norm which is called *Euclidian norm* because it is the square root of the average square of residuals

- Generally an l_k norm is nth root of the average n-exponent of residuals

- Higher norm indexes focus more on large values and neglect small ones due to the exponents giving outliers more weight. RMSE is therefore ussually a reasonable choice as a prediction metric