# Regression metrics

When it comes to regression, the most commonly used evaluation metrics are:
- Mean absolute error (MAE)
- Mean squared error (MSE)
- Root mean squared error (RMSE)
- Root mean squared logarithmic error (RMSLE)
- Mean percentage error (MPE)
- Mean absolute percentage error (MAPE)
- $ R^2 $

In [1]:
import numpy as np
from sklearn import metrics

### Mean absolute error (MAE)

**Error** is a common metric<br>
$ Error = True Value – Predicted Value $<hr>
**Absolute error**<br>
$Absolute Error = Abs ( True Value – Predicted Value )$<hr>
**Mean Absolute error**
- mean of all absolute error <br>
sum of all Absolute errors divided by number of targets 

In [2]:
def mean_absolute_error(y_true, y_pred):
    mae = 0
    for yt, yp in zip(y_true, y_pred):
        mae += abs(yt - yp)
    return mae / len(y_true)

### Squared error

$Squared Error = ( True Value – Predicted Value )^2$

In [3]:
def mean_squared_error(y_true, y_pred):
    mse = 0
    for yt, yp in zip(y_true, y_pred):
        mse += (yt - yp)**2
    return mse / len(y_true)

### Root Mean Squared Error

$RMSE = SQRT ( MSE )$

In [4]:
def root_mean_sqaured_error(y_true, y_pred):
    mae = mean_squared_error(y_true, y_pred)
    return np.sqrt(mae)

### Mean Squared Log Error

$MSLE = (log(1 + TrueValue) - log(1 + PredValue))^2$

In [5]:
def mean_squared_log_error(y_true, y_pred):
    mslr = 0
    for yt, yp in zip(y_true, y_pred):
        mslr += (np.log(1 + yt) - np.log(1 + yp)) ** 2
    return mslr / len(y_true)

### Root Mean Squared Log Error

$RMSLE = SQRT(MSLE)$

In [6]:
def root_mean_squared_log_error(y_true, y_pred):
    mslr = mean_squared_log_error(y_true, y_pred)
    return np.sqrt(mslr)

### Mean Percentage Error

$ Percentage Error = ( ( True Value – Predicted Value ) / True Value ) * 100 $

In [7]:
def mean_percentage_error(y_true, y_pred):
    pe = 0
    for yt, yp in zip(y_true, y_pred):
        pe += (yt - yp) / yt
    return pe / len(y_true)

### Mean Absolute Percentage Error

In [8]:
def mean_abs_percentage_error(y_true, y_pred):
    ape = 0
    for yt, yp in zip(y_true, y_pred):
        ape += abs(yt - yp) / yt
    return ape / len(y_true)

### $R^2$ (R-squared)

also known as the coefficient of determination.

$ R^2 $ measures how good the model fits. The closer $R^2$ to 1.0 the better the model is.

It also gives negative values when the predictions are absurd

![image.png](attachment:image.png)

In [9]:
def r_squared(y_true, y_pred):
    denominator = 0
    numerator = 0
    mean = np.mean(y_true)
    for yt, yp in zip(y_true, y_pred):
        numerator += (yt - yp)**2
        denominator += (yt - mean)
    return 1 - (numerator / denominator)

There are some advanced metrics which can be implemented using sklearn

**quadratic weighted kappa** is widely used, also known as QWK. It is also known as **Cohen’s kappa**

In [10]:
y_true = [1, 2, 3, 1, 2, 3, 1, 2, 3]
y_pred = [2, 1, 3, 1, 2, 3, 3, 1, 2]
metrics.cohen_kappa_score(y_true, y_pred, weights="quadratic")

0.33333333333333337

Another important metric is **Matthew’s Correlation Coefficient (MCC)**. MCC ranges from -1 to 1. 1 is perfect prediction, -1 is imperfect prediction, and 0 is random prediction.

![image.png](attachment:image.png)