# Regression Evaluation

## Mean Absolute Error (MAE)

- MAE calculates the absolute different between actual and predicted values
- $MAE= \frac{1}{n} \sum | y-\hat{y} | $
- Advantages:
    - MAE is in the same units as the dependent variable
    - Most robust to outliers, meaning a few large errors doesn't overpower a lot of smaller ones
- Disadvantages:
    - The graph of MAE is not differentiable so it's not a great loss function; we have to apply variables optimizers like gradient descent which can be differentiable

## Mean Squared Error (MSE)

- MSE calculates the squared difference between actual and predicted values
- As a loss function, it's commonly referred to as "least squares"
- $MSE= \frac{1}{n} \sum (y-\hat{y})^2 $
- Advantages:
    - The graph of MSE is differentiable, so you can easily use it as a loss function
- Disadvantages:
    - MSE is a squared unit of output, so interpretation can be harder
    - It is not robust to outliers. It punishes the model more for large errors because they're squared.

## Root Mean Squared Error (RMSE)

- RMSE calculates takes the square root of MSE
- $RMSE= \sqrt{\frac{1}{n} \sum (y-\hat{y})^2}$
- Advantages:
    - RMSE is in the same units as the dependent variable
- Disadvantages:
    - It is not as robust to outliers as MAE, but much more robust than MSE

## R-Squared ($R^2$)

- $R^2$ measures the proportion of variance the independent variables + model explain of the dependent variable. It is often used in model comparison.
- It is also known as Coefficient of Determination or Goodness of Fit.
- $R^2= 1 - \frac{SSR}{SST}$
- $SSR = \sum (y - \hat{y})^2$
- $SST = \sum (y - \bar{y})^2$
- The range is 0-1. Interpret an $R^2 = 0.8$ to mean that the model explains 80% of the variance in the data
- Advantages:
    - Compare different models
- Disadvantages:
    - It is not as robust to outliers as MAE.
    - Adding new features makes $R^2$ either stay the same or increase, even if the features are irrelevant.
    - Not interpretable in the same units as the dependent variable

## Adjusted R-Squared (Adj $R^2$)

- $R^2_a = 1 - [\frac{(1 - R^2)(n - 1)}{n - k - 1}]$
- Punishes the model when additional features are added.
- Advantages:
    - Compare different models
    - Much less affected by adding irrelevant features
- Disadvantages:
    - Not interpretable in the same units as the dependent variable