# Regression Metrics Review I

#### Plan for the video

#### 1) Regression

* **MSE, RMSE, R-squared**
* **MAE**
* (R)MSPE, MAPE
* (R)MSLE


2) Classification
* Accuracy, LogLoss, AUC
* Cohen's (Quadratic weighted) Kappa

### Notation

![reg-metric-notation](../img/regression-metric-notation.png)

### MSE : Mean Square Error

$$MSE = \frac{1}{N}\sum^{N}_{i=1}(y_{i} - \hat{y_i})^2$$
#### Optimizing value = mean of the target values
![mse1](../img/mse1.png)



### MSE notes: RMSE

`RMSE` = Root mean square error

$$RMSE = \sqrt{\frac{1}{N}\sum^{N}_{i=1}(y_{i} - \hat{y_i})^2} = \sqrt{MSE}$$
$$MSE(a) > MSE(b) \Leftrightarrow RMSE(a) > RMSE(b)$$

* Gradient of MSE multiplied by certain value
<br/>

$$\frac{\partial{RMSE}}{\partial{\hat{y_i}}} = \frac{1}{2\sqrt{MSE}}\frac{\partial{MSE}}{\partial{\hat{y_i}}}$$

#### Even though they are very similar metrics, they can not be immediately interchangeable for `gradient based methods`.
* We will probably need to adjust some parameters like the `learning rate`.

#### R-squared
* Score ranges from `0` to `1`
* Optimize R-squared = Optimize MSE

$$
R^2 = 1 - \frac{\frac{1}{N}\sum^{N}_{i=1}(y_{i} - \hat{y_i})^2}{\frac{1}{N}\sum^{N}_{i=1}(y_{i} - \bar{y_i})^2} = 1 - \frac{MSE}{\frac{1}{N}\sum^{N}_{i=1}(y_{i} - \bar{y_i})^2}
$$

$$
\hat{y} = \frac{1}{N}\sum^{N}_{i=1}y_{i}
$$

### MAE : Mean Absolute Error
* This metric **penalizes huge errors that not as that badly as MSE does**
  * Thus it's not that sensitive to outliers as mean square error
* It also has a little bit different applications than MSE
  * **`MAE` is widely used in finance sector where `$10` error is usually exactly two times worse than `$5` error.**
  * On the other hand, **`MSE`** metric thinks that `$10` error is **`four times`** worse than `$5` error.
  * If you use `RMSE`, it would become really hard to explain to your boss how you evaluated your model.

#### Optimizing value = median of the target values
#### `MAE` is more robust than `MSE`
* Less outlier influence 

$$MAE = \frac{1}{N}\sum^N_{i=1}{|y_i - \hat{y_i}|}$$

![mae](../img/mae1.png)

### MAE : derivatives
Another important thing about `MAE`
* Its gradients with respect to the predictions
  * `+1` when Y hat is larger than the target
  * `-1` when Y hat is smaller than the target
  * Gradient not defined when the prediction is perfect
    * So formerly **MAE is not differentiable**
  * But we can deal with the case `your predictions perfectly measure the target` as returning `0`.
  * Also notice that second derivative is zero everywhere and not defined in the point of zero

$$MAE = \frac{1}{N}\sum^N_{i=1}{|y_i - \hat{y_i}|}$$
![mae2](../img/mae2.png)



### `MAE` vs `MSE`

#### Do you have outliers in the data / Are you sure they are outliers?
* They are exactly outliers which are to be excluded in training models.
* USE `MAE`

#### Or they are just unexpected values we should still care about?
* They are not outliers, just rare samples.
* USE `MSE`

### Conclusion

* Discussed the following metrics:
  - **MSE, RMSE, R-squared**
    * They are about the same from optimization perspective
  - **MAE**
    * Robust to outliers