# Regression Metrics Review II

#### Plan for the video

#### 1) Regression
* MSE, RMSE, R-squared
* MAE
* **(R)MSPE, MAPE**
* **(R)MSLE**


2) Classification
* Accuracy, LogLoss, AUC
* Cohen's (Quadratic weighted) Kappa

## From MSE and MAE to MSPE and MAPE

#### Say ...
* We need to predict how many laptops two shops will sell
* In the train set for a particular date, we see that the first shop sold 10 items and the second shop sold 1000 items

#### Then suppose ...
* Our model predicts 9 items intead of 10 for the first shop and 999 instead of 1000 for the second shop.
* It could happen that **off by one error in the first case, is much more critical than in the second case**
* **BUT `MSE` and `MAE` are equal to `one` for both shops predictions**, and thus these **offs caused by one error are indistinguishable**

----
* Shop 1 : predicted 9, sold 10, `MSE` = 1
* Shop 2 : predicted 999, sold 1000, `MSE` = 1

<br/>
* Shop 1 : predicted 9, sold 10, `MSE` = 1
* Shop 2 : predicted 900, sold 1000, `MSE` = 10000

<br/>
* Shop 1 : predicted 9, sold 10, relative_metric = 1
* Shop 2 : predicted 900, sold 1000, relative_metric = 1
----

#### `MSPE` = Mean Square Percentage Error
#### `MAPE` = Mean Absolute Percentage Error
* `MAE` and `MSE` work with the absolute values
*  `MSPE` and `MAPE` work with the relative values (divided by the target values)

#### `MSPE` and `MAPE` can be thought as `weighted version of MAE and MSE`
* For `MAPE`, the weight of its sample is **inversely proportional to its target**
* For `MSPE`, the weight of its sample is **inversely proportional to a target square**

#### We see the curve became more flat as the target value increases
* It means that the cost we pay for a fixed absolute error, depends on the target value
* As the target increases, **we pay less**

![mspe-mape](../img/mspemape.png)

### MSPE : constant

#### Best constant = `weighted target mean`
* The best constant for the example below = about `6.6`
  * we see that it's **biased towards small targets**
  * since the absolute error for them is a weighted with the highest weight and thus inputs metrics the most


![mspe-constant](../img/mspe-constant.png)

### MAPE : constant

#### Best constant = `weighted target median`
* Optimal value here = `6`
  * even smaller than the best constant for `MSPE`
  * BUT do not try to explain it using outliers
    * If an outlier had an extremely small value, `MAPE` would be very biased towards it, **since the extremely small outlier will have the highest weight!**

![mape-constant](../img/mape-constant.png)

### (R)MSLE : Root Mean Square Logarithmic Error

* Target value can be `0` - which leads logarithmic not to be defined.
  * That is why a constant is usually added to the predictions and the targets before applying the logarithmic operation

$$RMSLE = \sqrt{\frac{1}{N}\sum^N_{i=1}((\log{y_{i}+1})-(\log{\hat{y_i}+1}))^2}\\=RMSE((\log{y_{i}+1}),(\log{\hat{y_i}+1}))\\\sqrt{MSE(\log{y_{i}+1}),(\log{\hat{y_i}+1})}$$


![rmsle](../img/rmsle.png)

* As it also carries about relative errors more than about absolute ones - its usecase is similar to `MSPE` and `MAPE`
* **NOTE** the symmetry of the error curves
  * **from the perspective of `RMSLE`, it is always better to predict more than the same amount less than target**
* Same as RMSE does not differ much from MSE, RMSLE can be calculated without root operation
  * But rooted version is more widely used!

#### It is important to know that
* the plot we see here on the slide is built for a version without root
* for a root version, an analoguous plot would be misleading.

### (R)MSLE : constant $\alpha$

$$RMSLE = \sqrt{\frac{1}{N}\sum^N_{i=1}((\log{y_{i}+1})-(\log{\alpha+1}))^2}\\=RMSE((\log{y_{i}+1}),(\log{\alpha+1}))\\\sqrt{MSE(\log{y_{i}+1}),(\log{\alpha+1})}$$

* Best constant **in log space** is a **mean target value**
* We need to **exponentiate** it to get an answer

![rmsle-constant](../img/rmsle-constant.png)



### Compare the constants

Metric|Constant
---|---
MSE|11
RMSLE|9.11
MAE|8
MSPE|6.6
MAPE|6

* `MSE` is quite biased towards the huge value from our dataset
* `MAE` is much less biased
* `MSPE` and `MAPE` are biased towards the small target
  * because they assign high weights to the object with small target.
* `RMSLE` is usually considered as better metric than `MAPE`
  * since it is less biased towards small target yet works with relative errors

### Conclusion

Discussed the metrics, sensitive to relative errors
* **(R)MSPE**
  - Weighted version of `MSE`

* **MAPE**
  - Weighted version of `MAE`

* **(R)MSLE**
  - `MSE` in logspace