# General Approaches for Metrics Optimization

### Overview

* Loss vs. metric
* Approaches to metrics optimization in general

## Loss vs. metric

* **Target metric** is what we want to optimize
* **Optimization loss** is what *model optimizes*
  * for example, logarithmic loss is widely used as an optization loss
  * while the accuracy score is how the solution is eventually evaluated


### Sometimes we want to optimize metrics that ...
* are really hard or even impossbile to optimize directly
* In this case, we usually set the model to optimize a loss that is different to a target metric
  * BUT after a model is trained, we use **hacks and heuristics** to negate the discrepancy 
  * and adjust the model to better fit the target metric
* It is completely okay to say target loss as optimization metric
  * but we will fix the wording for the clarity now
  

## Approaches for target metric optimization

The approaches can be broadly divided into several categories, depending on the metric we need to optimize. Some metrcis can be optimized directly.

* **Just run the right model!**
  - MSE, Logloss
* **Preprocess train and optimize another metric**
  - MSPE MAPE, RMSLE ...
  - for example, while `MSPE` cannot be optimized directly with XGBoost, we will see later that we can `resample` that train set and optimize `MSE loss` instead, which XGBoost can optimize.
* **Optimize another metric, postprocess predictions**
  - Accuracy, Kappa
* **Write custgom loss function**
  - Any, if you can
  - for example, quadratic-weighted Kappa

## Custom loss for XGBoost

* **Define an 'objective'**:
  - function that computes *first and second order derivatives w.r.t. predictions*
  
```python
def logregobj(preds, dtrain):
    labels = dtrain.get_label()
    preds = 1.0 / (1.0 + np.exp(-preds))
    grad = preds - labels
    hess = preds * (1.0 - preds)
    return grad, hess 
```

### We only need to implement a single function that 
* take `predictions` and the `target values`
* compute first and second-order derivatives of the loss function with respect to the model's predictions
  * For example, here you see one for the Logloss
  * Of course, the loss function should be smooth enough and have well-behaved derivatives - 
    * otherwise XGBoost will drive crazy!
    


### In this course, we will consider only a small set of metrics
* BUT there are plenty of them in fact!
* Some of them, it's really hard to come up with a neat optimization procedure or write a custom loss function
* Thankfully there is a method that always works:
  * **`EARLY STOPPING`**

## Early Stopping

![early-stoppping](../img/early-stopping.png)

* You set a model to optimize any loss function it can optimize 
* And you monitor the desired metric on a validation set
* And you stop the training when the model starts to fit 
  * **`according to the desired metric and not according to the metric the model is truly optimizing`**

#### Some metrics cannot be even easily evaluated
* for example, if the metric is based on a human assessor's opinions, you cannot evaluate it on every iteration
  * for such metric, we cannot use early stopping
  * but we will never find such metrics in competition!
  


## Conclusion

* **Loss vs. metric**
* **Approaches in general**
  - Just run the right model
  - Preprocess train and optimize another metric
  - Optimize another metric, postprocess predictions
  - Write a custom loss function
  - Optimize another metric, use early stopping