   ## If your model is scored with some metric, you get best results by optimizing exactly this metric

### Exploritary metric analysis

#### 1) Regression
     * MSE, RMSE, R-squared
       MSE, Best constant for base model - target mean value

<img src="files/Images/MSE.png" width="400" height="100">

       RMSE - skale for the error is the same as target values. We can optimize MSE instead of RMSE. 
       
<img src="files/Images/RMSE.png" width="400" height="100">

        R-squered used to measure relults in range. To optimize R-squared we can optimize MSE.
        
<img src="files/Images/rsquered.png" width="400" height="100">
        
     * MAE
     Not that sensetive to outliets as MSE. Used often in Finance. Best constant for base model - Median. More robust than MSE.
     
<img src="files/Images/MAE.png" width="400" height="100">

    Non-deferrable at zero


<img src="files/Images/MAE_grad.png" width="400" height="100">
       
       
   #### total on mse/mae
    Do you Have outliers in the data?
        Use MAE
    Are you sure they are outliers?
        Use MAE
    Or they are just unexpected values we should still care about?
        Use MSE
        
     * (R)MSPE, MAPE
     
     these metrics are weighted according to target values
     
<img src="files/Images/All_reg.png" width="400" height="100">
<img src="files/Images/MSPE.png" width="400" height="100">
<img src="files/Images/MAPE.png" width="400" height="100">

     * (R)MSLE
     
     Cares about relative errors(as MSPE and MAPE) more than about absolute ones
<img src="files/Images/MSLE.png" width="400" height="100">   

     From the perspective of RMSLE, it is always better to predict more than the same amount less than target.
<img src="files/Images/MSLE2.png" width="400" height="100">  
     
 
 
#### So:
    MSE is quite biased towards the huge value from our dataset, while MAE is much less biased. MSPE and MAPE are biased towards smaller targets because they assign higher weight to the object with small targets. And RMSLE is frequently considered as better metrics than MAPE, since it is less biased towards small targets, yet works with relative errors. 

<img src="files/Images/compare.png" width="400" height="100">  

#### 2) Classification
      * Accuracy, LogLoss, AUC
      
      Accuracy
<img src="files/Images/accuracy.png" width="400" height="100"> 

      LogLoss
<img src="files/Images/logloss.png" width="400" height="100">  

      Best constant for logloss - set aj to freaquency of i-th class 
<img src="files/Images/loglossacc.png" width="400" height="100">  

      AUC. 
      Best constant: All constants give same score
      Random predictions lead to AUC = 0.5
<img src="files/Images/rocauc.png" width="400" height="100">  

      Examples of building area under curve
<img src="files/Images/rocex.png" width="400" height="100">  
<img src="files/Images/rocex2.png" width="400" height="100">  
      
      
      
      * Cohen's(Quadratic weighted) Kappa
      
<img src="files/Images/kappa.png" width="400" height="100">  
<img src="files/Images/kappa1.png" width="400" height="100">  
<img src="files/Images/kappa2.png" width="400" height="100">  
<img src="files/Images/kappa3.png" width="400" height="100">  
<img src="files/Images/kappa4.png" width="400" height="100">  


   #### So:
        The accuracy is an essential metric for classification. But a simple model that predicts always the same value can possibly have a very high accuracy that makes it hard to interpret this metric. The score also depends on the threshold we choose to convert soft predictions to hard labels. Logloss is another metric, as opposed to accuracy it depends on soft predictions rather than on hard labels. And it forces the model to predict probabilities of an object to belong to each class. AUC, area under receiver operating curve, doesn't depend on the absolute values predicted by the classifier, but only considers the ordering of the object. It also implicitly tries all the thresholds to converge soft predictions to hard labels, and thus removes the dependence of the score on the threshold. Finally, Cohen's Kappa fixes the baseline for accuracy score to be zero. In spirit it is very similar to how R-squared beta scales MSE value to be easier explained. If instead of accuracy we used weighted accuracy, we would get weighted kappa. Weighted kappa with quadratic weights is called quadratic weighted kappa and commonly used on Kaggle.

### Metrics optimization

#### 1) General approach
    Loss and Metric
    Target metric is what we want to optimize
    Optimization metric is what model optimizes
    
    Approaches for target metric optimization
    
       * Just run the right model
           - Mse, Logloss
       * Preprocess train and optimize another metric
           - MSPE, MAPE, RMSLE
       * Optimize another metric, postprocess predictions
           - Accuracy, Kappa
       * Write custom loss function
           - Any
           
       Early stopping
          
<img src="files/Images/es.png" width="400" height="100">  

In [2]:
## How to write custom loss functions?
def logregobj(preds, dtrain):
    labels = dtrain.get_label()
    preds = 1.0/(1.0 + np.exp(-preds))
    grad = preds - labels
    hess = preds * (1.0 - preds)
    return grad, hess

#### 2) Regression metrics 
    * RMSE, MSE, R-squared
    
    Libraries what support MSE loss function
    
<img src="files/Images/mse_op.png" width="400" height="100">  

    Libraries what support MAE loss function. MAE dont have second derivitive
<img src="files/Images/mae_op.png" width="400" height="100">  

    Ways how to make MAE smooth
<img src="files/Images/mae_op2.png" width="400" height="100">  

    
    
    * MSPE and MAPE
    
    MSPE(MAPE) as weighted MSE(MAE)
<img src="files/Images/weights.png" width="400" height="100">   
    
    Approaches:
<img src="files/Images/mspe_op.png" width="400" height="100"> 

     
    * RMSLE
    
    Approaches:
<img src="files/Images/rmsle_op.png" width="400" height="100">   

#### 3) Classification metrics
    