# Metrics

**Metrics** are used to evaluate the model we use to train the data and use it for prediction of unseen data.

* **Classification** metrics:
    - Confusion matrix: A confusion matrix is a summary of the predictions made by a classification model organized into a table by class. Each row of the table indicates the actual class and each column represents the predicted class. The confusion matrix provides more insight into not only the accuracy of a predictive model, but also which classes are being predicted correctly, which incorrectly, and what type of errors are being made. The simplest confusion matrix is for a two-class classification problem, with negative (class 0) and positive (class 1) classes.
        - True positives (TP): you predict an observation belongs to a class and it actually does belong to that class.
        - True negatives (TN): you predict an observation does not belong to a class and it actually does not belong to that class.
        - False positives (FP): you predict an observation belongs to a class when in reality it does not.
        - False negatives (FN): you predict an observation does not belong to a class when in fact it does.
        
        
    
    
* **Regression** metrics:
    - R2_score
    - mean_absolute_error
    - mean_squared_error
    - median_absolute_error

What we can extrcat from **Confusion matrix**:

- <span style='color:purple'> Accuracy: </span>
    - $\frac{TN+TP}{TN+TP+FN+FP}$
    
    - the percentage of correct predictions.
    
    
    
- <span style='color:purple'> Classification Error: </span>
    
    - $\frac{FN+FP}{TN+TP+FN+FP}$
    
    - 1 - Accuracy


- <span style='color:purple'> False positive rate (FPR) </span>
    - $\frac{FP}{TN+FP}$
    
    - What fraction of all negative instances does the classifier incorrectly identify as positive.

    
- <span style='color:purple'> Recall </span>
    - $\frac{TP}{TP+FN}$
    
    - Also known as **sensitivity** or **True Positive Rate**
    
    - The fraction of examples which were predicted to belong to a class with respect to all of the examples that truly belong in the class.
    - It gives us high TP and minimize FN. When we do not want to loose any positive result even with the cost of having more FP. High score of true positive but also avoided false negatives (like cancer tomour)
    
    
- <span style='color:purple'> Precision </span>
    - $\frac{TP}{TP+FP}$
    
    - The fraction of relevant examples (true positives) among all of the examples which were predicted to belong in a certain class.
    
    - Avoid false positives
    
Precision and recall are useful in cases where classes aren't evenly distributed.




    
    
- <span style='color:purple'> Precision-Recall tradeoff </span>
    - $F_1 = 2 \frac{Precision*Recall}{Precision+Recall} = \frac{2 TP}{2 TP+FN+FP}$
    
    - In general, there is a tradeoff between recall and precision (increasing one will decrease another). So we should see the problem and see which one is more important.
        * Examples of recall-oriented tasks: - info. extraction in legal discovery - Tumor detection - Often should be paird with human expert to filter out FP.
        * Examples of precision-oriented tasks: - Search engine ranking, query suggestion - Documentation classification - Customer-facing tasks.
    
    
- <span style='color:purple'> F-score </span>
    
    - $F_\beta = (1+\beta^2) \frac{Precision*Recall}{\beta^2 Precision+Recall} = \frac{(1+\beta^2) TP}{(1+\beta^2) TP+\beta FN+FP}$
    
    - Generalizes $F_1$ score

The $\beta$ parameter allows us to control the tradeoff of importance between precision and recall. $\beta<1$ focuses more on precision while $\beta>1$ focuses more on recall.




To access to some of the scores:
### <span style='color:green'>scikit-learn:</span>

classification_report

### Classifier decision function

- Many classifier has the option to tell us about the uncertainity in the prediction by methods,
    * **predict_proba**
    * **decision_function**
    
### <span style='color:green'>scikit-learn:</span>

- predict_proba:
    * returns the class probabilities for each data point, a score between (0,1).
- decision_function:
    * returns the distance between the hyperplane and the test instance
    * Predict confidence scores for samples, a score between (-1,1).    

Moreover, we can sweep the decision threshold (based on the problem) and tune our metric. (for example we may want not to loose any positive class, so we put the threshold into a low value, even with the risk of decreasing the precision.) Then, we would have the classification outcome as a curve.

## Precision-Recall curves

- X-axis: Precision
- Y-axis: Recall
- The goal is to maximize precision while maximizing recall.
    * Top right corner where both P, and R are 1 is the ideal point.


### <span style='color:green'>scikit-learn:</span>
- sklearn.metrics.precision_recall_curve




## ROC (Receiving operating characteristic) curves

- X-axis: False positive rate
- Y-axis: True positive rate
- We want a curve that maximize the TPR and minimize the FPR.
    * Top left is the ideal point where the FP rate is zero and TP rate is one.
- In a random classifier's ROC curve is a line on the diagonal. So a bad classifier's curve is under that line and a good one is far above that line.
- This implementation is restricted to the binary classification task.

### <span style='color:green'>scikit-learn:</span>
* sklearn.metrics.roc_curve

## AUC (Area under the curve)
It is area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores. For a random classifier it is 0.5 and as the classifier becomes better, it approaches 1.

### <span style='color:green'>scikit-learn:</span>
- sklearn.metrics.roc_auc_score