# Metric Guides

## Just like in training, it is recommended to use several metrics to determine the prediction accuracy, especially in the case when classes are not balanced:

### Accuracy
- General overall accuracy

### Recall
- What fraction of overall positives were correct

### Precision
- Determine when the costs of false positive are high

### F1 Score
- Analysis of the trade-off between recall and precision

##  Precision-Recall Curve
### What It Is: The precision-recall curve shows the trade-off between precision (the ratio of true positives to the sum of true and false positives) and recall (the ratio of true positives to the sum of true positives and false negatives) at different thresholds.

### When to Use It: This curve is particularly useful when dealing with imbalanced datasets where positive classes are rare. It helps to understand the performance of the model at different levels of sensitivity to the positive class.

## ROC Curve (Receiver Operating Characteristic Curve)
### What It Is: The ROC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. It plots the true positive rate (TPR, or recall) against the false positive rate (FPR, 1 - specificity).

### When to Use It: This curve is used widely in binary classification to evaluate the trade-off between sensitivity (ability to detect positives) and specificity (ability to reject negatives) across different thresholds. It’s particularly useful when the classes are more balanced or when false positives have different costs compared to false negatives.

## ROC AUC Score (Area Under the ROC Curve Score)

### What It Is: This score measures the entire two-dimensional area underneath the entire ROC curve from (0,0) to (1,1). It provides an aggregate measure of performance across all possible classification thresholds.

### When to Use It: The AUC score is useful as a single scalar value to compare the performance of multiple classifiers. A higher AUC value indicates a better performing model. It's beneficial because it is independent of the classification threshold and gives a sense of how well the classifier can separate the classes.

### Precision-Recall vs ROC: Precision-recall curves should be used when the positive class is rare or when false positives and false negatives have very different consequences. ROC curves are better when the classes are more balanced.

### ROC AUC Score: Provides a simple way to compare different models that is not tied to a specific threshold, unlike precision-recall curves that can provide more detail at specific operating thresholds.

## Precision-Recall Curve: 
### This is typically the most informative metric for imbalanced datasets. Since these datasets often prioritize the minority class (which might be the class of greater interest, like fraud in banking transactions), focusing on how precisely the model can predict this class (precision) and how many of the actual positive cases it captures (recall) is crucial. The precision-recall curve allows you to examine the trade-off between these two metrics for different thresholds, giving a clear picture of model performance in scenarios where positive class predictions are more critical.

## ROC Curve and ROC AUC Score: 
### While the ROC curve and the AUC score provide a good measure of a model's ability to discriminate between the classes, they might be less informative on their own in the context of highly imbalanced data. This is because the false positive rate, which heavily influences the ROC curve, may not be as critical in contexts where the negative class (majority) vastly outnumbers the positive class. Thus, even a small number of false positives can lead to a misleadingly high false positive rate.

## Area Under the Precision-Recall Curve (AUC-PR)
### Definition: The AUC-PR summarizes the precision-recall curve, which plots the trade-off between precision and recall for different thresholds. Unlike the F1 score, the AUC-PR provides an aggregate measure of performance across all possible thresholds.
### Interpretation: A higher AUC-PR indicates that, across all thresholds, the model is able to maintain a balance between precision and recall. It does not depend on a specific threshold but evaluates the model's ability to discriminate between classes across all levels of sensitivity.

## F1 Score: 
### Best used when you want to find the best balance between precision and recall with a particular emphasis on both metrics being equally important. It's useful when you know or can determine the optimal threshold.
## AUC-PR: 
### More informative for evaluating models on imbalanced datasets where positive cases are rare and the cost of false negatives is high. It is especially useful when you are uncertain about the threshold or when the operational threshold might change