### Classification Metrics: Detailed Explanation

#### 1. Accuracy
- **Definition**: Accuracy is the ratio of correctly predicted instances to the total instances.
  
  $$
  \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
  $$
  
- **When to Use**: Use accuracy when class distribution is balanced and all classes are equally important.
- **Advantages**: Simple and easy to interpret.
- **Disadvantages**: Misleading for imbalanced datasets as it can give a high value for models that simply predict the majority class.

#### 2. Precision
- **Definition**: Precision is the ratio of correctly predicted positive observations to the total predicted positives.
  
  $$
  \text{Precision} = \frac{TP}{TP + FP}
  $$
  
- **When to Use**: Use precision when the cost of false positives is high.
- **Advantages**: Highlights how many of the predicted positive cases are actually positive.
- **Disadvantages**: Does not consider false negatives.

#### 3. Recall (Sensitivity)
- **Definition**: Recall is the ratio of correctly predicted positive observations to all the observations in the actual class.
  
  $$
  \text{Recall} = \frac{TP}{TP + FN}
  $$
  
- **When to Use**: Use recall when the cost of false negatives is high.
- **Advantages**: Measures how well the model captures the actual positives.
- **Disadvantages**: Does not consider false positives.

#### 4. F1 Score
- **Definition**: F1 Score is the harmonic mean of precision and recall.
  
  $$
  F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
  $$
  
- **When to Use**: Use F1 Score when you need a balance between precision and recall.
- **Advantages**: Provides a single metric balancing both false positives and false negatives.
- **Disadvantages**: Can be misleading if the distribution of class is skewed.

#### 5. F2 Score
- **Definition**: F2 Score is a weighted version of the F1 Score that gives more importance to recall.
  
  $$
  F2 = (1 + 2^2) \cdot \frac{\text{Precision} \cdot \text{Recall}}{4 \cdot \text{Precision} + \text{Recall}}
  $$
  
- **When to Use**: Use F2 Score when recall is more important than precision.
- **Advantages**: Balances precision and recall with greater emphasis on recall.
- **Disadvantages**: Less intuitive and less commonly used than F1 Score.

#### 6. Specificity (True Negative Rate)
- **Definition**: Specificity is the ratio of correctly predicted negative observations to all the observations in the actual negative class.
  
  $$
  \text{Specificity} = \frac{TN}{TN + FP}
  $$
  
- **When to Use**: Use specificity when the cost of false positives is high.
- **Advantages**: Measures the proportion of actual negatives correctly identified.
- **Disadvantages**: Does not consider false negatives.

#### 7. Balanced Accuracy
- **Definition**: Balanced Accuracy is the average of recall obtained on each class.
  
  $$
  \text{Balanced Accuracy} = \frac{1}{2} \left( \frac{TP}{TP + FN} + \frac{TN}{TN + FP} \right)
  $$
  
- **When to Use**: Use when dealing with imbalanced datasets.
- **Advantages**: Accounts for imbalanced datasets better than simple accuracy.
- **Disadvantages**: Can be less intuitive than accuracy.

#### 8. ROC AUC Score
- **Definition**: ROC AUC Score measures the area under the receiver operating characteristic curve, which plots the true positive rate against the false positive rate at various threshold settings.
  
  $$
  \text{ROC AUC} = \int_{0}^{1} TPR(FPR) \, dFPR
  $$
  
- **When to Use**: Use to evaluate the performance of a binary classifier.
- **Advantages**: Provides a single metric for model performance across all classification thresholds.
- **Disadvantages**: Can be misleading in highly imbalanced datasets.

#### 9. Precision-Recall AUC
- **Definition**: Precision-Recall AUC measures the area under the precision-recall curve.
  
  $$
  \text{PR AUC} = \int_{0}^{1} \text{Precision}(\text{Recall}) \, d\text{Recall}
  $$
  
- **When to Use**: Use when you have imbalanced datasets and care more about the minority class.
- **Advantages**: Focuses on the performance of the positive class.
- **Disadvantages**: Can be less intuitive than ROC AUC.

#### 10. Logarithmic Loss (Log Loss)
- **Definition**: Log Loss measures the performance of a classification model where the output is a probability value between 0 and 1.
  
  $$
  \text{Log Loss} = -\frac{1}{n} \sum_{i=1}^{n} \left( y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right)
  $$
  
- **When to Use**: Use when you need to evaluate the probabilities predicted by a model.
- **Advantages**: Accounts for the uncertainty of the predictions.
- **Disadvantages**: Sensitive to extreme probabilities.

#### 11. Matthews Correlation Coefficient (MCC)
- **Definition**: MCC is a measure of the quality of binary classifications, accounting for all four quadrants of the confusion matrix.
  
  $$
  MCC = \frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}}
  $$
  
- **When to Use**: Use when you need a balanced measure for binary classification.
- **Advantages**: Provides a more balanced measure that accounts for all four confusion matrix quadrants.
- **Disadvantages**: Can be more complex to interpret.

#### 12. Cohen’s Kappa
- **Definition**: Cohen’s Kappa measures the agreement between two raters or classifiers.
  
  $$
  \kappa = \frac{p_o - p_e}{1 - p_e}
  $$
  
  where $ p_o $ is the observed agreement and $ p_e $ is the expected agreement.
  
- **When to Use**: Use to evaluate the agreement between two raters/classifiers.
- **Advantages**: Accounts for the possibility of the agreement occurring by chance.
- **Disadvantages**: Can be difficult to interpret.

#### 13. Jaccard Index
- **Definition**: Jaccard Index measures the similarity between the actual and predicted sets.
  
  $$
  \text{Jaccard Index} = \frac{TP}{TP + FP + FN}
  $$
  
- **When to Use**: Use for binary and multiclass classification to evaluate similarity.
- **Advantages**: Provides a straightforward measure of similarity.
- **Disadvantages**: Less informative than some other metrics.

#### 14. Brier Score
- **Definition**: Brier Score measures the mean squared difference between predicted probabilities and the actual outcome.
  
  $$
  \text{Brier Score} = \frac{1}{n} \sum_{i=1}^{n} (p_i - y_i)^2
  $$
  
- **When to Use**: Use to evaluate the accuracy of probabilistic predictions.
- **Advantages**: Measures the accuracy of probability predictions.
- **Disadvantages**: Less intuitive than classification metrics like accuracy.

#### 15. Hamming Loss
- **Definition**: Hamming Loss is the fraction of incorrect predictions (misclassifications).
  
  $$
  \text{Hamming Loss} = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}(\hat{y}_i \neq y_i)
  $$
  
- **When to Use**: Use for multi-label classification problems.
- **Advantages**: Simple and easy to understand.
- **Disadvantages**: May not capture the severity of misclassifications.

#### 16. Fowlkes-Mallows Index (FMI)
- **Definition**: FMI measures the geometric mean of precision and recall.
  
  $$
  FMI = \sqrt{\frac{TP}{TP + FP} \cdot \frac{TP}{TP + FN}}
  $$
  
- **When to Use**: Use for clustering and binary classification.
- **Advantages**: Combines precision and recall in a single measure.
- **Disadvantages**: Less commonly used than F1 Score.

#### 17. G-Mean (Geometric Mean)
- **Definition**: G-Mean is the geometric mean of sensitivity and specificity.
  
  $$
  G\text{-Mean} = \sqrt{\text{Sensitivity} \times \text{Specificity}}
  $$
  
- **When to Use**: Use for imbalanced datasets to balance sensitivity and specificity.
- **Advantages**: Balances performance on both classes.
- **Disadvantages**: Can be less intuitive to interpret.

Sure, let's continue:

#### 18. Balanced Error Rate (BER) (Continued)
- **Definition**: BER calculates the average of the error rates for each class. It is useful for evaluating the performance of classifiers on imbalanced datasets.
  
  $$
  \text{BER} = \frac{1}{2} \left( \frac{FP}{FP + TN} + \frac{FN}{FN + TP} \right)
  $$

- **When to Use**: BER is particularly useful when evaluating classifiers on imbalanced datasets where one class dominates the other(s).
- **Advantages**: Provides a balanced measure of error across different classes.
- **Disadvantages**: May not capture specific performance characteristics of individual classes.

These metrics provide a comprehensive set of tools for evaluating the performance of classification models under different scenarios and requirements. Depending on the specific goals of the classification task and the characteristics of the dataset, different metrics may be more appropriate for assessing model performance.