# Classification Metrics & Handling Imbalanced Data

A practical guide to evaluation metrics and techniques for working with imbalanced datasets in machine learning.

---

# 1. Core Evaluation Metrics

## 1.1 Confusion Matrix

The foundation of most classification metrics.

|                      | Predicted Positive | Predicted Negative |
|----------------------|-------------------|-------------------|
| **Actual Positive**  | TP                | FN                |
| **Actual Negative**  | FP                | TN                |

- **TP** = True Positive  
- **TN** = True Negative  
- **FP** = False Positive (Type I error)  
- **FN** = False Negative (Type II error)  

---

## 1.2 Accuracy

```math
Accuracy = \frac{TP + TN}{TP + TN + FP + FN}
```

→ **Very misleading when classes are imbalanced.**

---

## 1.3 Precision (Positive Predictive Value)

```math
Precision = \frac{TP}{TP + FP}
```

→ Of all instances predicted as positive, how many were actually positive?  
→ Important when **false positives are expensive** (spam detection, fraud alerts).

---

## 1.4 Recall (Sensitivity, True Positive Rate)

```math
Recall = \frac{TP}{TP + FN}
```

→ Of all actual positive instances, how many did we correctly identify?  
→ Critical when **false negatives are expensive** (cancer detection, defect detection).

---

## 1.5 F1 Score

Harmonic mean of Precision and Recall:

```math
F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}
```

→ Best single metric when you want balance between Precision and Recall.

---

## 1.6 Fβ Score

Generalized form — gives more weight to recall or precision:

```math
F_\beta = (1 + \beta^2) \times 
\frac{Precision \times Recall}{\beta^2 \times Precision + Recall}
```

- β > 1 → favors recall (e.g., F2)  
- β < 1 → favors precision (e.g., F0.5)

---

## 1.7 Matthews Correlation Coefficient (MCC)

Very robust metric — especially good for imbalanced data:

```math
MCC = \frac{TP \times TN - FP \times FN}
{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}
```

Range: **−1 to +1**  
- +1 → perfect classifier  
- 0 → random  
- −1 → total disagreement  

---

## 1.8 Balanced Accuracy

```math
Balanced\ Accuracy = \frac{Recall + Specificity}{2}
```

Where:

```math
Specificity = \frac{TN}{TN + FP}
```

→ Much better than regular accuracy for imbalanced datasets.

---

## 1.9 ROC-AUC vs PR-AUC

| Metric  | When to Prefer | Sensitive to Imbalance? | Typical Use Case |
|----------|---------------|--------------------------|------------------|
| ROC-AUC  | Balanced to moderately imbalanced | Moderately | General model comparison |
| PR-AUC   | Highly imbalanced (<5–10%) | No | Rare event detection (fraud, defects) |

---

# 2. Handling Imbalanced Datasets

## 2.1 Resampling Techniques

| Method | Type | Pros | Cons | Library Example |
|--------|------|------|------|----------------|
| Random Under-sampling | Under | Fast, simple | Loss of information | `imblearn.under_sampling` |
| NearMiss | Under | Smarter selection | Can discard useful samples | `imblearn` |
| Random Over-sampling | Over | No information loss | Risk of overfitting | `imblearn.over_sampling` |
| SMOTE | Over (synthetic) | Creates realistic samples | Can create noisy samples | `imblearn.over_sampling.SMOTE` |
| ADASYN | Over (synthetic) | Focuses on difficult examples | More complex | `imblearn` |
| SMOTE + Tomek / ENN | Hybrid | Cleans boundary after synthesis | Computationally heavier | `imblearn.combine` |

---

## 2.2 Algorithm-Level Solutions

- **Class weights**  
  Most sklearn classifiers support:  
  ```python
  class_weight='balanced'
  ```
  or a manual dictionary.

- **Cost-sensitive learning**  
  Define different misclassification costs.

- **Ensemble methods designed for imbalance**
  - BalancedRandomForestClassifier
  - RUSBoost
  - EasyEnsemble
  - BalancedBagging

---

## 2.3 Extremely Rare Positive Class (<1–2%)

Consider treating the problem as **anomaly detection**:

- Isolation Forest  
- One-Class SVM  
- Autoencoders (reconstruction error)  
- Local Outlier Factor (LOF)  

---

# 3. Quick Reference – Which Metric to Use?

| Situation | Recommended Metric(s) | Why? |
|------------|----------------------|------|
| Roughly balanced classes | Accuracy + F1 + ROC-AUC | All are reasonable |
| Moderate imbalance (~5–30%) | F1 + Balanced Accuracy + MCC | Better reflect minority performance |
| Severe imbalance (<5%) | PR-AUC + MCC + F2 | ROC-AUC becomes overly optimistic |
| False negatives very costly | Recall, F2 score | Prioritize catching positives |
| False positives very costly | Precision, F0.5 score | Avoid false alarms |
| Need one interpretable number | MCC | Most balanced single-number metric |

---

# 4. Final Practical Tips

- Always stratify when splitting data:
  ```python
  train_test_split(X, y, stratify=y)
  ```

- Use appropriate scoring in cross-validation:
  ```python
  cross_val_score(model, X, y, scoring='f1')
  cross_val_score(model, X, y, scoring='roc_auc')
  cross_val_score(model, X, y, scoring='average_precision')
  cross_val_score(model, X, y, scoring='matthews_corrcoef')
  ```

- Plot **both ROC and Precision-Recall curves**
- Tune probability threshold after training (not always 0.5!)
- When in doubt → report multiple metrics (Precision, Recall, F1, MCC, PR-AUC)

---
