# Lesson 19 - Evaluation: ROC, PR, Thresholding


## Objectives
- Compute confusion matrix, ROC, and PR curves.
- Explore how thresholds impact precision/recall.
- Visualize tradeoffs between metrics.


## From the notes

**Evaluation**
- Confusion matrix: TP, FP, TN, FN.
- ROC plots TPR vs FPR; PR plots precision vs recall.

_TODO: Validate evaluation definitions in the CS229 main notes PDF._


## Intuition
Evaluation metrics depend on the operating threshold. ROC and PR curves summarize tradeoffs across thresholds.


## Data
We use synthetic probabilistic scores to illustrate metrics.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

y_true = np.hstack([np.ones(80), np.zeros(120)])
scores = np.hstack([
    np.random.beta(5, 2, 80),
    np.random.beta(2, 5, 120)
])

def confusion_at_thresh(y_true, scores, thresh):
    preds = scores >= thresh
    tp = np.sum((preds == 1) & (y_true == 1))
    fp = np.sum((preds == 1) & (y_true == 0))
    tn = np.sum((preds == 0) & (y_true == 0))
    fn = np.sum((preds == 0) & (y_true == 1))
    return tp, fp, tn, fn

def roc_pr(y_true, scores, thresholds):
    tprs, fprs, precs, recalls = [], [], [], []
    for t in thresholds:
        tp, fp, tn, fn = confusion_at_thresh(y_true, scores, t)
        tpr = tp / (tp + fn)
        fpr = fp / (fp + tn)
        prec = tp / (tp + fp) if tp + fp > 0 else 0
        rec = tpr
        tprs.append(tpr)
        fprs.append(fpr)
        precs.append(prec)
        recalls.append(rec)
    return np.array(tprs), np.array(fprs), np.array(precs), np.array(recalls)

thresholds = np.linspace(0, 1, 50)
tprs, fprs, precs, recalls = roc_pr(y_true, scores, thresholds)


## Experiments


In [None]:
# Example threshold
confusion_at_thresh(y_true, scores, 0.5)


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.plot(fprs, tprs)
plt.plot([0,1],[0,1], linestyle="--", color="gray")
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.show()

plt.figure(figsize=(6,4))
plt.plot(recalls, precs)
plt.title("Precision-Recall curve")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.show()


## Takeaways
- ROC curves emphasize ranking performance; PR curves emphasize positive class precision.
- Thresholds trade off precision and recall.


## Explain it in an interview
- Explain when PR is more informative than ROC.
- Describe how to pick a threshold for deployment.


## Exercises
- Compute AUROC and average precision from the curves.
- Simulate a more imbalanced dataset and compare curves.
- Plot F1 vs threshold and pick the best point.
