# ROC and Precision-Recall Curves

In this notebook, you'll learn:
- How to plot and interpret ROC (Receiver Operating Characteristic) curves
- How to calculate ROC-AUC (Area Under the Curve)
- How to plot and interpret Precision-Recall curves
- When to use ROC-AUC vs PR-AUC
- How to find the optimal threshold from these curves

## Why Curves?

So far we've looked at metrics for a single threshold. But curves show us:
- Model performance across **all possible thresholds**
- The trade-offs between different metrics
- A single summary metric (AUC) independent of threshold choice

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import (
    roc_curve, roc_auc_score, auc,
    precision_recall_curve, average_precision_score,
    RocCurveDisplay, PrecisionRecallDisplay
)

# Set random seed
np.random.seed(42)

# Display settings
pd.set_option('display.max_columns', None)
plt.style.use('default')
sns.set_palette('colorblind')

## Load Data

In [None]:
# Load the classification data
df = pd.read_csv('../../fixtures/input/classification_data.csv')

# Extract labels and probabilities
y_true = df['true_label'].values
y_pred = df['predicted_label'].values
y_prob = df['predicted_probability'].values  # Probability of class 1

print(f"Dataset shape: {df.shape}")
print(f"Class distribution: {np.bincount(y_true)}")
print(f"Probability range: [{y_prob.min():.3f}, {y_prob.max():.3f}]")

## ROC Curve (Receiver Operating Characteristic)

The **ROC curve** plots:
- **X-axis**: False Positive Rate (FPR) = FP / (FP + TN)
- **Y-axis**: True Positive Rate (TPR) = TP / (TP + FN) = **Recall**

Each point on the curve represents a different threshold.

### Interpretation:
- **Top-left corner** (0, 1): Perfect classifier
- **Diagonal line**: Random classifier (no better than guessing)
- **Below diagonal**: Worse than random (predictions are inverted)
- **Higher curve**: Better model

In [None]:
# Calculate ROC curve
fpr, tpr, roc_thresholds = roc_curve(y_true, y_prob)

# Calculate ROC-AUC
roc_auc = roc_auc_score(y_true, y_prob)

print(f"ROC-AUC Score: {roc_auc:.4f}")
print(f"Number of thresholds evaluated: {len(roc_thresholds)}")

In [None]:
# Plot ROC curve
plt.figure(figsize=(10, 8))

# Plot ROC curve
plt.plot(fpr, tpr, 'b-', linewidth=2, label=f'ROC curve (AUC = {roc_auc:.3f})')

# Plot diagonal (random classifier)
plt.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random classifier (AUC = 0.5)')

# Mark some interesting points
# Find threshold closest to 0.5
idx_05 = np.argmin(np.abs(roc_thresholds - 0.5))
plt.plot(fpr[idx_05], tpr[idx_05], 'ro', markersize=10, label=f'Threshold = 0.5')

# Find optimal threshold (Youden's J statistic)
j_scores = tpr - fpr
optimal_idx = np.argmax(j_scores)
optimal_threshold = roc_thresholds[optimal_idx]
plt.plot(fpr[optimal_idx], tpr[optimal_idx], 'go', markersize=10, 
         label=f'Optimal (T = {optimal_threshold:.3f})')

plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate (FPR)', fontsize=12)
plt.ylabel('True Positive Rate (TPR / Recall)', fontsize=12)
plt.title('ROC Curve', fontsize=14, pad=20)
plt.legend(loc='lower right', fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nOptimal threshold (Youden's J): {optimal_threshold:.4f}")
print(f"  TPR (Recall): {tpr[optimal_idx]:.4f}")
print(f"  FPR: {fpr[optimal_idx]:.4f}")

## Understanding ROC-AUC

**ROC-AUC** (Area Under the ROC Curve) summarizes the entire curve into a single number:
- **1.0**: Perfect classifier
- **0.5**: Random classifier
- **< 0.5**: Worse than random

**Interpretation**: Probability that the model ranks a random positive example higher than a random negative example.

In [None]:
# Demonstrate ROC-AUC interpretation
# Take 10000 random pairs of (positive, negative) samples
np.random.seed(42)
n_pairs = 10000

pos_indices = np.where(y_true == 1)[0]
neg_indices = np.where(y_true == 0)[0]

correct_ranking = 0
for _ in range(n_pairs):
    pos_idx = np.random.choice(pos_indices)
    neg_idx = np.random.choice(neg_indices)
    
    if y_prob[pos_idx] > y_prob[neg_idx]:
        correct_ranking += 1

empirical_auc = correct_ranking / n_pairs

print(f"ROC-AUC from sklearn: {roc_auc:.4f}")
print(f"Empirical AUC (from random pairs): {empirical_auc:.4f}")
print(f"\nInterpretation: In {empirical_auc*100:.1f}% of random (positive, negative) pairs,")
print(f"the model assigns a higher probability to the positive sample.")

## Precision-Recall Curve

The **Precision-Recall (PR) curve** plots:
- **X-axis**: Recall = TP / (TP + FN)
- **Y-axis**: Precision = TP / (TP + FP)

### When to use PR curves:
- **Imbalanced datasets**: PR curves are more informative than ROC curves
- **When you care about positive class**: PR focuses on positive predictions

### Interpretation:
- **Top-right corner** (1, 1): Perfect classifier
- **Horizontal line at y = (positive ratio)**: Random classifier
- **Higher curve**: Better model

In [None]:
# Calculate Precision-Recall curve
precision, recall, pr_thresholds = precision_recall_curve(y_true, y_prob)

# Calculate PR-AUC (Average Precision)
pr_auc = average_precision_score(y_true, y_prob)

# Calculate baseline (random classifier)
baseline_precision = (y_true == 1).sum() / len(y_true)

print(f"PR-AUC (Average Precision): {pr_auc:.4f}")
print(f"Baseline (random classifier): {baseline_precision:.4f}")
print(f"Number of thresholds evaluated: {len(pr_thresholds)}")

In [None]:
# Plot Precision-Recall curve
plt.figure(figsize=(10, 8))

# Plot PR curve
plt.plot(recall, precision, 'b-', linewidth=2, label=f'PR curve (AP = {pr_auc:.3f})')

# Plot baseline (random classifier)
plt.plot([0, 1], [baseline_precision, baseline_precision], 'k--', linewidth=1, 
         label=f'Random classifier (AP = {baseline_precision:.3f})')

# Mark threshold = 0.5
idx_05 = np.argmin(np.abs(pr_thresholds - 0.5))
plt.plot(recall[idx_05], precision[idx_05], 'ro', markersize=10, label=f'Threshold = 0.5')

# Find optimal threshold (maximize F1)
f1_scores = 2 * (precision[:-1] * recall[:-1]) / (precision[:-1] + recall[:-1] + 1e-10)
optimal_f1_idx = np.argmax(f1_scores)
optimal_f1_threshold = pr_thresholds[optimal_f1_idx]
plt.plot(recall[optimal_f1_idx], precision[optimal_f1_idx], 'go', markersize=10,
         label=f'Max F1 (T = {optimal_f1_threshold:.3f})')

plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall', fontsize=12)
plt.ylabel('Precision', fontsize=12)
plt.title('Precision-Recall Curve', fontsize=14, pad=20)
plt.legend(loc='best', fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nThreshold that maximizes F1: {optimal_f1_threshold:.4f}")
print(f"  Precision: {precision[optimal_f1_idx]:.4f}")
print(f"  Recall: {recall[optimal_f1_idx]:.4f}")
print(f"  F1-Score: {f1_scores[optimal_f1_idx]:.4f}")

## ROC vs PR Curves: Side by Side

Let's compare both curves to understand when each is most useful:

In [None]:
# Plot both curves side by side
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# ROC Curve
axes[0].plot(fpr, tpr, 'b-', linewidth=2, label=f'ROC curve (AUC = {roc_auc:.3f})')
axes[0].plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random (AUC = 0.5)')
axes[0].plot(fpr[optimal_idx], tpr[optimal_idx], 'go', markersize=10, label='Optimal point')
axes[0].set_xlim([0.0, 1.0])
axes[0].set_ylim([0.0, 1.05])
axes[0].set_xlabel('False Positive Rate', fontsize=12)
axes[0].set_ylabel('True Positive Rate', fontsize=12)
axes[0].set_title('ROC Curve', fontsize=14)
axes[0].legend(loc='lower right')
axes[0].grid(True, alpha=0.3)

# PR Curve
axes[1].plot(recall, precision, 'b-', linewidth=2, label=f'PR curve (AP = {pr_auc:.3f})')
axes[1].plot([0, 1], [baseline_precision, baseline_precision], 'k--', linewidth=1, 
             label=f'Random (AP = {baseline_precision:.3f})')
axes[1].plot(recall[optimal_f1_idx], precision[optimal_f1_idx], 'go', markersize=10, label='Max F1')
axes[1].set_xlim([0.0, 1.0])
axes[1].set_ylim([0.0, 1.05])
axes[1].set_xlabel('Recall', fontsize=12)
axes[1].set_ylabel('Precision', fontsize=12)
axes[1].set_title('Precision-Recall Curve', fontsize=14)
axes[1].legend(loc='best')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Why PR Curves Matter for Imbalanced Data

Let's see what happens with a very imbalanced dataset:

In [None]:
# Create a poor classifier that achieves high ROC-AUC but poor PR-AUC on imbalanced data
# Simulate a classifier that predicts mostly negatives
poor_prob = y_prob.copy()
poor_prob = poor_prob * 0.3  # Reduce all probabilities

# Calculate metrics for poor classifier
poor_roc_auc = roc_auc_score(y_true, poor_prob)
poor_pr_auc = average_precision_score(y_true, poor_prob)

print("Original Model:")
print(f"  ROC-AUC: {roc_auc:.4f}")
print(f"  PR-AUC:  {pr_auc:.4f}")
print()
print("Poor Model (reduced probabilities):")
print(f"  ROC-AUC: {poor_roc_auc:.4f}")
print(f"  PR-AUC:  {poor_pr_auc:.4f}")
print()
print("Key Insight:")
print(f"ROC-AUC is still high ({poor_roc_auc:.3f}) because the model still ranks positives")
print(f"higher than negatives, but PR-AUC ({poor_pr_auc:.3f}) reveals the model")
print("is much less useful for finding actual positives!")

## Comparing Different Models

Let's create a few hypothetical models and compare them:

In [None]:
# Create different "models" (just transformations of probabilities)
models = {
    'Original': y_prob,
    'Conservative': y_prob * 0.7,  # Lower probabilities
    'Aggressive': np.minimum(y_prob * 1.3, 1.0),  # Higher probabilities
    'Random': np.random.rand(len(y_true))  # Random predictions
}

# Calculate metrics for each model
results = []
for name, probs in models.items():
    roc_auc = roc_auc_score(y_true, probs)
    pr_auc = average_precision_score(y_true, probs)
    results.append({'Model': name, 'ROC-AUC': roc_auc, 'PR-AUC': pr_auc})

results_df = pd.DataFrame(results)
print("Model Comparison:")
print(results_df.to_string(index=False))

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# ROC Curves
for name, probs in models.items():
    fpr_temp, tpr_temp, _ = roc_curve(y_true, probs)
    roc_auc_temp = roc_auc_score(y_true, probs)
    axes[0].plot(fpr_temp, tpr_temp, linewidth=2, label=f'{name} (AUC={roc_auc_temp:.3f})')

axes[0].plot([0, 1], [0, 1], 'k--', linewidth=1, alpha=0.3)
axes[0].set_xlabel('False Positive Rate', fontsize=12)
axes[0].set_ylabel('True Positive Rate', fontsize=12)
axes[0].set_title('ROC Curves Comparison', fontsize=14)
axes[0].legend(loc='lower right')
axes[0].grid(True, alpha=0.3)

# PR Curves
for name, probs in models.items():
    prec_temp, rec_temp, _ = precision_recall_curve(y_true, probs)
    pr_auc_temp = average_precision_score(y_true, probs)
    axes[1].plot(rec_temp, prec_temp, linewidth=2, label=f'{name} (AP={pr_auc_temp:.3f})')

axes[1].plot([0, 1], [baseline_precision, baseline_precision], 'k--', linewidth=1, alpha=0.3)
axes[1].set_xlabel('Recall', fontsize=12)
axes[1].set_ylabel('Precision', fontsize=12)
axes[1].set_title('PR Curves Comparison', fontsize=14)
axes[1].legend(loc='best')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Exercise 1: Find Optimal Threshold for Your Use Case

Find the threshold that achieves at least 95% recall while maximizing precision:

In [None]:
# YOUR CODE HERE
# 1. Find all thresholds where recall >= 0.95
# 2. Among those, find the one with highest precision
# 3. Report the threshold and resulting metrics



## Exercise 2: Calculate AUC Manually

Calculate the ROC-AUC using the trapezoidal rule:

In [None]:
# YOUR CODE HERE
# Use np.trapz or manually implement trapezoidal rule
# Compare with sklearn's roc_auc_score



## Exercise 3: Understand Model Ranking

Verify the ROC-AUC interpretation: check what percentage of (positive, negative) pairs are correctly ranked:

In [None]:
# YOUR CODE HERE
# 1. Sample many pairs of (positive, negative) examples
# 2. Count how often the positive has higher probability
# 3. Compare with ROC-AUC score



## Summary

In this notebook, you learned:

1. **ROC Curve**: Plots TPR vs FPR across all thresholds
2. **ROC-AUC**: Probability of ranking a positive higher than a negative
3. **PR Curve**: Plots Precision vs Recall across all thresholds
4. **PR-AUC (AP)**: More informative for imbalanced datasets
5. **Threshold Selection**: Choose based on business requirements

### Key Takeaways:

- **ROC curves** are good for balanced datasets and when both classes matter equally
- **PR curves** are better for imbalanced datasets and when you care about the positive class
- ROC-AUC can be misleadingly high on imbalanced data
- Both curves help you choose the optimal threshold for your use case
- AUC provides a threshold-independent measure of model quality

### Decision Guide:

| Scenario | Use ROC | Use PR |
|----------|---------|--------|
| Balanced classes | ✓ | ✓ |
| Imbalanced classes | | ✓ |
| Both classes equally important | ✓ | |
| Care about positive class | | ✓ |
| False positives and negatives equal cost | ✓ | |

### Next Steps:

In the next notebook, we'll explore:
- Multi-class classification metrics
- Macro, micro, and weighted averaging
- Per-class metrics and confusion matrices