# Area Under the Curve

What is AUC?
AUC stands for Area Under the ROC Curve (Receiver Operating Characteristic Curve). It's a performance metric for binary classification models that measures the model's ability to distinguish between positive and negative classes.

ROC Curve Fundamentals
Core Components:
X-axis: False Positive Rate (FPR) = FP / (FP + TN)

Y-axis: True Positive Rate (TPR) = TP / (TP + FN)

Diagonal line: Random classifier (AUC = 0.5)

Perfect classifier: Top-left corner (AUC = 1.0)

### Confusion Matrix Terms:
```bash
TP = True Positives  (correctly predicted positive)
TN = True Negatives  (correctly predicted negative)
FP = False Positives (incorrectly predicted positive)
FN = False Negatives (incorrectly predicted negative)
```

### AUC Interpretation
AUC Value	Interpretation
0.90-1.00	Excellent
0.80-0.90	Good
0.70-0.80	Fair
0.60-0.70	Poor
0.50-0.60	Fail (worse than random)

### Mathematical Foundation
Probability Perspective:
AUC = P(score_positive > score_negative)

Where score_positive is the predicted probability for a random positive instance

Where score_negative is the predicted probability for a random negative instance

### Calculation Methods:

Trapezoidal Rule (Most Common):

```bash
def calculate_auc(fpr, tpr):
    # Sort by fpr
    sorted_indices = np.argsort(fpr)
    fpr_sorted = fpr[sorted_indices]
    tpr_sorted = tpr[sorted_indices]
    
    # Calculate area using trapezoidal rule
    auc = np.trapz(tpr_sorted, fpr_sorted)
    return auc

    ```

In [1]:
import matplotlib.pyplot as plt
from sklearn.metrics import RocCurveDisplay
import seaborn as sns

def plot_roc_auc_analysis(y_true, y_pred_proba, model_name="Model"):
    """
    Comprehensive ROC/AUC visualization
    """
    fpr, tpr, thresholds = roc_curve(y_true, y_pred_proba)
    roc_auc = auc(fpr, tpr)
    
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    
    # 1. ROC Curve
    axes[0].plot(fpr, tpr, color='darkorange', lw=2, 
                 label=f'ROC curve (AUC = {roc_auc:.3f})')
    axes[0].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
    axes[0].set_xlim([0.0, 1.0])
    axes[0].set_ylim([0.0, 1.05])
    axes[0].set_xlabel('False Positive Rate')
    axes[0].set_ylabel('True Positive Rate')
    axes[0].set_title(f'ROC Curve - {model_name}')
    axes[0].legend(loc="lower right")
    axes[0].grid(True, alpha=0.3)
    
    # 2. Threshold Analysis
    youden_idx = np.argmax(tpr - fpr)
    optimal_threshold = thresholds[youden_idx]
    
    axes[1].plot(thresholds, tpr, label='True Positive Rate', lw=2)
    axes[1].plot(thresholds, fpr, label='False Positive Rate', lw=2)
    axes[1].axvline(x=optimal_threshold, color='red', linestyle='--', 
                    label=f'Optimal Threshold: {optimal_threshold:.3f}')
    axes[1].set_xlabel('Threshold')
    axes[1].set_ylabel('Rate')
    axes[1].set_title('Threshold Analysis')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    # 3. Probability Distribution
    pos_probs = y_pred_proba[y_true == 1]
    neg_probs = y_pred_proba[y_true == 0]
    
    axes[2].hist(pos_probs, bins=30, alpha=0.7, label='Positive Class', 
                 color='green', density=True)
    axes[2].hist(neg_probs, bins=30, alpha=0.7, label='Negative Class', 
                 color='red', density=True)
    axes[2].axvline(x=optimal_threshold, color='black', linestyle='--', 
                    label=f'Optimal Threshold')
    axes[2].set_xlabel('Predicted Probability')
    axes[2].set_ylabel('Density')
    axes[2].set_title('Probability Distribution')
    axes[2].legend()
    axes[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return roc_auc, optimal_threshold

In [2]:
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize

# One-vs-Rest (OvR) AUC
def multiclass_auc_ovr(y_true, y_pred_proba, classes):
    y_true_bin = label_binarize(y_true, classes=classes)
    
    auc_scores = {}
    for i, cls in enumerate(classes):
        auc_score = roc_auc_score(y_true_bin[:, i], y_pred_proba[:, i])
        auc_scores[cls] = auc_score
    
    # Macro-average AUC
    macro_auc = np.mean(list(auc_scores.values()))
    
    return auc_scores, macro_auc

In [3]:
from sklearn.utils import resample
import scipy.stats as stats

def auc_with_ci(y_true, y_pred_proba, n_bootstraps=1000, confidence=0.95):
    """
    Calculate AUC with confidence intervals using bootstrapping
    """
    auc_scores = []
    
    for _ in range(n_bootstraps):
        # Bootstrap sample
        indices = resample(range(len(y_true)))
        y_true_bs = y_true[indices]
        y_pred_bs = y_pred_proba[indices]
        
        # Calculate AUC for bootstrap sample
        try:
            auc_bs = roc_auc_score(y_true_bs, y_pred_bs)
            auc_scores.append(auc_bs)
        except:
            continue
    
    # Calculate confidence interval
    alpha = (1 - confidence) / 2
    lower_bound = np.percentile(auc_scores, 100 * alpha)
    upper_bound = np.percentile(auc_scores, 100 * (1 - alpha))
    mean_auc = np.mean(auc_scores)
    
    return mean_auc, (lower_bound, upper_bound), auc_scores

### When to Use AUC:
Imbalanced datasets (AUC is threshold-independent)

Comparing multiple models

When both false positives and false negatives are important

Probability ranking is important

### When NOT to Use AUC:
Extremely imbalanced datasets (consider Precision-Recall AUC)

Cost-sensitive learning (use cost curves)

Multi-class with class imbalance (use macro/micro averaging carefully)

---

AUC measures ranking ability, not absolute predictions

Range: 0.5 (random) to 1.0 (perfect)

Threshold-independent - evaluates all possible thresholds

Use AUC-PR for highly imbalanced datasets

Always validate with business metrics and cost functions

Monitor AUC drift in production systems

Combine with calibration curves for complete assessment