# Week 2: Uncertainty Quantification 마스터

## 목표: UQ의 4가지 핵심 기법을 코드로 구현
1. MC Dropout
2. Deep Ensembles
3. Temperature Scaling (Calibration)
4. Conformal Prediction

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

---
# Day 1-2: MC Dropout 구현

**Tasks:**
- [ ] Read MC Dropout paper (Gal & Ghahramani, 2016)
- [ ] Run the MC Dropout code on Cora
- [ ] Experiment with n_samples = [10, 50, 100]
- [ ] Understand epistemic uncertainty from dropout

In [None]:
class GCN_with_Dropout(nn.Module):
    """
    Monte Carlo Dropout을 위한 GNN
    핵심: 추론 시에도 dropout을 켜둠!
    """
    def __init__(self, in_channels, hidden_channels, out_channels, dropout=0.5):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)
        self.dropout = dropout
    
    def forward(self, x, edge_index, training=False):
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        # 핵심: training=True면 항상 dropout 적용
        x = F.dropout(x, p=self.dropout, training=training or self.training)
        x = self.conv2(x, edge_index)
        return x

In [None]:
def mc_dropout_prediction(model, data, n_samples=50):
    """
    MC Dropout으로 uncertainty 측정
    
    Args:
        model: GCN_with_Dropout 모델
        data: 그래프 데이터
        n_samples: Dropout sampling 횟수
    
    Returns:
        mean_pred: 평균 예측 (N, C)
        epistemic_uncertainty: Epistemic 불확실성 (N,)
        entropy: Predictive entropy (N,)
    """
    model.eval()
    all_predictions = []
    
    # n_samples번 forward pass (매번 다른 dropout mask)
    with torch.no_grad():
        for _ in range(n_samples):
            # training=True로 설정하여 dropout 활성화
            logits = model(data.x, data.edge_index, training=True)
            probs = F.softmax(logits, dim=1)
            all_predictions.append(probs)
    
    # (n_samples, num_nodes, num_classes) -> (num_nodes, num_classes)
    all_predictions = torch.stack(all_predictions)
    mean_pred = all_predictions.mean(dim=0)
    
    # Epistemic Uncertainty: Variance across samples
    epistemic = all_predictions.var(dim=0).mean(dim=1)
    
    # Predictive Entropy
    entropy = -(mean_pred * torch.log(mean_pred + 1e-10)).sum(dim=1)
    
    return mean_pred, epistemic, entropy

### Load Cora Dataset

In [None]:
# Load Cora dataset
dataset = Planetoid(root='./data', name='Cora')
data = dataset[0]

print(f"Dataset: {dataset}")
print(f"Number of graphs: {len(dataset)}")
print(f"Number of features: {dataset.num_features}")
print(f"Number of classes: {dataset.num_classes}")
print(f"\nGraph:")
print(f"Number of nodes: {data.num_nodes}")
print(f"Number of edges: {data.num_edges}")
print(f"Average node degree: {data.num_edges / data.num_nodes:.2f}")
print(f"Training nodes: {data.train_mask.sum()}")
print(f"Validation nodes: {data.val_mask.sum()}")
print(f"Test nodes: {data.test_mask.sum()}")

### Train MC Dropout Model

In [None]:
# Initialize model
mc_model = GCN_with_Dropout(
    in_channels=dataset.num_features,
    hidden_channels=16,
    out_channels=dataset.num_classes,
    dropout=0.5
)

optimizer = torch.optim.Adam(mc_model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = nn.CrossEntropyLoss()

# Training loop
mc_model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = mc_model(data.x, data.edge_index)
    loss = criterion(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 20 == 0:
        mc_model.eval()
        with torch.no_grad():
            pred = mc_model(data.x, data.edge_index).argmax(dim=1)
            train_acc = (pred[data.train_mask] == data.y[data.train_mask]).float().mean()
            val_acc = (pred[data.val_mask] == data.y[data.val_mask]).float().mean()
        mc_model.train()
        print(f'Epoch {epoch+1:3d}, Loss: {loss:.4f}, Train Acc: {train_acc:.4f}, Val Acc: {val_acc:.4f}')

print("\nTraining complete!")

### Experiment: Test Different n_samples [10, 50, 100]

In [None]:
print("\n" + "="*60)
print("MC Dropout: Testing different n_samples")
print("="*60)

for n_samples in [10, 50, 100]:
    print(f"\nn_samples = {n_samples}")
    mean_pred, epistemic, entropy = mc_dropout_prediction(mc_model, data, n_samples=n_samples)
    
    # Test accuracy
    test_pred = mean_pred[data.test_mask].argmax(dim=1)
    test_acc = (test_pred == data.y[data.test_mask]).float().mean()
    
    # Average uncertainty on test set
    avg_epistemic = epistemic[data.test_mask].mean()
    avg_entropy = entropy[data.test_mask].mean()
    
    print(f"  Test Accuracy: {test_acc:.4f}")
    print(f"  Avg Epistemic Uncertainty: {avg_epistemic:.4f}")
    print(f"  Avg Entropy: {avg_entropy:.4f}")

---
# Day 3-4: Deep Ensembles 구현

**Tasks:**
- [ ] Read Deep Ensembles paper (Lakshminarayanan et al., 2017)
- [ ] Run the ensemble code
- [ ] Experiment with n_models = [3, 5, 10]
- [ ] Compare with MC Dropout

In [None]:
class GCNConv_Model(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)
    
    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        return x

In [None]:
class GCN_Ensemble:
    """
    Deep Ensemble: 여러 모델을 독립적으로 학습
    각 모델은 다른 random seed로 초기화
    """
    def __init__(self, in_channels, hidden_channels, out_channels, n_models=5):
        self.models = []
        self.n_models = n_models
        
        for i in range(n_models):
            # 각 모델마다 다른 seed
            torch.manual_seed(42 + i)
            model = GCNConv_Model(in_channels, hidden_channels, out_channels)
            self.models.append(model)
    
    def train_ensemble(self, data, epochs=200):
        """각 모델을 독립적으로 학습"""
        for i, model in enumerate(self.models):
            print(f"\nTraining model {i+1}/{self.n_models}")
            optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
            criterion = nn.CrossEntropyLoss()
            
            model.train()
            for epoch in range(epochs):
                optimizer.zero_grad()
                out = model(data.x, data.edge_index)
                loss = criterion(out[data.train_mask], data.y[data.train_mask])
                loss.backward()
                optimizer.step()
                
                if (epoch + 1) % 50 == 0:
                    val_acc = self.evaluate_single(model, data, data.val_mask)
                    print(f"  Epoch {epoch+1}, Loss: {loss:.4f}, Val Acc: {val_acc:.4f}")
    
    def predict(self, data):
        """
        Ensemble 예측 및 불확실성 측정
        
        Returns:
            mean_pred: 평균 예측
            epistemic: 모델 간 disagreement
            entropy: Predictive entropy
        """
        all_predictions = []
        
        for model in self.models:
            model.eval()
            with torch.no_grad():
                logits = model(data.x, data.edge_index)
                probs = F.softmax(logits, dim=1)
                all_predictions.append(probs)
        
        all_predictions = torch.stack(all_predictions)  # (n_models, N, C)
        mean_pred = all_predictions.mean(dim=0)
        
        # Epistemic: 모델들의 disagreement
        epistemic = all_predictions.var(dim=0).mean(dim=1)
        
        # Entropy
        entropy = -(mean_pred * torch.log(mean_pred + 1e-10)).sum(dim=1)
        
        return mean_pred, epistemic, entropy
    
    def evaluate_single(self, model, data, mask):
        model.eval()
        with torch.no_grad():
            pred = model(data.x, data.edge_index).argmax(dim=1)
            acc = (pred[mask] == data.y[mask]).float().mean()
        return acc

### Experiment: Test Different n_models [3, 5, 10]

In [None]:
print("\n" + "="*60)
print("Deep Ensembles: Testing different n_models")
print("="*60)

ensemble_results = {}

for n_models in [3, 5, 10]:
    print(f"\n\nTraining Ensemble with n_models = {n_models}")
    print("="*60)
    
    ensemble = GCN_Ensemble(
        in_channels=dataset.num_features,
        hidden_channels=16,
        out_channels=dataset.num_classes,
        n_models=n_models
    )
    
    ensemble.train_ensemble(data, epochs=200)
    
    print(f"\nPredicting with {n_models} models...")
    mean_pred, epistemic, entropy = ensemble.predict(data)
    
    # Test accuracy
    test_pred = mean_pred[data.test_mask].argmax(dim=1)
    test_acc = (test_pred == data.y[data.test_mask]).float().mean()
    
    # Average uncertainty on test set
    avg_epistemic = epistemic[data.test_mask].mean()
    avg_entropy = entropy[data.test_mask].mean()
    
    print(f"\nResults for n_models = {n_models}:")
    print(f"  Test Accuracy: {test_acc:.4f}")
    print(f"  Avg Epistemic Uncertainty: {avg_epistemic:.4f}")
    print(f"  Avg Entropy: {avg_entropy:.4f}")
    
    ensemble_results[n_models] = {
        'accuracy': test_acc.item(),
        'epistemic': avg_epistemic.item(),
        'entropy': avg_entropy.item()
    }

---
# Day 5: Temperature Scaling (Calibration)

**Tasks:**
- [ ] Read Temperature Scaling paper (Guo et al., 2017)
- [ ] Apply calibration to your MC Dropout model
- [ ] Check if ECE improves
- [ ] Understand calibration importance

In [None]:
class TemperatureScaling(nn.Module):
    """
    Temperature Scaling으로 probability calibration
    
    학습된 모델의 logits를 temperature로 나눠서 보정
    """
    def __init__(self):
        super().__init__()
        self.temperature = nn.Parameter(torch.ones(1))
    
    def forward(self, logits):
        """
        logits: (N, C) - 모델의 raw output
        return: (N, C) - temperature-scaled probabilities
        """
        return F.softmax(logits / self.temperature, dim=1)
    
    def calibrate(self, model, data, val_mask, max_iter=50):
        """
        Validation set으로 optimal temperature 찾기
        NLL을 최소화하는 temperature를 학습
        """
        # Get validation logits
        model.eval()
        with torch.no_grad():
            logits = model(data.x, data.edge_index)
            val_logits = logits[val_mask]
            val_labels = data.y[val_mask]
        
        # Optimize temperature
        optimizer = torch.optim.LBFGS([self.temperature], lr=0.01, max_iter=max_iter)
        criterion = nn.CrossEntropyLoss()
        
        def eval():
            optimizer.zero_grad()
            loss = criterion(val_logits / self.temperature, val_labels)
            loss.backward()
            return loss
        
        optimizer.step(eval)
        
        print(f"Optimal temperature: {self.temperature.item():.4f}")
        return self.temperature.item()

In [None]:
def compute_ece(probs, labels, n_bins=15):
    """
    Expected Calibration Error (ECE)
    
    Confidence와 accuracy가 얼마나 일치하는지 측정
    ECE가 낮을수록 well-calibrated
    """
    confidences = probs.max(dim=1)[0].cpu().numpy()
    predictions = probs.argmax(dim=1).cpu().numpy()
    labels = labels.cpu().numpy()
    
    bin_boundaries = np.linspace(0, 1, n_bins + 1)
    ece = 0.0
    
    for i in range(n_bins):
        # Bin에 속하는 샘플들
        in_bin = (confidences >= bin_boundaries[i]) & (confidences < bin_boundaries[i+1])
        
        if in_bin.sum() > 0:
            bin_accuracy = (predictions[in_bin] == labels[in_bin]).mean()
            bin_confidence = confidences[in_bin].mean()
            ece += (in_bin.sum() / len(labels)) * abs(bin_accuracy - bin_confidence)
    
    return ece

### Apply Temperature Scaling to MC Dropout Model

In [None]:
print("\n" + "="*60)
print("Temperature Scaling: Calibrating MC Dropout Model")
print("="*60)

# Get uncalibrated predictions
mc_model.eval()
with torch.no_grad():
    logits = mc_model(data.x, data.edge_index)
    probs_before = F.softmax(logits, dim=1)

# ECE before calibration
ece_before = compute_ece(probs_before[data.test_mask], data.y[data.test_mask])
print(f"\nECE before calibration: {ece_before:.4f}")

# Calibrate
temp_scaler = TemperatureScaling()
optimal_temp = temp_scaler.calibrate(mc_model, data, data.val_mask)

# Get calibrated predictions
with torch.no_grad():
    probs_after = temp_scaler(logits)

# ECE after calibration
ece_after = compute_ece(probs_after[data.test_mask], data.y[data.test_mask])
print(f"ECE after calibration: {ece_after:.4f}")

print(f"\nECE improvement: {ece_before - ece_after:.4f}")

---
# Day 6-7: Conformal Prediction

**Tasks:**
- [ ] Read Conformal Prediction tutorial
- [ ] Run conformal prediction code
- [ ] Check coverage guarantees
- [ ] Understand distribution-free uncertainty

In [None]:
class ConformalPredictor:
    """
    Conformal Prediction: Distribution-free uncertainty
    
    핵심 아이디어:
    - Calibration set에서 nonconformity score 계산
    - Test time에 prediction set 생성 (guaranteed coverage)
    """
    def __init__(self, alpha=0.1):
        """
        alpha: 유의수준 (1-alpha = coverage level)
        alpha=0.1이면 90% coverage 보장
        """
        self.alpha = alpha
        self.quantile = None
    
    def calibrate(self, model, data, cal_mask):
        """
        Calibration set에서 nonconformity scores 계산
        
        Nonconformity score: 1 - P(y_true)
        즉, 정답 클래스의 확률이 낮을수록 높은 score
        """
        model.eval()
        with torch.no_grad():
            logits = model(data.x, data.edge_index)
            probs = F.softmax(logits, dim=1)
            
            cal_probs = probs[cal_mask]
            cal_labels = data.y[cal_mask]
            
            # Nonconformity scores
            scores = 1 - cal_probs[torch.arange(len(cal_labels)), cal_labels]
            
            # (1-alpha) quantile 계산
            n = len(scores)
            q_level = np.ceil((n + 1) * (1 - self.alpha)) / n
            self.quantile = torch.quantile(scores, q_level)
            
        print(f"Conformal quantile (alpha={self.alpha}): {self.quantile:.4f}")
        return self.quantile
    
    def predict(self, model, data, test_mask):
        """
        Prediction sets 생성
        
        Returns:
            prediction_sets: List of sets, 각 노드마다 가능한 클래스들
            set_sizes: 각 prediction set의 크기
        """
        if self.quantile is None:
            raise ValueError("먼저 calibrate()를 호출하세요!")
        
        model.eval()
        with torch.no_grad():
            logits = model(data.x, data.edge_index)
            probs = F.softmax(logits, dim=1)
            test_probs = probs[test_mask]
            
            # Prediction set: {y : 1 - P(y) <= quantile}
            # 즉, P(y) >= 1 - quantile인 모든 클래스
            threshold = 1 - self.quantile
            prediction_sets = (test_probs >= threshold).cpu().numpy()
            set_sizes = prediction_sets.sum(axis=1)
            
        return prediction_sets, set_sizes
    
    def evaluate_coverage(self, model, data, test_mask):
        """
        Coverage 측정: 정답이 prediction set에 포함된 비율
        이론적으로 (1-alpha) 이상이어야 함
        """
        prediction_sets, set_sizes = self.predict(model, data, test_mask)
        test_labels = data.y[test_mask].cpu().numpy()
        
        coverage = np.mean([prediction_sets[i, test_labels[i]] 
                           for i in range(len(test_labels))])
        avg_set_size = np.mean(set_sizes)
        
        print(f"Coverage: {coverage:.4f} (target: {1-self.alpha:.4f})")
        print(f"Average prediction set size: {avg_set_size:.2f}")
        
        return coverage, avg_set_size

### Run Conformal Prediction

In [None]:
print("\n" + "="*60)
print("Conformal Prediction: Computing Prediction Sets")
print("="*60)

# Initialize conformal predictor (90% coverage)
conformal = ConformalPredictor(alpha=0.1)

# Calibrate using validation set
conformal.calibrate(mc_model, data, data.val_mask)

# Evaluate on test set
coverage, avg_set_size = conformal.evaluate_coverage(mc_model, data, data.test_mask)

# Try different alpha values
print("\n" + "="*60)
print("Testing different alpha values (coverage levels)")
print("="*60)

for alpha in [0.05, 0.1, 0.2]:
    print(f"\nalpha = {alpha} (target coverage: {1-alpha:.2f})")
    cp = ConformalPredictor(alpha=alpha)
    cp.calibrate(mc_model, data, data.val_mask)
    cov, size = cp.evaluate_coverage(mc_model, data, data.test_mask)
    print(f"  Actual coverage: {cov:.4f}")
    print(f"  Avg set size: {size:.2f}")

---
# Final: Compare All Methods

**Tasks:**
- [ ] Run the complete pipeline
- [ ] Compare accuracy, ECE, NLL, Brier Score
- [ ] Generate all visualizations
- [ ] Write summary of when to use each method

In [None]:
def compute_nll(probs, labels):
    """
    Negative Log-Likelihood
    
    확률 예측의 품질 측정
    NLL이 낮을수록 좋은 확률 예측
    """
    labels = labels.cpu()
    probs = probs.cpu()
    nll = -torch.log(probs[torch.arange(len(labels)), labels] + 1e-10).mean()
    return nll.item()


def compute_brier_score(probs, labels, num_classes):
    """
    Brier Score: 확률 예측의 정확도
    
    낮을수록 좋음 (0이 perfect)
    """
    labels_onehot = F.one_hot(labels, num_classes=num_classes).float()
    brier = ((probs - labels_onehot) ** 2).sum(dim=1).mean()
    return brier.item()

In [None]:
print("\n" + "="*70)
print("FINAL COMPARISON: All UQ Methods on Test Set")
print("="*70)

# MC Dropout predictions
mc_probs, mc_epistemic, mc_entropy = mc_dropout_prediction(mc_model, data, n_samples=50)

# Ensemble predictions (using n_models=5)
ensemble_5 = GCN_Ensemble(
    in_channels=dataset.num_features,
    hidden_channels=16,
    out_channels=dataset.num_classes,
    n_models=5
)
print("\nTraining Ensemble (5 models) for final comparison...")
ensemble_5.train_ensemble(data, epochs=200)
ens_probs, ens_epistemic, ens_entropy = ensemble_5.predict(data)

# Comparison
results = {
    'MC Dropout': {
        'probs': mc_probs,
        'epistemic': mc_epistemic,
        'entropy': mc_entropy
    },
    'Deep Ensemble': {
        'probs': ens_probs,
        'epistemic': ens_epistemic,
        'entropy': ens_entropy
    }
}

print("\n" + "="*70)
for method_name, result in results.items():
    probs = result['probs'][data.test_mask]
    labels = data.y[data.test_mask]
    
    # Accuracy
    acc = (probs.argmax(dim=1) == labels).float().mean()
    
    # UQ Metrics
    ece = compute_ece(probs, labels)
    nll = compute_nll(probs, labels)
    brier = compute_brier_score(probs, labels, dataset.num_classes)
    
    print(f"\n{method_name}:")
    print(f"  Accuracy:       {acc:.4f}")
    print(f"  ECE:            {ece:.4f}")
    print(f"  NLL:            {nll:.4f}")
    print(f"  Brier Score:    {brier:.4f}")
    print(f"  Avg Epistemic:  {result['epistemic'][data.test_mask].mean():.4f}")
    print(f"  Avg Entropy:    {result['entropy'][data.test_mask].mean():.4f}")

print("\n" + "="*70)
print("\n✅ Week 2 Complete! You've implemented all 4 UQ methods!")

---
# Visualizations

In [None]:
def plot_reliability_diagram(probs, labels, n_bins=10, title="Reliability Diagram"):
    """
    Calibration plot: Confidence vs Accuracy
    대각선에 가까울수록 well-calibrated
    """
    confidences = probs.max(dim=1)[0].cpu().numpy()
    predictions = probs.argmax(dim=1).cpu().numpy()
    labels = labels.cpu().numpy()
    
    bin_boundaries = np.linspace(0, 1, n_bins + 1)
    bin_confidences = []
    bin_accuracies = []
    bin_counts = []
    
    for i in range(n_bins):
        in_bin = (confidences >= bin_boundaries[i]) & (confidences < bin_boundaries[i+1])
        if in_bin.sum() > 0:
            bin_confidences.append(confidences[in_bin].mean())
            bin_accuracies.append((predictions[in_bin] == labels[in_bin]).mean())
            bin_counts.append(in_bin.sum())
    
    plt.figure(figsize=(8, 8))
    plt.plot([0, 1], [0, 1], 'k--', label='Perfect calibration')
    plt.bar(bin_confidences, bin_accuracies, width=1/n_bins, alpha=0.7, 
            edgecolor='black', label='Model')
    plt.xlabel('Confidence', fontsize=14)
    plt.ylabel('Accuracy', fontsize=14)
    plt.title(title, fontsize=16)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()


def plot_uncertainty_vs_error(uncertainty, is_correct, title="Uncertainty vs Error"):
    """
    불확실성이 높은 샘플이 틀릴 확률이 높은가?
    """
    uncertainty = uncertainty.cpu().numpy()
    is_correct = is_correct.cpu().numpy()
    
    # Bin by uncertainty
    n_bins = 10
    bins = np.percentile(uncertainty, np.linspace(0, 100, n_bins + 1))
    bin_error_rates = []
    bin_centers = []
    
    for i in range(n_bins):
        in_bin = (uncertainty >= bins[i]) & (uncertainty < bins[i+1])
        if in_bin.sum() > 0:
            error_rate = 1 - is_correct[in_bin].mean()
            bin_error_rates.append(error_rate)
            bin_centers.append((bins[i] + bins[i+1]) / 2)
    
    plt.figure(figsize=(10, 6))
    plt.plot(bin_centers, bin_error_rates, 'o-', linewidth=2, markersize=8)
    plt.xlabel('Uncertainty (binned)', fontsize=14)
    plt.ylabel('Error Rate', fontsize=14)
    plt.title(title, fontsize=16)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

In [None]:
# Plot reliability diagram for MC Dropout
plot_reliability_diagram(mc_probs[data.test_mask], data.y[data.test_mask], 
                         title="MC Dropout - Reliability Diagram")

# Plot uncertainty vs error
is_correct = (mc_probs.argmax(dim=1) == data.y)[data.test_mask]
plot_uncertainty_vs_error(mc_epistemic[data.test_mask], is_correct,
                         title="MC Dropout - Uncertainty vs Error")

---
# Week 2 Summary

## When to use each UQ method?

### 1. MC Dropout
**Pros:**
- Easy to implement (just turn on dropout at test time)
- Single model training
- Fast inference (can adjust n_samples)

**Cons:**
- Theoretical justification relies on approximation
- Uncertainty quality depends on dropout rate

**Use when:** You need quick uncertainty estimates with minimal changes to existing models

### 2. Deep Ensembles
**Pros:**
- Strong empirical performance
- Diversity from different initializations
- Often best uncertainty estimates

**Cons:**
- Expensive (train multiple models)
- More memory at inference

**Use when:** You need high-quality uncertainty and have computational budget

### 3. Temperature Scaling
**Pros:**
- Simple post-processing
- Improves calibration significantly
- Single parameter to tune

**Cons:**
- Doesn't capture epistemic uncertainty
- Only improves calibration, not accuracy

**Use when:** You need well-calibrated probabilities for decision-making

### 4. Conformal Prediction
**Pros:**
- Distribution-free coverage guarantees
- Works with any base model
- Mathematically rigorous

**Cons:**
- Produces sets, not single predictions
- Requires holdout calibration set

**Use when:** You need guaranteed coverage (e.g., safety-critical applications)

---

## Key Concepts Learned

**Epistemic vs Aleatoric Uncertainty:**
- **Epistemic**: Model uncertainty (can be reduced with more data)
- **Aleatoric**: Data uncertainty (irreducible)

**Calibration:**
- A model is well-calibrated if predicted probabilities match true frequencies
- E.g., among all predictions with 80% confidence, 80% should be correct

**Metrics:**
- **ECE**: Measures calibration quality
- **NLL**: Measures probability prediction quality
- **Brier Score**: Measures overall prediction quality

---

✅ **Week 2 Complete!** Ready for Week 3: GNN + UQ integration