# Lab 9: Capstone — End-to-End Secure ML Pipeline

## Learning Objectives

By the end of this capstone, you will be able to:

1. **Design** an end-to-end secure ML system (threat model → defenses)
2. **Implement** multi-layer defenses across data, training, and inference
3. **Evaluate** attacks: evasion, poisoning, membership inference, resource exhaustion
4. **Measure** privacy, robustness, and utility trade-offs
5. **Operationalize** security with monitoring, rate limits, and alerts

---

## Capstone Scenario

You are the ML security lead for a healthcare imaging startup. The company deploys a
cloud-hosted classifier for medical triage. Your system must withstand:

- **Evasion attacks:** Adversarial perturbations to fool predictions
- **Data poisoning:** Malicious training samples injected into the pipeline
- **Membership inference:** Adversaries trying to detect patient data inclusion
- **Sponge attacks:** Resource exhaustion via crafted inputs

Your job: build a secure pipeline and quantify its performance.

---

## Table of Contents
1. [Threat Model](#threat-model)
2. [Secure Data Pipeline](#data)
3. [Robust Training (Adversarial + DP)](#training)
4. [Secure Inference (Validation + Rate Limiting)](#inference)
5. [Attack Suite](#attacks)
6. [Evaluation Dashboard](#evaluation)
7. [Exercises](#exercises)

---

## Threat Model <a id="threat-model"></a>

| Layer | Threat | Goal | Defense |
|------|--------|------|---------|
| Data | Poisoning | Corrupt model | Outlier detection, sanitization |
| Training | Membership inference | Privacy leakage | DP-SGD |
| Inference | Evasion | Misclassification | Adversarial training |
| Inference | Sponge attack | DoS/resource exhaustion | Validation + rate limiting |
| Ops | Model theft | IP exposure | Logging + anomaly detection |

---

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Subset
import torchvision.transforms as transforms
from torchvision.datasets import MNIST

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from dataclasses import dataclass
from sklearn.ensemble import IsolationForest
from sklearn.metrics import roc_auc_score

np.random.seed(42)
torch.manual_seed(42)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

# Data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = MNIST(root='./data', train=False, download=True, transform=transform)

# Use subset for faster execution
train_indices = np.random.choice(len(train_dataset), 8000, replace=False)
test_indices = np.random.choice(len(test_dataset), 2000, replace=False)

train_data = Subset(train_dataset, train_indices)
test_data = Subset(test_dataset, test_indices)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=64, shuffle=False)

print(f"Train: {len(train_data)}, Test: {len(test_data)}")

In [None]:
# ============================================================================
# Model Definition
# ============================================================================

class SmallCNN(nn.Module):
    def __init__(self):
        super(SmallCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = self.pool(x)
        x = torch.relu(self.conv2(x))
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

def evaluate(model, loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()
            total += target.size(0)
    return 100.0 * correct / total

print('Model ready.')

In [None]:
# ============================================================================
# Secure Data Pipeline: Simple Poisoning + Detection
# ============================================================================

print('\n' + '='*70)
print('Secure Data Pipeline: Label-Flipping + Pixel-Based Detector (Mismatch)')
print('='*70)

def poison_labels(dataset, poison_rate=0.1, target_label=0):
    """Flip labels for a fraction of samples (label-flipping attack)."""
    indices = np.random.choice(len(dataset), int(poison_rate * len(dataset)), replace=False)
    poisoned = []
    for i in range(len(dataset)):
        x, y = dataset[i]
        if i in indices:
            poisoned.append((x, target_label))
        else:
            poisoned.append((x, y))
    return poisoned, indices

# Create poisoned dataset
poisoned_data, poisoned_indices = poison_labels(train_data, poison_rate=0.1, target_label=0)

# NOTE: Label-flipping does not change pixel values.
# A pixel-based outlier detector (IsolationForest) is therefore NOT effective here.
X_flat = torch.stack([poisoned_data[i][0].view(-1) for i in range(len(poisoned_data))]).numpy()
clf = IsolationForest(contamination=0.1, random_state=42)
outlier_preds = clf.fit_predict(X_flat)

outlier_rate = (outlier_preds == -1).mean()
print(f'Poisoned samples (label flips): {len(poisoned_indices)}')
print(f'Pixel-based outlier rate: {100*outlier_rate:.1f}% (not tied to label flips)')

# Keep all samples; no reliable pixel-based filtering for label-flip attacks
sanitized_data = poisoned_data
print(f'Sanitized dataset size: {len(sanitized_data)} (no pixel-based filtering)')

In [None]:
# ============================================================================
# Robust Training: Adversarial + DP-SGD (Simplified)
# ============================================================================

print('\n' + '='*70)
print('Robust Training: Adversarial + DP-SGD')
print('='*70)

@dataclass
class DefenseConfig:
    epsilon_adv: float = 0.1
    clip_norm: float = 1.0
    noise_multiplier: float = 1.0

def fgsm_attack(model, data, target, epsilon=0.1):
    data.requires_grad = True
    output = model(data)
    loss = nn.CrossEntropyLoss()(output, target)
    model.zero_grad()
    loss.backward()
    data_grad = data.grad.data
    perturbed = data + epsilon * data_grad.sign()
    return torch.clamp(perturbed, -3, 3)

def dp_sgd_step(model, data, target, config: DefenseConfig):
    model.train()
    criterion = nn.CrossEntropyLoss(reduction='none')
    data, target = data.to(device), target.to(device)

    # Per-sample gradients
    per_sample_grads = []
    for i in range(data.size(0)):
        model.zero_grad()
        output = model(data[i:i+1])
        loss = criterion(output, target[i:i+1]).mean()
        loss.backward()
        grads = [p.grad.detach().clone() for p in model.parameters()]
        per_sample_grads.append(grads)

    # Clip and aggregate
    agg_grads = []
    for param_i in range(len(per_sample_grads[0])):
        stacked = torch.stack([g[param_i] for g in per_sample_grads], dim=0)
        # Clip
        norms = torch.norm(stacked.view(stacked.size(0), -1), dim=1)
        factors = torch.clamp(config.clip_norm / (norms + 1e-6), max=1.0)
        stacked = stacked * factors.view(-1, *([1] * (stacked.dim()-1)))
        agg = stacked.mean(dim=0)
        # Noise
        noise = torch.randn_like(agg) * (config.noise_multiplier * config.clip_norm / data.size(0))
        agg_grads.append(agg + noise)

    for param, grad in zip(model.parameters(), agg_grads):
        param.grad = grad

def train_secure(model, loader, config: DefenseConfig, epochs=3):
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    for epoch in range(epochs):
        for data, target in loader:
            data, target = data.to(device), target.to(device)
            adv = fgsm_attack(model, data, target, epsilon=config.epsilon_adv)
            optimizer.zero_grad()
            dp_sgd_step(model, adv, target, config)
            optimizer.step()
        print(f'Epoch {epoch+1}/{epochs}: Secure training complete')

secure_model = SmallCNN().to(device)
secure_loader = DataLoader(sanitized_data, batch_size=64, shuffle=True)
config = DefenseConfig()
train_secure(secure_model, secure_loader, config, epochs=3)

secure_train_acc = evaluate(secure_model, train_loader)
secure_test_acc = evaluate(secure_model, test_loader)
print(f'\nSecure Model Accuracy: Train={secure_train_acc:.2f}%, Test={secure_test_acc:.2f}%')

In [None]:
# ============================================================================
# Secure Inference: Input Validation + Rate Limiting
# ============================================================================

print('\n' + '='*70)
print('Secure Inference: Input Validation + Rate Limiting')
print('='*70)

@dataclass
class InputPolicy:
    max_abs_value: float = 3.0
    max_variance: float = 2.0

def validate_input(x: torch.Tensor, policy: InputPolicy) -> bool:
    if x.abs().max().item() > policy.max_abs_value:
        return False
    if x.var().item() > policy.max_variance:
        return False
    return True

@dataclass
class RateLimiter:
    max_qps: int = 100
    burst: int = 10
    tokens: int = 10

    def allow(self):
        if self.tokens > 0:
            self.tokens -= 1
            return True
        return False

policy = InputPolicy()
limiter = RateLimiter()

# Simulate 20 requests
allowed = 0
for _ in range(20):
    if limiter.allow():
        allowed += 1

print(f'Requests allowed (burst=10): {allowed}/20')

In [None]:
# ============================================================================
# Attack Suite
# ============================================================================

print('\n' + '='*70)
print('Attack Suite')
print('='*70)

def get_confidences(model, loader):
    model.eval()
    confs = []
    with torch.no_grad():
        for data, _ in loader:
            data = data.to(device)
            out = model(data)
            probs = torch.softmax(out, dim=1)
            confs.extend(probs.max(dim=1)[0].cpu().numpy())
    return np.array(confs)

# 1) Evasion (FGSM)
def eval_fgsm(model, loader, epsilon=0.1):
    model.eval()
    correct = 0
    total = 0
    for data, target in loader:
        data, target = data.to(device), target.to(device)
        adv = fgsm_attack(model, data, target, epsilon=epsilon)
        out = model(adv)
        pred = out.argmax(dim=1)
        correct += (pred == target).sum().item()
        total += target.size(0)
    return 100.0 * correct / total

# 2) Membership inference (confidence-based)
def membership_auc(model):
    train_conf = get_confidences(model, train_loader)
    test_conf = get_confidences(model, test_loader)
    labels = np.concatenate([np.ones(len(train_conf)), np.zeros(len(test_conf))])
    scores = np.concatenate([train_conf, test_conf])
    return roc_auc_score(labels, scores)

# 3) Sponge (simple variance-based input)
def sponge_input(scale=5.0):
    x = torch.randn(1, 1, 28, 28) * scale
    return x

# Evaluate secure model
clean_acc = evaluate(secure_model, test_loader)
fgsm_acc = eval_fgsm(secure_model, test_loader, epsilon=0.1)
mia_auc = membership_auc(secure_model)

# Sponge rejection
sponge = sponge_input(scale=5.0)
sponge_allowed = validate_input(sponge, policy)

print(f'Clean accuracy: {clean_acc:.2f}%')
print(f'FGSM accuracy: {fgsm_acc:.2f}%')
print(f'Membership AUC: {mia_auc:.4f}')
print(f'Sponge input allowed? {sponge_allowed}')

In [None]:
# ============================================================================
# Evaluation Dashboard
# ============================================================================

clean_acc_frac = clean_acc / 100.0
fgsm_acc_frac = fgsm_acc / 100.0

results = pd.DataFrame({
    'Metric': ['Clean Acc (fraction)', 'FGSM Acc (fraction)', 'MIA AUC', 'Sponge Allowed'],
    'Value': [clean_acc_frac, fgsm_acc_frac, mia_auc, 1.0 if sponge_allowed else 0.0]
})

print(results.to_string(index=False))

fig, ax = plt.subplots(figsize=(8, 4))
ax.bar(results['Metric'], results['Value'], color=['#2ecc71', '#e67e22', '#3498db', '#e74c3c'])
ax.set_ylabel('Value (0–1)')
ax.set_title('Secure ML Pipeline: Key Metrics')
ax.set_ylim([0, 1.0])
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('capstone_dashboard.png', dpi=150, bbox_inches='tight')
plt.show()

---

## Exercises

### Exercise 1: Attack Scaling (Medium)
Increase attack strength (ε) and report FGSM accuracy drop.

### Exercise 2: Poisoning Budget (Hard)
Vary poisoning rate (0%, 5%, 10%, 20%) and measure accuracy + MIA AUC.

### Exercise 3: Privacy Budget (Hard)
Sweep DP noise multiplier and measure MIA AUC vs test accuracy.

### Exercise 4: Sponge Defense Tuning (Medium)
Adjust input variance threshold and measure false positives.

### Exercise 5: Defense Ablation (Hard)
Disable one defense at a time (DP, adversarial training, sanitization).
Quantify which defense contributes most to security.

### Exercise 6: Deployment Plan (Hard)
Write a security checklist for production deployment, including monitoring,
logging, and incident response.