# Certified Defenses Against Adversarial Examples

In [17]:
# Setup and Imports
import torch
import numpy as np
import random
import warnings
warnings.filterwarnings('ignore')

# Detect device (supports CUDA, Apple Silicon MPS, and CPU)
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"✓ Using CUDA GPU: {torch.cuda.get_device_name(0)}")
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    device = torch.device('mps')
    print("✓ Using Apple Silicon GPU (MPS)")
else:
    device = torch.device('cpu')
    print("ℹ Using CPU")

print(f"Device: {device}")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

✓ Using Apple Silicon GPU (MPS)
Device: mps


In [18]:
# Install required packages for this notebook
import subprocess
import sys

def install_package(package):
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])

# Check scipy (should be in requirements.txt)
try:
    import scipy
    from scipy.stats import norm
    print("✓ scipy already installed")
except ImportError:
    print("Installing scipy...")
    install_package('scipy')
    print("✓ scipy installed")

print("✓ All packages ready!")

✓ scipy already installed
✓ All packages ready!


## Overview


While empirical defenses can be bypassed by adaptive attacks, certified defenses provide provable guarantees of robustness. This document explores techniques that mathematically guarantee model behavior within specified bounds, offering the strongest form of adversarial robustness.

## Learning Objectives


- Understand certified vs empirical robustness
- Implement randomized smoothing
- Apply interval bound propagation
- Use Lipschitz constraints
- Evaluate certified accuracy
- Design certifiably robust models

## Why "Certified"?

**Certified** = Mathematical proof/certificate that guarantees robustness

Think of it like a certificate of authenticity:
- A certificate proves something is genuine
- A **certified defense** proves the model is robust within specific bounds

### Real-World Analogy

**Empirical Defense**: "I tested this bridge with 100 trucks and it didn't collapse"
- ❌ What about the 101st truck?
- ❌ What if someone designs a heavier truck?

**Certified Defense**: "I have engineering calculations proving this bridge can hold up to 50 tons"
- ✓ This is a **certificate** of safety
- ✓ Doesn't matter what truck you design - if it's under 50 tons, the bridge is guaranteed safe

### In Machine Learning

**Empirical**: "Robust against attacks we tested"
- Tested against FGSM, PGD, C&W
- No guarantee against unknown attacks

**Certified**: "Provably robust against ALL attacks in threat model"
- Mathematical guarantee: "ANY perturbation with L2 norm < 0.5 cannot fool the model"
- Holds for all possible attacks, not just tested ones

---

## Why Certification Matters


### Empirical vs Certified Robustness

**Empirical Defense**:

In [19]:
# Example: Empirical Defense (Conceptual)
# This shows the LIMITATION of empirical defenses

print("Empirical Defense Approach:")
print("1. Train model with adversarial examples")
print("2. Test against known attacks (FGSM, PGD)")
print("3. Report accuracy on tested attacks")
print("\n⚠️  Problem: Might fail against unknown/adaptive attacks")
print("   Only robust against attacks we tested!")

Empirical Defense Approach:
1. Train model with adversarial examples
2. Test against known attacks (FGSM, PGD)
3. Report accuracy on tested attacks

⚠️  Problem: Might fail against unknown/adaptive attacks
   Only robust against attacks we tested!


**Certified Defense**:

In [20]:
# Example: Certified Defense (Conceptual)
# This shows the ADVANTAGE of certified defenses

print("Certified Defense Approach:")
print("1. Train model with certification objective")
print("2. Compute provable robustness radius")
print("3. Guarantee: No attack within epsilon can fool model")
print("\n✓ Advantage: Holds for ALL possible attacks")
print("  Not just the ones we tested!")
print("\nExample guarantee:")
epsilon = 0.5
print(f"  'For input x, any perturbation with L2 norm < {epsilon}'")
print(f"  'will not change the model prediction'")

Certified Defense Approach:
1. Train model with certification objective
2. Compute provable robustness radius
3. Guarantee: No attack within epsilon can fool model

✓ Advantage: Holds for ALL possible attacks
  Not just the ones we tested!

Example guarantee:
  'For input x, any perturbation with L2 norm < 0.5'
  'will not change the model prediction'


### Security Guarantees

```
Empirical: "Robust against attacks we tested"
Certified: "Provably robust against ALL attacks in threat model"
```

## Randomized Smoothing


### Core Concept

**Idea**: Average predictions over random noise to create smooth, robust classifier

In [21]:
# Note: This is an example function showing the structure.
# To run it, you would need to load MNIST data and a trained model.
# See Lab 5 for a complete working implementation.

import torch
import numpy as np
from scipy.stats import norm

class RandomizedSmoothing:
    """
    Certified defense via randomized smoothing
    
    Reference: "Certified Adversarial Robustness via Randomized Smoothing"
               (Cohen et al., 2019)
    """
    def __init__(self, base_classifier, sigma, num_samples=100):
        """
        Args:
            base_classifier: Base model to smooth
            sigma: Noise standard deviation
            num_samples: Number of noise samples for prediction
        """
        self.base_classifier = base_classifier
        self.sigma = sigma
        self.num_samples = num_samples
    
    def predict(self, x):
        """
        Smoothed prediction by averaging over noise
        """
        counts = np.zeros(self.num_classes)
        
        for _ in range(self.num_samples):
            # Add Gaussian noise
            noise = torch.randn_like(x) * self.sigma
            x_noisy = x + noise
            
            # Get prediction
            pred = self.base_classifier(x_noisy)
            predicted_class = pred.argmax()
            counts[predicted_class] += 1
        
        # Return most common prediction
        return counts.argmax()
    
    def certify(self, x, n_samples=10000, alpha=0.001):
        """
        Compute certified radius for input x
        
        Returns:
            predicted_class: Predicted class
            radius: Certified L2 radius (or None if certification fails)
        """
        # Count predictions with many samples
        counts = np.zeros(self.num_classes)
        
        for _ in range(n_samples):
            noise = torch.randn_like(x) * self.sigma
            x_noisy = x + noise
            pred = self.base_classifier(x_noisy)
            counts[pred.argmax()] += 1
        
        # Get top two classes
        top_counts = np.sort(counts)[-2:]
        top_class = counts.argmax()
        
        # Compute confidence bounds using Clopper-Pearson
        p_lower = self._lower_confidence_bound(
            counts[top_class], n_samples, alpha
        )
        
        # Compute certified radius
        if p_lower > 0.5:
            radius = self.sigma * norm.ppf(p_lower)
            return top_class, radius
        else:
            return top_class, None  # Certification failed
    
    def _lower_confidence_bound(self, count, n, alpha):
        """
        Compute lower confidence bound on probability
        """
        from statsmodels.stats.proportion import proportion_confint
        lower, _ = proportion_confint(count, n, alpha=2*alpha, method='beta')
        return lower

# Example usage
def example_randomized_smoothing():
    """
    Example: Certify MNIST classifier
    """
    # Load base classifier
    base_model = load_mnist_classifier()
    
    # Create smoothed classifier
    smoothed = RandomizedSmoothing(
        base_classifier=base_model,
        sigma=0.25,  # Noise level
        num_samples=100
    )
    
    # Test on example
    x, y = load_test_example()
    
    # Get certified radius
    pred_class, radius = smoothed.certify(x, n_samples=10000)
    
    if radius is not None:
        print(f"Prediction: {pred_class}")
        print(f"Certified L2 radius: {radius:.4f}")
        print(f"Guarantee: No L2 perturbation < {radius} can change prediction")
    else:
        print("Certification failed")

### Working Example: Simple Randomized Smoothing

Let's implement a simplified version that demonstrates the concept:

In [22]:
# Working Example: Simple Randomized Smoothing
import torch
import torch.nn as nn
from torchvision import models, transforms
from scipy.stats import norm

print("Creating simple example...")

# Create a simple classifier
class SimpleClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 3)  # 10 features, 3 classes
    
    def forward(self, x):
        return self.fc(x)

model = SimpleClassifier().to(device)

# Simple randomized smoothing
def smooth_predict(model, x, sigma=0.1, num_samples=100):
    """Predict by averaging over noisy samples"""
    counts = torch.zeros(3)  # 3 classes
    
    for _ in range(num_samples):
        # Add Gaussian noise
        noise = torch.randn_like(x) * sigma
        x_noisy = x + noise
        
        # Get prediction
        with torch.no_grad():
            pred = model(x_noisy).argmax()
            counts[pred] += 1
    
    return counts.argmax().item()

# Test it
x = torch.randn(1, 10).to(device)

# Regular prediction
with torch.no_grad():
    regular_pred = model(x).argmax().item()

# Smoothed prediction
smooth_pred = smooth_predict(model, x, sigma=0.1, num_samples=100)

print(f"\nRegular prediction: Class {regular_pred}")
print(f"Smoothed prediction: Class {smooth_pred}")
print(f"\n✓ Smoothed prediction is more robust to small perturbations!")

# Demonstrate robustness
print(f"\nTesting robustness to perturbations:")
epsilon = 0.05
perturbation = torch.randn_like(x) * epsilon
x_perturbed = x + perturbation

with torch.no_grad():
    regular_pred_perturbed = model(x_perturbed).argmax().item()
smooth_pred_perturbed = smooth_predict(model, x_perturbed, sigma=0.1, num_samples=100)

print(f"  Regular: {regular_pred} → {regular_pred_perturbed} (changed: {regular_pred != regular_pred_perturbed})")
print(f"  Smoothed: {smooth_pred} → {smooth_pred_perturbed} (changed: {smooth_pred != smooth_pred_perturbed})")

Creating simple example...

Regular prediction: Class 0
Smoothed prediction: Class 0

✓ Smoothed prediction is more robust to small perturbations!

Testing robustness to perturbations:
  Regular: 0 → 0 (changed: False)
  Smoothed: 0 → 0 (changed: False)


### Training for Randomized Smoothing

In [23]:
def train_for_smoothing(model, train_loader, sigma, epochs=100):
    """
    Train base classifier for randomized smoothing
    
    Key: Train with Gaussian noise augmentation
    """
    optimizer = torch.optim.Adam(model.parameters())
    criterion = torch.nn.CrossEntropyLoss()
    
    for epoch in range(epochs):
        for x, y in train_loader:
            # Add Gaussian noise during training
            noise = torch.randn_like(x) * sigma
            x_noisy = x + noise
            
            # Standard training
            optimizer.zero_grad()
            output = model(x_noisy)
            loss = criterion(output, y)
            loss.backward()
            optimizer.step()
    
    return model

### Certified Accuracy Evaluation

In [24]:
def evaluate_certified_accuracy(smoothed_model, test_loader, epsilon):
    """
    Evaluate certified accuracy at radius epsilon
    """
    certified_correct = 0
    total = 0
    
    for x, y in test_loader:
        pred_class, radius = smoothed_model.certify(x)
        
        # Count as certified correct if:
        # 1. Prediction is correct
        # 2. Certified radius >= epsilon
        if pred_class == y and radius is not None and radius >= epsilon:
            certified_correct += 1
        
        total += 1
    
    certified_accuracy = certified_correct / total
    return certified_accuracy

# Example results
# Epsilon | Certified Accuracy
# 0.0     | 95%  (clean accuracy)
# 0.25    | 82%
# 0.5     | 68%
# 1.0     | 45%

## Interval Bound Propagation (IBP)


### Core Concept

**Idea**: Propagate input bounds through network to certify output

In [25]:
class IntervalBoundPropagation:
    """
    Certified defense via interval bound propagation
    
    Reference: "Towards Fast Computation of Certified Robustness for 
                ReLU Networks" (Weng et al., 2018)
    """
    def __init__(self, model):
        self.model = model
    
    def compute_bounds(self, x, epsilon):
        """
        Compute output bounds for input in [x-epsilon, x+epsilon]
        """
        # Initialize input bounds
        lower = x - epsilon
        upper = x + epsilon
        
        # Propagate through each layer
        for layer in self.model.layers:
            lower, upper = self.propagate_layer(layer, lower, upper)
        
        return lower, upper
    
    def propagate_layer(self, layer, lower, upper):
        """
        Propagate bounds through a single layer
        """
        if isinstance(layer, torch.nn.Linear):
            return self.propagate_linear(layer, lower, upper)
        elif isinstance(layer, torch.nn.ReLU):
            return self.propagate_relu(lower, upper)
        elif isinstance(layer, torch.nn.Conv2d):
            return self.propagate_conv(layer, lower, upper)
        else:
            raise NotImplementedError(f"Layer {type(layer)} not supported")
    
    def propagate_linear(self, layer, lower, upper):
        """
        Propagate through linear layer: y = Wx + b
        """
        W = layer.weight
        b = layer.bias
        
        # Compute bounds
        # For positive weights: lower bound uses lower input
        # For negative weights: lower bound uses upper input
        W_pos = torch.clamp(W, min=0)
        W_neg = torch.clamp(W, max=0)
        
        new_lower = W_pos @ lower + W_neg @ upper + b
        new_upper = W_pos @ upper + W_neg @ lower + b
        
        return new_lower, new_upper
    
    def propagate_relu(self, lower, upper):
        """
        Propagate through ReLU: y = max(0, x)
        """
        # ReLU is monotonic
        new_lower = torch.clamp(lower, min=0)
        new_upper = torch.clamp(upper, min=0)
        
        return new_lower, new_upper
    
    def certify(self, x, y_true, epsilon):
        """
        Certify that prediction is robust within epsilon
        """
        # Compute output bounds
        lower, upper = self.compute_bounds(x, epsilon)
        
        # Check if true class has highest lower bound
        # This guarantees it's the prediction for all inputs in ball
        if lower[y_true] > upper.max(dim=0)[0]:
            return True, epsilon
        else:
            return False, None

# Example usage
def example_ibp():
    """
    Example: Certify with IBP
    """
    model = load_simple_network()
    ibp = IntervalBoundPropagation(model)
    
    x, y = load_test_example()
    epsilon = 0.1
    
    is_certified, radius = ibp.certify(x, y, epsilon)
    
    if is_certified:
        print(f"Certified robust within L∞ radius {radius}")
    else:
        print("Not certified at this radius")

### Training with IBP

In [26]:
def train_with_ibp(model, train_loader, epsilon, epochs=100):
    """
    Train model to be certifiably robust using IBP
    
    Reference: "Certified Adversarial Robustness via Randomized Smoothing"
    """
    optimizer = torch.optim.Adam(model.parameters())
    ibp = IntervalBoundPropagation(model)
    
    for epoch in range(epochs):
        for x, y in train_loader:
            optimizer.zero_grad()
            
            # Compute bounds on output
            lower, upper = ibp.compute_bounds(x, epsilon)
            
            # Loss: Maximize lower bound of correct class
            #       Minimize upper bound of incorrect classes
            correct_class_lower = lower[range(len(y)), y]
            
            # Get max upper bound of incorrect classes
            mask = torch.ones_like(lower)
            mask[range(len(y)), y] = 0
            incorrect_upper = (upper * mask).max(dim=1)[0]
            
            # Robust loss: margin between correct and incorrect
            loss = torch.relu(incorrect_upper - correct_class_lower + 1.0).mean()
            
            loss.backward()
            optimizer.step()
    
    return model

## Lipschitz Constraints


### Core Concept

**Idea**: Bound how much output can change relative to input change

In [27]:
def lipschitz_constant(model):
    """
    Compute Lipschitz constant of model
    
    For neural network: L = product of layer Lipschitz constants
    For linear layer: L = largest singular value of weight matrix
    """
    lipschitz = 1.0
    
    for layer in model.layers:
        if isinstance(layer, torch.nn.Linear):
            # Lipschitz constant is largest singular value
            singular_values = torch.svd(layer.weight)[1]
            layer_lipschitz = singular_values.max()
            lipschitz *= layer_lipschitz
        elif isinstance(layer, torch.nn.ReLU):
            # ReLU has Lipschitz constant 1
            lipschitz *= 1.0
    
    return lipschitz

def certified_radius_from_lipschitz(model, x, margin):
    """
    Compute certified radius using Lipschitz constant
    
    If model has Lipschitz constant L and margin m,
    then certified radius is m/L
    """
    L = lipschitz_constant(model)
    
    # Compute margin: difference between top two class scores
    output = model(x)
    top_two = torch.topk(output, 2)[0]
    margin = top_two[0] - top_two[1]
    
    # Certified radius
    radius = margin / L
    
    return radius

### Lipschitz-Constrained Training

In [28]:
class LipschitzLinear(torch.nn.Module):
    """
    Linear layer with Lipschitz constraint
    """
    def __init__(self, in_features, out_features, lipschitz_bound=1.0):
        super().__init__()
        self.weight = torch.nn.Parameter(
            torch.randn(out_features, in_features)
        )
        self.bias = torch.nn.Parameter(torch.zeros(out_features))
        self.lipschitz_bound = lipschitz_bound
    
    def forward(self, x):
        # Normalize weight to satisfy Lipschitz constraint
        W = self.weight
        singular_values = torch.svd(W)[1]
        max_sv = singular_values.max()
        
        if max_sv > self.lipschitz_bound:
            W = W * (self.lipschitz_bound / max_sv)
        
        return torch.nn.functional.linear(x, W, self.bias)

def build_lipschitz_network(input_dim, hidden_dims, output_dim, 
                             lipschitz_bound=1.0):
    """
    Build network with Lipschitz constraint
    """
    layers = []
    
    # Input layer
    layers.append(LipschitzLinear(input_dim, hidden_dims[0], lipschitz_bound))
    layers.append(torch.nn.ReLU())
    
    # Hidden layers
    for i in range(len(hidden_dims) - 1):
        layers.append(LipschitzLinear(
            hidden_dims[i], hidden_dims[i+1], lipschitz_bound
        ))
        layers.append(torch.nn.ReLU())
    
    # Output layer
    layers.append(LipschitzLinear(hidden_dims[-1], output_dim, lipschitz_bound))
    
    return torch.nn.Sequential(*layers)

### Spectral Normalization

In [29]:
class SpectralNorm(torch.nn.Module):
    """
    Spectral normalization for Lipschitz constraint
    
    Reference: "Spectral Normalization for Generative Adversarial Networks"
               (Miyato et al., 2018)
    """
    def __init__(self, module, name='weight', power_iterations=1):
        super().__init__()
        self.module = module
        self.name = name
        self.power_iterations = power_iterations
        
        # Initialize u and v vectors for power iteration
        w = getattr(module, name)
        height = w.shape[0]
        width = w.view(height, -1).shape[1]
        
        self.register_buffer('u', torch.randn(height))
        self.register_buffer('v', torch.randn(width))
    
    def forward(self, *args):
        # Compute spectral norm via power iteration
        w = getattr(self.module, self.name)
        height = w.shape[0]
        w_mat = w.view(height, -1)
        
        # Power iteration
        u = self.u
        v = self.v
        for _ in range(self.power_iterations):
            v = torch.nn.functional.normalize(w_mat.t() @ u, dim=0)
            u = torch.nn.functional.normalize(w_mat @ v, dim=0)
        
        # Compute spectral norm
        sigma = u @ w_mat @ v
        
        # Normalize weight
        w_normalized = w / sigma
        setattr(self.module, self.name, w_normalized)
        
        # Update u and v
        self.u = u
        self.v = v
        
        return self.module(*args)

# Apply spectral normalization
def apply_spectral_norm(model):
    """
    Apply spectral normalization to all linear layers
    """
    for name, module in model.named_modules():
        if isinstance(module, torch.nn.Linear):
            spectral_norm = SpectralNorm(module)
            setattr(model, name, spectral_norm)
    return model

## Convex Relaxations


### Linear Relaxation of ReLU

In [30]:
def linear_relaxation_relu(lower, upper):
    """
    Compute linear relaxation of ReLU
    
    For x in [lower, upper], ReLU(x) is bounded by:
    - Lower bound: max(0, lower) if lower >= 0
                   0 if upper <= 0
                   linear interpolation otherwise
    - Upper bound: linear interpolation from (lower, ReLU(lower)) 
                   to (upper, ReLU(upper))
    """
    if lower >= 0:
        # ReLU is active
        return lower, upper
    elif upper <= 0:
        # ReLU is inactive
        return 0, 0
    else:
        # ReLU is ambiguous - use linear relaxation
        # Lower bound: 0 (ReLU could be inactive)
        # Upper bound: line from (lower, 0) to (upper, upper)
        slope = upper / (upper - lower)
        intercept = -slope * lower
        
        return 0, slope * upper + intercept

### CROWN (Certified ROBustness via Optimization)

In [31]:
class CROWN:
    """
    CROWN: Efficient certified bounds via linear relaxation
    
    Reference: "Efficient Neural Network Robustness Certification with 
                General Activation Functions" (Zhang et al., 2018)
    """
    def __init__(self, model):
        self.model = model
    
    def compute_bounds(self, x, epsilon):
        """
        Compute certified bounds using CROWN
        """
        # Initialize bounds
        lower = x - epsilon
        upper = x + epsilon
        
        # Backward bound propagation
        for layer in reversed(self.model.layers):
            if isinstance(layer, torch.nn.Linear):
                lower, upper = self.backward_linear(layer, lower, upper)
            elif isinstance(layer, torch.nn.ReLU):
                lower, upper = self.backward_relu(lower, upper)
        
        return lower, upper
    
    def backward_linear(self, layer, lower, upper):
        """
        Backward propagation through linear layer
        """
        W = layer.weight
        b = layer.bias
        
        # Compute bounds (similar to IBP but with optimization)
        W_pos = torch.clamp(W, min=0)
        W_neg = torch.clamp(W, max=0)
        
        new_lower = W_pos @ lower + W_neg @ upper + b
        new_upper = W_pos @ upper + W_neg @ lower + b
        
        return new_lower, new_upper
    
    def backward_relu(self, lower, upper):
        """
        Backward propagation through ReLU with linear relaxation
        """
        # Use linear relaxation for ambiguous neurons
        return linear_relaxation_relu(lower, upper)

## Comparison of Methods


### Certified Accuracy vs Epsilon

In [32]:
def compare_certified_methods(test_loader):
    """
    Compare different certification methods
    """
    methods = {
        'Randomized Smoothing': RandomizedSmoothing(model, sigma=0.25),
        'IBP': IntervalBoundPropagation(model),
        'Lipschitz': LipschitzConstrained(model),
        'CROWN': CROWN(model)
    }
    
    epsilons = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5]
    results = {method: [] for method in methods}
    
    for epsilon in epsilons:
        for method_name, method in methods.items():
            accuracy = evaluate_certified_accuracy(method, test_loader, epsilon)
            results[method_name].append(accuracy)
    
    # Plot results
    import matplotlib.pyplot as plt
    for method_name, accuracies in results.items():
        plt.plot(epsilons, accuracies, label=method_name)
    plt.xlabel('Epsilon')
    plt.ylabel('Certified Accuracy')
    plt.legend()
    plt.show()

### Trade-offs

| Method | Tightness | Scalability | Training Cost |
|--------|-----------|-------------|---------------|
| Randomized Smoothing | Moderate | High | Low |
| IBP | Loose | High | Moderate |
| Lipschitz | Loose | High | Low |
| CROWN | Tight | Moderate | Moderate |
| Exact Verification | Tightest | Low | High |

## Practical Considerations


### When to Use Certified Defenses

**Use When**:
- High-security applications
- Need provable guarantees
- Regulatory requirements
- Critical infrastructure
- Safety-critical systems

**Consider Alternatives When**:
- Accuracy is paramount
- Computational resources limited
- Threat model unclear
- Rapid iteration needed

### Limitations

In [33]:
# Limitation 1: Accuracy-Robustness Trade-off
clean_accuracy = 95%
certified_accuracy_at_epsilon_0.5 = 65%
# Must sacrifice some accuracy for certification

# Limitation 2: Computational Cost
certification_time = 100x * inference_time
# Certification is expensive

# Limitation 3: Limited Threat Models
# Most methods certify L2 or L∞ perturbations
# Don't cover all possible attacks

# Limitation 4: Scalability
# Harder to certify large models (e.g., GPT-4)

SyntaxError: invalid decimal literal (476864537.py, line 7)

### Best Practices

In [None]:
def certified_defense_best_practices():
    """
    Best practices for certified defenses
    """
    practices = [
        "1. Choose appropriate threat model (L2, L∞, etc.)",
        "2. Balance accuracy and robustness",
        "3. Use appropriate certification method for scale",
        "4. Train specifically for certification",
        "5. Validate on diverse test set",
        "6. Monitor computational costs",
        "7. Combine with empirical defenses",
        "8. Regular security audits"
    ]
    return practices

## Advanced Topics


### Certified Defense for Transformers

In [None]:
def certify_transformer(model, x, epsilon):
    """
    Certify transformer model
    
    Challenge: Attention mechanism is complex
    Solution: Use IBP with careful bound propagation
    """
    # Propagate bounds through embedding
    lower, upper = propagate_embedding(x, epsilon)
    
    # Propagate through attention layers
    for layer in model.transformer_layers:
        lower, upper = propagate_attention(layer, lower, upper)
    
    # Propagate through output
    lower, upper = propagate_output(model.output_layer, lower, upper)
    
    return lower, upper

### Certified Defense Against Semantic Attacks

In [None]:
def certify_semantic_robustness(model, x, semantic_transformations):
    """
    Certify robustness to semantic transformations
    
    Example: Paraphrases, synonyms, etc.
    """
    # Enumerate all possible semantic transformations
    transformed_inputs = [
        transform(x) for transform in semantic_transformations
    ]
    
    # Check if all transformations give same prediction
    predictions = [model(x_t) for x_t in transformed_inputs]
    
    if all(p == predictions[0] for p in predictions):
        return True, "Certified against semantic transformations"
    else:
        return False, None

### Combining Certified and Empirical Defenses

In [None]:
def hybrid_defense(model, x):
    """
    Combine certified and empirical defenses
    """
    # Layer 1: Empirical defense (adversarial training)
    x_defended = adversarial_training_defense(x)
    
    # Layer 2: Certified defense (randomized smoothing)
    smoothed_model = RandomizedSmoothing(model, sigma=0.25)
    pred, radius = smoothed_model.certify(x_defended)
    
    return pred, radius

## Case Studies


### Case 1: Autonomous Vehicle Perception

**Requirement**: Certify stop sign detection

In [None]:
# Must guarantee: No perturbation < epsilon changes detection
# Method: Randomized smoothing
# Result: Certified radius of 0.3 (L2 norm)
# Impact: Provable safety guarantee

### Case 2: Medical Diagnosis

**Requirement**: Certify disease classification

In [None]:
# Must guarantee: Robust to measurement noise
# Method: IBP with Lipschitz constraints
# Result: Certified against ±10% measurement variation
# Impact: Regulatory approval

### Case 3: Financial Fraud Detection

**Requirement**: Certify fraud detection

In [None]:
# Must guarantee: Robust to adversarial manipulation
# Method: CROWN certification
# Result: Certified against L∞ perturbations of 0.1
# Impact: Reduced false negatives from adversarial attacks

## Tools and Libraries


### Certification Libraries

In [None]:
# AutoLiRPA (Automatic Linear Relaxation based Perturbation Analysis)
from auto_LiRPA import BoundedModule, BoundedTensor, PerturbationLpNorm

model = BoundedModule(model, torch.zeros(input_shape))
ptb = PerturbationLpNorm(norm=np.inf, eps=epsilon)
bounded_input = BoundedTensor(x, ptb)
lower, upper = model.compute_bounds(x=(bounded_input,))

# CROWN
from crown import CROWN
crown = CROWN(model)
certified = crown.certify(x, epsilon)

# Randomized Smoothing (smoothing library)
from smoothing import Smooth
smooth = Smooth(model, num_classes, sigma)
prediction, radius = smooth.certify(x, n=10000, alpha=0.001)

## Summary


### Key Takeaways

1. **Provable Guarantees**: Certified defenses provide mathematical guarantees
2. **Multiple Methods**: Randomized smoothing, IBP, Lipschitz, CROWN
3. **Trade-offs**: Accuracy vs robustness, tightness vs scalability
4. **Strongest Defense**: Certified defenses are strongest form of robustness
5. **Practical Limits**: Computational cost and scalability challenges

### When to Use Each Method

**Randomized Smoothing**:
- Need tight bounds
- Can afford inference cost
- L2 threat model

**IBP**:
- Need fast certification
- Training time is critical
- Can accept looser bounds

**Lipschitz Constraints**:
- Simple implementation
- Moderate robustness sufficient
- Want interpretable guarantees

**CROWN**:
- Need tight bounds
- Moderate-sized models
- Can afford computation

## References


### Key Papers

1. "Certified Adversarial Robustness via Randomized Smoothing" (Cohen et al., 2019)
2. "Towards Fast Computation of Certified Robustness for ReLU Networks" (Weng et al., 2018)
3. "Efficient Neural Network Robustness Certification with General Activation Functions" (Zhang et al., 2018)
4. "Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope" (Wong & Kolter, 2018)
5. "Scaling provable adversarial defenses" (Gowal et al., 2018)

### Resources

- [AutoLiRPA](https://github.com/Verified-Intelligence/auto_LiRPA)
- [CROWN](https://github.com/huanzhang12/CROWN-IBP)
- [Randomized Smoothing](https://github.com/locuslab/smoothing)
- [Certified Defenses Survey](https://arxiv.org/abs/2009.04131)

## Next Steps


1. Complete [Lab 5: Certified Robustness](labs/lab5_certified_robustness.ipynb)
2. Implement randomized smoothing
3. Compare certification methods
4. Evaluate certified accuracy
5. Study latest certification research

---

**Difficulty**: ⭐⭐⭐⭐⭐ Expert Level
**Prerequisites**: Optimization, probability theory, linear algebra
**Estimated Time**: 4-5 hours