# 🔥 Tutorial: Adversarial Attacks with FGSM

![Adversarial Examples](https://docs.pytorch.org/tutorials/_images/fgsm_panda_image.png)

## Welcome to the Fascinating World of Adversarial Attacks! 🎯

In this comprehensive tutorial, you'll learn:
- 🤖 What are adversarial attacks and why they matter
- 🔬 The mathematics behind Fast Gradient Sign Method (FGSM)
- 💻 Hands-on implementation in PyTorch
- 🎨 Visualization of attack effects
- 🧪 Interactive exercises to build your skills

By the end, you'll be ready to implement sophisticated adversarial attacks!


## 📚 Table of Contents

1. [🎓 Understanding Adversarial Attacks](#1--understanding-adversarial-attacks)
2. [🧮 Mathematical Foundation](#2--mathematical-foundation)
3. [🔧 Setting Up the Environment](#3--setting-up-the-environment)
4. [🏗️ Building a Target Model](#4--building-a-target-model)
5. [🎯 Your First Attack](#5--your-first-attack)
6. [📈 Fast Gradient Sign Method (FGSM)](#6--fast-gradient-sign-method-fgsm)
7. [💼 Building the Complete Solution](#7--building-the-complete-solution)
8. [🎨 Visualizing Attack Effects](#8--visualizing-attack-effects)
9. [🎮 Interactive Exercises](#9--interactive-exercises)
10. [🚀 Next Steps and Advanced Topics](#10--next-steps-and-advanced-topics)


## 1. 🎓 Understanding Adversarial Attacks

### What are Adversarial Attacks?

Imagine you have a perfect photo of a panda 🐼. A human can clearly see it's a panda. But what if we could add tiny, nearly invisible changes to the image that make a neural network think it's a gibbon? That's an **adversarial attack**!

### Key Concepts:

- **Original Image**: The clean, unmodified input
- **Perturbation**: Small noise added to the image  
- **Adversarial Example**: Original + Perturbation
- **Attack Success**: When the model misclassifies the adversarial example

### Mathematical Definition:

$$x_{adv} = x + \delta$$

Where:
- $x$ is the original input
- $\delta$ is the perturbation (very small!)
- $x_{adv}$ is the adversarial example

### Why Should We Care?

🚗 **Autonomous vehicles**: Malicious signs could fool self-driving cars  
🏥 **Medical diagnosis**: Adversarial examples in medical imaging  
🔒 **Security systems**: Fooling facial recognition systems  
🛡️ **Model robustness**: Understanding vulnerabilities helps build better defenses  


## 2. 🧮 Mathematical Foundation

Before diving into implementation, let's understand the math behind adversarial attacks.

### The Loss Function Perspective

Neural networks learn by minimizing a loss function $L(\theta, x, y)$ where:
- $\theta$ are the model parameters
- $x$ is the input
- $y$ is the true label

### The Adversarial Objective

For attacks, we want to **maximize** the loss by modifying the input:

$$\max_{\delta} L(\theta, x + \delta, y) \text{ subject to } ||\delta||_\infty \leq \epsilon$$

This means: "Find the smallest perturbation $\delta$ that maximizes the model's confusion"

### Gradient-Based Attacks

The key insight: **gradients tell us how to change inputs to increase loss!**

$$\delta = \epsilon \cdot \text{sign}(\nabla_x L(\theta, x, y))$$

This is the heart of **Fast Gradient Sign Method (FGSM)**!


### 💡 Key Insight

**Why does this work?**

Neural networks are trained to be smooth and continuous. Small changes in input should lead to small changes in output. But this smoothness is also their weakness - we can exploit the gradient information to find directions that maximally confuse the model!

Think of it like finding the steepest hill to climb - gradients point us in the right direction!


## 3. 🔧 Setting Up the Environment

Let's start by importing all the necessary libraries and setting up our environment. We'll be working with PyTorch for neural networks and various other libraries for data manipulation and visualization.


In [None]:
# Essential imports for our adversarial attacks tutorial
import os
import numpy as np
import matplotlib.pyplot as plt
from copy import deepcopy

# PyTorch for neural networks
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms

# For image similarity metrics
from skimage.metrics import structural_similarity as ssim
from PIL import Image

# Set up device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"🚀 Using device: {device}")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("✅ Environment setup complete!")

In [None]:
# Define our target neural network
class SimpleNet(nn.Module):
    """
    A simple CNN for MNIST digit classification.
    This will be our target for adversarial attacks!
    """

    def __init__(self):
        super(SimpleNet, self).__init__()
        # First convolutional layer: 1 input channel (grayscale), 32 output channels
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=0)
        # Second convolutional layer: 32 input channels, 64 output channels
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=0)
        # Max pooling layer
        self.pool = nn.MaxPool2d(kernel_size=3, stride=2)
        # Fully connected layers
        self.fc1 = nn.Linear(64 * 11 * 11, 128)  # 64 channels * 11x11 feature maps
        self.fc2 = nn.Linear(128, 10)  # 10 output classes (digits 0-9)

    def forward(self, x):
        # Apply first conv layer + ReLU activation
        x = F.relu(self.conv1(x))
        # Apply second conv layer + ReLU + pooling
        x = F.relu(self.pool(self.conv2(x)))
        # Flatten for fully connected layers
        x = torch.flatten(x, 1)
        # Apply first fully connected layer
        x = self.fc1(x)
        # Apply output layer with log_softmax
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


# Create our model
net = SimpleNet()
print("🧠 Neural network created!")
print(f"📊 Model has {sum(p.numel() for p in net.parameters())} parameters")

# Move to device
net = net.to(device)
print(f"🔥 Model moved to {device}")

## 4. 🏗️ Building a Target Model

Before we can attack a neural network, we need one to attack! Let's create a simple convolutional neural network for MNIST digit classification. This will be our target model.


### 📊 Create Some Sample Data

For this tutorial, we'll create a simple sample dataset to work with. In the real problem, you'll use the provided dataset.


In [None]:
# Create sample data for demonstration
# In practice, you'll load this from the provided dataset


def create_sample_data(n_samples=100):
    """Create sample MNIST-like data for demonstration"""
    # Create random 28x28 images (normalized to [-1, 1])
    data = np.random.randn(n_samples, 28, 28) * 0.3
    # Create random labels (0-9)
    labels = np.random.randint(0, 10, n_samples)
    return data, labels


def normalize_samples(samples):
    """Normalize samples to [-1, 1] range (same as in the problem)"""
    assert len(samples.shape) == 3
    samples = samples.reshape(-1, 28 * 28)
    minimum_values = np.min(samples, axis=1)
    maximum_values = np.max(samples, axis=1)

    # Avoid division by zero
    range_values = maximum_values - minimum_values
    range_values[range_values == 0] = 1

    normalized_samples = (
        2 * (samples - minimum_values[:, np.newaxis]) / range_values[:, np.newaxis]
    ) - 1

    normalized_samples = normalized_samples.reshape(normalized_samples.shape[0], 28, 28)
    return normalized_samples


# Create sample data
X_sample, y_sample = create_sample_data(10)
X_sample = normalize_samples(X_sample)

print(f"📊 Created sample dataset:")
print(f"   Shape: {X_sample.shape}")
print(f"   Value range: [{X_sample.min():.3f}, {X_sample.max():.3f}]")
print(f"   Labels: {y_sample}")

## 5. 🎯 Your First Attack

Now let's implement our first adversarial attack! We'll start with a simple version to understand the core concepts.

### Understanding Gradients in Attacks

The key insight is that we need to:
1. Forward pass: Get the model's prediction
2. Compute loss: Calculate how wrong we want the model to be
3. Backward pass: Get gradients with respect to input pixels
4. Create perturbation: Use gradients to modify the input


In [None]:
def simple_attack_demo(model, image, epsilon=0.1):
    """
    A simple demonstration of how adversarial attacks work.
    This is a basic version to understand the concepts.
    """
    # Ensure model is in evaluation mode
    model.eval()

    # Convert numpy to tensor and add batch dimension
    image_tensor = (
        torch.tensor(image, dtype=torch.float32).unsqueeze(0).unsqueeze(0).to(device)
    )
    image_tensor.requires_grad = True  # This is crucial for computing gradients!

    # Forward pass
    output = model(image_tensor)

    # Get the predicted class
    predicted_class = output.argmax(dim=1)
    print(f"🎯 Original prediction: class {predicted_class.item()}")

    # Compute loss - we want to maximize it!
    loss = F.nll_loss(output, predicted_class)

    # Backward pass to get gradients
    model.zero_grad()
    loss.backward()

    # Get the gradients
    data_grad = image_tensor.grad.data

    # Create the perturbation using the sign of gradients
    perturbation = epsilon * data_grad.sign()

    # Apply the perturbation
    perturbed_image = image_tensor + perturbation

    # Clamp to valid range [-1, 1]
    perturbed_image = torch.clamp(perturbed_image, -1, 1)

    # Test the perturbed image
    with torch.no_grad():
        perturbed_output = model(perturbed_image)
        perturbed_prediction = perturbed_output.argmax(dim=1)

    print(f"💥 After attack: class {perturbed_prediction.item()}")
    print(
        f"🎉 Attack {'successful' if predicted_class != perturbed_prediction else 'failed'}!"
    )

    return perturbed_image.squeeze().cpu().numpy(), perturbation.squeeze().cpu().numpy()


# Let's try our first attack!
print("🚀 Attempting our first adversarial attack...")
original_img = X_sample[0]
attacked_img, perturbation = simple_attack_demo(net, original_img, epsilon=0.2)

## 6. 📈 Fast Gradient Sign Method (FGSM)

Now let's dive deeper into FGSM and implement the advanced version with masking and optimization techniques, similar to what you saw in the solution.

### Key Improvements in Advanced FGSM:

1. **Gradient Masking**: Focus only on the most influential pixels
2. **Quantile-based Selection**: Use statistical thresholds to choose pixels
3. **Batch Processing**: Efficiently process multiple images
4. **Robust Perturbation**: Better control over attack strength


In [None]:
def fgsm_attack(image, epsilon, data_grad):
    """
    Advanced FGSM implementation with gradient masking.

    This is the core FGSM function similar to what you'll implement in the solution.

    Args:
        image: Tensor with image to perturb [batch_size, channels, height, width]
        epsilon: Attack strength - maximum perturbation value (usually 0.1-0.3)
        data_grad: Gradient of loss function with respect to image pixels

    Returns:
        perturbed_image: The adversarially perturbed image
    """
    # Take absolute value to find most influential pixels
    abs_data_grad = data_grad.abs()

    # ADVANCED TECHNIQUE: Gradient masking
    # Instead of using all pixels, focus on the most important ones
    # Use 70th percentile (top 30% of most influential pixels)
    threshold = torch.quantile(abs_data_grad, 0.7)
    mask = (abs_data_grad >= threshold).float()
    masked_data_grad = data_grad * mask

    # Take the sign of the gradient - this is the core of FGSM!
    sign_data_grad = masked_data_grad.sign()

    # Create perturbation: epsilon controls the attack strength
    perturbed_image = image + epsilon * sign_data_grad

    # Clamp to valid pixel range [-1, 1]
    perturbed_image = torch.clamp(perturbed_image, -1, 1)

    return perturbed_image


# Test the advanced FGSM function
print("🔬 Testing advanced FGSM with gradient masking...")

# Convert sample image to tensor
test_image = (
    torch.tensor(X_sample[0], dtype=torch.float32).unsqueeze(0).unsqueeze(0).to(device)
)
test_image.requires_grad = True

# Forward pass
net.eval()
output = net(test_image)
pred_class = output.argmax(dim=1)
print(f"Original prediction: {pred_class.item()}")

# Compute loss and gradients
loss = F.nll_loss(output, pred_class)
net.zero_grad()
loss.backward()

# Apply advanced FGSM
attacked_image = fgsm_attack(test_image, epsilon=0.25, data_grad=test_image.grad.data)

# Test result
with torch.no_grad():
    attacked_output = net(attacked_image)
    attacked_pred = attacked_output.argmax(dim=1)
    print(f"After advanced attack: {attacked_pred.item()}")
    print(f"Attack success: {pred_class.item() != attacked_pred.item()}")

### 🎯 Interactive Exercise 2

**Your Turn!** Try modifying the epsilon value and threshold in the FGSM function above.

1. What happens when you increase epsilon from 0.25 to 0.3?
2. What happens when you change the quantile threshold from 0.7 to 0.5?
3. Can you make the attack more subtle but still effective?

*(Experiment with the code above and write your observations below)*

**Your observations**: ...


## 7. 💼 Building the Complete Solution

Now let's build the complete `perturbe_dataset` function that processes an entire dataset, just like in the final solution. This will tie everything together!


In [None]:
def perturbe_dataset(original_dataset):
    """
    Complete function to perturb an entire dataset using FGSM attacks.

    This is the main function you'll need to implement for the problem!

    Args:
        original_dataset: Numpy array of shape (num_images, 28, 28)

    Returns:
        perturbed_dataset: Adversarially modified dataset (same shape)
    """
    global net  # Use the global network
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # INPUT VALIDATION
    assert (
        len(original_dataset.shape) == 3
    ), "Dataset must have 3 dimensions: (batch, height, width)"

    # PREPARE DATA
    # Convert numpy to PyTorch tensor and add channel dimension
    original_dataset = (
        torch.tensor(original_dataset, dtype=torch.float32).unsqueeze(1).to(device)
    )
    perturbed_dataset = deepcopy(original_dataset)

    # PREPARE MODEL
    net.eval()  # Set to evaluation mode (important!)
    net.to(device)

    # ATTACK HYPERPARAMETER
    epsilon = 0.29  # Attack strength - experiment with this!

    print(f"🔥 Starting adversarial attack with epsilon = {epsilon}")
    print(f"📊 Number of images to attack: {perturbed_dataset.size(0)}")

    # MAIN ATTACK LOOP
    for i in range(perturbed_dataset.size(0)):
        # Get single image and enable gradient computation
        data = perturbed_dataset[i].unsqueeze(0).detach()  # Add batch dimension
        data.requires_grad = True  # CRUCIAL: Enable gradient computation

        # FORWARD PASS
        output = net(data)  # Get model predictions

        # Find predicted class
        label = output.max(1, keepdim=True)[1].view(-1)

        # COMPUTE LOSS
        loss = F.nll_loss(output, label)

        # BACKWARD PASS: Compute gradients
        net.zero_grad()  # Clear gradients
        loss.backward()  # Compute gradients w.r.t. input
        data_grad = data.grad.data  # Extract gradients

        # APPLY FGSM ATTACK
        perturbed_data = fgsm_attack(data, epsilon, data_grad)

        # Save the perturbed image
        perturbed_dataset[i] = perturbed_data.squeeze()

        # Progress indicator
        if (i + 1) % 2 == 0:  # Every 2 images for demo
            print(f"✅ Processed {i + 1}/{perturbed_dataset.size(0)} images")

    # CONVERT BACK TO NUMPY
    perturbed_dataset = perturbed_dataset.cpu().detach().numpy()
    perturbed_dataset = perturbed_dataset.squeeze(1)  # Remove channel dimension

    # OUTPUT VALIDATION
    assert len(perturbed_dataset.shape) == 3, "Output must have 3 dimensions"
    assert perturbed_dataset.shape[1] == 28, "Height must be 28 pixels"
    assert perturbed_dataset.shape[2] == 28, "Width must be 28 pixels"

    print("🎯 Adversarial attack completed successfully!")

    return perturbed_dataset


# Test our complete solution!
print("🚀 Testing complete perturbe_dataset function...")
sample_subset = X_sample[:5]  # Use first 5 images for demo
attacked_dataset = perturbe_dataset(sample_subset)

## 8. 🎨 Visualizing Attack Effects <a id="section8"></a>

Visual learners rejoice! Let's create some visualizations to see how adversarial attacks actually affect images and understand what's happening under the hood.


In [None]:
# Create a more meaningful sample image for visualization
def create_pattern_image():
    """Create a simple pattern image that's easier to see changes in"""
    img = np.zeros((28, 28))
    # Create a simple pattern - diagonal lines and squares
    for i in range(28):
        for j in range(28):
            if i == j or i + j == 27:  # Diagonals
                img[i, j] = 1.0
            elif 8 <= i <= 19 and 8 <= j <= 19:  # Center square
                if i % 2 == j % 2:
                    img[i, j] = 0.5
    return img


# Create pattern image
pattern_img = create_pattern_image()
pattern_img_normalized = normalize_samples(pattern_img.reshape(1, 28, 28))[0]

print("📸 Created pattern image for better visualization")
print(
    f"Value range: [{pattern_img_normalized.min():.3f}, {pattern_img_normalized.max():.3f}]"
)

In [None]:
def visualize_attack_comparison(
    original, attacked, perturbation, title="Adversarial Attack Visualization"
):
    """
    Create a comprehensive visualization comparing original vs attacked images
    """
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    fig.suptitle(title, fontsize=16, fontweight="bold")

    # Original image
    axes[0, 0].imshow(original, cmap="gray", vmin=-1, vmax=1)
    axes[0, 0].set_title("🖼️ Original Image", fontsize=12, fontweight="bold")
    axes[0, 0].axis("off")

    # Attacked image
    axes[0, 1].imshow(attacked, cmap="gray", vmin=-1, vmax=1)
    axes[0, 1].set_title("💥 Attacked Image", fontsize=12, fontweight="bold")
    axes[0, 1].axis("off")

    # Perturbation (amplified for visibility)
    perturbation_amplified = perturbation * 10  # Amplify for visualization
    axes[0, 2].imshow(perturbation_amplified, cmap="RdBu", vmin=-1, vmax=1)
    axes[0, 2].set_title(
        "🔍 Perturbation (10x amplified)", fontsize=12, fontweight="bold"
    )
    axes[0, 2].axis("off")

    # Difference map
    difference = np.abs(attacked - original)
    axes[1, 0].imshow(difference, cmap="hot", vmin=0, vmax=0.5)
    axes[1, 0].set_title("📊 Absolute Difference", fontsize=12, fontweight="bold")
    axes[1, 0].axis("off")

    # Histogram of pixel values
    axes[1, 1].hist(
        original.flatten(), bins=50, alpha=0.7, label="Original", color="blue"
    )
    axes[1, 1].hist(
        attacked.flatten(), bins=50, alpha=0.7, label="Attacked", color="red"
    )
    axes[1, 1].set_title("📈 Pixel Value Distributions", fontsize=12, fontweight="bold")
    axes[1, 1].legend()
    axes[1, 1].set_xlabel("Pixel Value")
    axes[1, 1].set_ylabel("Frequency")

    # Statistics
    axes[1, 2].axis("off")
    max_pert = np.max(np.abs(perturbation))
    mean_pert = np.mean(np.abs(perturbation))
    ssim_val = ssim(original, attacked, data_range=2)
    pixels_changed = np.sum(np.abs(perturbation) > 1e-6)
    attack_strength = np.sqrt(np.mean(perturbation**2))

    stats_text = f"""📊 Attack Statistics:

• Max perturbation: {max_pert:.4f}
• Mean perturbation: {mean_pert:.4f}
• SSIM similarity: {ssim_val:.4f}
• Pixels changed: {pixels_changed}/{original.size}
• Attack strength: {attack_strength:.4f}"""

    axes[1, 2].text(
        0.1,
        0.5,
        stats_text,
        fontsize=11,
        verticalalignment="center",
        fontfamily="monospace",
        bbox=dict(boxstyle="round,pad=0.5", facecolor="lightgray"),
    )

    plt.tight_layout()
    plt.show()


print("📊 Visualization function ready!")

In [None]:
# Demonstrate the attack on our pattern image
print("🎨 Creating visual demonstration of adversarial attack...")

# Convert pattern image to tensor
demo_image = (
    torch.tensor(pattern_img_normalized, dtype=torch.float32)
    .unsqueeze(0)
    .unsqueeze(0)
    .to(device)
)
demo_image.requires_grad = True

# Apply attack
net.eval()
output = net(demo_image)
pred_class = output.argmax(dim=1)
loss = F.nll_loss(output, pred_class)

net.zero_grad()
loss.backward()

# Create attacked version with moderate epsilon
demo_attacked = fgsm_attack(demo_image, epsilon=0.2, data_grad=demo_image.grad.data)
demo_perturbation = (demo_attacked - demo_image).squeeze().cpu().detach().numpy()

# Convert back to numpy for visualization
demo_original = demo_image.squeeze().cpu().detach().numpy()
demo_attacked_np = demo_attacked.squeeze().cpu().detach().numpy()

print(f"📊 Original prediction: class {pred_class.item()}")

# Test attacked version
with torch.no_grad():
    attacked_output = net(demo_attacked)
    attacked_pred = attacked_output.argmax(dim=1)
    print(f"💥 Attacked prediction: class {attacked_pred.item()}")

# Visualize the results
visualize_attack_comparison(
    demo_original,
    demo_attacked_np,
    demo_perturbation,
    "FGSM Attack Visualization - Pattern Image",
)

In [None]:
def compare_epsilon_effects(original_img, epsilons=[0.1, 0.2, 0.3]):
    """
    Compare the effect of different epsilon values
    """
    fig, axes = plt.subplots(2, len(epsilons) + 1, figsize=(15, 8))
    fig.suptitle(
        "🔬 Effect of Different Epsilon Values", fontsize=16, fontweight="bold"
    )

    # Convert to tensor
    image_tensor = (
        torch.tensor(original_img, dtype=torch.float32)
        .unsqueeze(0)
        .unsqueeze(0)
        .to(device)
    )
    image_tensor.requires_grad = True

    # Get gradients
    net.eval()
    output = net(image_tensor)
    pred_class = output.argmax(dim=1)
    loss = F.nll_loss(output, pred_class)
    net.zero_grad()
    loss.backward()
    data_grad = image_tensor.grad.data

    # Original image
    axes[0, 0].imshow(original_img, cmap="gray", vmin=-1, vmax=1)
    axes[0, 0].set_title("Original", fontweight="bold")
    axes[0, 0].axis("off")
    axes[1, 0].axis("off")

    # Test different epsilon values
    results = []
    for i, eps in enumerate(epsilons):
        # Apply attack
        attacked = fgsm_attack(image_tensor, eps, data_grad)
        attacked_np = attacked.squeeze().cpu().detach().numpy()
        perturbation = (attacked - image_tensor).squeeze().cpu().detach().numpy()

        # Test prediction
        with torch.no_grad():
            attacked_output = net(attacked)
            attacked_pred = attacked_output.argmax(dim=1)
            confidence = torch.softmax(attacked_output, dim=1).max().item()

        results.append(
            {
                "epsilon": eps,
                "attacked": attacked_np,
                "perturbation": perturbation,
                "prediction": attacked_pred.item(),
                "confidence": confidence,
            }
        )

        # Plot attacked image
        axes[0, i + 1].imshow(attacked_np, cmap="gray", vmin=-1, vmax=1)
        axes[0, i + 1].set_title(
            f"ε = {eps}\\nPred: {attacked_pred.item()}", fontweight="bold"
        )
        axes[0, i + 1].axis("off")

        # Plot perturbation
        axes[1, i + 1].imshow(perturbation * 10, cmap="RdBu", vmin=-1, vmax=1)
        axes[1, i + 1].set_title(
            f"Perturbation (10x)\\nConf: {confidence:.2f}", fontsize=10
        )
        axes[1, i + 1].axis("off")

    plt.tight_layout()
    plt.show()

    return results


# Demonstrate epsilon comparison
print("🔬 Comparing different epsilon values...")
epsilon_results = compare_epsilon_effects(pattern_img_normalized, [0.1, 0.2, 0.3, 0.4])

## 9. 🎮 Interactive Exercises

Now that you understand the concepts and have seen the visualizations, let's practice with some hands-on exercises!


### 🎯 Exercise 3: Implement Your Own Attack Variant

**Challenge**: Modify the FGSM attack to use different strategies. Try implementing:

1. **Random Sign Attack**: Instead of using gradient signs, use random signs
2. **Targeted Attack**: Try to make the model predict a specific wrong class
3. **Iterative FGSM**: Apply FGSM multiple times with smaller epsilon values

Use the cell below to experiment!


In [None]:
# 🧪 Experiment playground - Try your own attack variations!


def random_sign_attack(image, epsilon):
    """
    Random sign attack - uses random directions instead of gradients
    """
    random_signs = torch.sign(torch.randn_like(image))
    perturbed = image + epsilon * random_signs
    return torch.clamp(perturbed, -1, 1)


def targeted_fgsm_attack(image, epsilon, data_grad, target_class):
    """
    Targeted FGSM - tries to make the model predict a specific class
    Instead of maximizing loss, we minimize loss for the target class
    """
    # For targeted attacks, we subtract (not add) the gradient sign
    # This moves toward the target class instead of away from the original
    abs_data_grad = data_grad.abs()
    threshold = torch.quantile(abs_data_grad, 0.7)
    mask = (abs_data_grad >= threshold).float()
    masked_data_grad = data_grad * mask

    sign_data_grad = masked_data_grad.sign()
    perturbed_image = image - epsilon * sign_data_grad  # Note the minus sign!
    return torch.clamp(perturbed_image, -1, 1)


def iterative_fgsm_attack(image, epsilon, model, num_steps=5):
    """
    Iterative FGSM - apply attack in multiple small steps
    """
    alpha = epsilon / num_steps  # Step size
    perturbed = image.clone()

    for step in range(num_steps):
        perturbed.requires_grad = True
        output = model(perturbed)
        pred_class = output.argmax(dim=1)
        loss = F.nll_loss(output, pred_class)

        model.zero_grad()
        loss.backward()

        # Take a small step
        data_grad = perturbed.grad.data
        perturbed = perturbed + alpha * data_grad.sign()

        # Ensure we stay within epsilon budget from original
        perturbation = perturbed - image
        perturbation = torch.clamp(perturbation, -epsilon, epsilon)
        perturbed = image + perturbation
        perturbed = torch.clamp(perturbed, -1, 1)
        perturbed = perturbed.detach()

    return perturbed


print("🎮 Attack variants ready for experimentation!")
print(
    "Try calling: random_sign_attack(), targeted_fgsm_attack(), or iterative_fgsm_attack()"
)

### 🎯 Exercise 4: Compare Attack Methods

**Your Turn!** Use the code below to compare different attack methods on the same image:


In [None]:
# Compare different attack methods
def compare_attack_methods():
    """Compare FGSM, Random, and Iterative attacks"""
    test_img = (
        torch.tensor(pattern_img_normalized, dtype=torch.float32)
        .unsqueeze(0)
        .unsqueeze(0)
        .to(device)
    )
    test_img.requires_grad = True

    # Get original prediction and gradients
    net.eval()
    output = net(test_img)
    original_pred = output.argmax(dim=1).item()
    loss = F.nll_loss(output, output.argmax(dim=1))
    net.zero_grad()
    loss.backward()
    grad = test_img.grad.data

    epsilon = 0.2

    # Apply different attacks
    fgsm_result = fgsm_attack(test_img, epsilon, grad)
    random_result = random_sign_attack(test_img, epsilon)
    iterative_result = iterative_fgsm_attack(test_img, epsilon, net, num_steps=5)

    # Test predictions
    attacks = [
        ("Original", test_img, original_pred),
        ("FGSM", fgsm_result, None),
        ("Random", random_result, None),
        ("Iterative", iterative_result, None),
    ]

    # Get predictions for attacked images
    with torch.no_grad():
        for i in range(1, len(attacks)):
            name, img, _ = attacks[i]
            pred = net(img).argmax(dim=1).item()
            attacks[i] = (name, img, pred)

    # Visualize results
    fig, axes = plt.subplots(1, 4, figsize=(16, 4))
    fig.suptitle("🔬 Comparison of Attack Methods", fontsize=16, fontweight="bold")

    for i, (name, img, pred) in enumerate(attacks):
        img_np = img.squeeze().cpu().detach().numpy()
        axes[i].imshow(img_np, cmap="gray", vmin=-1, vmax=1)
        axes[i].set_title(f"{name}\\nPrediction: {pred}", fontweight="bold")
        axes[i].axis("off")

    plt.tight_layout()
    plt.show()

    # Print success rates
    print("\\n📊 Attack Results:")
    for name, _, pred in attacks[1:]:  # Skip original
        success = "✅ Success" if pred != original_pred else "❌ Failed"
        print(f"  {name}: {success} (Original: {original_pred} → New: {pred})")


# Run the comparison
print("🎯 Comparing different attack methods...")
compare_attack_methods()

## 10. 🚀 Next Steps and Advanced Topics

Congratulations! 🎉 You've learned the fundamentals of adversarial attacks and seen them in action. Here's what you can explore next:


### Advanced Attack Methods:
- **PGD (Projected Gradient Descent)**: Iterative refinement of FGSM
- **C&W Attack**: Optimization-based approach with different loss functions  
- **DeepFool**: Minimal perturbation attacks
- **Adversarial Patches**: Physical-world attacks

### Defense Techniques:
- **Adversarial Training**: Train models on adversarial examples
- **Gradient Masking**: Hide gradient information  
- **Input Preprocessing**: Denoise or transform inputs
- **Certified Defenses**: Provable robustness guarantees

### Useful Resources:
- 📚 [Adversarial Examples in the Physical World](https://arxiv.org/abs/1607.02533)
- 📚 [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572)
- 🛠️ [CleverHans Library](https://github.com/cleverhans-lab/cleverhans)
- 🛠️ [Adversarial Robustness Toolbox](https://github.com/Trusted-AI/adversarial-robustness-toolbox)"


## 🎓 Summary and Key Takeaways

### What You've Learned:

1. **🔍 Adversarial Attacks Fundamentals**:
   - Small perturbations can fool neural networks
   - Attacks exploit the smoothness of neural networks
   - Balance between attack effectiveness and perturbation size

2. **🧮 FGSM (Fast Gradient Sign Method)**:
   - Uses gradient information to find optimal perturbations
   - Simple yet effective: `δ = ε × sign(∇_x L(θ, x, y))`
   - Can be enhanced with gradient masking and quantile thresholds

3. **💻 Implementation Skills**:
   - PyTorch gradient computation with `requires_grad=True`
   - Batch processing for efficient attacks
   - Proper tensor manipulation and device handling

4. **🛡️ Advanced Techniques**:
   - Gradient masking for focused attacks
   - Iterative approaches for stronger attacks
   - Evaluation metrics (SSIM, accuracy drop)

### For the Problem Solution:
You now have all the knowledge to implement the `perturbe_dataset` function! The key components are:
- Use the `fgsm_attack` function with gradient masking
- Process each image individually with gradient computation
- Choose appropriate epsilon value (around 0.29)
- Ensure proper input/output format validation

**Good luck with your implementation!** 🚀"
