# Robust Training with Jacobinet and Adversarial Attacks

This tutorial demonstrates the use of Jacobinet for robust training in neural networks. 
Jacobinet allows the backward pass of a neural network to be represented as a neural network with shared weights. 

### Goals of this Tutorial 🎯

* Understand the impact of adversarial attacks on a standard neural network.
* Implement robust adversarial training using Jacobinet to generate adversarial examples during the training loop.
* Evaluate and compare the robustness of a baseline model and a robustly trained model using the **AutoAttack** benchmark.

**Goals:**
- Understand adversarial attacks (FGSM, PGD) and their impact on model robustness.
- Use Jacobinet to implement robust training by regularizing against adversarial examples.
- Evaluate robustness with AutoAttack for both baseline and robust training.

We will:
1. Train a baseline model and evaluate its adversarial robustness.
2. Train a robust model with adversarial regularization using Jacobinet.
3. Compare adversarial success rates for both models.


- When running this notebook on Colab, we need to install *jacobinet* if on Colab. 
- If you run this notebook locally, do it inside the environment in which you [installed *jacobinet*](https://ducoffeM.github.io/jacobinet/main/install.html).

In [None]:
# On Colab: install the library
on_colab = "google.colab" in str(get_ipython())
if on_colab:
    import sys  # noqa: avoid having this import removed by pycln

    # install dev version for dev doc, or release version for release doc
    !{sys.executable} -m pip install -U pip
    !{sys.executable} -m pip install git+https://github.com/ducoffeM/jacobinet@main#egg=decomon
    # install desired backend (by default torch)
    !{sys.executable} -m pip install "torch"
    !{sys.executable} -m pip install "keras"

    # extra librabry used in this notebook
    !{sys.executable} -m pip install "torchattacks"
    !{sys.executable} -m pip install "numpy"
    !{sys.executable} -m pip install "matplotlib"

In [None]:
# Set this environment variable *before* importing torch, otherwise it has no effect.
# Ideally, we'd only set this if torch.backends.mps.is_available() is True,
# but checking that requires importing torch first, which would make this setting too late.
# So we preemptively enable the MPS fallback just in case MPS is available.
import os

os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

## Load and Preprocess Data

We will use the MNIST dataset for this tutorial. The dataset is normalized to the [0, 1] range and reshaped for compatibility with the convolutional model.


In [None]:
import keras
import matplotlib.pyplot as plt
import numpy as np
import torch

# Load the MNIST data and split it into training and testing sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale the images to the [0, 1] range
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Reshape images to have an additional channel dimension (1, 28, 28)
x_train = np.expand_dims(x_train, 1)
x_test = np.expand_dims(x_test, 1)

# Convert class labels to one-hot encoded vectors
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

## Define and Train the Baseline Model

We will build a simple Convolutional Neural Network (CNN) using Keras to serve as the baseline model. 
This model will be trained on MNIST and evaluated for accuracy on clean data.


In [None]:
from keras import Sequential, layers

# Define the model architecture


model = Sequential(
    [
        layers.Input(shape=(1, 28, 28)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="linear"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.Flatten(),
        layers.Dense(10),
    ]
)
model.summary()


train_model = Sequential(model.layers + [layers.Activation("softmax")])

train_model.compile(
    loss=keras.losses.CategoricalCrossentropy(from_logits=False),
    optimizer="adam",
    metrics=["accuracy"],
)


model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

train_model.fit(x_train, y_train, batch_size=128, epochs=2, validation_split=0.1)

## Evaluate Robustness of Baseline Model

We use AutoAttack, a strong adversarial attack framework, to test the baseline model's robustness. 
AutoAttack generates adversarial examples by varying the attack radius (`epsilon`), and we measure the model's accuracy on these examples.


In [None]:
import torch
import torchattacks

# Test robustness at different epsilon values
n = 20
random_index = np.random.permutation(len(x_test))[:n]
adv_acc = []
eps_values = [np.round(eps_i, 2) for eps_i in np.linspace(0.01, 0.2, 10)]
for eps in eps_values:
    auto_attack = torchattacks.attacks.autoattack.AutoAttack(model, eps=eps)
    adv_data = auto_attack(
        torch.Tensor(x_test[random_index]), torch.tensor(y_test[random_index].argmax(-1))
    )
    acc = (
        len(
            np.where(
                model.predict(adv_data, verbose=0).argmax(-1) != y_test[random_index].argmax(-1)
            )[0]
        )
        / len(random_index)
        * 100
    )
    print(eps, acc)
    if len(adv_acc):
        adv_acc.append(max(adv_acc[-1], acc))
    else:
        adv_acc.append(acc)

print(acc)

plt.plot(eps_values, adv_acc)
plt.title("Distribution of adversarial success rates with baseline training")
plt.xlabel("Epsilon (attack radius)")
plt.ylabel("Adversarial success rate")

The plot above shows that the attack success rate quickly climbs to 100% as the attack strength (`epsilon`) increases. 
This confirms that our baseline model is highly vulnerable to adversarial attacks.

# Robust Training with Jacobinet and Adversarial Attacks

To improve robustness, we will train a model that outputs adversarial examples. 
`Jacobinet` is used to create adversarial examples with Projected Gradient Descent (PGD), which are integrated into the training process.
This tutorial demonstrates how to use the **Jacobinet** library to perform robust adversarial training in neural networks. Jacobinet simplifies this process by allowing the backward pass of a neural network—essential for generating gradient-based adversarial attacks—to be represented as a neural network itself. 🤖

---

### Core Concepts

* **Adversarial Attacks**: These are techniques used to create slightly perturbed inputs (adversarial examples) that are designed to cause a machine learning model to make a mistake. Common attacks include the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD).
* **Adversarial Robustness**: A model is considered robust if it can maintain high accuracy even when evaluated on adversarial examples.
* **Adversarial Training**: A method to improve a model's robustness by training it on a mix of clean and adversarial examples. This forces the model to learn features that are less sensitive to small input perturbations.



In [None]:
from jacobinet.attacks import get_adv_model

In [None]:
pgd_model = get_adv_model(
    model, loss="categorical_crossentropy", epsilon=0.2, attack="pgd", n_iter=20, alpha=0.02
)

In [None]:
x = layers.Input(shape=(1, 28, 28))
y = layers.Input((10,))

In [None]:
dot_img_file = "./pgd_model.png"
keras.utils.plot_model(pgd_model, to_file=dot_img_file, show_shapes=True, show_layer_names=True)

from IPython.display import HTML, Image, display

display(
    HTML('<div style="text-align: center;"><img src="{}" width="400"/></div>'.format(dot_img_file))
)

In [None]:
model_adv = keras.models.Model([x, y], model(pgd_model([x, y])))
model_adv.compile(
    "adam",
    loss=keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)

In [None]:
dot_img_file = "./model_adv.png"
keras.utils.plot_model(model_adv, to_file=dot_img_file, show_shapes=True, show_layer_names=True)

from IPython.display import HTML, Image, display

display(
    HTML('<div style="text-align: center;"><img src="{}" width="400"/></div>'.format(dot_img_file))
)

In [None]:
model_adv.fit(
    [x_train, y_train],
    y_train,
    batch_size=64,
    epochs=4,
    validation_split=0.1,
    shuffle=True,
)

In [None]:
model.evaluate(x_test, y_test)

In [None]:
model_adv.evaluate([x_test, y_test], y_test)

In [None]:
import torch
import torchattacks

# Test robustness at different epsilon values
adv_acc_baseline = adv_acc  # remember of previous results
adv_acc = []
eps_values = [np.round(eps_i, 2) for eps_i in np.linspace(0.01, 0.2, 10)]
for eps in eps_values:
    auto_attack = torchattacks.attacks.autoattack.AutoAttack(model, eps=eps)
    # robust to the same set of hyperparameters or to more ???
    # auto_attack = torchattacks.attacks.pgd.PGD(model, eps=0.15, alpha=2/255., steps=20, random_start=False)

    adv_data = auto_attack(
        torch.Tensor(x_test[random_index]), torch.tensor(y_test[random_index].argmax(-1))
    )
    acc = (
        len(
            np.where(
                model.predict(adv_data, verbose=0).argmax(-1) != y_test[random_index].argmax(-1)
            )[0]
        )
        / len(random_index)
        * 100
    )
    print(eps, acc)
    if len(adv_acc):
        adv_acc.append(max(adv_acc[-1], acc))
    else:
        adv_acc.append(acc)

In [None]:
plt.plot(eps_values, adv_acc_baseline)
plt.plot(eps_values, adv_acc)
plt.title("Distribution of adversarial success rates with baseline training")
plt.xlabel("Epsilon (attack radius)")
plt.ylabel("Adversarial success rate")
plt.legend(["baseline training", "DART training"])

## Conclusion
In this tutorial, we demonstrated the effectiveness of adversarial training for improving model robustness. Key takeaways include:

1. Baseline models are highly vulnerable to adversarial examples, with attack success rates reaching 100% even for small perturbations.

2. Adversarial training significantly improves robustness. By training the model to correctly classify adversarial examples, we made it much more resistant to attacks.

3. Jacobinet simplifies robust training. By providing a Keras model that encapsulates the attack generation process, Jacobinet allows adversarial training to be implemented with standard Keras fit() calls, abstracting away the complexities of the backward pass and gradient manipulation.

Jacobinet's ability to represent the backward pass as a neural network opens up exciting possibilities for research in robustness, explainability, and the broader field of adversarial machine learning.