<a href="https://colab.research.google.com/github/jonasrauber/foolbox-native-tutorial/blob/master/foolbox-native-tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Activate GPU

1.   Runtime menu
2.   Change runtime type
3.   Hardware accelerator -> GPU

## Get Foolbox

In [1]:
!pip3 install foolbox==3.1.1
# !pip3 install git+https://github.com/bethgelab/foolbox.git

Collecting foolbox==3.1.1
  Obtaining dependency information for foolbox==3.1.1 from https://files.pythonhosted.org/packages/a0/78/f4f8d2654893b1382e6e7dc10e06c9ebaca9cf0fc22584f6ecc324140f0b/foolbox-3.1.1-py3-none-any.whl.metadata
  Downloading foolbox-3.1.1-py3-none-any.whl.metadata (6.1 kB)
Collecting eagerpy==0.29.0 (from foolbox==3.1.1)
  Obtaining dependency information for eagerpy==0.29.0 from https://files.pythonhosted.org/packages/e1/07/54994565da4fc5a4840d3a434fb9bf3835b4a4e68c931ccfcc327d568f95/eagerpy-0.29.0-py3-none-any.whl.metadata
  Downloading eagerpy-0.29.0-py3-none-any.whl.metadata (5.5 kB)
Downloading foolbox-3.1.1-py3-none-any.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[?25hDownloading eagerpy-0.29.0-py3-none-any.whl (30 kB)
Installing collected packages: eagerpy, foolbox
  Attempting uninstall: eagerpy
    Found existing installation: eagerpy 0.30.0
    Uninsta

In [2]:
import foolbox as fb

## Get a model

Get a pertrained PyTorch or TensorFlow model, e.g. `torchvision.models.resnet18` or `tf.keras.applications.ResNet50`.

#### PyTorch

In [3]:
import torch
import torchvision

In [4]:
torch.__version__

'2.1.0'

In [5]:
torch.cuda.is_available()

True

In [6]:
model = torchvision.models.resnet18(pretrained=True)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.7M/44.7M [00:12<00:00, 3.82MB/s]


In [7]:
model = model.eval()

## Turn your PyTorch / TensorFlow model into a Foolbox model

Don't forget to specify the correct bounds and preprocessing!

#### PyTorch Solution

In [15]:
# PyTorch ResNet18
preprocessing = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], axis=-3)
bounds = (0, 1)
fmodel = fb.PyTorchModel(model, bounds=bounds, preprocessing=preprocessing)

ValueError: expected model to be a torch.nn.Module instance

## Transform bounds

In the following, we want to work with a model that has `(0, 1)` bounds. Use `fmodel.transform_bounds`.

In [None]:
fmodel = fmodel.transform_bounds((0, 1))

In [None]:
assert fmodel.bounds == (0, 1)

## Get some test images

Get a batch of 16 images and the corrresponding labels. You can use `foolbox.utils.samples` to get up to 20 images, but you can also you your own data loader.

In [None]:
images, labels = fb.utils.samples(fmodel, dataset='imagenet', batchsize=16)

## Check the accuracy of your model to make sure you specified the correct preprocessing

In [None]:
fb.utils.accuracy(fmodel, images, labels)

In [None]:
type(images), images.shape

In [None]:
type(labels), labels.shape

## Run LinfDeepFool

In [None]:
attack = fb.attacks.LinfDeepFoolAttack()

In [None]:
raw, clipped, is_adv = attack(fmodel, images, labels, epsilons=0.03)

In [None]:
is_adv

## Use EagerPy tensors and rerun the attack

In [None]:
import eagerpy as ep

In [None]:
images = ep.astensor(images)
labels = ep.astensor(labels)

In [None]:
raw, clipped, is_adv = attack(fmodel, images, labels, epsilons=0.03)

In [None]:
is_adv

In [None]:
is_adv.float32().mean().item()

## Using the Misclassification criterion explicitly

In [None]:
criterion = fb.criteria.Misclassification(labels)

In [None]:
raw, clipped, is_adv = attack(fmodel, images, criterion, epsilons=0.03)

In [None]:
is_adv

## Run the attack using many epsilons

In [None]:
import numpy as np

In [None]:
epsilons = np.linspace(0.0, 0.005, num=20)

In [None]:
raw, clipped, is_adv = attack(fmodel, images, labels, epsilons=epsilons)

In [None]:
is_adv.shape

In [None]:
is_adv.float32().mean(axis=-1)

In [None]:
robust_accuracy = 1 - is_adv.float32().mean(axis=-1)

In [None]:
robust_accuracy

## Plot the robust accuracy as a function of epsilon

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(epsilons, robust_accuracy.numpy())

We can see that **the model is not robust** at all. Even extremely small perturbations (Linf norm of 0.003 for pixels between 0 and 1) are sufficient
to change the classification.

## Run a targeted attack

In [None]:
labels

In [None]:
target_classes = (labels + 200) % 1000

In [None]:
target_classes

In [None]:
criterion = fb.criteria.TargetedMisclassification(target_classes)

In [None]:
attack = fb.attacks.L2CarliniWagnerAttack(steps=100)
# Note: 100 is too little -> results will be bad = perturbations will be relatively large (but 1000 takes much longer)

In [None]:
# epsilons = np.linspace(0.0, 10.0, num=20)
epsilons = None

In [None]:
advs, _, is_adv = attack(fmodel, images, criterion, epsilons=epsilons)

In [None]:
is_adv

In [None]:
fb.distances.l2(images, advs)

In [None]:
# attack_success_rate = is_adv.float32().mean(axis=-1)

In [None]:
# plt.plot(epsilons, attack_success_rate.numpy())

## Visualizing adversarial examples and perturbations

In [None]:
fb.plot.images(images)

In [None]:
fb.plot.images(advs)

In [None]:
fb.plot.images(advs - images, n=4, bounds=(-0.1, 0.1), scale=4.)

The adversarial examples look like the orignal (clean) images. That shows that **the model is not robust against adversarial attacks**. Tiny perturbations mislead the model and allow the attacker to control which class is recognized.

## Continuing from here ...



*   Repeating an attack (`attack = attack.repeat(3)`)
*   Getting the per-sample worst-case over multiple attacks
    * stack attack results and take max over the attacks before taking the mean over samples
*   Gradient estimators (`fb.gradient_estimators.*`)
*   Transfer attacks using gradient substitution (see examples)

