# Carlini & Wagner (C&W) L_inf - Adversarial Attacks
### Paper link : https://arxiv.org/abs/1608.04644

## Objectives & Context

Attacks we have studied so far — such as **FGSM, PGD, Auto-PGD, DeepFool, and NewtonFool** — all focus on crafting adversarial examples by **minimizing perturbations** added to the input images.

We also explored attacks like **Brendel & Bethge**, which allow us to further **refine perturbations** on already misclassified images (e.g., starting from a PGD attack).

This process typically involves two steps:

1. **Attack with PGD** (to misclassify the image)
2. **Minimize the perturbation** using Brendel & Bethge

**Carlini & Wagner (C&W) performs both of those steps in one single unified process!**

It is a **powerful optimization-based attack** that :
* Mislead the model (as all attacks do)
* Minimizes the perturbation using a **direct optimization objective** (usually with **L2 norm**)
* Preserves **high confidence** in the misclassification
* **Evades detection** by bypassing defenses based on softmax confidence, entropy, or visible perturbation patterns.

---

### Isn't $L_2$ or $L_∞$ used in other attacks ?

Yes ! Using **$L_2$, $L_∞$ or $L_1$** norm is not new :
* PGD often uses **$L_∞$**
* DeepFool focuses on **$L_2$**
* Many others supports **multiple norms**

**BUT**, Carlini & Wagner **goes far beyon just picking a norm !**

C&W **reformulates the entire attack** as a **global optimization problem**.

It invents a **custom loss function** and uses an internal optimizer (e.g., Adam) to find the **minimal perturbation required** to fool the model.

---

Unlike FGSM, PGD, or BIM that require a fixed **epsilon** (maximum perturbation), **C&W automatically finds the smallest possible perturbation** needed to achieve its objective.

---

## Loss Function Used in C&W Attack

Carlini & Wagner reformulate the attack as an optimization problem:

$
\min_{\delta} \; \|\delta\|_p + c \cdot f(x + \delta)
$

Where:
- $x$ = original image  
- $\delta$ = perturbation (the variable being optimized)  
- $x + \delta$ = adversarial image  
- $\|\delta\|_p$ = perturbation size ($L_2$ norm is most common)  
- $f(x + \delta)$ = confidence-based loss to **ensure the image is misclassified**  
- $c$ = regularization coefficient to **balance between perturbation size and model fooling**

---

## Inner Function: $f(x')$

The function $f(x')$ determines **how far the model is from misclassifying the image**.  
For a **targeted attack**, the typical formulation is:

$
f(x') = \max\left(\max_{i \neq t} \left\{ Z_i(x') \right\} - Z_t(x'), -\kappa \right)
$

Where:
- $Z_i(x')$ are the **logits** (pre-softmax output)  
- $t$ is the **target class**  
- $\kappa$ is a **confidence margin**: higher values make the attack stronger

---

---

### Intuition

- We want the **logit of the target class $Z_t$** to be higher than **all other logits**.
- If that's not the case → $f(x') > 0$ → the optimizer keeps adjusting.
- Once the adversarial image **fools the model confidently** → $f(x') \leq 0$ → this term becomes 0 and the optimizer **focuses only on reducing the perturbation**.

---

## Example

Imagine we want to force an image to be classified as **"dog"**.

The model outputs these logits for the adversarial image $x'$:

* $Z_{cat}(x')$ = 12.0
* $Z_{dog}(x')$ = 8.0 <- (target)
* $Z_{bird}(x')$ = 5.0

Then:

$
f(x') = \max(12 - 8, 5 - 8, -\kappa) = \max(4, -3, -\kappa)
$

So long as **another class logit is higher than the target**, $f(x') > 0$, and the optimizer keeps adjusting $\delta$.

Once **Z_dog** becomes the highest, $f(x') \leq 0$, and the attack focuses only on shrinking the perturbation.

---


## $L_{inf}$ Version

Carlini & Wagner has 3 types of attacks :
* $L_0$
* **$L2$**
* $L_{inf}$

In this code, we'll see the **$L_{inf}$** version.

---

### What's difference between these 3 versions ?

#### $L_0$ :
$L_0$ minimize the number of different pixels between the original image and the adversarial picture.

Then, the perturbation is **strong on few pixels**. This version is ideal for **visually visible but minimal attacks**. $L_0$ is very useful for **masking or drop-pixel robustness**.


#### **$L_2$** :
**$L_2$** minimize **the sum of the squares of the pixel differences**.

Then, we have **subtle perturbations** and **very difficult to detect visually**. This is the **most used one in litterature** !


#### $L_{inf}$ :
$L_{inf}$ minimize the **absolute maximum value of disturbance** ($max |\delta|$).

Then, we have **uniform perturbation** over the entire image. And targets adverasrial training type defences, often in $L_{inf}$ (e.g PGD).
*Note* : This is a similar attack to PGD in terms of standard, but more optimised and targeted.

# Code
### **AUTHOR** : Maxence QUINET (University Of Luxembourg)

## 1. Setup & Configuration

------------------

Please ensure all dependencies are installed using the `requirements.txt` file.

For additional environment setup details, refers to **"environment_configuration.txt"**.

-------------------

Below are the required **libraries and frameworks** for running Adversarial Attacks

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # (Reduce TensorFlow logs)
import tensorflow as tf
import torch
import numpy as np
import matplotlib.pyplot as plt

--------------------

**Machine Learning & Neural Network Libraries**

In [None]:
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Input, Lambda
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.applications import InceptionV3

------------------------
**Datasets & Image Processing**

In [None]:
from tensorflow.keras.datasets import mnist, cifar10, cifar100
from tensorflow.keras.preprocessing.image import load_img, img_to_array

-----------------
**Adversarial Robustness Toolbox (ART)**

In [None]:
from art.estimators.classification import TensorFlowV2Classifier, PyTorchClassifier
from art.attacks.evasion import CarliniL0Method, CarliniL2Method, CarliniLInfMethod

------------------------------
**Vision Models**

In [None]:
from PIL import Image

------------------------

**imagenet_stubs** 

imagenet_stubs is a small dataset available at this link : https://github.com/nottombrown/imagenet-stubs

#### Why use it ?

* Ideal for **testing adversarial attacks quickly** before applying them on larger datasets.
* Provides **two useful functions**:
  - `label_to_name(index)` --> Convert an ImageNet label (number) to its corresponding name
  - `name_to_label(name)` --> Convert an ImageNet class name back to its numerical label 

In [None]:
import imagenet_stubs
from imagenet_stubs.imagenet_2012_labels import label_to_name, name_to_label

## Checking PyTorch & TensorFlow Environment

### **CUDA & GPU Verification**
Since we need **CUDA** for accelerated deep learning computations, we ensure that **PyTorch and TensorFlow** are properly configured with CUDA.

------------------
**PyTorch**

In [None]:
 # Versions
print("PyTorch version:", torch.__version__)
print("TensorFlow version:", tf.__version__)

In [None]:
# For PyTorch
print("Number of GPU: ", torch.cuda.device_count())
print("GPU Name: ", torch.cuda.get_device_name())
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device: ', device)

------------------
**TensorFlow**

In [None]:
# For Tensorflow
print(tf.config.list_physical_devices('GPU'))
tf.test.is_gpu_available()

In [None]:
# Check CUDA & CUDNN Version
print("CUDA available:", tf.test.is_built_with_cuda())
print(tf.sysconfig.get_build_info()["cuda_version"])
print(tf.sysconfig.get_build_info()["cudnn_version"])

In [None]:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print("GPU memory growth enabled")
    except RuntimeError as e:
        print(e)


## Dataset Selection & Configuration

In this section, you can **choose the dataset** you want to use for adversarial attacks:

- **MNIST**
- **CIFAR10**
- **CIFAR100**
- **ImageNet**

#### NOTE: On the "Carlini & Wagner Paper" they test MNIST, CIFAR10 datasets !

### **Select Your Dataset & Model Configuration**
Modify the variables below to **choose the dataset and model**.  

The **official C&W** paper test the attack on 1 models for each dataset !

-----------

In [None]:
selected_dataset = "MNIST" # OPTIONS : "MNIST", "CIFAR10", and "ImageNet" for Carlini & Wagner

# These lines are commented because no choice here by following the paper. MNIST = 1 model, CIFAR10 = 1 Model

#selected_cifar10_model = "standard_resnet" # OPTIONS : "standard_resnet", "resnet_10x_variant", "conv_maxout"
#selected_mnist_model = "logistic" # OPTIONS : "simple_cnn", "shallow_softmax", "maxout", "logistic" (For MNIST only)

selected_attack = "CarliniWagnerLinf" # Used for report name only.

--------------------------------------------
**Define class labels for each dataset**

In [None]:
# Creation of the ancestors_name & ancestors_label corresponding to the selected dataset.

# Note: All labels are available on Internet. They are not created from us. They are official, often in a .json format.
if selected_dataset == "CIFAR10":
    # Correspondance between name & label for CIFAR10
    ancestors_name = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
    ancestors_label = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

elif selected_dataset == "CIFAR100":
    # Correspondance between name & label for CIFAR100
    ancestors_name = ['apple', 'bridge', 'castle', 'elephant', 'house', 'orange', 'shark', 'table', 'tractor', 'whale']
    ancestors_label = ['0', '12', '17', '31', '37', '53', '73', '84', '89', '95']

elif selected_dataset == "ImageNet":
    # Correspondance between name & label for ImageNet
    ancestors_name = ['abacus', 'acorn', 'baseball', 'broom', 'brown_bear', 'canoe', 'hippopotamus', 'llama', 'maraca', 'mountain_bike']
    ancestors_label = ['398', '988', '429', '462', '294', '472', '344', '355', '641', '671']

elif selected_dataset == "MNIST":
    # Correspondance between name & label for ImageNet
    ancestors_name = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine']
    ancestors_label = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

else:
    print(f"Your {selected_dataset} doesn't exist. Please provide an existing dataset between these choices : CIFAR10, CIFAR100, ImageNet & MNIST.")

---------------------------------------------------------------------
**Fix seed to ensure reproducibility (comment to get random results)**

In [None]:
np.random.seed(12345)

-----------------------

#### plot_prediction()

This function will be used to display the original / attacked images.

The function is designed to display the images correctly, depending on the dataset selected, with the following legend:

<font color='green'>Green bars</font> = correct classification <br>
<font color='red'>Red bars</font> = Attack target classification <br>
<font color='blue'>Blue bars</font> = other classifications

In [None]:
def label_to_name_dynamic(index, dataset):
    """Return the name of the label depending on the selectionned dataset."""
    if dataset == "MNIST":
        return str(index)  # For MNIST, the label name is simply the digit
    elif dataset == "ImageNet":
        return label_to_name(index)  # Use the imagenet_stubs function for ImageNet !
    elif dataset == "CIFAR10":
        return ancestors_name[index]  # Return the name from our list
    elif dataset == "CIFAR100":
        return cifar100_labels[index] if 0 <= index < 100 else "Unknown" # Return the name from our list 
    else:
        return "Unknown"

In [None]:
def plot_prediction(img, probs, correct_class=None, target_class=None):
    """
    Displays an image with predictions in the form of coloured bars :
    - Green --> Correct Class
    - Red --> Target Class
    - Blue --> Other Classes
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8))

    # Display the picture
    if selected_dataset=="MNIST":
        ax1.imshow(img, cmap="gray") # Force the display in gray level for MNIST !
        ax1.axis("off")
    else:
        ax1.imshow(img)
        ax1.axis("off")

    # Keep the top 10 classes with highest probabilities
    top_ten_indexes = list(probs[0].argsort()[-10:][::-1])
    top_probs = probs[0, top_ten_indexes]
    labels = [label_to_name_dynamic(i, selected_dataset) for i in top_ten_indexes]


    # Bar plot creation with color rules defined above
    barlist = ax2.bar(range(10), top_probs, color="blue")  # Blue by default

    if target_class in top_ten_indexes:
        barlist[top_ten_indexes.index(target_class)].set_color("red")  # Red if this is the target class

    if correct_class in top_ten_indexes:
        barlist[top_ten_indexes.index(correct_class)].set_color("green")  # Green if this is the correct class

    # Plot Graph
    plt.sca(ax2)
    plt.ylim([0, 1.1])
    plt.xticks(range(10), labels, rotation="vertical")
    plt.ylabel("Probability")
    plt.title("Top 10 Predictions")
    fig.subplots_adjust(bottom=0.2)

    plt.show()

## Step 1: Define Parameters

> NOTE: The Carlini & Wagner $L_∞$ attack uses a coordinate descent strategy to iteratively craft adversarial examples.
Instead of minimizing the L2 norm, it minimizes the maximum absolute change (L∞ norm) while maintaining misclassification.

This attack does not use binary search, but relies on tuning tau, a threshold controlling perturbation size.

---

### **Carlini & Wagner Paper Parameters**

* ```confidence (float)``` :
Controls the **minimum required confidence** for the attack to succeed.
A higher value pushes the model to misclassify with **stronger certainty**.

* ```targeted (bool)``` :
Indicates whether the attack is **targeted or untargeted**.

* ```learning_rate (float)``` :
Step size used in the **coordinate descent**.
Smaller = more precise, but slower.

* ```max_iter (int)``` :
Maximum number of **gradient steps** during optimization.

* ```decrease_factor (float)``` :
Controls how fast the ```tau``` threshold is reduced between steps.
Lower = more precision (e.g., 0.9 means ```tau``` is decreased by 10% per iteration).

* ```initial_const (float)``` :
Starting constant ```c``` in the loss. Larger values increase the importance of **classification vs. perturbation**.

* ```largest_const (float)``` :
Upper bound on ```c``` during tuning.

* ```const_factor (float)``` :
Multiplicative factor for increasing ```c```. Should be >1 (e.g. 2.0).

* ```batch_size (int)``` :
Number of images to process per batch. Usually set to 1.

* ```verbose (bool)``` :
Shows **progress and debug info**.

> You can change the values below to test different attack configurations.

In [None]:
# Carlini & Wagner L∞ Attack Parameters

confidence = 0            # Force misclassification with confidence
targeted = True           # Targeted attack (as in the original paper)
learning_rate = 0.01      # Optimization step size
max_iter = 10             # Number of optimization steps
decrease_factor = 0.9     # Controls how tau is reduced (tau *= decrease_factor)
initial_const = 0.1      # Starting value of constant c
largest_const = 10      # Maximum value c can take
const_factor = 2.0        # c is multiplied by this factor after each iteration
batch = 1                 # Attack one image at a time (recommended)
verbose = True            # Show attack progress

print(f"Selected Attack: CarliniWagnerLinf | Dataset: {selected_dataset} | Confidence: {confidence} | Max Iter: {max_iter} | Batch Size: {batch}")

-----------------------------
**Later, we'll see what EoT is. If you don't know what is EoT, skip this sub-section**

*If you want to test EoT Transformation, find parameters below*

In [None]:
# Parameters for EoT Transformation
angle_max = 22.5 # Rotation angle used for evaluation in degrees
eot_angle = angle_max # Maximum angle for sampling range in EoT rotation, applying range [-eot_angle, eot_angle]
eot_samples = 10 # Number of samples with random rotations in parallel per loss gradient calculation

### Dataset-Specific Parameters

In [None]:
# ImageNet has 1000 classes, CIFAR100 100 classes, and CIFAR10 & MNIST has 10 classes.
nb_classes = 1000 if selected_dataset == "ImageNet" else 100 if selected_dataset == "CIFAR100" else 10

# ImageNet Images Dimension : (299,299,3), CIFAR10 & CIFAR100 : (32,32,3), and MNIST : (28,28,1)
input_shape = (299, 299, 3) if selected_dataset == "ImageNet" else (32, 32, 3) if "CIFAR" in selected_dataset else (28, 28, 1)

# ImageNet use often a specific preprocessing. For the others dataset, it still an adapted normalisation (0,1)
preprocessing = (0.5, 0.5) if selected_dataset == "ImageNet" else (0.0, 1.0)  # Normalisation adaptée

# Clip values 
clip_values = (0.0, 1.0)  # Same for all datasets

# Target Class Definition (You can change, here are just some examples)
if selected_dataset == "ImageNet":
    y_target = np.array([641])  # "maraca"
elif selected_dataset == "CIFAR100":
    y_target = np.array([3])  # "bear"
elif selected_dataset == "CIFAR10":
    y_target = np.array([1])  # "automobile"
else:  # MNIST
    y_target = np.array([np.random.randint(0, 10)])  # random digit between 0 and 9

## Step 2: Load Dataset Data & Labels

In this step, we **load all dataset images and their labels into memory**.

#### **How does it work?**
1. We retrive the dataset path (`datasets/selected_dataset/`).
2. We read all images from the dataset folders.
3. We **normalize** the images (scale pixel values between `[0, 1]`).
4. We store **both images and labels** for further processing.

 -------------------

In [None]:
# List Initializations
x_all, y_all, original_images = [], [], []

In [None]:
# Try to get our dataset path in our computer to keep all pictures and put them into our lists.
dataset_path = os.path.join("datasets", selected_dataset)
# Check
assert(dataset_path=="datasets/"+selected_dataset) # If nothing : It's ok. Otherwise, you will get an error if the dataset path doesn't exists.

In [None]:
# Load images from the selected dataset
for class_name, class_label in zip(ancestors_name, ancestors_label):
    class_path = os.path.join(dataset_path, class_name)
    if not os.path.exists(class_path):
        continue
    
    for img_file in sorted(os.listdir(class_path)):
        img_path = os.path.join(class_path, img_file)

        if selected_dataset == "MNIST":
            im = load_img(img_path, color_mode="grayscale", target_size=(28, 28))
            im_array = img_to_array(im)
        
        elif selected_dataset == "ImageNet":
            im = load_img(img_path, target_size=(299, 299))
            im_array = img_to_array(im)

        elif selected_dataset in ["CIFAR10", "CIFAR100"]:
            im = load_img(img_path, target_size=(32, 32))
            im_array = img_to_array(im)
        
        x = (im_array / 255.0).astype(np.float32)
        
        x_all.append(x)
        y_all.append(int(class_label))
        original_images.append(im_array)

-----------------------------------------------------
#### Display Dataset (Optional)
**You can choose to display all images or only one image per class)**

#### How to enable visualization ?
- To display **ALL images** --> **Uncomment the loop bellow**.
- To display **ONLY 1 image per class** --> **Set `display_all_images = False`**.
-----------------------------------------------------

```Python
# Set to True to display all images, False to show only 1 image per class
display_all_images = False  

# Displaying of the 100 pictures (can be long, you can modify the code to display only 1 picture per class if you want)
for class_name, class_label in zip(ancestors_name, ancestors_label):
    class_path = os.path.join(dataset_path, class_name)
    if not os.path.exists(class_path):
        print(class_path)
        print("No os Path")
        continue
    
    print(f"Class : {class_name} (Label: {class_label})")
    
     # Show only 1 image per class if display_all_images = False
    images_to_show = sorted(os.listdir(class_path))[:1] if not display_all_images else sorted(os.listdir(class_path))
    # Go through the 10 pictures of each classes
    for img_file in images_to_show:
        img_path = os.path.join(class_path, img_file)

        # Load & Normalize the picture
        im = load_img(img_path, target_size=(299, 299))
        im_array = img_to_array(im)

        # Displaying all pictures
        plt.figure(figsize=(4, 4))
        plt.imshow(im_array.astype("uint8"))
        plt.axis("off")
        plt.title(f"Class: {class_name} | {img_file}", fontsize=10, fontweight="bold")
        plt.show()

        print(f"{img_file} well displayed in : {class_name}")

print(f"All of the {len(ancestors_name)} classes & their images has been displayed !")
```

### Convert to Numpy Arrays for TensorFlow
Since TensorFlow requires NumPy arrays, we convert our lists into arrays.

-----------------

In [None]:
# Convert into a numpy array
x_all = np.array(x_all)
y_all = np.array(y_all).reshape(-1, 1)

# Check
#for img_x, img_y in zip(x_all, y_all):
#    print(f"x_all shape: {x_all.shape}")  # (N, H, W, C)
#    print(f"y_all shape: {y_all.shape}")  # (N, 1)

## Step 3 : Load Model & Loss Function

### 1. Loading Dataset for Model Training
Before creating the model, we **load and preprocess** the dataset to ensure it is correctly formatted for TensorFlow.

-------------

In [None]:
if selected_dataset == "MNIST":
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    x_train = np.expand_dims(x_train, axis=-1)
    x_test = np.expand_dims(x_test, axis=-1)

elif selected_dataset == "CIFAR10":
    (x_train, y_train), (x_test, y_test) = cifar10.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    y_train, y_test = to_categorical(y_train, nb_classes), to_categorical(y_test, nb_classes)
    
elif selected_dataset == "CIFAR100":
    (x_train, y_train), (x_test, y_test) = cifar100.load_data(label_mode="fine")

    # We reproduce the list of all classes of CIFAR100
    cifar100_labels = [
    "apple", "aquarium_fish", "baby", "bear", "beaver", "bed", "bee", "beetle", "bicycle", "bottle",
    "bowl", "boy", "bridge", "bus", "butterfly", "camel", "can", "castle", "caterpillar", "cattle",
    "chair", "chimpanzee", "clock", "cloud", "cockroach", "couch", "crab", "crocodile", "cup", "dinosaur",
    "dolphin", "elephant", "flatfish", "forest", "fox", "girl", "hamster", "house", "kangaroo", "computer_keyboard",
    "lamp", "lawn_mower", "leopard", "lion", "lizard", "lobster", "man", "maple_tree", "motorcycle", "mountain",
    "mouse", "mushroom", "oak_tree", "orange", "orchid", "otter", "palm_tree", "pear", "pickup_truck", "pine_tree",
    "plain", "plate", "poppy", "porcupine", "possum", "rabbit", "raccoon", "ray", "road", "rocket", "rose", "sea",
    "seal", "shark", "shrew", "skunk", "skyscraper", "snail", "snake", "spider", "squirrel", "streetcar", "sunflower",
    "sweet_pepper", "table", "tank", "telephone", "television", "tiger", "tractor", "train", "trout", "tulip",
    "turtle", "wardrobe", "whale", "willow_tree", "wolf", "woman", "worm"
]

    x_train, x_test = x_train / 255.0, x_test / 255.0
    y_train, y_test = to_categorical(y_train, nb_classes), to_categorical(y_test, nb_classes)

---

**ONE-HOT ENCODING**

---

In [None]:
# One-hot encoding
if len(y_train.shape) == 1 or y_train.shape[1] == 1:
    y_train = to_categorical(y_train, nb_classes)
    y_test = to_categorical(y_test, nb_classes)

### 2. Model Selection & Architecture

On the **Carlini & Wagner** Paper, they test 1 model on each dataset : 

* MNIST --> Table 1
* CIFAR10 --> Table 2
* ImageNet --> InceptionV3

#### WE HAVE TO IMPLEMENT THE CODE TO LOAD MODEL FROM .H5 FILES !

-------------

You can find in the below case the code ued to create the models

---

**MNIST C&W PAPER MODEL**

---

In [1]:
# Architecture CIFAR10 with respect to the Carlini & Wagner Paper
def build_cw_mnist_model():
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1), padding='same'),
        layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Flatten(),
        layers.Dense(200, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(200, activation='relu'),
        layers.Dense(10)
    ])
    return model

---

**CIFAR10 C&W PAPER MODEL**

---

In [None]:
# Architecture CIFAR10 with respect to the Carlini & Wagner Paper
def build_cw_cifar10_model():
    model = models.Sequential([
        layers.Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Flatten(),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(256, activation='relu'),
        layers.Dense(10) 
    ])
    return model

### 3. Model Compilation & Training
Once the model is selected, we **compile and train** it.

- **For ImageNet**, the model is already pretrained
- **For other datasets**, a quick training step (5-10 epochs) is performed.

------------

In [None]:
# ============= IMAGENET =============
if selected_dataset == "ImageNet":
    print(f"SELECTED MODEL : InceptionV3.") 
    model = InceptionV3(include_top=True, weights='imagenet', classifier_activation=None)
    loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

# ============= MNIST =============
elif selected_dataset == "MNIST":
    model = build_cw_mnist_model()
    optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
    
    model.compile(
    optimizer=optimizer,
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
    )

    if y_train.shape[-1] != 10:
        from tensorflow.keras.utils import to_categorical
        y_train = to_categorical(y_train, 10)
        y_test = to_categorical(y_test, 10)

    # Training with respect to the paper
    model.fit(
        x_train, y_train,
        batch_size=128,
        epochs=50,
        validation_data=(x_test, y_test)
    )
    
# ============= CIFAR10/CIFAR100 =============
elif selected_dataset in ["CIFAR10", "CIFAR100"]:
    # Création du modèle
    model = build_cw_cifar10_model()

    
    optimizer = optimizers.SGD(learning_rate=0.01, momentum=0.9)
    model.compile(
        optimizer=optimizer,
        loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
        metrics=['accuracy']
    )

    # Training with respect to the paper
    model.fit(
        x_train, y_train,
        batch_size=128,
        epochs=50,
        validation_data=(x_test, y_test)
    )

# ============= ERROR =============
else:
    raise ValueError(f"Error: Dataset '{selected_dataset} not recognized. Please ensure to use one of this dataset : ImageNet, CIFAR10, CIFAR100 or MNIST.'")

## Step 4 : Create the ART Classifier & Configure the Attack

Now that the model is **trained and ready**, we integrate it into **ART (Adversarial Robustness Toolbox)**.

#### What is happening here ?
1. We **create a classifier** for ART based on the trained model
2. We **define an adversarial attack** (FGSM in this case)
3. The attack can be **targeted or untargeted**, and parameters are fully configurable.

In [None]:
classifier = TensorFlowV2Classifier(model=model,
                                    nb_classes=nb_classes,
                                    loss_object=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
                                    preprocessing=preprocessing,
                                    preprocessing_defences=None,
                                    clip_values=clip_values,
                                    input_shape=input_shape)

### VERY IMPORTANT !

**CarliniMethod** need **logits** and **NOT probabilities** in output !

So between the 2 possibilites of loss_type in AutoProjectedGradientDescent instance : `cross_entropy` & `difference_logits_ratio`.

We have to choose **cross_entropy**. However, you will get a Value error telling you that CarliniMethod is waiting for logits and not probabilities ! 

---

### WHY ?

Because Carlini & Wagner constructs its **personalised loss function** from *logits* in order to **be more sensitive to the ‘decisional distance’ between classes**. The probabilities (softmax output) flatten these differences, which **weakens** the attack.

---

In [None]:
attack = CarliniLInfMethod(classifier=classifier,
                           confidence=0,
                           targeted=True,
                           learning_rate=0.01,
                           max_iter=10,
                           decrease_factor=0.9,
                           initial_const=1e-2,
                           largest_const=2e-2,
                           const_factor=2.0,
                           batch_size=1,
                           verbose=True
                          )

print(attack.targeted)

## Step 5 : Predict Clean (Original) Images BEFORE the attack.

Before applying any attack, we **predict the clean images** with our trained model.

#### What happens here ?
1. We run the classifier on all images **before the attack**.
2. We display the **top-10 predictions** for each image.
3. You can choose to **display all images or only one per class**.

------------

In [None]:
# Predict Clean Images (returns logits)
y_pred_clean_all = classifier.predict(np.array(x_all))

# Convert logits to probabilities using softmax
y_pred_clean_all = tf.nn.softmax(y_pred_clean_all, axis=1).numpy()


# Check prediction shape
print("Shape of Clean Predictions:", y_pred_clean_all.shape)  # Expected (N, nb_classes)

# Summarize Prediction
top1_correct = np.mean(np.argmax(y_pred_clean_all, axis=1) == y_all.flatten()) * 100
print(f"Top-1 Accuracy on Clean Images: {top1_correct:.2f}%")

---------------------------

#### Display Clean Images & Predictions
You can **choose whether to display all images or just one per class**.

**How enable visualization?**
- To display **ALL images** --> Set `display_all_images = True`
- To display **ONLY 1 image per class** --> Set `display_all_images = False`

---------------------

In [None]:
# Set to True to display all images, False to show only 1 image per class
display_all_images = False  

# Displaying Clean Images with Predictions
for class_name, class_label in zip(ancestors_name, ancestors_label):
    print(f"\nClass : {class_name} (Label: {class_label})")

    # Get all images from this class
    class_indices = np.where(y_all == int(class_label))[0]

    if len(class_indices) == 0:
        print(f"No Images found for {class_name}, skipping...")
        continue

    # Show only 1 image per class if display_all_images = False
    images_to_show = class_indices[:1] if not display_all_images else class_indices
    
    for index in images_to_show:
        plot_prediction(
            np.squeeze(x_all[index]),  # Original clean image
            y_pred_clean_all[index].reshape(1, -1),  # Reshaped prediction
            correct_class=y_all[index],  # True class
            target_class=None  # No target class for clean images
        )
        print(f"Image {index} displayed for class: {class_name}")

print(f"\n All {len(x_all)} clean images have been processed!")

## Step 6: Generate and Evaluate Adversarial Examples

Now, we **generate adversarial examples** and evaluate the effectiveness of the attack.

 **What happens here?**
1. We **generate adversarial examples** using the selected attack.
2. We **save the adversarial images** for later analysis. (optional)
3. We **evaluate the attack's success** (accuracy, confidence scores, and performance metrics).
4. We **generate a detailed report** summarizing the attack results.

---

BE CAREFUL TO DON'T FORGET TO ONE-HOT ENCODED THE TARGET VALUES !

---

In [None]:
# Create one-hot vector repeated for each image
y_target_all = np.full(shape=(len(x_all),), fill_value=y_target)
y_target_all_onehot = to_categorical(y_target_all, nb_classes)

In [None]:
# ATTACK
print(f"Launching Carlini & Wagner L_inf targeted attack on {len(x_all)} images towards class {y_target}...")
x_adv_all = attack.generate(x=x_all, y=y_target_all_onehot)

In [None]:
# Check shape
print("x_adv_all shape:", x_adv_all.shape)

---------------------

**Do you want to save all adversarial images?**  
- **YES** → Uncomment the saving function below.
- **NO** → Comment the function to skip saving.


```Python
# Define the save path for adversarial images
adv_save_path = os.path.join("adversarials_img", selected_attack, selected_dataset)
os.makedirs(adv_save_path, exist_ok=True)  

# Iterate through all classes to save adversarial images
class_counters = {class_name: 1 for class_name in ancestors_name}  # Dictionary to track image indices per class

for adv_img, class_label in zip(x_adv_all, y_all.flatten()):  # Ensure y_all is 1D
    # Find the class name corresponding to the label
    if str(class_label) not in ancestors_label:
        print(f"Label {class_label} not found in ancestors_label, skipping image.")
        continue  

    class_index = ancestors_label.index(str(class_label))
    class_name = ancestors_name[class_index]

    # Determine the subfolder for the class
    class_folder = os.path.join(adv_save_path, class_name)
    os.makedirs(class_folder, exist_ok=True) 

    # Generate a unique filename with a counter (e.g., abacus1_adv.jpeg, abacus2_adv.jpeg, ..., acorn1_adv.jpeg, ...)
    img_filename = f"{class_name}{class_counters[class_name]:02d}_adv.jpeg"
    img_path = os.path.join(class_folder, img_filename)

    # Convert and save the image
    img = array_to_img(adv_img)
    img.save(img_path, "JPEG")

    print(f"Image saved : {img_path}")

    # Increment the counter for this class
    class_counters[class_name] += 1

print(f"\nAll  {len(x_adv_all)} adversarial images have been successfully saved!")
```

---

### Evaluate Adversarial Example

We now evaluate the adversarial examples by:
- Measuring the **model's accuracy** on these images.
- Computing the **confidence score** of predictions.
- Generating a **visual comparison** between clean and adversarial images.

---

**How enable visualization?**
- To display **ALL images** --> Set `display_all_images = True`
- To display **ONLY 1 image per class** --> Set `display_all_images = False`

---

In [None]:
# Comparative Display
for i in range(0, len(x_all), 10):  # Affiche une image sur 10
    fig, axs = plt.subplots(1, 3, figsize=(12, 4))

    axs[0].imshow(x_all[i].squeeze(), cmap="gray")
    axs[0].set_title("Original Image")
    axs[0].axis("off")

    axs[1].imshow(x_adv_all[i].squeeze(), cmap="gray")
    axs[1].set_title("Adversarial Image")
    axs[1].axis("off")

    axs[2].imshow(np.abs(x_adv_all[i] - x_all[i]).squeeze(), cmap="hot")
    axs[2].set_title("Perturbation (abs diff)")
    axs[2].axis("off")

    plt.suptitle(f"True: {y_all[i][0]} | Target: {y_target}")
    plt.tight_layout()
    plt.show()

---

Prediction & Visualization

---

In [None]:
# Get predictions on adversarial images
y_pred_adv_all = classifier.predict(x_adv_all)
y_pred_adv_all = tf.nn.softmax(y_pred_adv_all).numpy()

In [None]:
# Set to True to display all images, False to show only 1 image per class
display_all_images = True  

# Display Adversarial Examples (Optional)
for class_name, class_label in zip(ancestors_name, ancestors_label):
    print(f"\nClass : {class_name} (Label: {class_label})")

    class_indices = np.where(y_all == int(class_label))[0]
    
    # Show only 1 image per class if display_all_images = False
    images_to_show = class_indices[:1] if not display_all_images else class_indices
    
    if len(class_indices) == 0:
        print(f"No images found for {class_name}skipping...")
        continue

    for index in images_to_show:
        plot_prediction(
            np.squeeze(x_adv_all[index]),
            y_pred_adv_all[index].reshape(1, -1),
            correct_class=y_all[index],
            target_class=np.argmax(y_target_all_onehot[index])
        )
        print(f"Adversarial Image {index} displayed for class: {class_name}")

### Compute Performance Metrics

In [None]:
# Compute confidence score
confidence_scores = np.max(y_pred_clean_all, axis=1)
average_confidence = np.mean(confidence_scores) * 100

# Compute Tok-K Accuracy
def compute_accuracy(predictions, true_labels, top_k=1):
    top_k_preds = np.argsort(predictions, axis=1)[:, -top_k:]
    match = np.any(top_k_preds == np.array(true_labels).reshape(-1, 1), axis=1)
    return np.mean(match) * 100 

In [None]:
clean_top1 = compute_accuracy(y_pred_clean_all, y_all, top_k=1)
clean_top5 = compute_accuracy(y_pred_clean_all, y_all, top_k=5)
adv_top1 = compute_accuracy(y_pred_adv_all, y_all, top_k=1)
adv_top5 = compute_accuracy(y_pred_adv_all, y_all, top_k=5)

In [None]:
# Display Performance Results
attack_name = "CarliniLinfMethod" if isinstance(attack, CarliniLInfMethod) else "fast"

In [None]:
print("\n=== Performance Summary ===")
print(f"Selected Attack: CarliniWagnerLinf | Dataset: {selected_dataset} | Confidence: {confidence} | Max Iter: {max_iter} | Batch Size: {batch}")
print(f"Clean Images : Top-1 : {clean_top1:.2f}% | Top-5 : {clean_top5:.1f}%")
print(f"Adv. Images  : Top-1 : {adv_top1:.2f}% | Top-5 : {adv_top5:.1f}%")
print("----------------------------------------------------------")
print(f"Confidence Score: {average_confidence:.2f}%")

### Generate a Report

In [None]:
# Round epsilon for eadability
# eps_rounded = round(epsilon, 3)

# Define report save path
if selected_dataset == "MNIST":
    report_filename = f"{selected_attack}_with_{selected_dataset}_report.txt"
elif selected_dataset in ["CIFAR10", "CIFAR100"]:
    report_filename = f"{selected_attack}_with_{selected_dataset}_report.txt"
elif selected_dataset == "ImageNet":
    report_filename = f"{selected_attack}_with_{selected_dataset}_with_InceptionV3_report.txt"
else:
    print(f"This {selected_dataset} is not recognized. Be careful to provide an existing dataset between MNIST, CIFAR")
    
report_path = os.path.join("adversarials_img", selected_attack, selected_dataset, report_filename)
os.makedirs(os.path.dirname(report_path), exist_ok=True)


with open(report_path, "w", encoding="utf-8") as f:
    # Report Title
    f.write(f"====== {selected_attack} Adversarial Attack Report ======\n\n")

    # Information generation for each image
    for i in range(len(y_pred_adv_all)):  
        # Find the class index in ancestors_label
        class_label = str(y_all[i][0])  # Convert to string to match ancestors_label
        if class_label in ancestors_label:
            class_index = ancestors_label.index(class_label)  # Get index in ancestors_name
            class_name = ancestors_name[class_index]  # Retrieve class name
        else:
            class_name = "Unknown"  # If not found, prevent error

        # Original Image file name (ensuring correct numbering)
        original_image_name = f"{class_name}{(i % 10) + 1:02d}.jpeg"

        # Predict Class for the original image (top-1)
        clean_pred_index = np.argmax(y_pred_clean_all[i])

        # Predict Class for the Adversarial image (top-1)
        adv_pred_index = np.argmax(y_pred_adv_all[i])

        # Prediction
        clean_pred_label = label_to_name_dynamic(clean_pred_index, selected_dataset)
        adv_pred_label = label_to_name_dynamic(adv_pred_index, selected_dataset)


        # Targeted or Untargeted Scenario Attack
        attack_type = "Targeted" if attack.targeted else "Untargeted"

        # If Targeted : Target Class
        target_label = label_to_name(y_target[0]) if attack.targeted else "N/A"

        # Write results in the report:
        f.write(f"------ CLASS : {class_name.upper()} ------\n")
        f.write(f"Original image name : {original_image_name}\n")
        f.write(f"Original Prediction : {clean_pred_label}\n")
        f.write(f"Targeted / Untargeted : {attack_type}\n")
        if attack.targeted:
            f.write(f"Target Class : {target_label}\n")
        f.write(f"Adversarial Prediction : {adv_pred_label}\n")
        f.write("------------------------------------------------\n\n")

    # Performance Summary at the end of the file
    f.write("============ PERFORMANCE RESUME ============\n")
    f.write(f"Selected Attack: CarliniWagnerLinf | Dataset: {selected_dataset} | Confidence: {confidence} | Max Iter: {max_iter} | Batch Size: {batch}")
    f.write(f"Clean Images : Top-1 : {clean_top1:.1f}% | Top-5 : {clean_top5:.1f}%\n")
    f.write(f"Adv. Images  : Top-1 : {adv_top1:.1f}% | Top-5 : {adv_top5:.1f}%\n")
    f.write("----------------------------------------------------------")
    f.write(f"Confidence Score: {average_confidence:.2f}%")

    attack_eff_top1 = 100 - adv_top1
    attack_eff_top5 = 100 - adv_top5

    f.write("\n")
    f.write(f"{selected_attack} Efficiency : Top-1 : {attack_eff_top1:.1f}% | Top-5 : {attack_eff_top5:.1f}%\n")

# Saving Confirmation
print(f"Report saved : {report_path}")

# Going further (optional) : Expectation Over Transformation (EoT) 
Adversarial attacks like **FGSM** are often **sensitive to image transformations** such as **rotation, scaling, or noise**.

**Why does this happen?**  
- A small rotation (e.g., **5°**) can **invalidate** an adversarial example.
- This **breaks the perturbation pattern** that misleads the classifier.
  
**How does EoT (Expectation Over Transformation) help?**  
- Instead of using **a single perturbed image**, EoT **randomly transforms** the image (rotation, blur, etc.).
- The attack is then **optimized over multiple transformations**, making it **more robust**.

---

In [None]:
import scipy.ndimage

# Define rotation angles to test
rotation_angles = [-22.5, -10.0, -5.0, 0.0, 5.0, 10.0, 22.5]  

# Apply rotation to all adversarial examples
x_adv_rotated_all = {
    angle: np.array([
        scipy.ndimage.rotate(img, angle=angle, reshape=False, axes=(0, 1), order=1, mode='constant')
        for img in x_adv_all
    ]) for angle in rotation_angles
}

# Get predictions after rotation
y_pred_adv_rotated_all = {
    angle: classifier.predict(x_adv_rotated_all[angle])
    for angle in rotation_angles
}

print(f"Adversarial images rotated and evaluated for {len(rotation_angles)} angles.")


### Display Rotated Adversarial Examples
You can **choose whether to display all images or just a few.**

In [None]:
display_all_images = False  # Set to True to display all, False to show a few per angle

for angle in rotation_angles:
    print(f"\nRotation Angle: {angle}°")

    for i in range(len(x_adv_rotated_all[angle])):
        if not display_all_images and i > 1:
            break

        plot_prediction(
            np.squeeze(x_adv_rotated_all[angle][i]),  
            y_pred_adv_rotated_all[angle][i].reshape(1, -1),  
            correct_class=y_all[i],  
            target_class=y_target  
        )

### Evaluate Performance After Rotation

In [None]:
# Compute Accuracy After Rotation
for angle in rotation_angles:
    adv_top1_rotated = compute_accuracy(y_pred_adv_rotated_all[angle], y_all, top_k=1)
    adv_top5_rotated = compute_accuracy(y_pred_adv_rotated_all[angle], y_all, top_k=5)
    
    print(f"Rotation {angle}° → Top-1: {adv_top1_rotated:.1f}% | Top-5: {adv_top5_rotated:.1f}%")

## Step 7: Apply Expectation Over Transformation (EoT)

### **What is EoT and Why is it Useful?**
FGSM and adversarial attacks often **fail** when images undergo transformations like **rotations**.

**EoT (Expectation Over Transformation) mitigates this issue by:**
- Generating multiple **randomly transformed** versions of the adversarial image.
- Applying these transformations **during model evaluation** (predictions & gradients).
- Making the adversarial attack **robust to transformations** like **rotations, noise, and blur**.

---

### **Enable EoT in ART**
We use ART’s **`EoTImageRotationTensorFlow`** to introduce **random rotations** during classification.

---


In [None]:
# Create ART Classifier with EoT
eot_rotation = EoTImageRotationTensorFlow(nb_samples=eot_samples,  
                                          clip_values=clip_values,  
                                          angles=eot_angle)  # Random rotation range

classifier_eot = TensorFlowV2Classifier(model=model,
                                        nb_classes=nb_classes,
                                        loss_object=tf.keras.losses.CategoricalCrossentropy(),
                                        preprocessing=preprocessing,
                                        preprocessing_defences=[eot_rotation],  # EoT applied
                                        clip_values=clip_values,
                                        input_shape=input_shape)

print(f"EoT Classifier created with {eot_samples} transformation samples per evaluation.")


### Generate Adversarial Examples with EoT
We generate **adversarial examples** that remain effective even **after transformations**.

-----------------

In [None]:
from tqdm import tqdm

# Prepare target labels for targeted attacks
y_target_one_hot = np.zeros((1, nb_classes), dtype=np.float32)
y_target_one_hot[0, name_to_label("guacamole")] = 1.0  
y_target_all = np.tile(y_target_one_hot, (len(x_all), 1))  

x_adv_eot_all = []

for i in tqdm(range(len(x_all)), desc="Generating EoT Examples"):
    x_i = np.expand_dims(x_all[i], axis=0)  
    y_i = np.expand_dims(y_target_all[i], axis=0)  

    if attack.targeted:
        x_adv_i = attack.generate(x=x_i, y=y_i)
    else:
        x_adv_i = attack.generate(x=x_i)

    x_adv_eot_all.append(np.squeeze(x_adv_i))  

x_adv_eot_all = np.array(x_adv_eot_all)

print(f"Shape of EoT Adversarial Examples: {x_adv_eot_all.shape}")

### Apply Rotation to Adversarial Examples
We now test the **robustness** of these adversarial examples by **rotating them** at different angles.

--------------------

In [None]:
# Define rotation angles
rotation_angles = [-22.5, -10.0, -5.0, 0.0, 5.0, 10.0, 22.5]  

# Rotate and Evaluate Adversarial Examples
x_adv_rotated_all = {
    angle: np.array([
        scipy.ndimage.rotate(img, angle=angle, reshape=False, axes=(1, 2), order=1, mode='constant')
        for img in x_adv_eot_all
    ]) for angle in rotation_angles
}

y_pred_adv_rotated_all = {
    angle: classifier.predict(x_adv_rotated_all[angle])
    for angle in rotation_angles
}

print(f"Adversarial images rotated and evaluated for {len(rotation_angles)} angles.")

### Display Rotated Adversarial Examples
You can **choose whether to display all images or just a few per rotation angle**.

----------

In [None]:
display_all_images = False  

for angle in rotation_angles:
    print(f"\nRotation Angle: {angle}°")

    for i in range(len(x_adv_rotated_all[angle])):
        if not display_all_images and i > 1:  
            break

        plot_prediction(
            np.squeeze(x_adv_rotated_all[angle][i]),  
            y_pred_adv_rotated_all[angle][i].reshape(1, -1),  
            correct_class=y_all[i],  
            target_class=y_target  
        )

### Evaluate Performance After Rotation

In [None]:
# Compute Accuracy After Rotation
for angle in rotation_angles:
    adv_top1_rotated = compute_accuracy(y_pred_adv_rotated_all[angle], y_all, top_k=1)
    adv_top5_rotated = compute_accuracy(y_pred_adv_rotated_all[angle], y_all, top_k=5)
    
    print(f"Rotation {angle}° → Top-1: {adv_top1_rotated:.1f}% | Top-5: {adv_top5_rotated:.1f}%")

### Generate a Report on EoT Performance

In [None]:
# Define report save path
report_filename = f"EoT_{selected_attack}_{selected_dataset}_eps={round(epsilon, 2)}.txt"
report_path = os.path.join("adversarials_img", selected_attack, selected_dataset, report_filename)
os.makedirs(os.path.dirname(report_path), exist_ok=True)

# Generate Report
print("\nGenerating EoT attack report...")

with open(report_path, "w", encoding="utf-8") as f:
    f.write(f"====== EoT Adversarial Attack Report (ε = {round(epsilon, 2)}) ======\n\n")
    
    for angle in rotation_angles:
        adv_top1_rotated = compute_accuracy(y_pred_adv_rotated_all[angle], y_all, top_k=1)
        adv_top5_rotated = compute_accuracy(y_pred_adv_rotated_all[angle], y_all, top_k=5)
        
        f.write(f"\n=== Rotation {angle}° ===\n")
        f.write(f"Top-1 Accuracy: {adv_top1_rotated:.1f}%\n")
        f.write(f"Top-5 Accuracy: {adv_top5_rotated:.1f}%\n")

print(f"EoT Report saved: {report_path}")