This notebook is related to the work to be performed in the section `4 Bonus` of the project description file.

# Robustness of the obtained models to Adversarial Examples using the DeepFool algorithm for multi-class problems

## Introduction

This Jupyter notebook was created by Antónia Brito, António Cardoso and Pedro Sousa for the Machine Learning II (CC3043) course at University of Porto. It serves as a practical execution of the _DeepFool_ algorithm as an evaluation strategy for the robustness of the obtained models against adversarial examples.

## Authorship

- **Author:** Antónia Brito, António Cardoso, Pedro Sousa
- **University:** Faculty of Science from University of Porto
- **Course:** Machine Learning II (CC3043)
- **Date:** 05/12/2023


## The *DeepFool* Algorithm

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi and Pascal Frossard published a paper proposing an algorithm entitled DeepFool to evaluate with a simple and accurate methodology the robustness of a classifier against adversarial examples ([link to the publication in PDF format](https://openaccess.thecvf.com/content_cvpr_2016/papers/Moosavi-Dezfooli_DeepFool_A_Simple_CVPR_2016_paper.pdf)).

For a given classifier and example, the algorithm is set to compute the minimal perturbation that is sufficient to change the estimated label. For the remainder of the section, the following notation will be used:

- $f$: a given classifier that outputs a vector with the probability distribution for the classification associated with its probability index.

- $x$: a given example.

- $X$: the domain of the examples.

- $T$: the domain of test examples available.

- $k$: a possible classification of the considered problem. Thus, the probability of the classification of an example $x$ to be $k$ is $f_k(x)$.

- $\hat{k}(x)$: the estimated classification of a given example. It is noted that $\hat{k}(x) = argmax_k( f_k(x) )$.

- $\hat{r}(x)$: the minimal perturbation for which $\hat{k}(x) \ne \hat{k}(x+\hat{r}(x))$.

The DeepFool algorithm only outputs the value of the minimal perturbation $\hat{r}(x)$ of a given example $x$. The following pseudocode represents the DeepFool algorithm:

--- algorithm ---

Finally, the proposed formal definition for the robustness to adversarial examples of a given classifier is the expected value over the domain of examples for the norm of the minimal perturbation for an example divided by the norm of that same example. For practical purposes, the aforementioned expected value is approximated to the mean value for all examples in the available test domain of the classifier:

$\rho_{adv}(f) = 𝔼_X \frac{||\hat{r}(x)||_2}{||x||_2} ≈ \frac{1}{|T|} ∑_{x\in T} \frac{||\hat{r}(x)||_2}{||x||_2}$


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Implementation of the DeepFool algorithm for multi-class problems on a given model and input example

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import models, layers, regularizers, optimizers
from tensorflow.python.client import device_lib
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import numpy as np
import pickle
import numpy as np
import pandas as pd
import os
import librosa
from matplotlib import pyplot as plt
import warnings

In [3]:
from copy import deepcopy

def get_gradient(model, example, k):
    with tf.GradientTape(persistent=True) as tape:
        inputs = [tf.cast(input_value, dtype=tf.float64) for input_value in example]
        for input_value in inputs:
            tape.watch(input_value)
        results = model(inputs)
        results_k = results[0,k]

    gradients = tape.gradient(results_k, inputs) # gradients of the model result w.r.t. the inputs
    del tape
    return [grad.numpy() for grad in gradients], results

def deepfool(model, x0, eta, max_iter):
    f_x0 = model(x0).numpy().flatten()
    label_x0 = f_x0.argsort()[::-1][0]

    loop_i = 0
    xi = deepcopy(x0)
    label_xi = label_x0
    r = []
    while label_xi == label_x0 and loop_i < max_iter:
        w_l = [np.zeros(x_input.shape) for x_input in x0]
        f_l = 0
        fk_wk_min = np.inf
        grad_f_label_x0_on_xi, f_xi = get_gradient(model, xi, label_x0)
        for k in range(10): # k = 0, ..., 9 (possible classes in the problem considered for this project)
            if (k == label_x0):
                continue
            grad_f_k_on_xi, f_xi = get_gradient(model, xi, k)
            w_k = [g_f_k - g_f_label for g_f_k, g_f_label in zip(grad_f_k_on_xi, grad_f_label_x0_on_xi)]
            w_k_norm = np.sqrt(np.sum(np.fromiter([np.linalg.norm(w_k_input)**2 for w_k_input in w_k], dtype=np.float32)))
            f_k = f_xi[0,k] - f_xi[0,label_x0]
            fk_wk = np.linalg.norm(f_k) / (w_k_norm + 1e-3)

            if fk_wk < fk_wk_min:
                w_l, f_l = w_k, f_k
        w_l_squared_norm = np.sum(np.fromiter([np.linalg.norm(w_l_input)**2 for w_l_input in w_l], dtype=np.float32))
        f_l_norm = np.linalg.norm(f_l)
        ri_const = f_l_norm / (w_l_squared_norm + 1e-3)
        ri = [ri_const * w_l_input for w_l_input in w_l]
        r.append(ri)
        xi_new = [xi_item + (1+eta)*ri_item for xi_item, ri_item in zip(xi, ri)]
        xi = xi_new
        label_xi = model(xi).numpy().flatten().argsort()[::-1][0]
        loop_i += 1

    # while loop finished
    r_sum = [np.zeros(x_input.shape) for x_input in x0]
    for i in range(len(x0)):
        for r_i in r:
            r_sum[i] += r_i[i][0]

    return r_sum, loop_i, label_xi

def example_robustness(x, r):
    r_norm = np.sqrt(np.sum(np.fromiter([np.linalg.norm(r_input)**2 for r_input in r], dtype=np.float32)))
    x_norm = np.sqrt(np.sum(np.fromiter([np.linalg.norm(x_input)**2 for x_input in x], dtype=np.float32)))
    return r_norm / x_norm

def model_robustness(example_robustness_list):
    mean = np.mean(np.array(example_robustness_list))
    std = np.std(np.array(example_robustness_list))
    return mean, std


### Saving and Loading functions

In [4]:
def save_pkl(data, path):
    with open(path, "wb") as saved_data:
        pickle.dump(data, saved_data)
    saved_data.close()

def load_pkl(path):
    to_return = None
    with open(path, "rb") as loaded_data:
        to_return = pickle.load(loaded_data)
    loaded_data.close()
    return to_return

In [None]:
FOLDS_PATH = "UrbanSound8K/audio/"
DURATION = 4 # 4 seconds for each audio file
SAMPLE_RATE = 22050
HOP_LENGTH = round(SAMPLE_RATE * 0.0125)
WIN_LENGTH = round(SAMPLE_RATE * 0.023)
N_FFT = 2**10
TIME_SIZE = 4*SAMPLE_RATE//HOP_LENGTH+1

## Methodology for the Empirical Experience using DeepFool

As stated before, the robustness for a classifier is defined, in practice, as the mean value for the norm of the minimal perturbation for an example divided by its norm.


Given that the models constructed in this project were empirically evaluated using a 10-fold cross validation, it is deemed necessary to obtain the robustness for each of the models trained in the cross validation process, both for the CNN and the RNN models.

Hence, at each iteration, the data to be used to test the robustness to adversarial examples of the current model will be the same test data used to compute the cross validation metrics in the notebooks *CNN.ipynb* and *LSTM.ipynb*.

The final results to be present for each of the model architectures will be the mean value and standard deviation of the obtained results for each of the cross validation test datasets.


## Running the DeepFool algorithm on the CNN

In [None]:
df_data = load_pkl("/content/drive/MyDrive/Colab Notebooks/urbansound8k_cnn.pkl")

In [None]:

for f in range(1, 10+1):
    robustness_values_cnn_fold = []
    fool_labels_fold = {i: [] for i in range(10)}

    X_fold = df_data[df_data['fold'] == f"fold{f}"]

    X_mel = np.asarray(X_fold["mel_spec"].to_list()).astype(np.float32)
    X_chroma = np.asarray(X_fold["chroma"].to_list()).astype(np.float32)

    centroid = np.asarray(tuple(X_fold["spectral_centroid"].to_list())).astype(np.float32)
    bandwidth = np.asarray(tuple(X_fold["spectral_bandwidth"].to_list())).astype(np.float32)
    flatness = np.asarray(tuple(X_fold["spectral_flatness"].to_list())).astype(np.float32)
    rolloff = np.asarray(tuple(X_fold["spectral_rolloff"].to_list())).astype(np.float32)
    X_1d = np.stack([centroid,bandwidth,flatness,rolloff], axis=-1)
    X_1d = X_1d.reshape(-1, TIME_SIZE, 4)

    fold_model_cnn = keras.models.load_model(f"/content/drive/MyDrive/Colab Notebooks/CNN Models/cnn_model{f}.h5", compile=False)
    fold_model_cnn.compile(
        optimizer=optimizers.Adam(learning_rate=0.001),
        loss="categorical_crossentropy",
        metrics=["accuracy"]
    )

    print(f"\nFOLD {f} ({len(X_fold)})")
    for i in range(len(X_fold)):
        if i > 1: break
        print(i, end=" ")
        if i > 0 and i % 100 == 0:
          print()

        example_input = [np.array([X_mel[i]]), np.array([X_chroma[i]]), np.array([X_1d[i]])]
        model_label = fold_model_cnn(example_input).numpy().flatten().argmax()
        perturbation, iters, fool_label = deepfool(fold_model_cnn, example_input, 0.01, 20)
        print(perturbation)
        print(perturbation[0].shape)
        fool_labels_fold[model_label].append(fool_label)
        robustness_values_cnn_fold.append(example_robustness(example_input, perturbation))

        iters_values_cnn.append(iters)
        fool_labels_cnn[model_label].append(fool_label)
        robustness_values_cnn.append(example_robustness(example_input, perturbation))

    save_pkl(fool_labels_fold, f"/content/drive/MyDrive/Colab Notebooks/fool_labels_cnn_fold{f}.pkl")
    save_pkl(robustness_values_cnn_fold, f"/content/drive/MyDrive/Colab Notebooks/robustness_values_cnn_fold{f}.pkl")


In [None]:
for f in range(1, 10+1):
    robustness_values_cnn_fold = load_pkl(f"/content/drive/MyDrive/Colab Notebooks/robustness_values_cnn_fold{f}.pkl")
    mean_robustness_cnn, std_robustness_cnn = model_robustness(robustness_values_cnn_fold)
    print(f"Fold {f} - The CNN model has a robustness of {mean_robustness_cnn: .7f} +/- {std_robustness_cnn: .7f}.")

Fold 1 - The CNN model has a robustness of  0.0000998 +/-  0.0001057.
Fold 2 - The CNN model has a robustness of  0.0000744 +/-  0.0000929.
Fold 3 - The CNN model has a robustness of  0.0001274 +/-  0.0001670.
Fold 4 - The CNN model has a robustness of  0.0000907 +/-  0.0001367.
Fold 5 - The CNN model has a robustness of  0.0001199 +/-  0.0001449.
Fold 6 - The CNN model has a robustness of  0.0000607 +/-  0.0000818.
Fold 7 - The CNN model has a robustness of  0.0000913 +/-  0.0000947.
Fold 8 - The CNN model has a robustness of  0.0001069 +/-  0.0001206.
Fold 9 - The CNN model has a robustness of  0.0001230 +/-  0.0001500.
Fold 10 - The CNN model has a robustness of  0.0000922 +/-  0.0000921.


The results for the various folds' models indicate that they are not very robust to adversarial examples, as the norm of the minimal perturbation to alter the models' predictions is very small relatively to the corresponding input's norm (about 0.00607% to 0.01274% of the original input's norm).

## Running the DeepFool algorithm for the RNN

In [5]:
df_data = load_pkl("/content/drive/MyDrive/Colab Notebooks/urbansound8k_rnn.pkl")

In [None]:

for f in range(7, 10+1):
    robustness_values_rnn_fold = []
    fool_labels_rnn_fold = {i: [] for i in range(10)}

    X_fold = df_data[df_data['fold'] == f"fold{f}"]

    X_spec = np.asarray(X_fold["spec"].to_list()).astype(np.float32)

    fold_model_rnn = keras.models.load_model(f"/content/drive/MyDrive/Colab Notebooks/RNN Models/rnn_model{f}.h5", compile=False)
    fold_model_rnn.compile(
        optimizer=optimizers.Adam(learning_rate=0.001),
        loss="categorical_crossentropy",
        metrics=["accuracy"]
    )

    print(f"\nFOLD {f} ({len(X_fold)})")
    for i in range(len(X_fold)):
        print(i, end=" ")
        if i > 0 and i % 100 == 0:
          print()

        example_input = [np.array([X_spec[i]])]
        model_label = fold_model_rnn(example_input).numpy().flatten().argmax()
        perturbation, iters, fool_label = deepfool(fold_model_rnn, example_input, 1e6, 15)
        fool_labels_rnn_fold[model_label].append(fool_label)
        robustness_values_rnn_fold.append(example_robustness(example_input, perturbation))

        iters_values_rnn.append(iters)
        fool_labels_rnn[model_label].append(fool_label)
        robustness_values_rnn.append(example_robustness(example_input, perturbation))

    save_pkl(fool_labels_rnn_fold, f"/content/drive/MyDrive/Colab Notebooks/fool_labels_rnn_fold{f}.pkl")
    save_pkl(robustness_values_rnn_fold, f"/content/drive/MyDrive/Colab Notebooks/robustness_values_rnn_fold{f}.pkl")


In [8]:
for f in range(1, 10+1):
    robustness_values_rnn_fold = load_pkl(f"/content/drive/MyDrive/Colab Notebooks/robustness_values_rnn_fold{f}.pkl")
    mean_robustness_rnn, std_robustness_rnn = model_robustness(robustness_values_rnn_fold)
    print(f"Fold {f} - The RNN model has a robustness of {mean_robustness_rnn: .7f} +/- {std_robustness_rnn: .7f}.")

Fold 1 - The RNN model has a robustness of  0.0009903 +/-  0.0012117.
Fold 2 - The RNN model has a robustness of  0.0010233 +/-  0.0011046.
Fold 3 - The RNN model has a robustness of  0.0015006 +/-  0.0011319.
Fold 4 - The RNN model has a robustness of  0.0012471 +/-  0.0010433.
Fold 5 - The RNN model has a robustness of  0.0015296 +/-  0.0013027.
Fold 6 - The RNN model has a robustness of  0.0007348 +/-  0.0010656.
Fold 7 - The RNN model has a robustness of  0.0011942 +/-  0.0010320.
Fold 8 - The RNN model has a robustness of  0.0013856 +/-  0.0013305.
Fold 9 - The RNN model has a robustness of  0.0009796 +/-  0.0011695.
Fold 10 - The RNN model has a robustness of  0.0008553 +/-  0.0008744.


The RNN results show some improvement compared to the CNN ones by a relative factor of approximately 10 times higher.

However, the results show that the model robustness is considerably weak, as the magnitude of the minimal perturbation that changes the model prediction is quite small, ranging from 0.099% to 0.152% of the original input's magnitude.

## Resources & References

