# Benchmark

In this notebook we walkthrough how we can benchmark multiple attack and defences applied on a model. We will also see how we can submit the benchmarking as an experiment in AML as well. There are two classes that we will focus here - ``AttackBenchmark`` and ``DefenceBenchmark``. 

## Loading model and dataset
The first thing we will need is to load a model in mlfflow.

In [1]:
import mlflow

model_path = "../../models/grocery_mlflow/"
mlflow_model = mlflow.pyfunc.load_model(model_path)
print(f'Model: {type(mlflow_model)}')

Model: <class 'mlflow.pyfunc.PyFuncModel'>


As most of our module works with art model class, we need to convert the loaded model to art format

In [2]:
from aidefender.utils.mlflow import create_art_model

classifier = create_art_model(mlflow_model)
print(f'ART classifier: {type(classifier)}')

ART classifier: <class 'art.estimators.classification.pytorch.PyTorchClassifier'>


Next task is to load the dataset for experiment

In [3]:
from aidefender.exp.datasets import create_dataset, split_train_val

data_path = "../../data/grocery/images/"
dataset = create_dataset('aidefender.exp.datasets.GroceryDataset', data_path)
dataset_train, dataset_val = split_train_val(dataset, test_size=0.2)  


Loading images and labels from the dir: ../../data/grocery/images/
BEANS: only 136 images
CAKE: only 161 images
CANDY: only 372 images
CEREAL: only 278 images
CHIPS: only 181 images
CHOCOLATE: only 307 images
COFFEE: only 298 images
CORN: only 97 images
FISH: only 110 images
FLOUR: only 109 images
HONEY: only 185 images
JAM: only 241 images
JUICE: only 302 images
MILK: only 162 images
NUTS: only 168 images
OIL: only 143 images
PASTA: only 172 images
RICE: only 150 images
SODA: only 177 images
SPICES: only 207 images
SUGAR: only 118 images
TEA: only 283 images
TOMATO_SAUCE: only 171 images
VINEGAR: only 157 images
WATER: only 262 images


## Attack Benchmark

Let us benchmark bunch of attacks first. We rely on ART library so we can use attacks implemented in art library. We provide the attack name and parameter space via a dictionary.

In [4]:
import warnings
from aidefender.benchmark.attack import AttackBenchmark
from aidefender.exp.datasets import create_dataset, split_train_val

warnings.filterwarnings(action='ignore')

To apply multiple attacks we need to create an instance of ``AttackBenchmark``. We need to provide the attack configuration. We can load it from a json config or we can just simply use a *dict* object. Let us look at an example config first:

```json
{
    "fgsm":{
        "eps": [0.1, 0.2],
        "batch_size": [128],
        "norm": [1, 2, "inf"]
    },
    "pgd":{
        "norm" : [1, 2, "inf"],
        "eps": [0.1, 0.2],
        "batch_size": [128]
    },
    "deepfool":{
        "max_iter": [5, 10],
        "batch_size": [128]
    }
}
```

Notice the keys correspond to different attacks. It is important to know the supported keys. These keys basically map it to corresponding art attack object. The full list of supported attacks at the moment are : ``['fgsm', 'pgd', 'hsj', square', 'deepfool', 'cwl2', 'cwlinf', 'boundary']`` We can add more attack capability in **aidefender.attacks**. Let us make a config object now:


In [5]:
attack_config = {
    "fgsm":{
        "eps": [0.1, 0.2],
        "batch_size": [128],
        "norm": [1, 2, "inf"]
    },
    "deepfool":{
        "max_iter": [5, 10],
        "batch_size": [128]
    }
}

Now we have everything in place to create our attack benchmark object.

In [6]:
ab = AttackBenchmark(model=classifier, dataset=dataset, attack_configs=attack_config, num_samples=2)


We can run attack on the whole dataset, but it might be big. So we can limit that by using how many number of samples per class we want to use. 

In [7]:
results = ab.run()

Generating Adversarial images with fgsm_0.....
Attack params: {'eps': 0.1, 'batch_size': 128, 'norm': 1}
	 -Robustness accuracy: 1.0  (361 ms)

Generating Adversarial images with fgsm_1.....
Attack params: {'eps': 0.1, 'batch_size': 128, 'norm': 2}
	 -Robustness accuracy: 1.0  (346 ms)

Generating Adversarial images with fgsm_2.....
Attack params: {'eps': 0.1, 'batch_size': 128, 'norm': 'inf'}
	 -Robustness accuracy: 0.78  (366 ms)

Generating Adversarial images with fgsm_3.....
Attack params: {'eps': 0.2, 'batch_size': 128, 'norm': 1}
	 -Robustness accuracy: 1.0  (344 ms)

Generating Adversarial images with fgsm_4.....
Attack params: {'eps': 0.2, 'batch_size': 128, 'norm': 2}
	 -Robustness accuracy: 1.0  (349 ms)

Generating Adversarial images with fgsm_5.....
Attack params: {'eps': 0.2, 'batch_size': 128, 'norm': 'inf'}
	 -Robustness accuracy: 0.44  (358 ms)

Generating Adversarial images with deepfool_0.....
Attack params: {'max_iter': 5, 'batch_size': 128}


DeepFool:   0%|          | 0/1 [00:00<?, ?it/s]

	 -Robustness accuracy: 0.0  (18680 ms)

Generating Adversarial images with deepfool_1.....
Attack params: {'max_iter': 10, 'batch_size': 128}


DeepFool:   0%|          | 0/1 [00:00<?, ?it/s]

	 -Robustness accuracy: 0.0  (18643 ms)



In [8]:
results

Unnamed: 0,name,params,robustness_accuracy,time(ms)
0,fgsm_0,"{'eps': 0.1, 'batch_size': 128, 'norm': 1}",1.0,361
1,fgsm_1,"{'eps': 0.1, 'batch_size': 128, 'norm': 2}",1.0,346
2,fgsm_2,"{'eps': 0.1, 'batch_size': 128, 'norm': 'inf'}",0.78,366
3,fgsm_3,"{'eps': 0.2, 'batch_size': 128, 'norm': 1}",1.0,344
4,fgsm_4,"{'eps': 0.2, 'batch_size': 128, 'norm': 2}",1.0,349
5,fgsm_5,"{'eps': 0.2, 'batch_size': 128, 'norm': 'inf'}",0.44,358
6,deepfool_0,"{'max_iter': 5, 'batch_size': 128}",0.0,18680
7,deepfool_1,"{'max_iter': 10, 'batch_size': 128}",0.0,18643


## Defense Benchmark
Similar to attack benchmarking we need a defence config to test. 




In [11]:
from aidefender.benchmark.defence import DefenceBenchmark

defense_config = {
                    "art.defences": {
                        "preprocessor":{
                            "SpatialSmoothing":{
                                "window_size": [2,3]
                            },
                            "FeatureSqueezing":{
                                "bit_depth": [2,3],
                                "clip_values": [(0,1)]
                            }
                        }   
                    },
                    "aidefender.defences":{
                        "bart": {
                            "BaRT":{
                                "apply_fit":[False],
                                "apply_predict":[True]
                            }
                        }
                    }
                }

db = DefenceBenchmark(model=classifier, dataset=dataset, defence_configs=defense_config, num_samples=2)
results = db.run()

- Base Model Robustness: 0.34 

- Base Model Accuracy 0.72
--------------------- SpatialSmoothing_0 --------------------------

- Model Robustness accuracy with (SpatialSmoothing_0): 0.6
- Model accuracy: 0.7
--------------------- SpatialSmoothing_1 --------------------------

- Model Robustness accuracy with (SpatialSmoothing_1): 0.58
- Model accuracy: 0.7
--------------------- FeatureSqueezing_0 --------------------------

- Model Robustness accuracy with (FeatureSqueezing_0): 0.32
- Model accuracy: 0.72
--------------------- FeatureSqueezing_1 --------------------------

  4%|▍         | 2/50 [00:00<00:02, 17.24it/s]- Model Robustness accuracy with (FeatureSqueezing_1): 0.36
- Model accuracy: 0.72
--------------------- BaRT_0 --------------------------

100%|██████████| 50/50 [00:02<00:00, 17.67it/s]
100%|██████████| 32/32 [00:00<00:00, 33.28it/s]
100%|██████████| 18/18 [00:00<00:00, 27.63it/s]
100%|██████████| 50/50 [00:02<00:00, 20.73it/s]
100%|██████████| 50/50 [00:03<00:00, 16.4

## Benchmark AML Pipeline

We can also run the benchmarking as an AML pipeline using the ``run_benchmark.py`` script in ``aidefender/exp/``

```bash
python run_benchmark.py --config_path=../../notebooks/configs/attack_config.json --model_path=grocery_net --type='attack'
python run_benchmark.py --config_path=../../notebooks/configs/defense_config.json --model_path=grocery_net --type='defence'
```
