# Team Ares -- Task 1 Report -- Fall 2020
## Contributions:
### Cody Shearer
- Code:
  - Generated BIM AEs
  - Evalauated BIM AEs
  - Generated plots for BIM evaluation
- Report:
  - Background
  - All BIM related content
- Created/managed team repository.
- Helped setup development environments.
- Organized team meetings.

### Zhymir Thompson
- Performed experiments, gathered results for Carlini Wagner attack.
- Modified script to make generating and evaluating AEs easier (now only requires having sub data and roots and paths to outputs and configs)

### Mahmudul Hasan
- Performed JSMA experiment, did evaluation and wrote JSMA report.

### Vincent Davidson
- Co-managed team meetings
- Co-managed/organized individual contributions for each team member
- Performed experiments, evaluation and analysis on PGD attacks. 

## Background
In their work on ATHENA, Ying et al. (2020) solve the problem of adversarial defense, not as a technique, but as a framework, wherein a variable number of weak adversarial defenses (an ensemble) are trained and their collective predictions are used to create a response to adversarial attacks, the robustness and overhead of which are inversely correlated and controlled by the number of weak defenses.

In the following report, we compare the robustness of ATHENA's ensemble with PGD-ADT and an undefended (control) model by subjecting them to several varations of different adversarial attack methods. 

## BIM Attack and Evaluation
### Introduction
The basic iterative method (BIM) is a whitebox adversarial attack developed by researchers at Google Brain and OpenAI. In their paper, Kurakin et. al demonstrate transferability of adversarial examples from a lab setting to a real-world setting. In particular, they show that adversarial examples generated by attackers who have direct access to an image classifier can still fool that same model when the images are seen through a physical camera.

### Background
We begin with the non-iterative, fast method from (Goodfellow et. al., 2014), then describe its iterative extension by (Kurakin et. al., 2016). The fast method is as follows, where
- $X$ is the original image (3d tensor)
- $X^{adv}$ is an adversarial image
- $\epsilon$ is a hyper-parameter which controls how much of the perterbation is applied to the original image. That is, $\epsilon$ constrains the $L_{\infty}$ of the adversarial image.
- $J(X,y_{true})$ is the cross-entropy cost function of the neural network.

$$ X^{adv} = X + \epsilon \space sign(\nabla_{X}J(X,y_{true})) $$

Simply put, the fast method finds what changes to an image will make a neural network's classification of an image worse and applies an $\epsilon$ amount of that change to the original image.

Through an iterative version the above method, called the basic iterative method (BIM), Kurakin et. al. show how to produce effective adversarial images with far smaller perturbations, where
- $ Clip_{X,\epsilon}\{X'\}$ is a function that takes a source image $X$, a perturbed image $X'$, and a constraint $\epsilon$, which performs per-pixel clipping of the perturbed image $X'$, such that it remains in the $L_{\infty}\epsilon$-neighbourhood of the source image $X$. 

$$ X^{adv}_0 = X,\space\space\space X^{adv}_{N+1} = Clip_{X,\epsilon}\{X^{adv}_N + \alpha sign(\nabla_X J(X^{adv}_N,y_{true}))\} $$

We see that at the initial iteration, the adversarial image is just the source image and with each subsequent step, an $\alpha$ amount of the perterbations are applied, such that at each step the resulting image remains visually similar to the previous image. 

### Experimental Setting
Here we consider an adversarial attack on a convolutional neural network (CNN) trained on a subset (10%) of the MNIST dataset using nine variations of the [basic iterative method](https://arxiv.org/pdf/1607.02533.pdf) (BIM). We generate these variations as a pairwise combination over the epsilon values of `0.1, 0.2, 0.3` and maximum iteration values of `50, 60, 70` to reveal how these parameters influence the error rate of the undefeneded model (UM), an athena ensemble, and PGD-ADT. We expect that error rate will generally increase as both values increase.

Using the following configurations, we generate adversarial examples (AEs) and evaluate their effectivness against the undefended model (UM), the ensemble model, and PGD-ADT. When evaluating the AEs, we use a diverse subset of 20 of Athena's 73 weak defenses for MNIST, which can be found in the first configuration file. 
- `configs/BIM/athena-mnist.json`
- `configs/BIM/attack-bim-mnist.json`
- `configs/BIM/data-bim-mnist.json`

All scripts for the BIM experiments can be found in the following jupyter notebook `scripts/cody_scripts/task3.ipynb`.

The AEs we generated can be found at `data/AE-mnist-cnn-clean-bim_eps{e}_maxiter{i}.npy`, where `e` is a value for epsilon and `i` is the maximum number of iterations.

### Initial Investigation
We first plot a sample of the AEs for each pair of epsilon and maximum iteration values to get an understanding of how these values impact the perterbations.

![](figures/example_BIM_AEs.png)

We can see that with each increase in epsilon, there is a distinct increase in the amount of visual noise in the images. However, there is very little difference for each increase in the maximum number of iterations. Ultimately, all of the above AEs fooled the undefended model (UM). 

## Evaluation

Plotting the error rates for each model, we find that the undefended model (UM) performs as well as we would expect from our initial investigation - all the BIM attacks fooled the UM more than 90% of the time.

![](figures/bim_evaluation.png)

### Influence of Maximum Iteration

We see over the range $maximum\space iteration = [50,\space 60,\space 70]$, there is little impact on the error rate of any model. We also see an unexpected trend for the ensemble model (EM), where a maximum iteration value of 60 consistently results in a higher error rate than for a maximum iteration value of 70. However, the difference is small, along with our sample size, so further investigation with a larger range of finer values will be necessary to reveal a meaningful trend.  

### Influence of Epsilon

We see over the range $\epsilon = [0.1,\space 0.2,\space 0.3]$, there is a large impact on all of the models. However, while the error rate for the undefended model converges at 0.989 for all $\epsilon > 0.1$, the error rate for the ensemble model and PGD-ADT remain less than 1% until $\epsilon = 0.3$, where error rate sharply increases to around 45% for the former and around 35% for the latter.

## Conclusion

For BIM attacks ranging in values of $\epsilon = [0.1,\space 0.2,\space 0.3]$ and $maximum\space iteration = [50,\space 60,\space 70]$, PGD-ADT proves to be more robust against our collection of 20 of Athena's vanilla weak defenses for MNIST. However, as we did not measure the computational or memory overhead for Athena or PGD-ADT, it is possible that Athena is more efficent.

## Citations
- [Dong, Yinpeng, et al. (2020)](https://arxiv.org/pdf/2002.05999.pdf) "Adversarial Distributional Training for Robust Deep Learning." Advances in Neural Information Processing Systems 33.
- [Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy (2014)](https://arxiv.org/abs/1412.6572). "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572.
- [Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. (2016) ](https://arxiv.org/pdf/1607.02533.pdf) "Adversarial examples in the physical world." arXiv preprint arXiv:1607.02533.
- [Meng, Ying, et al. (2020)](https://arxiv.org/abs/2001.00308) "Ensembles of many diverse weak defenses can be strong: defending deep neural networks against adversarial attacks." arXiv preprint arXiv:2001.00308.
- [LeCun, Y. & Cortes, C. (2010)](http://yann.lecun.com/exdb/mnist/), 'MNIST handwritten digit database', .

## CW Attack and Evaluation

The CW attack is an adversarial attack that attempts to find a balance between an example that tricks the target model and an example that is valid (realistic image).

For the CW attack, the variables altered were the learning rate and normalization method. There are a total of 10 variations where 5 learning rates are repeated for each normalization method. The files for this attack are stored in ~/src/task1/attack2.

### Files Used

#### Configs:

* src/configs/task3/attack_2_cw_config.json
* src/configs/task3/athena-mnist.json
* src/configs/task3/attack_cw_config.json
* src/configs/task3/data_cw_config.json
* src/configs/task3/data_config.json
* src/configs/task3/sub_data_config.json
* src/configs/task3/model_config.json

#### sub-samples:

* Task1_update/data/sublabels-1000-ratio_0.1-398037.171.npy
* Task1_update/data/subsamples-1000-ratio_0.1-398037.171.npy

#### AE's:

* Task1_update/data/cw_norm_l2_lr_1e-3_iter_10_.npy
* Task1_update/data/cw_norm_linf_lr_1e-4_iter_20_eps_0-010_.npy
* Task1_update/data/cw_norm_linf_lr_1e-4_iter_10_eps_0-010_.npy
* Task1_update/data/cw_norm_linf_lr_1e-4_iter_10_eps_0-20_.npy
* Task1_update/data/cw_norm_l2_lr_1e-5_iter_20_.npy
* Task1_update/data/cw_norm_linf_lr_1e-4_iter_30_eps_0-020_.npy
* Task1_update/data/cw_norm_l2_lr_1e-5_iter_10_.npy
* Task1_update/data/cw_norm_l2_lr_1e-5_iter_50_.npy
* Task1_update/data/cw_norm_linf_lr_1e-4_iter_50_eps_0-010_.npy
* Task1_update/data/cw_norm_linf_lr_1e-4_iter_20_eps_0-020_.npy
* Task1_update/data/cw_norm_l2_lr_1e-3_iter_50_.npy
* Task1_update/data/cw_norm_l2_lr_1e-5_iter_30_.npy
* Task1_update/data/cw_norm_l2_lr_1e-3_iter_20_.npy
* Task1_update/data/cw_norm_linf_lr_1e-4_iter_30_eps_0-010_.npy
* Task1_update/data/cw_norm_l2_lr_1e-3_iter_30_.npy
* Task1_update/data/cw_norm_linf_lr_1e-4_iter_50_eps_0-20_.npy

#### Results:

* Task1_update/results/cw_norm_l2_lr_1e-3_iter_10__eval.csv
* Task1_update/results/cw_norm_linf_lr_1e-4_iter_10_eps_0-010__eval.csv
* Task1_update/results/cw_norm_linf_lr_1e-4_iter_50_eps_0-20__eval.csv
* Task1_update/results/cw_norm_l2_lr_1e-5_iter_50__eval.csv
* Task1_update/results/cw_norm_linf_lr_1e-4_iter_50_eps_0-010__eval.csv
* Task1_update/results/cw_norm_linf_lr_1e-4_iter_30_eps_0-010__eval.csv
* Task1_update/results/cw_norm_l2_lr_1e-5_iter_20__eval.csv
* Task1_update/results/cw_norm_linf_lr_1e-4_iter_20_eps_0-020__eval.csv
* Task1_update/results/cw_norm_l2_lr_1e-3_iter_20__eval.csv
* Task1_update/results/cw_norm_l2_lr_1e-3_iter_30__eval.csv
* Task1_update/results/cw_norm_l2_lr_1e-5_iter_30__eval.csv
* Task1_update/results/cw_norm_linf_lr_1e-4_iter_10_eps_0-20__eval.csv
* Task1_update/results/cw_norm_l2_lr_1e-5_iter_10__eval.csv
* Task1_update/results/cw_norm_linf_lr_1e-4_iter_20_eps_0-010__eval.csv
* Task1_update/results/cw_norm_l2_lr_1e-3_iter_50__eval.csv
* Task1_update/results/cw_norm_linf_lr_1e-4_iter_30_eps_0-020__eval.csv


### Results

#### Summary and Analysis

Overall, the experiment data showed a trend similar to the graph for sqrt(x) in response to increased max iterations or increased learning rate. In other words, an increase in the max number of iterations or learning rate led to an increase in error rate, but the error rate increased less and less as either the max number of iterations or learning rate continued to increase.

None of the experiments were successful in causing the undefended model (UM) to have an error rate at or above 50%. Furthermore, the error rate never went above 1.2% for the Ensemble and the error rate for the PGD trained model was only slightly better at a maximum of 2.5%.

Regarding the time to generate these experiments, L2 normalization took significantly longer than the L infinity normalization while the data suggested both were almost equal in effectiveness. Practically, it would seem advantageous to use L infinity normalization since, as suggested in the explanation of the carlini wagner attack article below, one could test numerous different epsilons to find a good balance between error and minimal changes to the image in the time it would take for a few L2 normalized attacks to converge.


#### Data

|Norm|LR     |Max Iter|Epsilon|UM      |Ensemble|PGD-ADT|Image 1|Image 2|Image 3|
|----|-------|--------|-------|--------|-------|-------|-------|-------|--------|
|L2  |0.001  |10      |N/A    |0.302326|0.0121335|0.0242669|![](figures/cw_norm_l2_lr_1e-3_iter_10__image_0.png)|![](figures/cw_norm_l2_lr_1e-3_iter_10__image_1.png) |![](figures/cw_norm_l2_lr_1e-3_iter_10__image_2.png) |
|L2  |0.001  |20      |N/A    |0.3700708|0.0111224|0.0252781 |![](figures/cw_norm_l2_lr_1e-3_iter_20__image_0.png)|![](figures/cw_norm_l2_lr_1e-3_iter_20__image_1.png)|![](figures/cw_norm_l2_lr_1e-3_iter_20__image_2.png)|
|L2  |0.001  |30      |N/A    |0.388271|0.0111224|0.02426694|![](figures/cw_norm_l2_lr_1e-3_iter_30__image_0.png)|![](figures/cw_norm_l2_lr_1e-3_iter_30__image_1.png)|![](figures/cw_norm_l2_lr_1e-3_iter_30__image_2.png)|
|L2  |0.001  |50      |N/A    |0.4115268|0.01112235|0.02426694|![](figures/cw_norm_l2_lr_1e-3_iter_50__image_0.png)|![](figures/cw_norm_l2_lr_1e-3_iter_50__image_1.png)|![](figures/cw_norm_l2_lr_1e-3_iter_50__image_2.png)|
|L2  |0.00001|10      |N/A    |0.255814|0.0101112|0.02022245|![](figures/cw_norm_l2_lr_1e-5_iter_10__image_0.png)|![](figures/cw_norm_l2_lr_1e-5_iter_10__image_1.png)|![](figures/cw_norm_l2_lr_1e-5_iter_10__image_2.png)|
|L2  |0.00001|20      |N/A    |0.35187058|0.010111224|0.0232558|![](figures/cw_norm_l2_lr_1e-5_iter_20__image_0.png)|![](figures/cw_norm_l2_lr_1e-5_iter_20__image_1.png)| ![](figures/cw_norm_l2_lr_1e-5_iter_20__image_2.png)|
|L2  |0.00001|30      |N/A    |0.37815976|0.01011122|0.02426694|![](figures/cw_norm_l2_lr_1e-5_iter_30__image_0.png)|![](figures/cw_norm_l2_lr_1e-5_iter_30__image_1.png)|![](figures/cw_norm_l2_lr_1e-5_iter_30__image_2.png)|
|L2  |0.00001|50      |N/A    |0.39433772|0.010111224|0.02426694|![](figures/cw_norm_l2_lr_1e-5_iter_50__image_0.png)|![](figures/cw_norm_l2_lr_1e-5_iter_50__image_1.png)|![](figures/cw_norm_l2_lr_1e-5_iter_50__image_2.png)|
|LINF|0.0001 |10|0.1|0.1051567|0.0101112|0.01112235|![](figures/cw_norm_linf_lr_1e-4_iter_10_eps_0-010__image_0.png)|![](figures/cw_norm_linf_lr_1e-4_iter_10_eps_0-010__image_1.png)|![](figures/cw_norm_linf_lr_1e-4_iter_10_eps_0-010__image_2.png)|
|LINF|0.0001 |10|0.2|0.40242669|0.01213347|0.0222447|![](figures/cw_norm_linf_lr_1e-4_iter_10_eps_0-20__image_0.png)|![](figures/cw_norm_linf_lr_1e-4_iter_10_eps_0-20__image_1.png)|![](figures/cw_norm_linf_lr_1e-4_iter_10_eps_0-20__image_2.png)|
|LINF|0.0001 |20|0.1|0.1183013|0.0101112|0.01112235|![](figures/cw_norm_linf_lr_1e-4_iter_20_eps_0-010__image_0.png)|![](figures/cw_norm_linf_lr_1e-4_iter_20_eps_0-010__image_1.png)|![](figures/cw_norm_linf_lr_1e-4_iter_20_eps_0-010__image_2.png)|
|LINF|0.0001 |20|0.2|0.4509606|0.0121335|0.0242669|![](figures/cw_norm_linf_lr_1e-4_iter_20_eps_0-020__image_0.png)|![](figures/cw_norm_linf_lr_1e-4_iter_20_eps_0-020__image_1.png)|![](figures/cw_norm_linf_lr_1e-4_iter_20_eps_0-020__image_2.png)|
|LINF|0.0001 |30|0.1|0.1223458|0.0101112|0.01112235|![](figures/cw_norm_linf_lr_1e-4_iter_30_eps_0-010__image_0.png)|![](figures/cw_norm_linf_lr_1e-4_iter_30_eps_0-010__image_1.png)|![](figures/cw_norm_linf_lr_1e-4_iter_30_eps_0-010__image_2.png)|
|LINF|0.0001 |30|0.2|0.46006067|0.01213347|0.02426694|![](figures/cw_norm_linf_lr_1e-4_iter_30_eps_0-020__image_0.png)|![](figures/cw_norm_linf_lr_1e-4_iter_30_eps_0-020__image_1.png)|![](figures/cw_norm_linf_lr_1e-4_iter_30_eps_0-020__image_2.png)|
|LINF|0.0001 |50|0.1|0.12335693|0.0101112|0.01112235|![](figures/cw_norm_linf_lr_1e-4_iter_50_eps_0-010__image_0.png)|![](figures/cw_norm_linf_lr_1e-4_iter_50_eps_0-010__image_1.png)|![](figures/cw_norm_linf_lr_1e-4_iter_50_eps_0-010__image_2.png)|
|LINF|0.0001 |50|0.2|0.46107179|0.01213347|0.02426694|![](figures/cw_norm_linf_lr_1e-4_iter_50_eps_0-20__image_0.png)|![](figures/cw_norm_linf_lr_1e-4_iter_50_eps_0-20__image_1.png)|![](figures/cw_norm_linf_lr_1e-4_iter_50_eps_0-20__image_2.png)|

### Citations
https://engineering.purdue.edu/ChanGroup/ECE595/files/chapter3.pdf

https://medium.com/@iambibek/explanation-of-the-carlini-wagner-c-w-attack-algorithm-to-generate-adversarial-examples-6c1db8669fa2

https://arxiv.org/abs/1608.04644

## JSMA Attack and Evaluation
We worked with JSMA attack on a convolutional neural network (CNN). We used the following values for gamma 

gamma: 0.30, 0.40, 0.50, 0.60, 0.70

By using notebooks/Task1_GenerateAEs_ZeroKnowledgeModel.ipynb:
* src/practice task1/at.json
* src/practice task1/md.json
* src/practice task1/dt.json
We can get  AEs at:
* 215272.937.npy
* 215373.062.npy
* 215373.156.npy
* 215373.25.npy
* 215373.343.npy

### Undefended Model Results
We can see  that the error rate was reduced  for the UM and it reduced five times, where Y predicted shape is 10. Moreover, Ensemble and JSMA are also zero. Furthermore, at 215272.937.npy, 215373.062.npy, 215373.156.npy, 
215373.25.npy, 215373.343.npy, the value of UM, Ensemble and JSMA are zero.



## PGD Attack and Evaluation 
- The PGD attack is trained on a subset of the MNIST dataset using five variations of the attack. The variables altered were the epsilon between the number 0 and 1. 

## Files Used
Configs: 	
* athena-mnist.json
* attack-zk-mnist.json
* data-mnist.json
* model-mnist.json
* ./results/sample.json

Sub-samples:
* sublabels-10-ratio_0.001-5.142948666.npy
* subsamples-10-ratio_0.001-5.142948666.npy

AE’s:
* pgd-eps0.1.npy
* pgd-eps0.8.npy
* pgd-eps0.7.npy
* pgd-eps0.5.npy
* pgd-eps0.3npy

*AEs located in ./results folder

## Summary and Analysis 

The epsilon showed a consistent error rate of each variation between the range of 0.7 and 1.0, indicating the number of inputs that fools the model. So, the higher the number closest to 1.0 the better.  

## Data:

### Ensemble and PGD-ADT Results

| PGD Error Rate |           |            |            |                  |
|------------------------|-------------|-------------|---------------------------------------------|---------------------------------------|
|                    |   UM     |   Ensemble   |   PGD-ADT  |
|    Epsilon=0.3    |    0.8   |     0.9      |    1.0     |
|    Epsilon=0.5    |    0.7   |     0.9      |    0.8     |
|    Epsilon=0.1    |    0.9   |     1.0      |    1.0     |
|    Epsilon=0.7    |    0.7   |     0.9      |    0.7     |
|    Epsilon=0.8    |    0.7   |     0.9      |    0.7     |    