# Generating Adversarial Examples with a Zero-Knowledge Threat Model

Team Mars: Miles Ziemer, Max Corbel, Safi Shams Muhtasimul Hoque,Shuge Lei

## Introduction

In this task, we generated adversarial examples in the context of the zero-knowledge threat model using 3 different attack methods: FGSM(Fast Gradient Sign Method), PGD(Projected Gradient Descent) and DeepFool. For each type of attack, we generate adversarial examples with four degrees of pertubation (discussed later on). We also evaluated the generated adversarial examples on the undefended model, the Athena Ensemble, and PGD-ADT. The success rate of each adversarial example was reported in the evaluation section of the report.


## Approach

1. Prepare subset of data
2. Generate adversarial examples against undefended model
3. Test adversarial examples on undefended model, ensemble and baseline
4. Record error rates for each variation of AE.<br><br>
**All steps here are taken in the same python script:**<br>
'generate_ae_zk/craft_ae_zk.py'

## Prepare a smaller dataset 

The data set used was the benign samples provided with Athena. We subsample this data set with a ratio of .2, taking the 20% generated by the subsampling algorithm, so 2000 benign samples and their respective labels.

Files used to prepare dataset:

* Contains benign samples files: '../configs/experiment/data-mnist.json'
* Subsampling: 'generate_ae_zk/utils/data.py'

## Generate Adversarial Examples

The manner in which the adversarial examples was generated is key to the experiment. In order to ensure that each AE is evaluated fairly by each model, we run through each attack type specified in *attack_configs*. We then generate one by one, adversarial examples based on each variant of that attack type. This way, the models are predicting on the same variant on each pass through. As described before, the adversarial examples are generated using a zero-knowledge threat model, meaning we generate the adversarial examples on the undefended model. Following is a detailed description of each attack and variant used in the experiment.

### Fast Gradient Sign Method

The fast gradient sign method, FGSM, uses the loss gradient in the neural network to generate a new picture that maximizes the loss; using the opposite direction of the gradient. The mathematical model describing this is as follows:<br><br>
$$ x_{adv} = x + \epsilon * sign({\nabla}_{x}J(\theta,x,y)) $$<br><br>
The value $\epsilon$ is a parameter that controls the degree of pertubation. In our experiment we used:<br><br>
$\epsilon=0.05, 0.1, 0.15, 0.2$<br>
<table> <tr>
    <td> <img src="results/fgsm/fgsm_eps0.05.png" width="200"/> </td>
    <td> <img src="results/fgsm/fgsm_eps0.1.png" width="200"/> </td>
    <td> <img src="results/fgsm/fgsm_eps0.15.png" width="200"/> </td>
    <td> <img src="results/fgsm/fgsm_eps0.2.png" width="200"/> </td>
</tr> </table>


### Projected Gradient Descent

Projected Gradient Descent, PGD, generates an adversarial example by finding pertubations that maximize loss along the networks gradient. It starts at random values and descends to the maximum loss, repeating this process over the image. The mathematical model is as follows:<br><br>
$$x_{adv} = x + \mathcal{P}(\delta + \alpha{\nabla}_{\delta}\mathcal{l}(h_{\theta}(x+\delta),y))$$
<br><br>
PGD uses an upper limit on its pertubation sizes, $\epsilon$, which we adjusted in our experiment to the values of:<br><br>
$\epsilon=0.05, 0.1, 0.15, 0.2$<br>
<table> <tr>
    <td> <img src="results/pgd/pgd_eps0.05.png" width="200"/> </td>
    <td> <img src="results/pgd/pgd_eps0.1.png" width="200"/> </td>
    <td> <img src="results/pgd/pgd_eps0.15.png" width="200"/> </td>
    <td> <img src="results/pgd/pgd_eps0.2.png" width="200"/> </td>
</tr> </table>

### DeepFool

DeepFool is an algorithm that uses the most confident predictions of a model, and pertubes the image until the model misclassifies it. This allows DeepFool to find the minimum pertubation that will result in a successful adversarial example. The $\epsilon$ value determines how quickly that pertubation is found, in our experiment we use:<br><br>
$\epsilon=0.01,0.03,0.05,0.1$
<table> <tr>
    <td> <img src="results/deepfool/deepfool_eps0.01.png" width="200"/> </td>
    <td> <img src="results/deepfool/deepfool_eps0.03.png" width="200"/> </td>
    <td> <img src="results/deepfool/deepfool_eps0.05.png" width="200"/> </td>
    <td> <img src="results/deepfool/deepfool_eps0.1.png" width="200"/> </td>
</tr> </table><br>
Files used to generate adversarial examples:

* attacks: '../configs/experiment/attack-zk-mnist.json'
* generated examples: 'generate_ae_zk/examples/(fgsm,pgd,deepfool)'

## Evaluate the generated AE's

The adversarial examples were evaluated on the undefended model, Athena Ensemble and the Baseline PGD-ADT model. Each model made their evaluations on the same set of adversarial examples at the same time to ensure test efficacy. The ensemble used contained **the first 20 weak defenses implemented in Athena**. A picture was rendered as an example of each type/variant of attack, and the error rates of the models are recored on a per variant per attack type basis.

Files used to evaluate AE's:

* model configurations: '../configs/experiment/model-mnist.json'
* ensemble configuration: '../configs/experiment/athena-mnist.json'
* plots: 'generate_ae_zk/generate_plots.py'

## Results

<table> <tr>
    <td> <img src="results/fgsm/fgsm.png" width="500"/> </td>
    <td> <img src="results/pgd/pgd.png" width="500"/> </td>
    <td> <img src="results/deepfool/deepfool.png" width="500"/> </td>
</table> </tr>

## Discussion and Conclusion

As can be seen from the charts, overall the models performed worse with higher degrees of pertubation. FGSM has the largest degree of increase in error rate per increase in epsilon, whereas PGD has an initially higher error rate but does not increase as rapidly. This is likely due to the nature of FGSM and PGD as described above. DeepFool has the same predictions for all variants, which may mean that the epsilon value has no effect on the operation of deepfool. This would make sense considering deepfool finds the minimal pertubation no matter what. The ensemble used also had an effect on Athena's performance, as a different ensemble may have produced different results. Repeating this experiment would require using a different attack besides deepfool. Higher degrees of pertubation could also be used, or a larger dataset. It is clear from this data that we can use attack strategies that have zero knowledge of the model they attack to reduce the model's effectiveness, with higher levels of pertubation having a larger effect.

## Contributions

* Experiment planning: Max Corbel, Miles Ziemer, Shuge Lei, Safi Hoque
* Code: Miles Ziemer, Max Corbel
* Data gathering and presentation: Miles Ziemer
* Interpretation of results: Miles Ziemer, Max Corbel, Shuge Lei, Safi Hoque