# Team Ares -- Task 2 Report -- Fall 2020
## Contributions:
### Cody Shearer
- Created approach for task 2 and broke it down into managable steps.
- Created script for generating training/testing data 
- Generated training/testing data
- Wrote report section on generating training/testing data
- Managed team repository.
- Organized team meetings.

### Zhymir Thompson
### Mahmudul Hasan
### Vincent Davidson

___

## Informal Breakdown of Our Approach
I've broken task 2 down into three steps. The first is to create some training data, from which we can learn an ensemble strategy. The second is to select a learning model, then train and test variations of that model. The third is to summarize our results and approach in the report. Throughout the entire process and for each step you complete, write down a summary of what you did and what you learned (where necessary). This will save us a lot of time when writing the report. Also, keep a high-level list of contributions you make to the task; it is your responsiblity to make sure you get credit for your work.

1. generate training/test data from Athena as (X,Y), where:
  1. X is a set of predictions from athena
    1. I'm thinking we use the adversarial examples provided in the /data/ folder. This will require understanding how the AEs are stored in the .npy files. 
    2. Once we understand how to use the AEs from the .npy files, we should select a subset of them, ensuring we do not introduce bias into our model by using too many of any one type of AE. We may even decide to include some benign samples in the mix of training/testing data. Whatever we use, let's *be sure to use equal amounts of AEs from each attack type*.
  2. Y is a set of true labels that matches with our selected AEs
    1. We should find out how to get predictions from a weak defense(WD) (see below table for an example of what I mean).
    2. We then need to select a set of WDs (or just use all of 72 of them) to create training data

Once we are done with this first part, we should have data that looks something like this table

| AE_id | WD1                   | WD2                   | ... | Y   |
|-------|-----------------------|-----------------------|-----|-----|
| 1     | [0.1,0.4,0,0,...,0.1] | [0.1,0,0.8,0,...,0.1] | ... | 3   |
| 2     | [0.6,0,0,0.3,...,0.1] | [0,0,0.3,0.1,...,0.2] | ... | 1   |
| ...   | ....                  | ...                   | ... | ... |
| n     | [0.1,0.4,0,0,...,0.1] | [0.1,0,0.8,0,...,0.1] | ... | 9   |

where:
- where AE_id is just an index we associate with the training sample
- WD# is the nth weak defense we choose to use
- predictions from the weak defenses (e.g [0.1,0.4,0,0,...,0.1]) are a probability distribution from 0 through 9, where the WD assigns a probability (makes a prediction) that a particular number is the true label.
- Y is the true (correct) label for that AE or benign sample

2. Learn an ensemble strategy
  1. We need to select a ML model for learning. Logistic regression is the most basic form of categorical predicion model, but we could also use some type of decision tree. Whatever we use, let's stick to that one model.
  2. We should then decide what metrics we wish to track during training and find out how to do this with Keras. We must at the very least track model accuracy (loss) over time.
    1. Run through [these tutorials](https://www.tensorflow.org/tutorials) (no setup required, tutorials can be run in google collab).
    2. Once you've finished some of the basic tutorials (it's up to you to figure out how much you will need), figure out how to track model loss (accuracy) over each training iteration for one of the example models you create. Figure out how to save this and any other metrics you use to a csv file, where each row is a training step and each column is a metric (like model loss). In addition, figure out how to plot the data in the csv.
  2. Whatever model we select, train a few (3 to 5) variations of the ML model by changing the hyper-parameters that we could use for that model. Note that this selection of variations won't occur simultaneously. Note also that we should separate the data we generated in step 1 into a training and a testing set. How we separate this (e.g. 80% training 20% testing) might depend on the model we choose.
    1. Select a set of hyper-parameters
    2. Train the model with that selection of hyper-parameters, tracking the model loss (and any other selected metrics) over each training iteration. You will store these metrics in a csv file, where each row is a training step and each column is a metric (like model loss).
    3. As you train the model, print the model loss after every iteration. A succesful hyper-parameter selection will result in the model loss decreasing after each training step. If it doesn't decrease or stops decreasing, stop the training. Either the selection of hyper-parameters is bad and your loss won't converge, or your model has already converged and further training won't help. 
    3. After your model has converged, plot the resulting metrics.
    4. Save the plots, the learned weights/training parameters, and the metrics you tracked during training. Please name these intelligently (include the name of the model, and any relevant hyper-parameters in the file name), any member of the group should be able to understand what the contents of the file are without opening them.
    5. Place the files in their respective folders in Task2/. If a file doesn't appear to fit in any folder, ask in the discord chat where to put it.
    6. Repeat the above steps another 2 to 4 times, [adjusting the hyper-parameters you select](https://towardsdatascience.com/guide-to-choosing-hyperparameters-for-your-neural-networks-38244e87dafe). Selection of these hyper-parameters is uncertain, so you will need to use any knowledge and experience you have to adjust these, based on the results you get. 
3. Write the report. This should detail your approaches, results, what you learned, conclusions, etc. Imagine telling yourself what you wish you knew before starting the task. If you kept a summary of what you did and what you learned (as I suggested from the beginning) this should be easy.
  1. Introduce the approaches that are used in the task.
  2. Experimental settings --- the values of the tunable parameters for each variant.
  3. Evaluation and necessary analysis.
  4. Contribution of individual team members.
  5. Citations to all related works.
  
## Generating Training and Testing Data
We collected raw predictions as logits from 19 sets of 10k MNIST images run through 16 weak defenses using `src/scripts/cody_scripts/generate_test_data.py` and saved them at . 
  
While the original plan was to collect raw predicitons as logits from 73 models (the original CNN plus the 72 weak defenses from `/src/configs/demo/athena-mnist.json`) and all 45 sets of images (the original MNIST dataset, along with 44 adversarial examples from `/src/configs/demo/data-mnist.json`), the predictions from one model for one set of 10k images comes to ~35MB, so 73 models x 45 MNIST sets x 35MB = 114.975GB. This wasn't feasible, so we selected a representative sample of weak defenses and adversarial examples, where we have 16 models x 19 MNIST sets x 35MB = 10.64GB. This was further compressed with numpy's `savez_compressed` function within the `src/scripts/cody_scripts/generate_test_data.py` script, bringing the saved predictions at `Task2/data/predictions.npz` down to ~126MB. As github has a maximum 100MB limit on large files, we use 7-zip to store the predicitions in 3 parts.
  
### Weak Defenses
Our selection of weak defenses used to generate the predictions includes one model for each "type" of weak defense found in `/src/configs/task2/cody_configs/athena-mnist.json` (also available as a set of "active_wds"): 
  - model-mnist-cnn-clean.h5
  - model-mnist-cnn-rotate90.h5
  - model-mnist-cnn-shift_left.h5
  - model-mnist-cnn-flip_horizontal.h5
  - model-mnist-cnn-affine_vertical_compress.h5
  - model-mnist-cnn-morph_erosion.h5
  - model-mnist-cnn-augment_samplewise_std_norm.h5
  - model-mnist-cnn-cartoon_mean_type1.h5
  - model-mnist-cnn-quant_4_clusters.h5
  - model-mnist-cnn-distort_x.h5
  - model-mnist-cnn-noise_gaussian.h5
  - model-mnist-cnn-filter_sobel.h5
  - model-mnist-cnn-compress_jpeg_quality_80.h5
  - model-mnist-cnn-denoise_tv_chambolle.h5
  - model-mnist-cnn-geo_swirl.h5
  - model-mnist-cnn-seg_gradient.h5
  
### Adversarial (and clean) Examples
Our selection of samples includes the original MNIST dataset, along with two sets of samples per "type" of transformation found in `/src/configs/demo/data-mnist.json`. For each transformation, we select the two sets of AEs that are the most different. You can find the following at `/src/configs/task2/cody_configs/data-mnist.json`:
  - test_BS-mnist-clean.npy
  - test_AE-mnist-cnn-clean-fgsm_eps0.1.npy
  - test_AE-mnist-cnn-clean-fgsm_eps0.3.npy
  - test_AE-mnist-cnn-clean-bim_ord2_eps0.75.npy
  - test_AE-mnist-cnn-clean-bim_ord2_eps1.2.npy
  - test_AE-mnist-cnn-clean-bim_ordinf_eps0.075.npy
  - test_AE-mnist-cnn-clean-bim_ordinf_eps0.12.npy
  - test_AE-mnist-cnn-clean-cw_l2_lr0.0098.npy
  - test_AE-mnist-cnn-clean-cw_l2_lr0.018.npy
  - test_AE-mnist-cnn-clean-deepfool_l2_overshoot3.npy
  - test_AE-mnist-cnn-clean-deepfool_l2_overshoot50.npy
  - test_AE-mnist-cnn-clean-jsma_theta0.15_gamma0.5.npy
  - test_AE-mnist-cnn-clean-jsma_theta0.25_gamma0.5.npy
  - test_AE-mnist-cnn-clean-pgd_eps0.075.npy
  - test_AE-mnist-cnn-clean-pgd_eps0.11.npy
  - test_AE-mnist-cnn-clean-mim_eps0.06.npy
  - test_AE-mnist-cnn-clean-mim_eps0.1.npy
  - test_AE-mnist-cnn-clean-onepixel_pxCount15.npy
  - test_AE-mnist-cnn-clean-onepixel_pxCount75.npy