Minerva - Task 1 Report
===

---

### Abstract

In Task One, we generated adversarial examples (AE) in the context of the zero-knowledge threat models. Using the Carlini - Wagner (CW), the Basic Iterative Method (BIM), and the Projected Gradient Descent (PGD) attacks, we crafted 18,000 adversarial examples that were derived from 18 attacks on 1000 adversarial examples. Our experiment has shown that a PGD attack is the most effective in deceiving an undefended model (UM). While PGD used miniscule epsilon values, a CW attack is most effective against an UM with a constant epsilon causing the error rate to increase gradually and a BIM attack is the most effective against an UM as the epsilon values increase. We have concluded that while BIM and CW attacks fool the UM with a 100% success rate, the PGD attack exhibited the highest rate of error.

### Approach

For this task, we decided to use increasing strength variants of the Carlini - Wagner attack (CW), the Basic Iterative Method (BIM) attack, and the Projected Gradient Descent (PGD) attack. In total, 18 attacks were used on a sample size of 1000 adversarial examples (AE), effectively creating 18,000 adversarial examples. These AE's were then tested for robustness on an undefended model, the Athena Framework, and the PGD mode.  

### Experimental Settings  

#### CW Attacks
The CW Attacks all used the Linf configuration. This is our first experience with these types of attacks, therefore, we have decided to hold one variable as a constant while gradually increasing the strength of the other variable.  

The specific values used:
* epsilon constant: .10, lrs: .2, .4, .6, .8
* lr constant: .10, epsilons: .2, .4, .6, .8

#### BIM Attacks
The BIM attacks are simply varied by their epsilon strength. Initially, it was found to be incrementing by 20%, so that the CW attacks and the BIM attacks easily fooled the undefended model with a 100% success rate. Therefore, lower epsilons were implemented to provide an increasing rate of effectiveness in deception, while anticipating a similar increase in adversarial robustness when the AE's were introduced into the Athena and PGD models.

The specific values used:
* epsilons: .01, .05, .10, .15, .20

#### PGD Attacks
Similar to the BIM attacks, the values of the epsilon chosen for the PGD attacks were also chosen at low incremental values. These attacks seemed to fool the undefended model much easier than the other two attacks and as such, the epsilons chosen were extremely small.

The specific values used:
* epsilons: .025, .05, .075, .10, .15

### Creating the AE's
In order to create the AE's the json "attack-zk-mnist.json" was created with the variables for all 18 attacks. This was then loaded into craft_adversarial_examples.py to create 1000 images per attack. The AE's were saved to output directory, /results, and the files were named per the descriptions relevant to that individual configuration.  

model = ../configs/experiment/model-mnist.json  
data = ../configs/experiment/data-mnist.json  
labels also retrieved from the data-mnist.json  
attack_configs = ../configs/experiment/attack-zk-mnist.json  

In [6]:
def generate_ae(model, data, labels, attack_configs, save=True, output_dir="../../results"):
    """
    Generate adversarial examples
    :param model: WeakDefense. The targeted model.
    :param data: array. The benign samples to generate adversarial for.
    :param labels: array or list. The true labels.
    :param attack_configs: dictionary. Attacks and corresponding settings.
    :param save: boolean. True, if save the adversarial examples.
    :param output_dir: str or path. Location to save the adversarial examples.
        It cannot be None when save is True.
    :return:
    """
    img_rows, img_cols = data.shape[1], data.shape[2]
    num_attacks = attack_configs.get("num_attacks")
    data_loader = (data, labels)

    if len(labels.shape) > 1:
        labels = np.array([np.argmax(p) for p in labels])
        # might have to convert this to an array

    # generate attacks one by one
    for id in range(num_attacks):
        key = "configs{}".format(id)
        config = attack_configs.get(key)
        data_adv = generate(model=model,
                            data_loader=data_loader,
                            attack_args=attack_configs.get(key)
                            )

        # predict the adversarial examples
        predictions = model.predict(data_adv)
        predictions = np.array([np.argmax(p) for p in predictions])

        error_rate = metrics.error_rate_single(predictions, labels)
        print(config.get('description') + ' Error Rate: ' + str(error_rate))


        # # plotting some examples
        num_plotting = min(data.shape[0], 3)
        for i in range(num_plotting):
            img = data_adv[i].reshape((img_rows, img_cols))
            plt.imshow(img, cmap='gray')
            title = '{}: {}->{}'.format(attack_configs.get(key).get("description"),
                                        labels[i],
                                        predictions[i]
                                        )
            plt.title(title)
            plt.show()
            plt.close()

        # save the adversarial example
        if save:
            if output_dir is None:
                raise ValueError("Cannot save images to a none path.")
            # save with a random name
            file = os.path.join(output_dir, "minerva_AE-{}.npy".format(config.get('description')))
            print("Save the adversarial examples to file [{}].".format(file))
            np.save(file, data_adv)

### Testing the AE's against the Models
Once the AE's had been created and saved, the AE file names were included in the data-mnist.json in order to evaluate the effectiveness of the attacks against three distinct models.

1. Undefended Model  
    - The undefended model is a blank model that has no adversarial robustness training. It can identify the numbers 0-9 effectively, given the pictures do not have an adversarial attack transformations.
2. The Athena Framework  
    - Our Athena framework consisted of the config1 - config20 weak defences integrated into an ensemble. There was no logical reasoning for using these defences, they were just chosen to have a decently large selection from the available weak defences.
3. A PGT-ADT Trained Model  
    - This was the baseline defence model used to compare against the robustness of the Undefended model and the Athena ensemble.
    
The robustness results were calculated using the model's guess compared to the true labels of the AE. These results were then output to a file for data collection and analysis.

In [7]:
def evaluate(trans_configs, model_configs,
             data_configs, save=True, output_dir=None):
    """
    Apply transformation(s) on images.
    :param trans_configs: dictionary. The collection of the parameterized transformations to test.
        in the form of
        { configsx: {
            param: value,
            }
        }
        The key of a configuration is 'configs'x, where 'x' is the id of corresponding weak defense.
    :param model_configs:  dictionary. Defines model related information.
        Such as, location, the undefended model, the file format, etc.
    :param data_configs: dictionary. Defines data related information.
        Such as, location, the file for the true labels, the file for the benign samples,
        the files for the adversarial examples, etc.
    :param save: boolean. Save the transformed sample or not.
    :param output_dir: path or str. The location to store the transformed samples.
        It cannot be None when save is True.
    :return:
    """
    # Load the baseline defense (PGD-ADT model)
    baseline = load_lenet(file=model_configs.get('pgd_trained'), trans_configs=None,
                                  use_logits=False, wrap=False)

    # get the undefended model (UM)
    file = os.path.join(model_configs.get('dir'), model_configs.get('um_file'))
    undefended = load_lenet(file=file,
                            trans_configs=trans_configs.get('configs0'),
                            wrap=True)
    print(">>> um:", type(undefended))

    # load weak defenses into a pool
    pool, _ = load_pool(trans_configs=trans_configs,
                        model_configs=model_configs,
                        active_list=True,
                        wrap=True)
    # create an AVEP ensemble from the WD pool
    wds = list(pool.values())
    print(">>> wds:", type(wds), type(wds[0]))
    ensemble = Ensemble(classifiers=wds, strategy=ENSEMBLE_STRATEGY.AVEP.value)

    # load the benign samples
    bs_file = os.path.join(data_configs.get('dir'), data_configs.get('bs_file'))
    x_bs = np.load(bs_file)
    img_rows, img_cols = x_bs.shape[1], x_bs.shape[2]

    # load the corresponding true labels, take just the first 1000
    label_file = os.path.join(data_configs.get('dir'), data_configs.get('label_file'))
    labels = np.load(label_file)
    labels = labels[:1000]

    # get indices of benign samples that are correctly classified by the targeted model
    print(">>> Evaluating UM on [{}], it may take a while...".format(bs_file))
    pred_bs = undefended.predict(x_bs)
    corrections = get_corrections(y_pred=pred_bs, y_true=labels)

    if save:
        if output_dir is None:
            raise ValueError("Cannot save to a none path.")
        # save with a random name
        f = os.path.join(output_dir, "minerva_AE-results.txt")
        out_file = open(f, 'w')

    # Evaluate AEs.
    ae_list = data_configs.get('ae_files')
    for _ in range(len(ae_list)):
        results = {}
        ae_file = os.path.join(data_configs.get('dir'), ae_list[_])
        print(ae_list[_])
        print(ae_file)
        x_adv = np.load(ae_file)

        # evaluate the undefended model on the AE
        print(">>> Evaluating UM on [{}], it may take a while...".format(ae_file))
        pred_adv_um = undefended.predict(x_adv)
        err_um = error_rate(y_pred=pred_adv_um, y_true=labels, correct_on_bs=corrections)
        # track the result
        results['UM'] = err_um

        # evaluate the ensemble on the AE
        print(">>> Evaluating ensemble on [{}], it may take a while...".format(ae_file))
        pred_adv_ens = ensemble.predict(x_adv)
        err_ens = error_rate(y_pred=pred_adv_ens, y_true=labels, correct_on_bs=corrections)
        # track the result
        results['Ensemble'] = err_ens

        # evaluate the baseline on the AE
        print(">>> Evaluating baseline model on [{}], it may take a while...".format(ae_file))
        pred_adv_bl = baseline.predict(x_adv)
        err_bl = error_rate(y_pred=pred_adv_bl, y_true=labels, correct_on_bs=corrections)
        # track the result
        results['PGD-ADT'] = err_bl

        out_file.write(">>> Evaluations on [{}]:\n{}\n".format(ae_file, results))


### Data

[comment]: <> (TODO-DS: Must finish descriptions for all the images)

As previously stated, our approach for the CW attacks consists of making one of the variables remain constant while we gradually increase the value of the other variables. The graphs below illustrate the resulting differences using this method.


![CW-2-Const-Eps](Img/CW-2-Const-Eps.png)
<div style="text-align: center">
    <em>Figure 1.0  CW Attack effectiveness on an Ensemble vs. PGD with a constant epsilon.</em>
</div>

<br>

![CW-2-Const-Lw](Img/CW-2-Const-Lw.png)
<div style="text-align: center">
    <em>Figure 1.1  CW Attack effectiveness on an Ensemble vs. PGD with a constant Lw.</em>
</div>

<br>

![CW-All-Const-Esp](Img/CW-All-Const-Eps.png)
<div style="text-align: center">
    <em>Figure 1.2  CW Attack effectiveness on an UM vs. Ensemble vs PGD with a constant epsilon.</em>
</div>

<br>

![CW-All-Const-Lw](Img/CW-All-Const-Lw.png)
<div style="text-align: center">
    <em>Figure 1.3  CW Attack effectiveness on an UM vs. Ensemble vs PGD with a constant Lw.</em>
</div>

<br>

The line graph below describes the effectiveness of the BIM attack on a Ensemble model verses PGD model. As the epsilon is increased, the error rate of the Ensemble increases faster than the error rate of the PGD model.

![BIM-2](Img/BIM-2.png)
<div style="text-align: center">
    <em>Figure 1.4  BIM Attack effectiveness on an Ensemble vs. PGD.</em>
</div>

<br>

The line graph below illistruates the effectiveness of the BIM attack on an UM verses Ensemble model verses PGD model. As the epsilon is increased, the error rate of the UM increases faster than the error rate of the Ensemble and PGD models.

![BIM-All](Img/BIM-All.png)
<div style="text-align: center">
    <em>Figure 1.5  BIM Attack effectiveness on an UM vs. Ensemble vs. PGD.</em>
</div>

<br>

![PGD-2](Img/PGD-2.png)
<div style="text-align: center">
    <em>Figure 1.6  PGD Attack effectiveness on an Ensemble vs. PGD.</em>
</div>

<br>

![PGD-All](Img/PGD-All.png)
<div style="text-align: center">
    <em>Figure 1.7  PGD Attack effectiveness on an UM vs. Ensemble vs. PGD.</em>
</div>

<br>

#### The raw data:

\>>> Evaluations on [../../data/minerva/minerva_AE-BIM-eps0.01.npy]:  
{'UM': 0.010111223458038422, 'Ensemble': 0.003033367037411527, 'PGD-ADT': 0.008088978766430738}  
\>>> Evaluations on [../../data/minerva/minerva_AE-BIM-eps0.05.npy]:  
{'UM': 0.22143579373104147, 'Ensemble': 0.004044489383215369, 'PGD-ADT': 0.015166835187057633}  
\>>> Evaluations on [../../data/minerva/minerva_AE-BIM-eps0.1.npy]:  
{'UM': 0.9120323559150657, 'Ensemble': 0.023255813953488372, 'PGD-ADT': 0.032355915065722954}  
\>>> Evaluations on [../../data/minerva/minerva_AE-BIM-eps0.15.npy]:  
{'UM': 0.9888776541961577, 'Ensemble': 0.06471183013144591, 'PGD-ADT': 0.05864509605662285}  
\>>> Evaluations on [../../data/minerva/minerva_AE-BIM-eps0.2.npy]:  
{'UM': 0.9888776541961577, 'Ensemble': 0.1506572295247725, 'PGD-ADT': 0.1102123356926188}  
\>>> Evaluations on [../../data/minerva/minerva_AE-CW-lw0.1-eps0.2.npy]:  
{'UM': 0.1263902932254803, 'Ensemble': 0.003033367037411527, 'PGD-ADT': 0.014155712841253791}  
\>>> Evaluations on [../../data/minerva/minerva_AE-CW-lw0.1-eps0.4.npy]:  
{'UM': 0.13751263902932254, 'Ensemble': 0.003033367037411527, 'PGD-ADT': 0.014155712841253791}  
\>>> Evaluations on [../../data/minerva/minerva_AE-CW-lw0.1-eps0.6.npy]:  
{'UM': 0.13751263902932254, 'Ensemble': 0.003033367037411527, 'PGD-ADT': 0.014155712841253791}  
\>>> Evaluations on [../../data/minerva/minerva_AE-CW-lw0.1-eps0.8.npy]:  
{'UM': 0.1486349848331648, 'Ensemble': 0.003033367037411527, 'PGD-ADT': 0.01314459049544995}  
\>>> Evaluations on [../../data/minerva/minerva_AE-CW-lw0.2-eps0.1.npy]:  
{'UM': 0.4580384226491405, 'Ensemble': 0.003033367037411527, 'PGD-ADT': 0.03538928210313448}  
\>>> Evaluations on [../../data/minerva/minerva_AE-CW-lw0.4-eps0.1.npy]:  
{'UM': 0.8574317492416582, 'Ensemble': 0.004044489383215369, 'PGD-ADT': 0.07785642062689585}  
\>>> Evaluations on [../../data/minerva/minerva_AE-CW-lw0.6-eps0.1.npy]:  
{'UM': 0.9595551061678463, 'Ensemble': 0.019211324570273004, 'PGD-ADT': 0.1243680485338726}  
\>>> Evaluations on [../../data/minerva/minerva_AE-CW-lw0.8-eps0.1.npy]:  
{'UM': 0.9686552072800809, 'Ensemble': 0.033367037411526794, 'PGD-ADT': 0.16481294236602628}  
\>>> Evaluations on [../../data/minerva/minerva_AE-PGD-eps0.1.npy]:  
{'UM': 0.6723963599595552, 'Ensemble': 0.017189079878665317, 'PGD-ADT': 0.028311425682507583}  
\>>> Evaluations on [../../data/minerva/minerva_AE-PGD-eps0.05.npy]:  
{'UM': 0.1486349848331648, 'Ensemble': 0.004044489383215369, 'PGD-ADT': 0.014155712841253791}  
\>>> Evaluations on [../../data/minerva/minerva_AE-PGD-eps0.15.npy]:  
{'UM': 0.9332659251769464, 'Ensemble': 0.04044489383215369, 'PGD-ADT': 0.04954499494438827}  
\>>> Evaluations on [../../data/minerva/minerva_AE-PGD-eps0.025.npy]:  
{'UM': 0.033367037411526794, 'Ensemble': 0.004044489383215369, 'PGD-ADT': 0.011122345803842264}  
\>>> Evaluations on [../../data/minerva/minerva_AE-PGD-eps0.075.npy]:  
{'UM': 0.3923154701718908, 'Ensemble': 0.007077856420626896, 'PGD-ADT': 0.022244691607684528}  


### Conclusion

### Citation