Cody has a great [breakdown and plan](https://github.com/csce585-mlsystems/project-athena/issues/25) for task 2 option 2.

I will not provide full toturial for this option, instead, I provide the usage of some APIs that may help.

## To collect raw predictions of all weak defenses in the ensemble on the input(s)
raw predictions is a 3-D array of probabilities that each weak defense produces given an input (or inputs).

Let $D = \{x1, x2\}$ be the test dataset (sampled from ``MNIST``), ``athena`` be an ensemble consists of $3$ weak defenses. We collect the raw predictions on $D$ by
```python
raw_predictions = athena.predict(D, raw=True)
```
You will get an array with a shape of ``(3, 2, 10)``, referring to the probabilities of the $2$ samples ($x1$ and $x2$) produced by the $3$ weak defenses in the ensemble respectively.


### Example of collecting raw predictions.
* python tutorial: ``tutorial/collect_raws.py``
* api for raw collection: ``models.athena.predict`` (**with ``raw=True``**)

----
You might need to update ``athena.py`` or/and ``keraswrapper.py`` (for ``cnn`` weak defenses) to get the outputs from hidden layers (usually, converlutional layers).

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('../../src'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
import numpy as np
import os
from matplotlib import pyplot as plt

from utils.model import load_pool, load_lenet
from utils.file import load_from_json
from utils.metrics import error_rate, get_corrections
from models.athena import Ensemble, ENSEMBLE_STRATEGY

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [5]:
# copied from tutorial/collect_raws.py
def collect_raw_prediction(trans_configs, model_configs, data_configs, use_logits=False):
    """
    Collect raw predictions.
    :param trans_configs:
    :param model_configs:
    :param data_configs:
    :param use_logits: Boolean. If True, the model will return logits value (before ``softmax``),
                    return probabilities, otherwise.
    :return:
    """
    # load the pool and create the ensemble
    # tiny pool: 3 weak defenses
    pool, _ = load_pool(trans_configs=trans_configs,
                        model_configs=model_configs,
                        active_list=True,
                        use_logits=use_logits,
                        wrap=True
                        )
    athena = Ensemble(classifiers=list(pool.values()),
                      strategy=ENSEMBLE_STRATEGY.MV.value)

    # load test data
    # load the benign samples
    bs_file = os.path.join(data_configs.get('dir'), data_configs.get('bs_file'))
    x_bs = np.load(bs_file)
    # test with a small subset
    x_bs = x_bs[:2]

    # collect raw predictions
    raw_preds = athena.predict(x=x_bs, raw=True)
    print(">>> Shape of raw predictions ({}): {}\n{}".format("logits" if use_logits else "probability",
                                                             raw_preds.shape,
                                                             raw_preds))
    print()
    
    # transpose
    transposed_raw = np.transpose(raw_preds, (1, 0, 2))
    print(">>> Shape of transposed raw ({}): {}\n{}".format("logits" if use_logits else "probability",
                                                             transposed_raw.shape,
                                                             transposed_raw))
    print()
    
    

    # get the final predictions
    preds = athena.predict(x=x_bs) # raw is False by default
    print(">>> Shape of predictions ({}): {}\n{}".format("logits" if use_logits else "probability",
                                                         preds.shape,
                                                         preds))


In [6]:
# load experiment configurations
trans_configs = load_from_json("../../src/configs/demo/athena-mnist.json")
model_configs = load_from_json("../../src/configs/demo/model-mnist.json")
data_configs = load_from_json("../../src/configs/demo/data-mnist.json")

output_dir = "../../results"

# collect the probabilities
collect_raw_prediction(trans_configs=trans_configs,
                           model_configs=model_configs,
                           data_configs=data_configs)

>>> Loading model [../../models/cnn/model-mnist-cnn-shift_bottom_left.h5]...
>>> Loading model [../../models/cnn/model-mnist-cnn-affine_both_stretch.h5]...
>>> Loading model [../../models/cnn/model-mnist-cnn-cartoon_mean_type3.h5]...
>>> Loaded 3 models.
>>> Shape of raw predictions (probability): (3, 2, 10)
[[[4.91153728e-03 3.37717636e-03 4.85863257e-03 3.21264984e-03
   1.87736889e-03 3.51971108e-03 3.51327541e-03 3.00724864e-01
   4.19280957e-03 3.14533897e-03]
  [1.34512801e-02 1.35943014e-02 2.52675742e-01 1.03361532e-02
   5.35096182e-03 2.74821534e-03 1.44981239e-02 3.24912835e-03
   4.89710597e-03 1.25322994e-02]]

 [[3.12041165e-03 8.26350297e-04 4.37834812e-03 2.61355238e-03
   1.68064586e-03 1.47926807e-03 1.45221502e-03 3.12530667e-01
   4.06373199e-03 1.18817599e-03]
  [8.00706167e-03 9.06579604e-04 3.10224593e-01 2.13464978e-03
   1.18756830e-03 3.38755315e-04 6.32811291e-03 1.67038688e-03
   1.04902731e-03 1.48661179e-03]]

 [[8.87085323e-38 5.42056376e-35 3.27030522e-2

### Explanation of the outputs

Let ``ensemble`` be the ensemble model (an instance of ``Ensemble`` class), built with $3$ weak defenses (``wd_0``, ``wd_1``, and ``wd_2``). We evaluate ``ensemble`` on data **X** which consists of $2$ samples (``x0`` and ``x1``).

So, the meanings of each outputs are:

#### Shape of raw predictions: (3, 2, 10)

```python
[
    # predicted probabilities from the 0-th weak defense. i.e., wd_0.predict(X)
    [[4.91153728e-03 3.37717636e-03 4.85863257e-03 3.21264984e-03
      1.87736889e-03 3.51971108e-03 3.51327541e-03 3.00724864e-01
      4.19280957e-03 3.14533897e-03]   # <----------- wd_0.predict(x0)
     [1.34512801e-02 1.35943014e-02 2.52675742e-01 1.03361532e-02
      5.35096182e-03 2.74821534e-03 1.44981239e-02 3.24912835e-03
      4.89710597e-03 1.25322994e-02]]  # <----------- wd_0.predict(x1)

    # predicted probabilities from the 1st weak defense. i.e., wd_1.predict(X)
    [[3.12041165e-03 8.26350297e-04 4.37834812e-03 2.61355238e-03
      1.68064586e-03 1.47926807e-03 1.45221502e-03 3.12530667e-01
      4.06373199e-03 1.18817599e-03]   # <----------- wd_1.predict(x0)
     [8.00706167e-03 9.06579604e-04 3.10224593e-01 2.13464978e-03
      1.18756830e-03 3.38755315e-04 6.32811291e-03 1.67038688e-03
      1.04902731e-03 1.48661179e-03]]  # <----------- wd_1.predict(x1)

    # predicted probabilities from the 2nd weak defense. i.e., wd_2.predict(X)
    [[8.87085323e-38 5.42056376e-35 3.27030522e-25 1.17760702e-22
      0.00000000e+00 0.00000000e+00 0.00000000e+00 3.33333343e-01
      3.84718753e-35 1.24960316e-29]   # <----------- wd_2.predict(x0)
     [0.00000000e+00 0.00000000e+00 3.33333343e-01 0.00000000e+00
      0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
      0.00000000e+00 0.00000000e+00]]  # <----------- wd_2.predict(x1)
]
```

#### Shape of transposed raw (probability): (2, 3, 10)
```python
[
    # raw probabilities on the 0-th input
    [[4.91153728e-03 3.37717636e-03 4.85863257e-03 3.21264984e-03
      1.87736889e-03 3.51971108e-03 3.51327541e-03 3.00724864e-01
      4.19280957e-03 3.14533897e-03]  # <---------- wd_0.predict(x0)
     [3.12041165e-03 8.26350297e-04 4.37834812e-03 2.61355238e-03
      1.68064586e-03 1.47926807e-03 1.45221502e-03 3.12530667e-01
      4.06373199e-03 1.18817599e-03]  # <---------- wd_1.predict(x0)
     [8.87085323e-38 5.42056376e-35 3.27030522e-25 1.17760702e-22
      0.00000000e+00 0.00000000e+00 0.00000000e+00 3.33333343e-01
      3.84718753e-35 1.24960316e-29]] # <---------- wd_2.predict(x0)

    # raw probabilities on the 1st input
    [[1.34512801e-02 1.35943014e-02 2.52675742e-01 1.03361532e-02
      5.35096182e-03 2.74821534e-03 1.44981239e-02 3.24912835e-03
      4.89710597e-03 1.25322994e-02]  # <---------- wd_0.predict(x1)
     [8.00706167e-03 9.06579604e-04 3.10224593e-01 2.13464978e-03
      1.18756830e-03 3.38755315e-04 6.32811291e-03 1.67038688e-03
      1.04902731e-03 1.48661179e-03]  # <---------- wd_1.predict(x1)
     [0.00000000e+00 0.00000000e+00 3.33333343e-01 0.00000000e+00
      0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
      0.00000000e+00 0.00000000e+00]] # <---------- wd_2.predict(x1)
]
```


#### Shape of predictions: (2, 10); with the strategy applied
The ensemble is a black-box to the end-user. That is, the ensemble is treated as a single model that the end-user does not even aware s/he is querying from an ensemble.

```python
# predicted probabilities from the MV ensemble (as a whole)
[[0.00267732 0.00140118 0.00307899 0.00194207 0.001186   0.00166633
  0.00165516 0.3155296  0.00275218 0.0014445 ]   # <----------- ensemble.predict(x0)
 [0.00715278 0.00483363 0.29874456 0.00415693 0.00217951 0.00102899
  0.00694208 0.00163984 0.00198204 0.00467297]]  # <----------- ensemble.predict(x1)
```

### Example of collecting logits
* python tutorial: ``tutorial/collect_raws.py``
* api for use logits: ``utils.model.load_pool`` (**with ``use_logits=True``**)
* api for raw collection: ``models.athena.predict`` (**with ``raw=True``**)

In [7]:
# load experiment configurations
trans_configs = load_from_json("../../src/configs/demo/athena-mnist.json")
model_configs = load_from_json("../../src/configs/demo/model-mnist.json")
data_configs = load_from_json("../../src/configs/demo/data-mnist.json")

output_dir = "../../results"

collect_raw_prediction(trans_configs=trans_configs,
                       model_configs=model_configs,
                       data_configs=data_configs,
                       use_logits=True)

>>> Loading model [../../models/cnn/model-mnist-cnn-shift_bottom_left.h5]...


Keras model has no loss set. Classifier tries to use `k.sparse_categorical_crossentropy`.


>>> Loading model [../../models/cnn/model-mnist-cnn-affine_both_stretch.h5]...


Keras model has no loss set. Classifier tries to use `k.sparse_categorical_crossentropy`.


>>> Loading model [../../models/cnn/model-mnist-cnn-cartoon_mean_type3.h5]...


Keras model has no loss set. Classifier tries to use `k.sparse_categorical_crossentropy`.


>>> Loaded 3 models.
>>> Shape of raw predictions (logits): (3, 2, 10)
[[[-3.3882681e-02 -1.5873170e-01 -3.7492529e-02 -1.7537966e-01
   -3.5445449e-01 -1.4495197e-01 -1.4556202e-01  1.3376536e+00
   -8.6621314e-02 -1.8243766e-01]
  [ 7.5045615e-02  7.8571074e-02  1.0527232e+00 -1.2763297e-02
   -2.3222046e-01 -4.5432854e-01  1.0002728e-01 -3.9851689e-01
   -2.6176441e-01  5.1457264e-02]]

 [[-2.5570724e-02 -4.6847114e-01  8.7331377e-02 -8.4655598e-02
   -2.3183298e-01 -2.7437657e-01 -2.8052899e-01  1.5100086e+00
    6.2474936e-02 -3.4741923e-01]
  [ 3.4010625e-01 -3.8602722e-01  1.5590972e+00 -1.0056746e-01
   -2.9603252e-01 -7.1416086e-01  2.6166564e-01 -1.8231654e-01
   -3.3738062e-01 -2.2116846e-01]]

 [[-4.3223600e+00 -2.1839657e+00  5.3228803e+00  7.2849979e+00
   -1.4772465e+01 -7.0772219e+00 -1.8721497e+01  2.3749924e+01
   -2.2982502e+00  1.9320800e+00]
  [ 8.0103493e+00  7.8155155e+00  4.4603119e+01  4.9814862e-01
   -7.9236531e-01 -2.9296669e+01  4.0273676e+00 -1.0185078e+01