# LSTM Time Series Deep Learning and Adversarial Attacks

## 1. Background Information

This project reproduces and expands upon work published in [1] and [2] on Long Short-Term Memory (LSTM) predictive models and adversarial attacks on those models.  The previous studies used Long Short-Term Memory (LSTM) time series classification models trained with data from the Medical Information Mart for Intensive Care (MIMIC-III) database to predict Intensive Care Unit (ICU) patient outcomes. Input features to the classification models consisted of 13 lab measurements and 6 vital signs. A binary variable representing in-hospital mortality was the prediction target.

In [1], an adversarial attack algorithm was used to identify small perturbations which, when applied to a real, correctly-classified input features, caused a trained model to misclassify the perturbed input. L1 regularization was applied to the adversarial attack loss function to favor adversarial examples with sparse perturbations that resemble the structure of data entry errors most likely to occur in real medical data. Samples were attacked serially (one a time), and the attack process on a sample was stopped upon finding a single adversarial perturbation to that samples input features. After attacking a full dataset, susceptibility calculations were  performed to identify input feature space regions most vulnerable to adversarial attack.

The current study follows an approach similar to that of the previous studies. We use the same dataset, input features, and prediction targets to train a LSTM binary classification model and subsequently search for adversarial examples using an L1 regularized attack algorithm. Aspects of the current work that expand upon the previous studies include a vectorized (faster) approach to data preprocessing, extensive hyperparameter tuning (of both the predictive model and attack algorithm), improved performance of the predictive model, implementation of a GPU-compatible attack algorithm that enables attacking samples in batches, and not halting the attack process upon finding a single adversarial perturbation for a sample (so that additional, lower loss adversarial perturbations can be discovered).

## 2. Development Environment Setup

### 2.1 Docker Container

The code and instructions in this notebook assume the development environment has been set up by completing all steps in the [How to run this project](https://github.com/duanegoodner/lstm_adversarial_attack/tree/main#3-how-to-run-this-project) section of the project [README](https://github.com/duanegoodner/lstm_adversarial_attack). If you have used the procedure described there to run this notebook inside a `lstm_aa_app` Docker container, then the output of the following code cell should be `PosixPath('/home/devspace/project/notebooks')`.

In [1]:
from pathlib import Path
Path.cwd()

PosixPath('/home/devspace/project/notebooks')

We can also check the contents of the container project root directory

In [2]:
!ls /home/devspace/project

README.md  data  docker  docs  notebooks  src


If `/home/devspace/project` is correctly mapped to your local project root, the above output should match the list of files in the local project root.

## 3. Project File Structure
All .py files are in `/home/devspace/project/src/lstm_adversarial_attack`. This directory contains four sub-packages responsible for different parts of the project data pipeline (`query_db`, `preprocess`, `tune_train`, `attack`, and `attack_analysis`). The code snippets in this notebook instantiate classes and call methods of files under the `src` directory. Look to the code and docstrings there for implementation details. 


Data files are under `/home/devspace/project/data/` in subdirectories with names that match the sub-package names.



## 4. Imports


### 4.1 Standard Library and External Packages
Most of the necessary standard library imports and external package imports are handled code in the `src` sub-packages, but we need to import a few things here.

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import pprint
import pandas as pd
import sys
import torch
from IPython.display import Markdown as md
from torch.utils.data import Dataset, random_split

### 4.2 Internal Project Sub-packages and Modules
To make it easier to understand what various internal packages and modules do, we will wait to import them until just before the notebook code cells where they are first used. For now, we import the `src` path defined in [`notebooks/src_paths.py`](./src_paths.py), and add it to `sys.path` (so we can easily import project code). We also import project config files.

In [4]:
import src_paths
sys.path.append(str(src_paths.lstm_adversarial_attack_pkg))
import lstm_adversarial_attack.config_paths as cfg_paths
import lstm_adversarial_attack.config_settings as cfg_set

### 5. Check for GPU

We won't need a GPU until we reach the HyperParameter Tuning section, but it is good to find out now we have a GPU that PyTorch can use. If we do not have one, we likely do not want to try to run the project.

In [5]:
import torch

if torch.cuda.is_available():
    cur_device = torch.device("cuda:0")
else:
    cur_device = torch.device("cpu")

print(f"cur_device is {cur_device}")

cur_device is cuda:0


## 5. Database Queries

### 5.1 `.sql` files
To obtain the necessary raw data, we will use modified versions of files (originally intended for Google Big Query) from https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/concepts/pivot. The paths to the `.sql` query files are stored as a list in variable [`config_paths.DB_QUERIES`](../src/lstm_adversarial_attack/config_paths.py).

In [6]:
pprint.pprint(cfg_paths.DB_QUERIES)

[PosixPath('/home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/icustay_detail.sql'),
 PosixPath('/home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_bg.sql'),
 PosixPath('/home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_lab.sql'),
 PosixPath('/home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_vital.sql')]


### 5.2 Connecting to Database and Executing Queries

To connect to the database, and execute the queries, we use the [\_\_main__](../src/lstm_adversarial_attack/query_db/__main__.py) module of the [query_db](../src/lstm_adversarial_attack/query_db/\_\_init__.py) sub-package.

In [7]:
import lstm_adversarial_attack.query_db.__main__ as query_db
query_db.main()

Query 1 of 4
Executing: /home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/icustay_detail.sql
Done. Query time = 0.48 seconds
Writing result to csv: /home/devspace/project/data/query_db/icustay_detail.csv
Done. csv write time = 0.47 seconds

Query 2 of 4
Executing: /home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_bg.sql
Done. Query time = 16.58 seconds
Writing result to csv: /home/devspace/project/data/query_db/pivoted_bg.csv
Done. csv write time = 3.37 seconds

Query 3 of 4
Executing: /home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_lab.sql
Done. Query time = 24.07 seconds
Writing result to csv: /home/devspace/project/data/query_db/pivoted_lab.csv
Done. csv write time = 5.40 seconds

Query 4 of 4
Executing: /home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_vital.sql
Done. Query time = 63.07 seconds
Writing result to csv: /home/devspace/project/data/query

[PosixPath('/home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/icustay_detail.sql'),
 PosixPath('/home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_bg.sql'),
 PosixPath('/home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_lab.sql'),
 PosixPath('/home/devspace/project/src/lstm_adversarial_attack/query_db/mimiciii_queries/pivoted_vital.sql')]

The results of each `.sql` query is saved to a `.csv` file. The path to each of these files is shown in the terminal output above. The output path of the queries is defined by variable `DB_OUTPUT_DIR` in [config_settings](../src/lstm_adversarial_attack/config_settings.py).

## 6. Preprocess

### 6.1 Implementation Details

We will use the [`preprocess`](../src/lstm_adversarial_attack/preprocess/__init__.py) sub-package's [`\_\_main__`](../src/lstm_adversarial_attack/preprocess/__main__.py) module to transform information from the `.csv` files output by the `.sql` queries into numpy arrays (which can then be asily converted into PyTorch tensors). Examining the code of [`preprocess.\_\_main__.main()`](../src/lstm_adversarial_attack/preprocess/__main__.py), we see that it instantiates a `Preprocessor` object. Looking at the implementation of [`Preprocessor`](../src/lstm_adversarial_attack/preprocess/preprocessor.py), we see that this class has a `.preprocess_modules` attribute assigned by the following code:

```
self.preprocess_modules = [
            prf.Prefilter(),
            imc.ICUStayMeasurementCombiner(),
            slb.FullAdmissionListBuilder(),
            fb.FeatureBuilder(),
            ff.FeatureFinalizer(),
        ]
```
Each element of the `.preprocess_modules` attribute is a subclass of [`PreprocessModule`](../src/lstm_adversarial_attack/preprocess/preprocess_module.py) and performs performs a portion the preprocessing tasks.

* [`Prefilter`](../src/lstm_adversarial_attack/preprocess/prefilter.py) reads the database query outputs into Pandas Dataframes, removes all data related to patients younger than 18 years in age, ensures consistent column naming formats, and takes care of datatype details.
* [`ICUStayMeasurementCombiner`](../src/lstm_adversarial_attack/preprocess/icustay_measurement_combiner.py) performs various joins (aka "merges" in the language of Pandas) to combine lab and vital sign measurement data with ICU stay data.
* [`FullAdmissionListBuilder`](../src/lstm_adversarial_attack/preprocess/sample_list_builder.py) generates a list consisting of one FullAdmissionData object per ICU stay. The attributes of a FullAdmissionData object include ICU stay info, and a dataframe containing the measurement and timestamp data for all vital sign and lab data associated with the ICU stay.
* [`FeatureBuilder`](../src/lstm_adversarial_attack/preprocess/feature_builder.py) resamples the time series datafame to one-hour intervals, imputes missing data, winsorizes measurement values (with cutoffs at the 5th and 95th global percentiles), and normalizes the measuremnt values so all data are between 0 and 1.
* [`FeatureFinalizer`](../src/lstm_adversarial_attack/preprocess/feature_finalizer.py) selects the data observation time window (default starts at hospital admission time and ends 48 hours after admission). This module outputs the entire dataset features as a list of numpy arrays, and the mortality labels as a list of integers. These data structures (saved as .pickle files) will be convenient starting points when the `tune_train` and `attack` sub-packages need to create PyTorch Datasets.

Files output by [`Prefilter`](../src/lstm_adversarial_attack/preprocess/prefilter.py), [`ICUStayMeasurementCombiner`](../src/lstm_adversarial_attack/preprocess/icustay_measurement_combiner.py), [`FullAdmissionListBuilder`](../src/lstm_adversarial_attack/preprocess/sample_list_builder.py), and [`FeatureBuilder`](../src/lstm_adversarial_attack/preprocess/feature_builder.py) are saved under subdirectories of `data/preprocess/checkpoints/`, and the output of [`FeatureFinalizer`](../src/lstm_adversarial_attack/preprocess/feature_finalizer.py) is saved in `data/preprocess/final_output/`.

### 6.2 Run the Preprocess Modules

We run the preprocess code using:

In [None]:
import lstm_adversarial_attack.preprocess.__main__ as preprocess
preprocess.main()

### 6.3 Performance

On an Intel i7-10750H 2.60GHz CPU, the above preprocessing work takes approximately 9.5 minutes. The same data transformations on the same machine with preprocessing code from [] approximately 100 minutes. This time difference is largely due to the fact that the current project preprocess subpackage avoids the use of `for` loops and relies heavily vectorized Pandas and Numpy operations.

Additional time reduction could be achieved by parellelizing the preprocess computations with tools such as [pandaparallel](https://github.com/nalepae/pandarallel) or [pyspark](https://spark.apache.org/docs/3.3.1/api/python/index.html).

## 7. Dataset

### 7.1 Create the Pytorch Dataset object
We import module `x19_mort_general_dataset` and use it along with files saved by the `preprocessor.feature_finalizer` to insantiate a Pytorch Dataset

In [None]:
import lstm_adversarial_attack.x19_mort_general_dataset as xmd
dataset = xmd.X19MGeneralDataset.from_feature_finalizer_output()

### 7.2 Examine the dataset

Next we instantiate a DatasetInspector from the `x19_mort_general_dataset` module and use its methods to display some basic information about the dataset.

In [None]:
dataset_inspector = xmd.DatasetInspector(dataset=dataset)
dataset_inspector.view_basic_info()
dataset_inspector.view_seq_length_summary()
dataset_inspector.view_label_summary()

Each item in the dataset is from a unique ICU stay. The input features for a single sample are represented by a 2D tensor where the column corresponds to a particular lab or vital sign measurement, and the row corresponds to time in hours after hospital admission. All samples' input feature tensors have the same number of columns, but the number of rows can vary from sample-to-sample. In LSTM lingo, the number of time steps assiciated with a sample is called the *sequence length*. In the current analysis, the Preprocessor removed all measuremens > 48 hours post-admission, so the maximum sequence length is 48. Samples ICU stays with < 48 hours of observations have smaller sequence lengths.

A class label of 1 corresponds to an in-hospital mortality event. Less than 15% of samples belong this class. We will need to take the class imabalance into account when tuning and training predictive models with this dataset. 

## 8. Model Architecture

The starting point for our predictive model is based on the model in [1] and consists of the following layers:

| Layer # | Description        | Input Shape                            | Parameters          | Output Shape           | Activation       |
| ------- | ------------------ | -------------------------------------- | ------------------- | ---------------------- | ---------------- |
| 1       | Bidirectional LSTM | (b, t<sub>max</sub> = 48, n<sub>meas</sub> = 19) | n<sub>LSTM</sub>    | (b, 2n<sub>LSTM</sub>) | a<sub>LSTM</sub> |
| 2       | Dropoout           | (b, 2n<sub>LSTM</sub>)                 | P<sub>dropout</sub> | (b, 2n<sub>LSTM</sub>) | -                |
| 3       | Fully Connected    | (b, 2n<sub>LSTM</sub>)                 | n<sub>FC</sub>      | (b, n<sub>FC</sub>)    | a<sub>FC</sub>   |
| 4       | Output             | (b, n<sub>FC</sub>)                    | n<sub>out</sub> = 2 | (b, n<sub>out</sub>    | a<sub>out</sub>  |


The parameters from the above table are defined as:

| Parameter           | Description                                             |
| ------------------- | ------------------------------------------------------- |
| b                   | Batch size                                              |
| t<sub>max</sub>     | Maximum input sequence length                           |
| n<sub>meas</sub>    | Number of patient measurement types                     |
| n<sub>LSTM</sub>    | Number of features in a LSTM hidden state               |
| a<sub>LSTM</sub>    | Activation function for the LSTM output                 |
| P<sub>dropout</sub> | Dropout probablity                                      |
| n<sub>FC</sub>      | Numbef of nodes in the fully connected layer            |
| a<sub>FC</sub>      | Activation function for the fully connected layer ouput |
| n<sub>out</sub>     | Number of nodes in the output layer                     |
| a<sub>out</sub>     | Activation function for the output layer                |


Note that n<sub>meas</sub>, n<sub>out</sub>, abd s<sub>max</sub> are fixed. We have chosen to always use all 19 patient measurement types, and our classification problem always has two classes. In our current data pipeline, data collected outside of a specified time window are removed during the final preprocessing phase. If we want the observation window to be tunable, it would be helpful to move the `preprocess.feature_finalizer` module into the `tune_attack` sub-package.

## 9. Hyperparameter Tuning

### 9.1 Architectural hyperparameters

The following table lists the ranges architectural parameters to be explored during hyperparameter tuning.

| Parameter           | Tuning Type  | Values                            |
| ------------------- | ------------ | --------------------------------- |
| b                   | Discrete     | 2<sup>k</sup> , k = 5, 6, 7, 8    |                    
| h<sub>LSTM</sub>    | Discrete     | 2<sup>k</sup> , k = 5, 6, 7       |
| a<sub>LSTM</sub>    | Discrete     | ReLU, Tanh                        |
| P<sub>dropout</sub> | Continuous   | 0.000 $\textemdash$ 0.5000        |
| h<sub>FC</sub>      | Discrete     | 2<sup>k</sup> , k = 4, 5, 6, 7, 8 |
| a<sub>FC</sub>      | Discrete     | ReLU, Tanh                        |


### 9.2 Trainer hyperparameters




During hyperparameter tuning, we also explore different training optimization algorithms and learning rates.

| Parameter     | Tuning Type | Values             |
| ------------- | ----------- | ------------------ |
| Optimizer     | Discrete    | SGD, RMSprop, Adam |
| Learning Rate | Continuous  | 1e-5 - 1e-1        |

When using the Adam optimizer, we always use the Pytorch default values of $\beta_1 = 0.9, \beta_2 = 0.999, \epsilon = 10^{-8}$. 

### 9.3 Implementation Details
The [`HyperParameterTuner`](../src/lstm_adversarial_attack/tune_train/hyperparameter_tuner.py) class in the [`tune_train`](../src/lstm_adversarial_attack/tune_train/__init__.py) sub-package implements a cross-validation tuning scheme that utilizes the [Optuna](https://optuna.org/) framework. The boundaries of hyperparameter space to explore during tuning are passed to the [`HyperParameterTuner`](../src/lstm_adversarial_attack/tune_train/hyperparameter_tuner.py) constructor in a [`X19MLSTMTuningRanges`](../src/lstm_adversarial_attack/tune_train/tuner_helpers.py) object. The default attribute of a [`X19MLSTMTuningRanges`](../src/lstm_adversarial_attack/tune_train/tuner_helpers.py) object are stored in the following config variables in [`config_settings`](../src/lstm_adversarial_attack/config_settings.py):
```
    TUNING_LOG_LSTM_HIDDEN_SIZE
    TUNING_LSTM_ACT_OPTIONS
    TUNING_DROPOUT
    TUNING_LOG_FC_HIDDEN_SIZE
    TUNING_FC_ACT_OPTIONS
    TUNING_OPTIMIZER_OPTIONS
    TUNING_LEARNING_RATE
    TUNING_LOG_BATCH_SIZE
```


A [`StratifiedKFold`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) generator is used to assign samples to each fold. When selecting samples for each training batch, we use a [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) with a [`WeightedRandomSampler`](https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler) to oversample from the minority class (label = 1). For a given set of hyperparameters, the [`HyperParameterTuner.objective_fn`](../src/lstm_adversarial_attack/tune_train/hyperparaemter_tuner.py) method returns the mean validation loss across the K folds, and this mean loss is used as a minimization target by an Optuna [`TPESampler`](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.TPESampler.html) to select new sets of hyperparameters for additional trials. [`HyperParameterTuner`](../src/lstm_adversarial_attack/tune_train/hyperparaemter_tuner.py) also uses an Optuna [`MedianPruner`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.MedianPruner.html) to stop unpromising trials early.


### 9.4 Starting a New Hyperparameter Tuning Study

Before starting, a few things to note:
* Depending your GPU compute power, running the full 30 trials could take 2 - 20 hours.
* Results will be saved to a newly created directory (with a timestamp-based name) under `data/tune_train/hyperparameter_tuning`. 
* If the study is stopped early (via CTRL-C or the Jupyter Stop button), learning from whatever trials have completed up to that point will be saved.
* While the tuning trials are running, look ahead to the next Markdown cell for instructions on how to monitor progress in Tensorboard (depending on your notebook output settings you may need to scroll down to see that cell)

We can start a new hyperparaemter tuning study using the [`tune_new`](../src/lstm_adversarial_attack/tune_train/tune_new.py) module from the [`tune_train`](../src/lstm_adversarial_attack/tune_train/__init__.py) subpackage.

In [13]:
from lstm_adversarial_attack.tune_train import tune_new
my_completed_study = tune_new.main(num_trials=30)

[32m[I 2023-07-27 17:04:21,204][0m A new study created in memory with name: no-name-fa445063-7173-4cf2-8208-1abf61a6f1cc[0m


Starting hyperparameter tuning.

Data for Tensorboard will be written to:
/home/devspace/project/data/tune_train/hyperparameter_tuning/2023-07-27_17_04_20.069961/tensorboard

Optuna trial and study objects will be saved in:
/home/devspace/project/data/tune_train/hyperparameter_tuning/2023-07-27_17_04_20.069961/checkpoints_tuner


fold_0, epoch_1, Loss: 0.5673
fold_0, epoch_2, Loss: 0.5407
fold_0, epoch_3, Loss: 0.5491


[33m[W 2023-07-27 17:04:27,678][0m Trial 0 failed with parameters: {'log_lstm_hidden_size': 6, 'lstm_act_name': 'Tanh', 'dropout': 0.01806377235909634, 'log_fc_hidden_size': 7, 'fc_act_name': 'Tanh', 'optimizer_name': 'Adam', 'learning_rate': 0.0013468989672929513, 'log_batch_size': 7} because of the following error: KeyboardInterrupt().[0m
Traceback (most recent call last):
  File "/home/devspace/env/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/home/devspace/project/src/lstm_adversarial_attack/tune_train/hyperparameter_tuner.py", line 275, in objective_fn
    trainer.train_model(num_epochs=self.epochs_per_fold)
  File "/home/devspace/project/src/lstm_adversarial_attack/tune_train/standard_model_trainer.py", line 120, in train_model
    loss.backward()
  File "/home/devspace/env/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/devspace/env/li

KeyboardInterrupt: 

### 9.5 Monitor Tuning Progress with Tensorboard

While we are tuning hyperparameters, we can monitor results in Tensorboard. One (relatively straightforward) way to start Tensorboard is to first launch a `zsh` shell inside the project container:
```
$ docker exec -it lstm_aa_app_dev /bin/zsh
```
Next, at the container `zsh` prompt, run the following command to start a Tensorboard server:

```
> tensorboard --logdir=/home/devspace/project/data/hyperparameter_tuning/continued_trials/tensorboard --host=0.0.0.0
```
Then, in your browser, go to: `http://localhost:6006/` You should see something like the screenshot below.  The x-axis for all plots is epoch number. (Unfortunately, there is no good way to add axis labels in Tensorboard.) In this example we are in the middle of running trial #21. Trial #20 completed the default number of epochs per fold (100). Trial #19 only ran 20 epochs because it was pruned by the Optuna `MeadianPruner`. 

![tensorboard_image](images/tensorboard_hyperparameter_tuning.png)



### 9.6 Run Additional Trials on an Exiting Tuning Study

If we want to run additional trials using the results saved from a tuning study that previously ran, we can use the [`tune_train.tune_resume`](../src/lstm_adversarial_attack/tune_train/tune_resume.py) module.  When we resume an existing study, the Optuna framework can use learning from earlier trials in the study to choose conditios for the new trials. The new trial results are saved to the same directory and `optuna.Study` filepath containing results from the study's previous trials.

We use the next code cell to resume tuning with an existing Study. When we do not provide an argument for tune_resume.main study_dir parameter (as is the case below), we default to the directory under `data/tune_train/hyperparameter_tuning` that contains the most recently modified `optuna_study.pickle` file.

>**Note** If we want to use a study other than the most recently modified one, our call to `tune_resume.main` would look something like this:

>`tune_resume.main(study_dir=/home/devspace/project/data/tune_train/hyperparameter_tuning/2023-07-27_17_04_20.069961/checkpoints_tuner,
> num_trials=30)`

In [18]:
from lstm_adversarial_attack.tune_train import tune_resume
tune_resume.main(num_trials=30)

Starting hyperparameter tuning.

Data for Tensorboard will be written to:
/home/devspace/project/data/tune_train/hyperparameter_tuning/continued_trials/tensorboard

Optuna trial and study objects will be saved in:
/home/devspace/project/data/tune_train/hyperparameter_tuning/continued_trials/checkpoints_tuner


fold_0, epoch_1, Loss: 0.5825
fold_0, epoch_2, Loss: 0.5598
fold_0, epoch_3, Loss: 0.5442
fold_0, epoch_4, Loss: 0.5372


[33m[W 2023-07-27 18:41:40,538][0m Trial 21 failed with parameters: {'log_lstm_hidden_size': 7, 'lstm_act_name': 'Tanh', 'dropout': 0.0039149760032136105, 'log_fc_hidden_size': 4, 'fc_act_name': 'Tanh', 'optimizer_name': 'Adam', 'learning_rate': 0.0002465804125171783, 'log_batch_size': 5} because of the following error: KeyboardInterrupt().[0m
Traceback (most recent call last):
  File "/home/devspace/env/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/home/devspace/project/src/lstm_adversarial_attack/tune_train/hyperparameter_tuner.py", line 275, in objective_fn
    trainer.train_model(num_epochs=self.epochs_per_fold)
  File "/home/devspace/project/src/lstm_adversarial_attack/tune_train/standard_model_trainer.py", line 118, in train_model
    y_hat = self.model(inputs).squeeze()
  File "/home/devspace/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_ca

KeyboardInterrupt: 

### 9.7 Select Final Hyperparameters
When we are done tuning, we can view our best set of hyperparameters by examining the `Optuna.Study` object from our above tuning run(s).


In [20]:
import lstm_adversarial_attack.resource_io as rio

study_path = Path("/home/devspace/project/data/tune_train/hyperparameter_tuning/continued_trials/checkpoints_tuner/optuna_study.pickle")

study = rio.ResourceImporter().import_pickle_to_object(
    path=study_path
)

print(f"The best trial result is from trial # {study.best_trial.number}.\n")
print("The set of hyperparameters from this trial are:")
pprint.pprint(study.best_params)


The best trial result is from trial # 20.

The set of hyperparameters from this trial are:
{'dropout': 0.029018875280141854,
 'fc_act_name': 'Tanh',
 'learning_rate': 0.0002784280532512521,
 'log_batch_size': 5,
 'log_fc_hidden_size': 4,
 'log_lstm_hidden_size': 7,
 'lstm_act_name': 'Tanh',
 'optimizer_name': 'Adam'}


### 9.8 Run K-Fold Cross Validation with "Best" Hyperparameters and Extended Training (More Epochs)
In the above tuning runs, we only run 100 epochs per fold (in the interest of reducing compute requirements). Based on the validation loss and AUC curves, it appears that we could improve our predictive performance (i.e. decrease validation loss, and increase AUC) by training longer. We now run another round of Stratified K-fold cross-validation with our best set of parameters with a larger number of epochs.

#### 9.8.1 Notes on our Method
Some caveats about our methodology:
* We are using "flat" cross-validation (as was done in previous studies on this dataset). This method computationally less expensive than nested cross-validation. Flat cross-validation has the potential to overestimate of model performance. In many cases the magnitude of overestimation is small. We also mitigate this effect by using a different set of (randomly generated) fold assignments than was used for hyperparameter tuning. 
* By selecting our hyperparameters based on the smaller number of epochs (100), we favor models that are faster to to train. It is possible that using a larger number of epochs in the tuning runs would have yielded a different (and better) set of "best" hyperparameters, but would also be computationally more expensive.


#### 9.8.2 Instantiate a CrossValidatorDriver
We use a CrossValidatorDriver object to run cross-validation with a single set of hyperparameters: 

In [None]:
import lstm_adversarial_attack.tune_train.cross_validator_driver as cvd
import lstm_adversarial_attack.x19_mort_general_dataset as xmd

cv_driver = cvd.CrossValidatorDriver.from_study_path(
        device=cur_device,
        dataset=dataset,
        study_path=cfg_paths.ONGOING_TUNING_STUDY_PICKLE
    )


Lets look at the data members of `cv_driver`

In [None]:
pprint.pprint(cv_driver.__dict__)

We will run 5-fold cross-validation using 1000 epochs per fold. We will evaluate performance and save a checkpoint once every 10 epochs. These settings are determined by the values of `CV_DRIVER_EPOCHS_PER_FOLD`, `CV_DRIVER_NUM_FOLDS`, `CV_DRIVER_EVAL_INTERVAL`, and `CV_DRIVER_EVALS_PER_CHECKPOINT` in `lstm_adversarial_attacker.config_settings`. The `.from_study_path()` class method we used to construct `cv_driver` extracts the best set of hyperparameters from `study_path` and passes them to the CrossValidationDriver constructor.

#### 9.8.3 Run Cross-Validation
We now call `cv_driver`'s `.run()` method to start the cross-validation runs.

In [None]:
cv_driver.run()

#### 9.8.4 Monitor Cross-Validation Progress in Tensorbard
Near the start of the terminal output from the previous code cell, look for the lines:
```
Checkpoints will be saved in:
/home/devspace/project/data/cv_assessments/<timestamped_directory_name>/tensorboard
```
Then, start a zsh shell inside the app container, and launch tensorboard server:
```
$ docker exec -it lstm_aa_app /bin/zsh
$ tensorboard --logdir=/home/devspace/project/data/cv_assessments/<timestamped_directory_name>/tensorboard --host=0.0.0.0
```
The Tensorboard output can now be viewed in your browswer at http://localhost:6006

This Tensorboard screenshot was taken at the end of a 5-fold, 1000 epoch per fold cross-validation run.
![tensorboard_image](images/tensorboard_5fold_cv_best_params_1000epochs.png)

#### 9.8.5 Why Do We See Continued (Slow) Increase in Predictive Performance Up To Such High (1000) Epoch Counts?

The above AUC and validation loss curves show continued (though diminishing) improvement in predictive performance during the entire 1000 epochs. The fact that we do not observe any sign of overfitting at such a large number of epochs is somewhat unusual. A likely cause of this behavior is the `WeightedRandomSampler` used in our training `DataLoaders`. Samples with our minority class label (`mortality = 1`) only represent ~15% of the total dataset. To deal with this imbalanced dataset, we oversample from the minority class and undersample from the majority class when creating batches of samples for training. In our current implementation, some samples from the majority class go unseen by the `StandardModelTrainer` for a large number of epochs. The number of unseen samples slowly dwindles (and the amount of information available for training slowly increases), even at very high epoch counts.

#### 9.8.6 Summarize Results
We can use a CrossValidationSummarizer to identify and summarize each fold's best-performing checkpoint.

In [None]:
import lstm_adversarial_attack.tune_train.cross_validation_summarizer as cvs
cv_summarizer = cvs.CrossValidationSummarizer.from_cv_checkpoints_dir()
optimal_results_df = cv_summarizer.get_optimal_results_df(
        metric=cvs.EvalMetric.VALIDATION_LOSS,
        optimize_direction=cvs.OptimizeDirection.MIN,
    )

optimal_results_df


We get the mean and standard deviation of each performance metric using:

In [None]:
optimal_results_df.describe().loc[["mean", "std"], (optimal_results_df.columns != "epoch") & (optimal_results_df.columns != "fold")]

### 9.9 Comparison with Prior Work

The table below compares the predictive performance of the LSTM model in this work with other LSTM-based models using the same dataset. The current model shows the best predictive performance among all models in the table based on AUC and F1 scores. 


|  | Authors       | Model      | Input Features | AUC             | F1              | Precision       | Recall          |
|-|------------|------------|----------------|-----------------|-----------------|-----------------|-----------------|
|1 |Sun et al.  | LSTM-128 + FC-32 + FC-2 | [13 labs, 6 vitals] x 48 hr  | 0.9094 (0.0053) | 0.5429 (0.0194) | 0.4100 (0.0272) | 0.8071 (0.0269) |
|2 |Tang et al. | LSTM-256 + FC-2 | [13 labs, 6 vitals] x 48 hr + demographic data  | 0.949 (0.003) | 0.623 (0.012) | 
| 3|Tang et al. | CNN + LSTM-256 + FC-2 | [13 labs, 6 vitals] x 48 hr + demographic data | 0.940 (0.0071) | 0.633 (0.031) | 
|4 |Tang et al. | CNN + LSTM-256 + FC-2 | [13 labs, 6 vitals] x 48 hr | 0.933 (0.006) | 0.587 (0.025) |
|5 |Tang et al. | LSTM-256 + FC-2 | [13 labs, 6 vitals] x 48 hr | 0.907 (0.006) | 0.526 (0.013) |
|6 |This work   | LSTM-128 + FC-16 + FC-2 | [13 labs, 6 vitals] x 48 hr  | 0.9657 (0.0035) | 0.9669 (0.0038) | 0.9888 (0.0009) | 0.9459 (0.0072) |

> **Notes** LSTM-X indicates an LSTM with X hidden layers. FC-X indicates a fully connected layer with an output size of X. All LSTMs are bidirectional. The demographic data used in studies #2 and #3 was obtained from MIMIC-III.
 

## 10. Run Adversarial Attack Algorithm on the Trained Model

### 10.1 Adversarial Loss and Regularization
Our method of adversarial attack is similar to Chen et al.'s approach that uses an adversarial loss function and L1 regularization. When attacking a binary classification model with trained parameters $\theta$, we start with the input feature matrix $X$ of a sample that the model correctly predicts to be in class $t_{c}$, so  $M(X) = t_{c}$ where $M$ is the model's prediction function. We then search for a perturbation matrix $P$ that meets the condition:
$$
M(X + P) \ne t_{c}
$$
Since we are dealing with binary classification, this condition is equivalent to:
$$
M(X + P) = \neg{t_{c}}
$$
where $\neg{t_c}$ is the negation of $t_c$. Defining a perturbed feature matrix $\widetilde{X} = X + P$ , an adversarial loss function can be written as:
$$
max\{[Logit(\widetilde{X})]_{t_c} - [Logit(\widetilde{X})]_{\neg{t_c}}, - \kappa \}
$$

When running perturbed input $\widetilde{X}$ through a forward pass, $[Logit(\widetilde{X})]_{t_c}$ and $[Logit(\widetilde{X})]_{\neg{t_c}}$ are the **pre-activation** values at the nodes corresponding to $t_c$ and $\neg{t_c}$ the in 2-node final layer. A value $\ge 0$ is chosen for $\kappa$. Using a small non-zero value of $\kappa$ will prevent an attack algorithm from optimizing toward an infinitesimally small gap between $[Logit(\widetilde{X})]_{t_c}$ and $[Logit(\widetilde{X})]_{\neg{t_c}}$ while still targeting the small difference we want for an adversarial example.

To encourage an attack algorithm to find sparse perturbations, the following regularized version of Equation  () is used  

$$
max\{[Logit(\widetilde{X})]_{y_\theta} - [Logit(X)]_{\widetilde{y}_\theta}, - \kappa \} + \lambda||\widetilde{X}-X||_1
$$

where $\lambda$ is the L1 regularization constant. Equation () can be minimized by subgradient descent or by an Iterative Soft-Thresholding Algorithm (ISTA). The latter approach typically converges faster. 


### 10.2 Attack Algorithm and Regularization

Adversarial attacks on a particular model and dataset input features are managed by an `AdversarialAttackTrainer`. In the procedure outlined below, we discover an adversarial example any time we find $[Logit(\widetilde{X})]_{\neg{t_c}} > [Logit(\widetilde{X})]_{t_c}$, even if we have not converged near a minimum value of Equation (). We attack each batch of samples for a fixed number of iterations, regardless of how many (if any) adversarial examples are found.

1. A `LogitNoDropoutModelBuiler` creates a modified version of the target model. The modified model has all dropout probabilities set to zero, and does not have an activation function on the output layer.
2. Batches of input features are run through a `FeaturePerturber` (implemented in `attack.feature_perturber`) that generates slightly modified versions of original features
3. The perturbed features are run through the modified model that was built by the `LogitNoDropoutModelBuiler` to obtain values for $[Logit(\widetilde{X})]_{t_c}$ and $[Logit(\widetilde{X})]_{\neg{t_c}}$
4. An instance of custom PyTorch loss function ` AdversarialLoss`, which implements Equation (), calculates a loss tensor
5. The Pytorch `.backward()`  method of the loss tensor finds the gradient of the loss with respect to the elements of the `FeaturePerturber.perturbation` tensor

6. If the current $Logit$ values resulting from a sample's perturbed input features represent an adversarial example, and the example is either the first or lowest loss example for that sample, the perturbations and other details are stored in a `BatchResult` object.

7. A Pytorch optimizer uses the loss gradient to calculate and apply adjustments to the perturbations

8. The `AdversarialAttackTrainer.apply_soft_bounded_threshold()` method performs ISTA thresholding on the perturbations

9. The perturbations (which have been adjusted by the optimizer *and* ISTA thresholding, are used in step 1 of the next attack iteration.

Two key points from above procedure are: (1) Unlike the method used in [], we do not stop attacking an example upon finding a single adversarial perturbation for it.  (2) We use a combination of subgradient descent (in step 7), and ISTA (in step 8) to minimize (or at least reduce) the value of equation (). We do not know if this approach is guaranteed to converge to a minimum in the adversarial loss function, but empirically, we find this subgradient descent + ISTA more effective at finding sparse adversarial examples than either method is on its own.




### 10.3 Attack Hyperparameter Tuning

Before running an attack on the entire dataset, we tune attack hyperparameters with help from `optuna`. Our approach here is not as rigourous as the one we used for predictive model tuning. We just use small fraction of the total dataset for tuning, and no cross-validation is involved.

#### 10.3.1 Viewing / Setting the Tuning Ranges

First, let's look at the current values of the project config variables that determine how an attack hyperparameter tuning session will run.


In [None]:
print(f"Kappa: min = {cfg_set.ATTACK_TUNING_KAPPA[0]}, max = {cfg_set.ATTACK_TUNING_KAPPA[1]}")
print(f"Lambda: min = {cfg_set.ATTACK_TUNING_LAMBDA_1[0]}, max = {cfg_set.ATTACK_TUNING_LAMBDA_1[1]}")
print(f"Optimizer: {cfg_set.ATTACK_TUNING_OPTIMIZER_OPTIONS}")
print(f"Learning rate: min = {cfg_set.ATTACK_TUNING_LEARNING_RATE[0]}, max = {cfg_set.ATTACK_TUNING_LEARNING_RATE[1]}")
print(f"Batch size: min = {2 ** cfg_set.ATTACK_TUNING_LOG_BATCH_SIZE[0]}, max = {2 ** cfg_set.ATTACK_TUNING_LOG_BATCH_SIZE[1]}")
print(f"Attack iterations per batch: {cfg_set.ATTACK_TUNING_EPOCHS}")
print(f"Max number of samples: {cfg_set.ATTACK_TUNING_MAX_NUM_SAMPLES}")


These values are stored as variables in `src/lstm_adversarial_attack/config_settings.py` and can be modified as needed to customize a tuning session.

>**Note** The `ATTACK_TUNING_MAX_NUM_SAMPLES` specifies the number of samples to be considered for attack. However, samples that are misclassified by the target model are not attacked, so the actual number of samples used for tuning will be slightly lower.

#### 10.3.2 Running an AttackHyperParameterTuner

We can run a new attack hyperparameter tunin session with function `initiate_attack_tuning_study()` from the `attack.tune_attacks` module. The `target_mode_assessment_type` determines whether we use a model trained by cross-validation or single-fold training. The default behavior is to choose the most recent result of whichever assessment type is specified. We can influence the type of adversarial perturbation our tuned algorithm will produce through the argument passed to the `objective` parameter. We use the return value of any method of class `AttackTunerObjectivesBuilder`. Current options are:

| Objective                                                   | Maximizes                                                    |
| ----------------------------------------------------------- | ------------------------------------------------------------ |
| `sparsity()`        | Sum of the perturbation sparsities of the lowest loss adversarial example of each sample |
| `max_num_nonzero_perts()` | Number of adversarial perturbations with only one non-zero element |
| `sparse_small()`           | Sum of (sparsity / L1 norm) of the lowest loss adversarial example of each sample |
| `sparse_small_max()`       | Sum of (sparsity / largest magnitude of any perturbation element) of lowest loss adversarial example of each sample |



In [None]:
import lstm_adversarial_attack.attack.attack_hyperparameter_tuner as aht
import lstm_adversarial_attack.attack.tune_attacks as tua

# initial_attack_tuning_study = tua.main()
help(tua.main)

# initial_attack_tuning_study = tua.start_new_tuning(
#         num_trials=50,
#         # target_model_assessment_type=amr.ModelAssessmentType.KFOLD,
#         objective=aht.AttackTunerObjectivesBuilder.sparse_small_max(),
#     )


Results will be saved in a subdirectory of `/home/devspace/project/data/attack/attack_hyperparameter_tuning`. If we want to run additional trials for an existing study, we can use the following code. Again, the default behavior is to use the newest existing study.

In [None]:
continued_study = tua.resume_tuning(num_trials=50)

#### 10.3.3 Attacking the full dataset

Next we use function  `attack_with_tuned_params()` to attack all correctly classified samples using the results of our latest hyperparamter tuning session. Results will be saved in a subdirectory of /home/devspace/project/data/attack/frozen_hyperparameter_attack

In [None]:
import lstm_adversarial_attack.attack.attack as atk
atk.attack_with_tuned_params()

## 11. Attack Results

We can 

In [None]:
import lstm_adversarial_attack.attack_analysis.attack_analysis as ata


In [None]:
from lstm_adversarial_attack.attack_analysis import attack_analysis_driver as aad
aad.plot_latest_result()