# LSTM Time Series Deep Learning and Adversarial Attacks

## 1. Background Information

This project reproduces and expands upon work published in [1] and [2] on Long Short-Term Memory (LSTM) predictive models of Intensive Care Unit (ICU) patient outcomes, and adversarial attacks on those models. Following the approach of the previous studies, we use patient data from the Medical Information Mart for Intensive Care (MIMIC-III) database, and build a LSTM model with inputs consisting of 13 lab measurements and 6 vital signs. The prediction target is a binary variable representing in-hospital mortaliy. An adversarial attack algorithm with L1 regularization is then used to identify small perturbations which, when applied to a real, correctly-classified input features, caused a trained model to misclassify the perturbed input. After attacking a full dataset, susceptibility calculations were  performed to identify input feature space regions most vulnerable to adversarial attack.

Aspects of the current work that expand upon the previous studies include faster data preprocessing algorithms; extensive hyperparameter tuning of both the predictive model and attack algorithm; improved performance of the predictive model; implementation of a GPU-compatible attack algorithm that enables attacking samples in batches; and not halting the attack process upon finding a single adversarial perturbation for a sample, allowing the discovery of additional, lower loss adversarial perturbations.

## 2. Confirm Development Environment Setup
The code and instructions in this notebook assume you have completed all steps in the [How to run this project](https://github.com/duanegoodner/lstm_adversarial_attack/tree/main#3-how-to-run-this-project) section of the project [README](https://github.com/duanegoodner/lstm_adversarial_attack), and you are running this notebook using Jupyter Lab inside Docker container `lstm_aa_app`. Run the following tests to confirm the environment is set up correcly:

### 2.1 Docker Containers

From a **local terminal** (not in any docker container), run `docker ps --format "table {{.ID}}\t{{.Ports}}\t{{.Names}}"`. The output should include the following lines:

```
CONTAINER ID   PORTS                                                                        NAMES
152b6903a45b   127.0.0.1:6006->6006/tcp, 127.0.0.1:8888->8888/tcp, 127.0.0.1:2200->22/tcp   lstm_aa_app
5520201f8420   0.0.0.0:5556->5432/tcp, :::5556->5432/tcp                                    postgres_optuna
92e83589a4c8   0.0.0.0:5555->5432/tcp, :::5555->5432/tcp                                    postgres_mimiciii
```
Python code will run in `lstm_aa_app`. An instance of PostgreSQL in `postgres_mmimiciii` will hold MIMIC-III raw data for our model, and databases in `postgres_optuna` will store data from studies used to tune hyperparameters of our predictive model and adversarial attack model.

### 2.2 Python Interpreter and IPython Kernel

Runt the following quick tests to confirm which Python interpreter and IPython kernel we are using:

In [1]:
!which python
# Output should be: /home/devspace/env/bin/python

/home/devspace/env/bin/python


In [2]:
from IPython.core.getipython import get_ipython
get_ipython().kernel.config["IPKernelApp"]["connection_file"]
# Outptut should be similar to: '/home/gen_user/.local/share/jupyter/runtime/kernel-v2-26202uI0Vk7x2nHkK.json'

'/home/gen_user/.local/share/jupyter/runtime/kernel-81090020-f042-4206-bf59-edb19b0a5814.json'

### 2.3 Test Database Connections

Run the following cell to test the MIMIC-III database:

In [3]:
!python /home/devspace/project/src/lstm_adversarial_attack/query_db/test_mimiciii_db.py

Successfully connected to MIMIC-III database.
Connection to MIMIC-III database successfully closed.


Then run test queries on the databases that will handle hyperparameter tuning data:

In [4]:
!python /home/devspace/project/src/lstm_adversarial_attack/tuning_db/test_tuning_study_dbs.py

model_tuning database successfully queried.
Found 11 tuning studies.
attack_tuning database successfully queried.
Found 21 tuning studies.


### 2.4 Check for GPU

The PyTorch code in our project will run much faster on a GPU than it will on a CPU. Let's find out if we have GPU access:

In [5]:
import torch
torch.cuda.is_available()

False

### 2.5 Change Working Directory
Many of the code cells in this notebook use relative paths and assume we are in directory `/home/devspace/project/src/lstm_adversarial_attack`, so let's change to that directory.

In [6]:
import os
os.chdir("/home/devspace/project/src/lstm_adversarial_attack")
!pwd

/home/devspace/project/src/lstm_adversarial_attack


## 3. Project Structure
Our `docker-compose.yml` maps the local project root directory to `/home/devspace/project` in the container. Run the following cell for an overview of our project layout. 

In [7]:
!tree -L 1 /home/devspace/project

[01;34m/home/devspace/project[0m
├── [01;32mREADME.md[0m
├── [01;32mconfig.toml[0m
├── [01;34mdata[0m
├── [01;34mdocker[0m
├── [01;34mdocs[0m
├── [01;34mlogs[0m
├── [01;34mnotebooks[0m
└── [01;34msrc[0m

6 directories, 2 files


### 3.1 `src/`
The contents of `/home/devspace/project/src/lstm_adversarial_attack` are:

In [8]:
!tree -d -L 1 /home/devspace/project/src/lstm_adversarial_attack

[01;34m/home/devspace/project/src/lstm_adversarial_attack[0m
├── [01;34m__pycache__[0m
├── [01;34mattack[0m
├── [01;34mattack_analysis[0m
├── [01;34mconfig[0m
├── [01;34mdataset[0m
├── [01;34mmodel[0m
├── [01;34mpreprocess[0m
├── [01;34mquery_db[0m
├── [01;34mtuning_db[0m
└── [01;34mutils[0m

10 directories


 Code in the sub-directories listed above forms our project pipeline: 
 * **query_db** runs .sql queries to extract patient lab, vital sign, and in-hospital mortality data from the MIMIC-III PostgreSQL database.
 * **preprocess** transforms .sql query output into a form that can be input to PyTorch models. 
 * **model** tunes and trains a PyTorch model for predicting in-hospital mortality based on lab and vital sign time-series data.
 * **attack** tunes and trains a PyTorch attack model that generates adversarial examples for the predictive model.
 * **attack_analysis** generates plots for visualizing characteristics of adversarial examples found by the attack model.

### 3.2 `notebook_helpers`

The `utils/notebook_helpers` module contains functions and classes to help streamline the data pipeline when running the project in a Jupyter notebook.

In [10]:
import utils.notebook_helpers as nh

### 3.2 `data/`

For each of the critical directories under `src/lstm_adversarial_attack/`, there is a corresponding directory under `data/`

In [11]:
!tree -L 1 /home/devspace/project/data

[01;34m/home/devspace/project/data[0m
├── [01;34mattack[0m
├── [01;34mattack_analyses_old[0m
├── [01;34mattack_analysis[0m
├── [01;34mexample_data[0m
├── [01;34mmodel[0m
├── [01;34mpreprocess[0m
└── [01;34mquery_db[0m

7 directories, 0 files


### 3.3 `config.toml`
Project configuration variables are set in the `config.toml` file. We can use `CONFIG_READER` and `CONFIG_MODIFIER` from the `config` sub-package to read and write to the `config.toml` file. The following code cell demonstrates how we can read / write `config.tomll` values.

In [12]:
orig_kfold_random_seed = nh.get_config_value("model.tuner_driver.kfold_random_seed")
print(f"Original value: {orig_kfold_random_seed}")

nh.set_config_value("model.tuner_driver.kfold_random_seed", 2024)
modified_kfold_random_seed = nh.get_config_value("model.tuner_driver.kfold_random_seed")
print(f"Value changed to: {modified_kfold_random_seed}")

nh.set_config_value("model.tuner_driver.kfold_random_seed", orig_kfold_random_seed)

final_kfold_random_seed = nh.get_config_value("model.tuner_driver.kfold_random_seed")
print(f"Final value: {final_kfold_random_seed}")

Original value: 1234
Value changed to: 2024
Final value: 1234


### 3.4 Storing Sub-package and Module Session IDs
We will use an instance of `notebook_helpers.SessionIDs` to store the session IDs from the particular `modules` and `sub-package` run sessions that we want to send on through our pipeline.

In [20]:
result = nh.get_config_value("model")
print(result)

{'trainer_eval_general_logging_metrics': ['accuracy', 'auc', 'f1', 'precision', 'recall', 'validation_loss'], 'trainer_eval_tensorboard_metrics': ['auc', 'f1', 'precision', 'recall', 'validation_loss'], 'trainer': {'random_seed': 12345678}, 'tuner_driver': {'num_trials': 60, 'num_folds': 5, 'num_cv_epochs': 10, 'epochs_per_fold': 5, 'kfold_random_seed': 1234, 'performance_metric': 'validation_loss', 'optimization_direction_label': 'minimize', 'tuning_output_dir': 'data/model/tuning', 'pruner_name': 'MedianPruner', 'sampler_name': 'TPESampler', 'db_env_var_name': 'MODEL_TUNING_DB_NAME', 'fold_class_name': 'StratifiedKFold', 'collate_fn_name': 'x19m_collate_fn', 'cv_mean_tensorboard_metrics': ['accuracy', 'auc', 'f1', 'precision', 'recall', 'validation_loss'], 'tuning_ranges': {'log_lstm_hidden_size': [5, 7], 'lstm_act_options': ['ReLU', 'Tanh'], 'dropout': [0.0, 0.5], 'log_fc_hidden_size': [4, 8], 'fc_act_options': ['ReLU', 'Tanh'], 'optimizer_options': ['Adam', 'RMSprop', 'SGD'], 'lear

In [22]:
import pprint
session_ids = nh.PipelineInfo()
pprint.pprint(session_ids)

AttributeError: module 'utils.notebook_helpers' has no attribute 'PipelineInfo'

## 4. Database Queries

Raw ICU patient data can be extracted from the MIMIC-III database using modified versions of four `.sql`queries from the [MIT-LCP mimic-code repository](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/concepts/pivot).

### 4.1 Running the Queries

We connect to the database and execute the queries by running the [\_\_main__](../src/lstm_adversarial_attack/query_db/__main__.py) module of the [query_db](../src/lstm_adversarial_attack/query_db/\_\_init__.py) sub-package. 

In [None]:
!python -m query_db --help

In [None]:
!python -m query_db

### 4.2 Storing Query Session ID

We need to store ID of the database query session that will provide input to the next stage of our data pipeline. If we want to use the most recent query, assign `None` to custom_db_query ID. If we want to use a different query ID, assign that ID value to `custom_query_session_id` 

In [None]:
# TODO: assign either None or specific query_session_id to use.
# If use `None`, Prefilter will use most recently created query session.
custom_query_session_id = None

In [None]:
session_ids.set("db_queries", custom_query_session_id)

print("\nCurrent values in SessionIDs container:")
pprint.pprint(session_ids)

## 5. Preprocess

### 5.1 Implementation Details

We will use the [`preprocess`](../src/lstm_adversarial_attack/preprocess/__init__.py) sub-package to transform information from the `.csv` files output by the `.sql` queries into numpy arrays (which can then be easily converted into PyTorch tensors). Running this sub-package instantiates a `Preprocessor` object with a `.preprocess_modules` attribute assigned by the following code in  [`preprocessor.py`](../src/lstm_adversarial_attack/preprocess/preprocessor.py):

```
self.preprocess_modules = [
            prf.Prefilter(),
            imc.ICUStayMeasurementCombiner(),
            slb.FullAdmissionListBuilder(),
            fb.FeatureBuilder(),
            ff.FeatureFinalizer(),
        ]
```
Each element of the `.preprocess_modules` attribute is a subclass of [`PreprocessModule`](../src/lstm_adversarial_attack/preprocess/preprocess_module.py).

* [`Prefilter`](../src/lstm_adversarial_attack/preprocess/prefilter.py) reads the database query outputs into Pandas Dataframes, removes all data related to patients younger than 18 years in age, ensures consistent column naming formats, and takes care of datatype details.
* [`ICUStayMeasurementCombiner`](../src/lstm_adversarial_attack/preprocess/icustay_measurement_combiner.py) performs various joins (aka "merges" in the language of Pandas) to combine lab and vital sign measurement data with ICU stay data.
* [`FullAdmissionListBuilder`](../src/lstm_adversarial_attack/preprocess/sample_list_builder.py) generates a list consisting of one FullAdmissionData object per ICU stay. The attributes of a FullAdmissionData object include ICU stay info, and a dataframe containing the measurement and timestamp data for all vital sign and lab data associated with the ICU stay.
* [`FeatureBuilder`](../src/lstm_adversarial_attack/preprocess/feature_builder.py) resamples the time series datafame to one-hour intervals, imputes missing data, winsorizes measurement values (with cutoffs at the 5th and 95th global percentiles), and normalizes the measuremnt values so all data are between 0 and 1.
* [`FeatureFinalizer`](../src/lstm_adversarial_attack/preprocess/feature_finalizer.py) selects the data observation time window (default starts at hospital admission time and ends 48 hours after admission). This module outputs the entire dataset features as a list of numpy arrays, and the mortality labels as a list of integers. These data structures (saved as .pickle files) will be convenient starting points when the `tune_train` and `attack` sub-packages need to create PyTorch Datasets.

### 5.2 Set Preprocess Config Values

In [None]:
nh.set_config_value("preprocess", {'min_age': 18,
 'min_los_hospital': 1,
 'min_los_icu': 1,
 'bg_data_cols': ['potassium', 'calcium', 'ph', 'pco2', 'lactate'],
 'lab_data_cols': ['albumin',
  'bun',
  'creatinine',
  'sodium',
  'bicarbonate',
  'platelet',
  'glucose',
  'magnesium'],
 'vital_data_cols': ['heartrate',
  'sysbp',
  'diasbp',
  'tempc',
  'resprate',
  'spo2'],
 'winsorize_low': '5%',
 'winsorize_high': '95%',
 'resample_interpolation_method': 'linear',
 'resample_limit_direction': 'both',
 'min_observation_hours': 48,
 'observation_window_hours': 48,
 'observation_window_start': 'intime'})

### 5.3 Run the Preprocess Modules

We can run all preprocessing modules by executing the `preprocess` sub-packages `__main__`

In [None]:
!python -m preprocess --help

In [None]:
!python -m preprocess -d {session_ids.db_queries}

In [None]:
# TODO assing value if needed
custom_preprocess_session_id = None

In [None]:
session_ids.set("preprocess", custom_preprocess_session_id)

In [None]:
preprocess_session_id = get_session_id(custom_preprocess_session_id, "preprocess.output_root")
preprocess_session_id

### 5.4 Summarize Feature Finalizer Output
We can get information about the array shape and value distributions of the preprocessed using the `preprocess` sub-package's `inspect_feature_finalizer` module.

In [None]:
!python preprocess/inspect_feature_finalizer_output.py --help

In [None]:
!python preprocess/inspect_feature_finalizer_output.py -p {session_ids.preprocess}

Each sample in the FeatureFinalizer output is from a unique ICU stay, and consists of a 2D matrix of input features and a binary class label. Each column in a feature matrix corresponds to a particular lab or vital sign measurement, and each row in a feature matrix corresponds to the number of hours elapsed after a patient's hospital admission time. A class label of 1 indicates an in-hospital mortality event.

When preprocessor parameters in `config.toml` are set to default values, the FeatureFinalizer output consists of 37832 samples, and the shape of all input feature arrays is 48 x 19, and approximately 11% of the preprocessed samples have class label = 1. Later, when we tune and train our predictive model, we will use oversampling techniques to deal with the significant class imbalance.

### 5.5 Preprocessing Time

On an Intel i7-13700K CPU, the above preprocessing work takes approximately 3.9 minutes. Achieving the same transformations on the same machine with preprocessing code from [[1](#References)] takes approximately 45 minutes. This time difference is largely due to the fact that the current project preprocess subpackage avoids using  unnecessary loops and relies heavily vectorized Pandas and Numpy operations.

Additional time reduction could be achieved by parellelizing the preprocess computations with tools such as [pandaparallel](https://github.com/nalepae/pandarallel) or [pyspark](https://spark.apache.org/docs/3.3.1/api/python/index.html).

## 6. Model Architecture

The starting point for our predictive model is based on the model in [1] and consists of the following layers:

| Layer # | Description        | Input Shape                            | Parameters          | Output Shape           | Activation       |
| ------- | ------------------ | -------------------------------------- | ------------------- | ---------------------- | ---------------- |
| 1       | Bidirectional LSTM | (b, t<sub>max</sub> = 48, n<sub>meas</sub> = 19) | n<sub>LSTM</sub>    | (b, 2n<sub>LSTM</sub>) | a<sub>LSTM</sub> |
| 2       | Dropoout           | (b, 2n<sub>LSTM</sub>)                 | P<sub>dropout</sub> | (b, 2n<sub>LSTM</sub>) | -                |
| 3       | Fully Connected    | (b, 2n<sub>LSTM</sub>)                 | n<sub>FC</sub>      | (b, n<sub>FC</sub>)    | a<sub>FC</sub>   |
| 4       | Output             | (b, n<sub>FC</sub>)                    | n<sub>out</sub> = 2 | (b, n<sub>out</sub>    | a<sub>out</sub>  |


The parameters from the above table are defined as:

| Parameter           | Description                                             |
| ------------------- | ------------------------------------------------------- |
| b                   | Batch size                                              |
| t<sub>max</sub>     | Maximum input sequence length                           |
| n<sub>meas</sub>    | Number of patient measurement types                     |
| n<sub>LSTM</sub>    | Number of features in a LSTM hidden state               |
| a<sub>LSTM</sub>    | Activation function for the LSTM output                 |
| P<sub>dropout</sub> | Dropout probablity                                      |
| n<sub>FC</sub>      | Numbef of nodes in the fully connected layer            |
| a<sub>FC</sub>      | Activation function for the fully connected layer ouput |
| n<sub>out</sub>     | Number of nodes in the output layer                     |
| a<sub>out</sub>     | Activation function for the output layer                |


Note that n<sub>meas</sub>, n<sub>out</sub>, abd s<sub>max</sub> are fixed. We have chosen to always use all 19 patient measurement types, and our classification problem always has two classes. In our current data pipeline, data collected outside of a specified time window are removed during the final preprocessing phase. If we want the observation window to be tunable, it would be helpful to move the `preprocess.feature_finalizer` module into the `tune_attack` sub-package.

## 7. Model Hyperparameter Tuning

### 7.1 Architectural hyperparameters

The following table lists the ranges architectural parameters to be explored during hyperparameter tuning.

| Parameter           | Tuning Type  | Values                            |
| ------------------- | ------------ | --------------------------------- |
| b                   | Discrete     | 2<sup>k</sup> , k = 5, 6, 7, 8    |                    
| h<sub>LSTM</sub>    | Discrete     | 2<sup>k</sup> , k = 5, 6, 7       |
| a<sub>LSTM</sub>    | Discrete     | ReLU, Tanh                        |
| P<sub>dropout</sub> | Continuous   | 0.000 $\textemdash$ 0.5000        |
| h<sub>FC</sub>      | Discrete     | 2<sup>k</sup> , k = 4, 5, 6, 7, 8 |
| a<sub>FC</sub>      | Discrete     | ReLU, Tanh                        |


### 7.2 Trainer hyperparameters




During hyperparameter tuning, we also explore different training optimization algorithms and learning rates.

| Parameter     | Tuning Type | Values             |
| ------------- | ----------- | ------------------ |
| Optimizer     | Discrete    | SGD, RMSprop, Adam |
| Learning Rate | Continuous  | 1e-5 - 1e-1        |

When using the Adam optimizer, we always use the Pytorch default values of $\beta_1 = 0.9, \beta_2 = 0.999, \epsilon = 10^{-8}$. 

### 7.3 Implementation Details
The [`HyperParameterTuner`](../src/lstm_adversarial_attack/tune_train/hyperparameter_tuner.py) class in the [`model`](../src/lstm_adversarial_attack/model/__init__.py) sub-package implements a cross-validation tuning scheme that utilizes the [Optuna](https://optuna.org/) framework. The boundaries of hyperparameter space to explore during tuning are set in the `[model.tuner_driver.tuning_ranges]` section of the projectr `config.toml` file.

Other model hyperparameter tuning settings are also configured under `[model.tuner_driver]`. In the standard configuration, a PyTorch [`StratifiedKFold`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) generator is used to assign samples to each fold. When selecting samples for each training batch, we use a [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) with a [`WeightedRandomSampler`](https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler) to oversample from the minority class (label = 1). For a given set of hyperparameters, the [`HyperParameterTuner.objective_fn`](../src/lstm_adversarial_attack/tune_train/hyperparaemter_tuner.py) method returns the mean validation loss across the K folds, and this mean loss is used as a minimization target by an Optuna [`TPESampler`](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.TPESampler.html) to select new sets of hyperparameters for additional trials. [`HyperParameterTuner`](../src/lstm_adversarial_attack/tune_train/hyperparaemter_tuner.py) also uses an Optuna [`MedianPruner`](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.MedianPruner.html) to stop unpromising trials early.

### 7.4 Model Tuning Configuration Settings
Set `[model.tuner_driver]` configuration values:

In [None]:
nh.set_config_value("model.tuner_driver", {'num_trials': 60,
 'num_folds': 5,
 'num_cv_epochs': 10,
 'epochs_per_fold': 5,
 'kfold_random_seed': 1234,
 'performance_metric': 'validation_loss',
 'optimization_direction_label': 'minimize',
 'tuning_output_dir': 'data/model/tuning',
 'pruner_name': 'MedianPruner',
 'sampler_name': 'TPESampler',
 'db_env_var_name': 'MODEL_TUNING_DB_NAME',
 'fold_class_name': 'StratifiedKFold',
 'collate_fn_name': 'x19m_collate_fn',
 'cv_mean_tensorboard_metrics': ['accuracy',
  'auc',
  'f1',
  'precision',
  'recall',
  'validation_loss'],
 'tuning_ranges': {'log_lstm_hidden_size': [5, 7],
  'lstm_act_options': ['ReLU', 'Tanh'],
  'dropout': [0.0, 0.5],
  'log_fc_hidden_size': [4, 8],
  'fc_act_options': ['ReLU', 'Tanh'],
  'optimizer_options': ['Adam', 'RMSprop', 'SGD'],
  'learning_rate': [1e-05, 0.1],
  'log_batch_size': [5, 8]},
 'pruner_kwargs': {'n_startup_trials': 5, 'n_warmup_steps': 3},
 'sampler_kwargs': {}})

The `[model.tuner_driver]` secton of the `config.toml` includes parameters that determine the number of tuning trials, cross-validation folds, and epochs. With the values set above, we run an Optuna study with 60 trials. Each trial uses 5-fold cross-validation, and we run `num_cv_epochs * epochs_per_fold = 10 * 5 = 50` total epochs on each fold. (NOTE: Consider changing tname of `epochs_per_fold` to something less confusing.)

### 7.5 Start a New Hyperparameter Tuning Study
Before starting, a few things to note:
* Depending your GPU compute power, running the full 30 trials could take 2 - 20 hours.
* Results will be saved to a newly created directory (with a timestamp-based name) under `data/model/tuning/<tuning_session_id>`. 
* If the study is stopped early (via CTRL-C or the Jupyter Stop button), learning from whatever trials have completed up to that point will be saved.
* While the tuning trials are running, read ahead to the notebook section with instructions on how to monitor progress in Tensorboard.

We can start a new hyperparaemter tuning session by running the `tune_new` module in the `model` sub-package.

In [None]:
!python model/tune_new.py --help

Since terminal output during tuning can be very long, we will use the  `-r` option to redirect output to a log file and keep our notebook tidy.

In [None]:
!python model/tune_new.py -p {session_ids.preprocess} -r

### 7.6 Resume an Existing Hyperparameter Tuning Study
We can run additional trials for an existing study using the `model` sub-package's `tune_resume` module.

In [None]:
!python model/tune_resume.py --help

If we want to continue a study, we can un-comment each of the next two cells, and assign a value to `model_tuning_id_for_continuation`

In [None]:
# model_tuning_id_for_continuation = 

In [None]:
# !python model/tune_resume.py -t {model_tuning_id_for_continuation} -r

### 7.7 Monitor Tuning Progress with Tensorboard

While we are tuning hyperparameters, we can monitor results in Tensorboard. Use Jupyter Lab to open a new terminal, and run:

```
tensorboard --logdir=/home/devspace/project/data/model/tuning/<tuning-session-id>/tensorboard --host=0.0.0.0
```

Then, in your browser, go to: `http://localhost:6006/`. You should see something like the screenshot below.  The x-axis for all plots is epoch number. (Unfortunately, there is no good way to add axis labels in Tensorboard.) Note: `<tuning-session-ID>` is included in the output when running the `tune_new` and/or `tune_resume` modules.

Here is an example screen-shot of plots displayed in Tensorboard.

![tensorboard_image](images/tensorboard_model_tuning_50_epochs.png)

### 7.9 Set ID of Tuning Session to Use for Training, and View Session's Best Hyperparameters
When we are done tuning, we set the ID of the model tuning session to use as input to model training.

In [None]:
# TODO assign value if not using most recently created tuning session
custom_model_tuning_id_for_training = None

In [None]:
session_ids.set("model_tuning", custom_model_tuning_id_for_training)

Then we view the best set of hyperparameters from this tuning session using the `view_best_model_hyperparameters` module of the `model` sub-package.

In [None]:
!python model/view_best_model_hyperparameters.py --help

In [None]:
!python model/view_best_model_hyperparameters.py -t {model_tuning_id_for_training}

## 8. Model Training
For model hyperparameter tuning described in the previous section, we typically run ~50 epochs per fold (in the interest of reducing compute requirements). Based on the validation loss, AUC, and F1 curves from tuning trials, it appears that predictive performance could be improved by training for a larger number of epochs. We now run another round of Stratified K-fold cross-validation with our best set of parameters with a larger number of epochs.

### 8.1 Notes on our Method
* We are using "flat" cross-validation (as was done in previous studies on this dataset). This method computationally less expensive than nested cross-validation. Flat cross-validation has the potential to overestimate of model performance. In many cases the magnitude of overestimation is small. We also mitigate this effect by using a different set of (randomly generated) fold assignments than was used for hyperparameter tuning. 
* By selecting our hyperparameters based on the smaller number of epochs (100), we favor models that are faster to to train. It is possible that using a larger number of epochs in the tuning runs would have yielded a different (and better) set of "best" hyperparameters, but would also be computationally more expensive.

### 8.2 Cross-Validation Training Settings
Settings used during model training are specified in the `[model.cv_driver_settings]` section of the `config.toml` file.

In [None]:
CONFIG_MODIFIER.set("model.cv_driver_settings", {'collate_fn_name': 'x19m_collate_fn',
 'epochs_per_fold': 1000,
 'eval_interval': 10,
 'fold_class_name': 'StratifiedKFold',
 'kfold_random_seed': 20240807,
 'num_folds': 5,
 'single_fold_eval_fraction': 0.2})

### 8.3 Run Cross-Validation Training
We can begin a training session by running the `train` module in the `model` sub-package.

In [None]:
!python model/train.py --help

In [None]:
!python model/train.py -t {model_tuning_id_for_training} -r

### 8.4 Monitor Cross-Validation Progress in Tensorbard

To view training curves in tensorboard, use Jupyter Lab to open a new terminal, and run:
```
tensorboard --logdir /home/devspace/project/data/model/tuning/<cross-validation-training-session-id>/tensorboard --host=0.0.0.0
```

Then, go to http://localhost:6006 in your browser.

This Tensorboard screenshot was taken at the end of a 5-fold, 1000 epoch per fold cross-validation run.
![tensorboard_image](images/tensorboard_model_training_1000_epochs.png)

### 8.5 Model Training Behavior: Continued Improvement at High Epoch Counts

The above AUC and validation loss curves show continued (though diminishing) improvement in predictive performance during the entire 1000 epochs. The fact that we do not observe any sign of overfitting at such a large number of epochs is somewhat unusual. A likely cause of this behavior is the `WeightedRandomSampler` used in our training `DataLoaders`. Samples with our minority class label (`mortality = 1`) only represent ~15% of the total dataset. To deal with this imbalanced dataset, we oversample from the minority class and undersample from the majority class when creating batches of samples for training. In our current implementation, some samples from the majority class go unseen by the `StandardModelTrainer` for a large number of epochs. The number of unseen samples slowly dwindles (and the amount of information available for training slowly increases), even at very high epoch counts.

### 8.6 Summarize Model Training Results
We can run the `model` sub-package's `view_model_training_summary` module to summarize each fold's best-performing checkpoint as well as the means and standard deviations of performance metrics across all folds.

In [None]:
# TODO assign value if not using most recently created tuning session
custom_cv_training_id_for_summary = 20240803135302155971

In [None]:
cv_training_id_for_summary = get_session_id(custom_cv_training_id_for_summary, "model.cv_driver.output_dir")
cv_training_id_for_summary

In [None]:
!python model/view_model_training_summary.py --help

In [None]:
!python model/view_model_training_summary.py -t {cv_training_id_for_summary}

### 8.7 Comparing Training Results with Prior Studies' Predictive Models

The table below compares the predictive performance of the LSTM model in this work with other LSTM-based models using the same dataset. The current model shows the best predictive performance among all models in the table based on AUC and F1 scores. 


|  | Authors       | Model      | Input Features | AUC             | F1              | Precision       | Recall          |
|-|------------|------------|----------------|-----------------|-----------------|-----------------|-----------------|
|1 |Sun et al.  | LSTM-128 + FC-32 + FC-2 | [13 labs, 6 vitals] x 48 hr  | 0.9094 (0.0053) | 0.5429 (0.0194) | 0.4100 (0.0272) | 0.8071 (0.0269) |
|2 |Tang et al. | LSTM-256 + FC-2 | [13 labs, 6 vitals] x 48 hr + demographic data  | 0.949 (0.003) | 0.623 (0.012) | 
| 3|Tang et al. | CNN + LSTM-256 + FC-2 | [13 labs, 6 vitals] x 48 hr + demographic data | 0.940 (0.0071) | 0.633 (0.031) | 
|4 |Tang et al. | CNN + LSTM-256 + FC-2 | [13 labs, 6 vitals] x 48 hr | 0.933 (0.006) | 0.587 (0.025) |
|5 |Tang et al. | LSTM-256 + FC-2 | [13 labs, 6 vitals] x 48 hr | 0.907 (0.006) | 0.526 (0.013) |
|6 |This work   | LSTM-128 + FC-16 + FC-2 | [13 labs, 6 vitals] x 48 hr  | 0.9657 (0.0035) | 0.9669 (0.0038) | 0.9888 (0.0009) | 0.9459 (0.0072) |

> **Notes** LSTM-X indicates an LSTM with X hidden layers. FC-X indicates a fully connected layer with an output size of X. All LSTMs are bidirectional. The demographic data used in studies #2 and #3 was obtained from MIMIC-III.
 

## 9 Attack Hyperparameter Tuning

Before running an attack on the entire dataset, we tune attack hyperparameters with help from `optuna`. Our approach here is not as rigourous as the one we used for predictive model tuning. We use only a fraction of the total dataset for tuning, and do not perform cross-validation.

### 9.1 Set ID of Training Session to Use as Input for Attack Tuning


In [None]:
custom_cv_training_id_for_attack_tuning = None

In [None]:
if custom_cv_training_id_for_attack_tuning is None:
    cv_training_id_for_attack_tuning = cv_training_id_for_summary
else:
    cv_training_id_for_attack_tuning = custom_cv_training_id_for_attack_tuning
    
cv_training_id_for_attack_tuning

### 9.2 Attack Hyperparameter Tuning Config Settings

Settings that affect attack hyperparameter tuning are under `[attack.tuning.ranges]` and `[attack.tuner_driver_settings]`in the `config.toml`. We can view the current values of these settings using:

In [None]:
CONFIG_MODIFIER.set("attack.tuning.ranges", {'kappa': [0.0, 2.0],
 'lambda_1': [1e-07, 1.0],
 'learning_rate': [1e-05, 1.0],
 'log_batch_size': [5, 7],
 'optimizer_options': ['Adam', 'RMSprop', 'SGD']})

CONFIG_MODIFIER.set("attack.tuner_driver_settings", {'db_env_var_name': 'ATTACK_TUNING_DB_NAME',
 'num_trials': 75,
 'epochs_per_batch': 1000,
 'max_num_samples': 1028,
 'sample_selection_seed': 2023,
 'pruner_name': 'MedianPruner',
 'sampler_name': 'TPESampler',
 'objective_name': 'sparse_small',
 'max_perts': 0,
 'attack_misclassified_samples': False,
 'objective_extra_kwargs': {},
 'pruner_kwargs': {},
 'sampler_kwargs': {}})

The `max_num_samples` parameter specifies the number of samples to be considered for attack. However, samples that are misclassified by the target model are not attacked, so the actual number of samples used for tuning will be slightly lower.

### 9.2 Adversarial Example Quality Scores

Each trial in an attack tuning study runs `epochs_per_batch` attack iterations on each of the selected samples. Then, an adversarial example quality score is calculated for the lowest loss adversarial example each sample that has at least one associated adversarial example. A trial's score is the sum of these example quality scores. The `objective_name` parameter specifies the objective function used to calculate the quality scores.
 
 The following table summarizes how each of the available objective functions calculates the quality score of a single adversarial perturbation matrix $P_{adv}$.
| Objective                                                   | Example Quality Score Formula                              |
| ----------------------------------------------------------- | ------------------------------------------------------------ |
| sparsity        | $1 - f_{nonzero}$ |
| max_num_nonzero_perts | $if\; n_{nonzero} < n_{critical}: 1, otherwise: 0$ |
| sparse_small           | $sparsity\;/\;\|P_{adv}\|_1$ |
| sparse_small_max       | $sparsity\;/\ max(|P_{adv}|)$  |

where $n_{nonzero}$ is the number of non-zero elements in $P_{adv}$, $f_{nonzero}$ is the fraction of non-zero elements, $\|P_{adv}\|_1$ is the L1 norm, and $|P_{adv}|$ is the element-wise absolute value.

### 9.3 Tune Attack with `sparse_small` Objective

Although, `attack.tuner_driver_settings.objective_name` should already be set to `sparse_small_max`, let's set it again to be sure:


In [None]:
CONFIG_MODIFIER.set("attack.tuner_driver_settings.objective_name", "sparse_small")

We can start a new attack hyperparameter tuning session with the `attack` sub-package's `tune_attack_new` module. If a tuning session is stopped early (via CTRL-C or the notebook Stop button), data from completed trials will be saved.

In [None]:
!python attack/tune_attack_new.py --help

In [None]:
!python attack/tune_attack_new.py -t {cv_training_id_for_attack_tuning} -r

### 9.4 Tune Attack with `sparse_small_max` Objective

We change the objective name in our `config.toml` using:

In [None]:
CONFIG_MODIFIER.set("attack.tuner_driver_settings.objective_name", "sparse_small_max")

In [None]:
!python attack/tune_attack_new.py -t {cv_training_id_for_attack_tuning} -r

### 9.4 Resuming an Existing Attack Tuning Session
The `tune_attack_resume` module can be used to run more trials as part of an existing attack tuning study. If we are resuming the most recently created attack tuning study, we can set `custom_attack_tuning_id_to_resume` in the next cell to `None`. Otherwise, set it to the ID of the session we want to resume.

In [None]:
# TODO: set this to ID of session to resume (OK to leave as None if resuming most recently created study)
custom_attack_tuning_id_to_resume = None

In [None]:
attack_tuning_id_to_resume = get_session_id(custom_attack_tuning_id_to_resume, "attack.tuner_driver.output_dir")
attack_tuning_id_to_resume

In [None]:
!python attack/tune_attack_resume.py --help

In [None]:
!python attack/tune_attack_resume.py -t {attack_tuning_id_to_resume} -r

## 10. Attacking the Full Dataset with Tuned Attack Hyperparameters

In [None]:
CONFIG_READER.get_value("attack.driver_settings")

In [None]:
CONFIG_MODIFIER.set("attack.driver_settings", {'epochs_per_batch': 1000,
 'max_num_samples': 40000,
 'sample_selection_seed': 2023,
 'attack_misclassified_samples': False})

### 10.1 Attack Using Best Hyperparameters from `sparse_small_max` Tuning

In [None]:
!python attack/attack.py -t {sparse_small_max_attack_tuning_id} -r

## 10.2 Attack Using Best Hyperparameters from `sparse_small` Tuning

In [None]:
!python attack/attack.py -t {sparse_small_attack_tuning_id} -r

## References

<a id="ref_01"></a>1. [Sun, M., Tang, F., Yi, J., Wang, F. and Zhou, J., 2018, July. Identify susceptible locations in medical records via adversarial attacks on deep predictive models. In *Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining* (pp. 793-801).](https://dl.acm.org/doi/10.1145/3219819.3219909)

<a id="ref_02">2.</a> [Tang, F., Xiao, C., Wang, F. and Zhou, J., 2018. Predictive modeling in urgent care: a comparative study of machine learning approaches. *Jamia Open*, *1*(1), pp.87-98.](https://academic.oup.com/jamiaopen/article/1/1/87/5032901)

<a><a id="ref_03">3.</a> </a>[Johnson, A., Pollard, T., and Mark, R. (2016) 'MIMIC-III Clinical Database' (version 1.4), *PhysioNet*.](https://doi.org/10.13026/C2XW26) 

<a id="ref_04">4.</a> [Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.](https://www.nature.com/articles/sdata201635)
