# LSTM Time Series Deep Learning and Adversarial Attacks
## Predictive models of patient outcomes based on ICU data

This project aims to reproduce results originally published in:
Sun, M., Tang, F., Yi, J., Wang, F. and Zhou, J., 2018, July. Identify susceptible locations in medical records via adversarial attacks on deep predictive models. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 793-801). (https://arxiv.org/abs/1802.04822)

The original paper trained a Long Short-Term Memory (LSTM) model using time series of Intensive Care Unit (ICU) patient vital-sign and lab measurements as the inputs, and in-hospital mortality as the prediction target. An adversarial attack algorithm was then used to identify small perturbations which, when applied to a real, correctly-classified input sample, result in misclassification of the perturbed input. Susceptibility calculations were then performed to quantify the attack vulnerability as functions of time and the measurement type within the input feature space.

In [1]:
import numpy as np
import pprint
import sys
import src_paths
import torch
sys.path.append(str(src_paths.lstm_adversarial_attack_pkg))
import lstm_adversarial_attack.config_paths as lcp
import lstm_adversarial_attack.config_settings as lcs
import lstm_adversarial_attack.query_db.mimiciii_database as mdb

In [2]:
pprint.pprint(lcs.DB_QUERIES)

[PosixPath('/home/devspace/project/src/mimiciii_queries/icustay_detail.sql'),
 PosixPath('/home/devspace/project/src/mimiciii_queries/pivoted_bg.sql'),
 PosixPath('/home/devspace/project/src/mimiciii_queries/pivoted_lab.sql'),
 PosixPath('/home/devspace/project/src/mimiciii_queries/pivoted_vital.sql')]


In [None]:
db_access = mdb.MimiciiiDatabaseAccess(
    dotenv_path=lcp.DB_DOTENV_PATH, output_dir=lcp.DB_OUTPUT_DIR
)
db_access.connect()
db_query_results = db_access.run_sql_queries(
    sql_query_paths=lcs.DB_QUERIES
)
db_access.close_connection()

In [3]:
import lstm_adversarial_attack.preprocess.preprocessor as pre

In [None]:
preprocessor = pre.Preprocessor()
pprint.pprint([item.__class__ for item in preprocessor.modules])

In [None]:
preprocessed_resources = preprocessor.preprocess()

In [63]:
import lstm_adversarial_attack.x19_mort_general_dataset as xmd
dataset = xmd.X19MGeneralDataset.from_feature_finalizer_output()

In [68]:
print(f"Number of samples in dataset = {len(dataset)}")
print(f"Type returned by dataset.__getitem__ = {type(dataset[0])}")
print(
    f"Length of each tuple returned by dataset.__getitem__ = {len(dataset[0])}"
)
print(
    "\nObject type, dimensionality, and datatype of each element in a tuple"
    " returned by dataset.__getitem__:"
)
print(tuple([(type(item), item.dim(), item.dtype) for item in dataset[0]]))
print(f"The 'input size' (# columns) of each feature matrix is "
     f"{np.unique([item.shape[1] for item in dataset[:][0]]).item()}")
print(f"The various sequence lengths (# rows) among the input feature matrices are\n"
     f"{np.unique([item.shape[0] for item in dataset[:][0]])}")



unique_sequence_lengths, sequence_length_counts = np.unique(
    [item.shape[0] for item in dataset[:][0]], return_counts=True
)
print(
    np.concatenate(
        (
            unique_sequence_lengths.reshape(-1, 1),
            sequence_length_counts.reshape(-1, 1),
        ),
        axis=1,
    )
)


unique_labels, label_counts = np.unique([dataset[:][1]], return_counts=True)
print(
    np.concatenate(
        (unique_labels.reshape(-1, 1), label_counts.reshape(-1, 1)), axis=1
    )
)


Number of samples in dataset = 41951
Type returned by dataset.__getitem__ = <class 'tuple'>
Length of each tuple returned by dataset.__getitem__ = 2

Object type, dimensionality, and datatype of each element in a tuple returned by dataset.__getitem__:
((<class 'torch.Tensor'>, 2, torch.float32), (<class 'torch.Tensor'>, 0, torch.int64))
The 'input size' (# columns) of each feature matrix is 19
The various sequence lengths (# rows) among the input feature matrices are
[ 6 13 14 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
 37 38 39 40 41 42 43 44 45 46 47 48]
[[    6     1]
 [   13     1]
 [   14     1]
 [   16     2]
 [   17     4]
 [   18     3]
 [   19     8]
 [   20    12]
 [   21    25]
 [   22    49]
 [   23    84]
 [   24   144]
 [   25   126]
 [   26   110]
 [   27    93]
 [   28    95]
 [   29    84]
 [   30    90]
 [   31    75]
 [   32    99]
 [   33    98]
 [   34   113]
 [   35   148]
 [   36   152]
 [   37   189]
 [   38   199]
 [   39   220]
 [   40   17

In [35]:
len(dataset[0]

2

In [37]:
print(type(dataset[0][0]))

<class 'torch.Tensor'>


In [None]:
import torch
import lstm_adversarial_attack.tune_train.tuner_driver as td

if torch.cuda.is_available():
    cur_device = torch.device("cuda:0")
else:
    cur_device = torch.device("cpu")

print(f"cur_device is {cur_device}")

In [None]:
tuner_driver = td.TunerDriver(device=cur_device)
pprint.pprint(tuner_driver.tuner.tuning_ranges)

In [None]:
pprint.pprint(tuner_driver.tuner.dataset[0][0].shape)

In [None]:
my_completed_study = tuner_driver(num_trials=30)