# Table of Content

<a name="outline"></a>

## Setup

- [A](#seca) External Imports
- [B](#secb) Internal Imports
- [C](#secc) Configurations and Paths 
- [D](#secd) Patient Interface and Train/Val/Test Partitioning
- [E](#sece) General Utility Functions


## Training

- [1](#sec1) Training ICE-NODE and The Baselines on MIMIC-III
- [2](#sec2) Training ICE-NODE and The Baselines on MIMIC-IV

<a name="seca"></a>

### A External Imports [^](#outline)

In [1]:
import sys
import os
import glob
import random
from collections import defaultdict
from pathlib import Path

from IPython.display import display

import pandas as pd

from tqdm import tqdm

<a name="secb"></a>

### B Internal Imports [^](#outline)

In [2]:
%load_ext autoreload
%autoreload 2

import train as T
import common as C


In [3]:
# HOME and DATA_STORE are arbitrary, change as appropriate.
HOME = os.environ.get('HOME')
DATA_STORE = f'{HOME}/GP/ehr-data'

SOURCE_DIR = os.path.abspath("..")

<a name="secd"></a>

### D Configurations and Paths [^](#outline)

**Assign** MIMIC-III and MIMIC-IV directory paths into `mimic3` and `mimic4` variables.

In [4]:
output_dir = 'artefacts'
Path(output_dir).mkdir(parents=True, exist_ok=True)

In [12]:
with C.modified_environ(DATA_DIR=DATA_STORE):
    mimic3_dataset = C.datasets['M3']
    mimic4_dataset = C.datasets['M4']
   


"""
optimal hyperparams re: each model.
"""

model_config = {
    'ICE-NODE': f'{SOURCE_DIR}/optimal_configs/icenode_v1/icenode_2lr.json' ,
    'ICE-NODE_UNIFORM': f'{SOURCE_DIR}/optimal_configs/icenode_v1/icenode_2lr.json' ,
    'GRU': f'{SOURCE_DIR}/optimal_configs/icenode_v1/gru.json' ,
    'RETAIN': f'{SOURCE_DIR}/optimal_configs/icenode_v1/retain.json'
}

model_config = {clf: C.load_config(file) for clf, file in model_config.items()}

clfs = ['ICE-NODE', 'ICE-NODE_UNIFORM', 'GRU', 'RETAIN']

In [6]:
m3_train_output_dir = {clf: f'{output_dir}/m3_train/{clf}' for clf in clfs}
m4_train_output_dir = {clf: f'{output_dir}/m4_train/{clf}' for clf in clfs}

[Path(d).mkdir(parents=True, exist_ok=True) for d in m3_train_output_dir.values()]
[Path(d).mkdir(parents=True, exist_ok=True) for d in m4_train_output_dir.values()]

In [7]:
m3_reporters = T.make_reporters(clfs, m3_train_output_dir)
m4_reporters = T.make_reporters(clfs, m4_train_output_dir)

<a name="sece"></a>

### E Patient Interface and Train/Val/Test Patitioning [^](#outline)

In [8]:
code_scheme = {
    'dx': 'dx_ccs',
    'dx_outcome': 'dx_flatccs_filter_v1'
}
m3_interface = C.Subject_JAX.from_dataset(mimic3_dataset, code_scheme=code_scheme)
m4_interface = C.Subject_JAX.from_dataset(mimic4_dataset, code_scheme=code_scheme)

m3_splits = m3_interface.random_splits(split1=0.7, split2=0.85, random_seed=42)
m4_splits = m4_interface.random_splits(split1=0.7, split2=0.85, random_seed=42)


In [9]:
m4_percentiles = m4_interface.dx_outcome_by_percentiles(20)
m3_percentiles = m3_interface.dx_outcome_by_percentiles(20)

m3_train_percentiles = m3_interface.dx_outcome_by_percentiles(20, m3_splits[0])
m4_train_percentiles = m4_interface.dx_outcome_by_percentiles(20, m4_splits[0])

In [13]:
m3_models = T.init_models(clfs, model_config, m3_interface, m3_splits[0])

m4_models = T.init_models(clfs, model_config, m4_interface, m4_splits[0])

<a name="sec1"></a>

### 1 Training ICE-NODE and The Baselines on MIMIC-III [^](#outline)

#### ICE-NODE

In [14]:
## TODO: This may take a long time, a pretrained model already exists in (yy).
m3_trained_icenode = T.train(m3_models['ICE-NODE'], config=model_config['ICE-NODE'], 
                             splits=m3_splits, code_groups=m3_train_percentiles,
                             reporters=m3_reporters['ICE-NODE'])

#### ICE-NODE_UNIFORM

In [None]:
## TODO: This can take up to (xx), trained model already exist in (yy).
m3_trained_icenode_uni = T.train(m3_models['ICE-NODE_UNIFORM'], config=model_config['ICE-NODE_UNIFORM'], 
                                 splits=m3_splits, code_groups=m3_train_percentiles,
                                 reporters=m3_reporters['ICE-NODE_UNIFORM'])


#### GRU

In [None]:
## TODO: This can take up to (xx), trained model already exist in (yy).
m3_trained_gru = T.train(m3_models['GRU'], config=model_config['GRU'], 
                         splits=m3_splits, code_groups=m3_train_percentiles,
                         reporters=m3_reporters['GRU'])

#### RETAIN

In [None]:
## TODO: This can take up to (xx), trained model already exist in (yy).
m3_trained_retain = T.train(m3_models['RETAIN'], config=model_config['RETAIN'], 
                         splits=m3_splits, code_groups=m3_train_percentiles,
                         reporters=m3_reporters['RETAIN'])

<a name="sec2"></a>

### 2 Training ICE-NODE and The Baselines on MIMIC-IV [^](#outline)

#### ICE-NODE

In [None]:
## TODO: This can take up to (xx), trained model already exist in (yy).
m4_trained_icenode = T.train(m4_models['ICE-NODE'], config=model_config['ICE-NODE'], 
                             splits=m4_splits, code_groups=m4_train_percentiles,
                             reporters=m4_reporters['ICE-NODE'])

#### ICE-NODE_UNIFORM

In [None]:
## TODO: This can take up to (xx), trained model already exist in (yy).
m4_trained_icenode_uni = T.train(m4_models['ICE-NODE_UNIFORM'], config=model_config['ICE-NODE_UNIFORM'], 
                             splits=m4_splits, code_groups=m4_train_percentiles,
                             reporters=m4_reporters['ICE-NODE_UNIFORM'])


#### GRU

In [None]:
m4_gru_model, m4_gru_state = m4_models['GRU']
## TODO: This can take up to (xx), trained model already exist in (yy).
m4_gru_state, m4_gru_evals = train_model(m4_gru_model, m4_gru_state,
                                         model_config['GRU'], 
                                         m4_train_ids, m4_valid_ids,
                                         'trained_models/m4_gru', 
                                         m4_train_percentiles)

#### RETAIN

In [None]:
m4_retain_model, m4_retain_state = m4_models['RETAIN']
## RESOURCES WARNING: This model, with this large dataset and occasionally long patient histories, 
## unfortunately requires larger memory than what is available in usual high-end GPUs (e.g. 12 GB in my main workstation).
## For this particular experiment, we relied on CPUs and the CPU RAM (over 64 GB).
## Regarding training on MIMIC-IV, ICE-NODE and ICE-NODE_UNIFORM have finished training in less than 48 hours, 
## while GRU model has finished in less than 24 hours, however
## RETAIN training on MIMIC-IV would need more than three weeks to finish on a CPU.
## There is already a pretrained model that we add to this anonymous repository.
m4_retain_state, m4_retain_evals = train_model(m4_retain_model, m4_retain_state,
                                               model_config['RETAIN'],
                                               m4_train_ids, m4_valid_ids,
                                               'trained_models/m4_retain',
                                               m4_train_percentiles)