# Table of Content

<a name="outline"></a>

## Setup

- [A](#seca) External Imports
- [B](#secb) Internal Imports
- [C](#secc) Configurations and Paths 
- [D](#secd) Patient Interface and Train/Val/Test Partitioning
- [E](#sece) General Utility Functions


## Training

- [1](#sec1) Training ICE-NODE and The Baselines on MIMIC-III
- [2](#sec2) Training ICE-NODE and The Baselines on MIMIC-IV

<a name="seca"></a>

### A External Imports [^](#outline)

In [1]:
import sys
import os
import glob
import random
from collections import defaultdict
from pathlib import Path

from IPython.display import display

import pandas as pd

from tqdm import tqdm

<a name="secb"></a>

### B Internal Imports [^](#outline)

In [2]:
%load_ext autoreload
%autoreload 2

import train as T
import common as C


  PyTreeDef = type(jax.tree_structure(None))


In [3]:
# HOME and DATA_STORE are arbitrary, change as appropriate.
HOME = os.environ.get('HOME')
DATA_STORE = f'{HOME}/GP/ehr-data'
DATA_FILE = os.path.join(DATA_STORE, 'cprd-data/DUMMY_DATA.csv')
SOURCE_DIR = os.path.abspath("..")

<a name="secd"></a>

### D Configurations and Paths [^](#outline)

**Assign** MIMIC-III and MIMIC-IV directory paths into `mimic3` and `mimic4` variables.

In [4]:
output_dir = 'cprd_artefacts'
Path(output_dir).mkdir(parents=True, exist_ok=True)

In [5]:
with C.modified_environ(DATA_FILE=DATA_FILE):
    cprd_dataset = C.datasets['CPRD']
   

In [6]:
"""
optimal hyperparams re: each model.
"""

model_config = {
    'ICE-NODE': f'{SOURCE_DIR}/expt_configs/cprd/icenode_2lr.json' ,
    'ICE-NODE_UNIFORM': f'{SOURCE_DIR}/expt_configs/cprd/icenode_2lr.json' ,
    'GRU': f'{SOURCE_DIR}/expt_configs/cprd/gru.json' ,
    'RETAIN': f'{SOURCE_DIR}/expt_configs/cprd/retain.json'
}

model_config = {clf: C.load_config(file) for clf, file in model_config.items()}

clfs = ['ICE-NODE', 'ICE-NODE_UNIFORM', 'GRU', 'RETAIN']

In [7]:
cprd_train_output_dir = {clf: f'{output_dir}/train/{clf}' for clf in clfs}

[Path(d).mkdir(parents=True, exist_ok=True) for d in cprd_train_output_dir.values()]

[None, None, None, None]

In [8]:
cprd_reporters = T.make_reporters(clfs, cprd_train_output_dir)

<a name="sece"></a>

### E Patient Interface and Train/Val/Test Patitioning [^](#outline)

In [9]:
code_scheme = {
    'dx': 'dx_cprd_ltc9809',
    'dx_outcome': 'dx_cprd_ltc9809'
}
cprd_interface = C.Subject_JAX.from_dataset(cprd_dataset, code_scheme=code_scheme)
cprd_splits = cprd_interface.random_splits(split1=0.7, split2=0.85, random_seed=42)


In [10]:
cprd_percentiles = cprd_interface.dx_outcome_by_percentiles(20)
cprd_train_percentiles = cprd_interface.dx_outcome_by_percentiles(20, cprd_splits[0])

In [11]:
cprd_models = T.init_models(clfs, model_config, cprd_interface, cprd_splits[0])


  leaves, treedef = jax.tree_flatten(tree)
  return jax.tree_unflatten(treedef, leaves)


<a name="sec1"></a>

### 1 Training ICE-NODE and The Baselines on MIMIC-III [^](#outline)

#### ICE-NODE

In [12]:
## TODO: This may take a long time, a pretrained model already exists in (yy).
cprd_trained_icenode = T.train(cprd_models['ICE-NODE'], config=model_config['ICE-NODE'], 
                             splits=cprd_splits, code_groups=cprd_train_percentiles,
                             reporters=cprd_reporters['ICE-NODE'])

NameError: name 'cprd_reporters' is not defined

#### ICE-NODE_UNIFORM

In [None]:
## TODO: This can take up to (xx), trained model already exist in (yy).
cprd_trained_icenode_uni = T.train(cprd_models['ICE-NODE_UNIFORM'], config=model_config['ICE-NODE_UNIFORM'], 
                                 splits=cprd_splits, code_groups=cprd_train_percentiles,
                                 reporters=cprd_reporters['ICE-NODE_UNIFORM'])


#### GRU

In [None]:
## TODO: This can take up to (xx), trained model already exist in (yy).
cprd_trained_gru = T.train(cprd_models['GRU'], config=model_config['GRU'], 
                         splits=cprd_splits, code_groups=cprd_train_percentiles,
                         reporters=cprd_reporters['GRU'])

#### RETAIN

In [None]:
## TODO: This can take up to (xx), trained model already exist in (yy).
cprd_trained_retain = T.train(cprd_models['RETAIN'], config=model_config['RETAIN'], 
                         splits=cprd_splits, code_groups=cprd_train_percentiles,
                         reporters=cprd_reporters['RETAIN'])