Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.
/ CS598DL4H Public archive

Project for CS 598 Deep Learning for Healthcare

Notifications You must be signed in to change notification settings


Repository files navigation


Project for CS 598 Deep Learning for Healthcare

Local Setup


git submodule init
git submodule update

Virtual Environments


pyenv local 3.7.11
python3.7 -m venv env
env/bin/pip install --upgrade pip
env/bin/pip install -r requirements.txt -r dev-requirements.txt

Use env/bin/python as the kernel for MIMIC_III.ipynb.


pyenv local 3.7.11
python3.7 -m venv env
env/bin/pip install --upgrade pip
env/bin/pip install -r requirements.txt

Use caml-mimic/env/bin/python as the kernel for caml-mimic/notebooks/dataproc_mimic_III.ipynb.


pyenv local 3.7.11
python3.7 -m venv env
env/bin/pip install --upgrade pip
env/bin/pip install -r requirements.txt

Use Explainable-Automated-Medical-Coding/env/bin/python as the kernel for Explainable-Automated-Medical-Coding/HLAN/demo_HLAN_viz.ipynb.

Google Colab Setup

See Setup.ipynb for Google Colab-only set up steps. Also, relevant Colab-only header sections in each notebook reference this set up.

Repro Steps

Prerequisites / Demo



Examples detailed in Explainable-Automated-Medical-Coding/

cd Explainable-Automated-Medical-Coding/HLAN/
source ../env/bin/activate


Currently working off of Explainable-Automated-Medical-Coding/datasets/mimiciii_*_50_th0.txt.

cd Explainable-Automated-Medical-Coding/HLAN/
../env/bin/python \
    --dataset mimic3-ds-50 \
    --batch_size 32 \
    --per_label_attention=True \
    --per_label_sent_only=False \
    --num_epochs 100 \
    --report_rand_pred=False \
    --running_times 1 \
    --early_stop_lr 0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir ../checkpoints/checkpoint_HAN_50_per_label_bs32_LE/ \
    --use_sent_split_padded_version=False \
    --marking_id 50-hlan \
    --gpu=True  # Colab only

See scripts directory for operational parameters for HLAN, HA-GRU and HAN variants with and without label embedding (LE):

  • scripts/
  • scripts/
  • scripts/
  • scripts/
  • scripts/
  • scripts/

MIMIC-III COVID-19 Shielding

NOTE: this section is incomplete due to reproducibility challenges with the COVID-19 shielding data sourced from UK's NHS, and mapped from ICD-10 to ICD-9 using tools from The Govt. of NZ

Needs preprocessing to extract only Admissions IDs from admissions containing COVID-19 related ICD-9 codes, derived from COVID-19 related ICD-10 codes in ./spl-icd10-opcs4-disease-groups-v2.0.csv and from an ICD-10-to-ICD-9 mapping in ./masterb8.csv. This process needs to generate CSV files akin to caml-mimic/mimicdata/mimic3/*_50.csv to be converted by to Explainable-Automated-Medical-Coding/datasets/mimiciii_*_full_th_50_covid_shielding.txt, the files the training expects.

cd Explainable-Automated-Medical-Coding/HLAN/
../env/bin/python \
    --dataset mimic3-ds-shielding-th50 \
    --batch_size 32 \
    --per_label_attention=True \
    --per_label_sent_only=False \
    --num_epochs 100 \
    --report_rand_pred=False \
    --running_times 1 \
    --early_stop_lr 0.00002 \
    --remove_ckpts_before_train=False \
    --use_label_embedding=True \
    --ckpt_dir ../checkpoints/checkpoint_HAN_shielding_per_label_bs32_LE/ \
    --use_sent_split_padded_version=False \
    --marking_id shielding-hlan \
    --gpu=True  # Colab only

Code Changes

Refactoring of HAN Class

  • Original source: Explainable-Automated-Medical-Coding/HLAN/
  • Refactored source: HLAN/

Original Class

The original class is a single implementation with the responsibility for HLAN (Hierarchical Label-wise Attention Network), as well as both downgraded models, HA-GRU (Hierarchical Attention - Gated Recurrent Unit) and HAN (Hierarchical Attention Network). In addition, it handles the transparent application of Label Embedding (LE) to each, by conditional application of a pre-trained word2vec model.

The important call out in this diagram is the number of instances <method> and <method>_per_label pairs that exists, indicative of an imperitive implementation. Concretely, where to apply label-wise attention (i.e.: attention per label) is the primary difference between each of the model variants HAN, HA-GRU, and HLAN.

Replace Conditional with Polymorphism

The first refactoring was Replace Conditional with Polymorphism. This allowed all the instances of <method> and <method>_per_label pairs to be modeled, instead, with an inheritence hierarchy from the simplest model (HAN, which applies no label-wise attention) to the most complex (HLAN, which applies label-wise attention at the sentence and word level).

Form Template Method

The second refactoring applied was Form Template Method. This allowed a great deal of duplication to be effectively removed by making many more finer-grained methods than the original class supported. With this change, commonality among methods defined by more than one class became apparent, and all common methods could be pushed up the inheritence hierarchy as a Template Method.

Deduplication Results

Both refactorings allowed approximately a 40% redunction in Lines of Code, and 75% reduction in words, for a functionally equivalent implementation, as shown.

$ wc -l Explainable-Automated-Medical-Coding/HLAN/
    1193 Explainable-Automated-Medical-Coding/HLAN/
$ wc -l HLAN/
     698 HLAN/
$ wc -w Explainable-Automated-Medical-Coding/HLAN/
    6577 Explainable-Automated-Medical-Coding/HLAN/
$ wc -w HLAN/
    1678 HLAN/


Project for CS 598 Deep Learning for Healthcare






No releases published


No packages published