# MIMIC-IV ICU Module Preprocessing
This Jupyter Notebook is used to preprocess the chartevents, labevents and outputevents from ICU module.

## Requirements and Installation
- Recommended installation: Devcontainer vscode extension
- Open the repository in the Devcontainer
- Move the mimic-iv dataset into the `data/external/mimic-iv-2.2` folder
- All requirements should automatically installed

IPython extension autoreload can be used to reload modules before executing user code.
https://ipython.org/ipython-doc/3/config/extensions/autoreload.html

In [1]:
%load_ext autoreload
%autoreload 2

As configuation management framework, we use Hydra (see https://hydra.cc/)

In [2]:
from hydra import compose, initialize

import hydra
from hydra.core.global_hydra import GlobalHydra

from src.data.datamodule import DataModule
from src.data import utils as utils

In [3]:
GlobalHydra().instance().clear()
initialize(version_base=None, config_path=".", job_name="train_config")

hydra.initialize()

Have a look into the `train_config.yaml` file. All configuration variables are stored there.
You can always override the configurations by using the `overrides` parameter in the `compose` function.

Important configuration parameters:
- `dataset.datamodule.props.path`: dataset path
- `data_type`: Dataformat type: tabular or timeseries
- `feature_list`: List of features that are used from the icu module

In [4]:
cfg_tab = compose(config_name="train_config", overrides=["data_type=tabular"])

Instantiation of the Datamodule

In [5]:
datamodule: DataModule = hydra.utils.instantiate(cfg_tab.dataset.datamodule)

The `datamodule.pipeline` generates a dataframe with the given ``feature_list`` and the ``target`` as tabular or timeseries dataframe.
In the following dataframe, for each feature the featurevents are aggregated with different functions like max, mean, min, and std over the `cut_off_time`.

In [6]:
dataset = datamodule.pipeline(
    cfg_tab.data_type, cfg_tab.feature_list, cfg_tab.target
)
dataset.head()

building MIMIC-IV tabular features




Unnamed: 0,stay_id,Arterial Blood Pressure diastolic_max,Arterial Blood Pressure diastolic_mean,Arterial Blood Pressure diastolic_min,Arterial Blood Pressure diastolic_std,Arterial Blood Pressure mean_max,Arterial Blood Pressure mean_mean,Arterial Blood Pressure mean_min,Arterial Blood Pressure mean_std,Arterial Blood Pressure systolic_max,...,Vti High_min,Current Dyspnea Assessment_min,Respiratory Rate (Set)_max,Fspn High_mean,Glucose finger stick (range 70-100)_min,Current Dyspnea Assessment_max,Pain Level Response_std,Richmond-RAS Scale_max,Total PEEP Level_min,icu_los
0,31269608,58.0,53.818182,49.0,3.250175,89.0,81.909091,75.0,4.700097,147.0,...,,0.0,,,182.0,0.0,0.0,0.0,,7.702512
1,37509585,75.0,62.888889,47.0,7.653441,103.0,85.222222,63.0,11.17537,162.0,...,,0.0,,,127.0,0.0,,0.0,,5.452662
2,32554129,74.0,58.25,49.0,7.827303,101.0,80.0625,66.0,10.871791,143.0,...,,0.0,,,100.0,0.0,,0.0,,0.872685
3,31338022,,,,,,,,,,...,,0.0,,,84.0,2.0,0.0,0.0,,3.766725
4,32145159,,,,,,,,,,...,,,,,166.0,,1.407886,1.0,,1.037106


In [7]:
dataset.describe()

Unnamed: 0,stay_id,Arterial Blood Pressure diastolic_max,Arterial Blood Pressure diastolic_mean,Arterial Blood Pressure diastolic_min,Arterial Blood Pressure diastolic_std,Arterial Blood Pressure mean_max,Arterial Blood Pressure mean_mean,Arterial Blood Pressure mean_min,Arterial Blood Pressure mean_std,Arterial Blood Pressure systolic_max,...,Vti High_min,Current Dyspnea Assessment_min,Respiratory Rate (Set)_max,Fspn High_mean,Glucose finger stick (range 70-100)_min,Current Dyspnea Assessment_max,Pain Level Response_std,Richmond-RAS Scale_max,Total PEEP Level_min,icu_los
count,133.0,55.0,55.0,55.0,55.0,55.0,55.0,55.0,55.0,55.0,...,54.0,49.0,52.0,54.0,101.0,49.0,52.0,131.0,44.0,133.0
mean,35121300.0,76.363636,58.902884,46.345455,7.54783,107.618182,78.728014,58.890909,12.570543,143.963636,...,214.923889,0.183673,17.961538,24.717041,127.594059,0.77551,1.039797,0.015267,6.663636,3.033942
std,2821390.0,12.831333,7.008558,8.016691,2.973076,47.43748,14.352244,12.385285,17.033965,18.345414,...,421.546876,0.666879,4.830303,11.852365,47.11182,1.747204,0.940737,1.451708,3.141255,2.710199
min,30057450.0,53.0,43.304348,23.0,1.246423,79.0,49.26087,14.0,2.56348,112.0,...,1.0,0.0,12.0,10.0,66.0,0.0,0.0,-5.0,5.0,0.023727
25%,32554130.0,67.5,54.114103,43.0,5.633243,89.0,73.774359,55.0,7.271386,131.5,...,1.2,0.0,14.0,14.25,94.0,0.0,0.283473,0.0,5.0,1.161019
50%,35146800.0,75.0,59.037037,47.0,7.369452,99.0,77.125,60.0,9.065201,143.0,...,1.4,0.0,16.0,21.25,115.0,0.0,0.865544,0.0,5.0,2.022963
75%,37267580.0,83.5,62.98218,51.5,8.568108,105.5,80.731852,66.0,11.514194,160.0,...,1.9,0.0,20.5,34.75,153.0,1.0,1.5,0.0,6.5,4.446551
max,39880770.0,120.0,74.428571,64.0,18.167606,354.0,164.333333,83.0,126.229421,180.0,...,1500.0,4.0,35.0,55.0,305.0,8.0,4.242641,4.0,18.0,13.214676


The following cells, override the data_type with `timeseries` to generate a dataframe with timeseries data. Each sample represents a measurement at timestamp `abs_event_time` for the patient with a `subject_id` and `stay_id`.

In [8]:
cfg_time = compose(config_name="train_config", overrides=["data_type=timeseries"])

In [9]:
datamodule_time: DataModule = hydra.utils.instantiate(cfg_time.dataset.datamodule)

In [10]:
dataset_time = datamodule.pipeline(
    cfg_time.data_type, cfg_time.feature_list, cfg_time.target
)
dataset_time.head()



building MIMIC-IV timeseries features


Unnamed: 0_level_0,abs_event_time,stay_id,Activity / Mobility (JH-HLM),Apnea Interval,Arterial Blood Pressure Alarm - High,Arterial Blood Pressure Alarm - Low,Arterial Blood Pressure diastolic,Arterial Blood Pressure mean,Arterial Blood Pressure systolic,Current Dyspnea Assessment,...,Temperature,Urea Nitrogen,White Blood Cells,pCO2,pH,pO2,Time in the ICU,icu_los,icu_los_hour_int,remaining_icu_los_hour
subject_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10023117,1970-01-01 01:00:00,30057454,,,,,64.8,75.0,96.8,,...,,,,,,,1.0,4.446551,106,105.0
10023117,1970-01-01 02:00:00,30057454,,,,,62.333333,72.0,93.333333,,...,,45.0,12.7,,,,2.0,4.446551,106,104.0
10023117,1970-01-01 03:00:00,30057454,,,,,64.0,76.0,99.0,,...,,45.0,12.7,45.0,7.4,76.0,3.0,4.446551,106,103.0
10023117,1970-01-01 04:00:00,30057454,,,,,67.0,78.0,102.0,,...,,45.0,12.7,45.0,7.4,76.0,4.0,4.446551,106,102.0
10023117,1970-01-01 05:00:00,30057454,,,,,67.0,79.0,105.0,,...,,45.0,12.7,45.0,7.4,76.0,5.0,4.446551,106,101.0


In [11]:
dataset_time.describe()

Unnamed: 0,abs_event_time,stay_id,Activity / Mobility (JH-HLM),Apnea Interval,Arterial Blood Pressure Alarm - High,Arterial Blood Pressure Alarm - Low,Arterial Blood Pressure diastolic,Arterial Blood Pressure mean,Arterial Blood Pressure systolic,Current Dyspnea Assessment,...,Temperature,Urea Nitrogen,White Blood Cells,pCO2,pH,pO2,Time in the ICU,icu_los,icu_los_hour_int,remaining_icu_los_hour
count,2916,2916.0,1254.0,1124.0,947.0,946.0,1133.0,1149.0,1133.0,978.0,...,399.0,1617.0,1624.0,1198.0,1441.0,1198.0,2916.0,2916.0,2916.0,2916.0
mean,1970-01-01 12:18:36.049382716,35105060.0,2.199761,20.867438,141.800422,84.139535,59.068482,78.05605,115.460041,0.391616,...,36.809023,26.427953,15.295813,41.75626,7.06486,124.605593,12.310014,3.22752,76.927298,64.617284
min,1970-01-01 00:00:00,30057450.0,1.0,20.0,90.0,45.0,23.0,-23.0,58.0,0.0,...,35.3,5.0,0.4,20.0,5.0,18.0,0.0,0.023727,0.0,-2.0
25%,1970-01-01 06:45:00,32506120.0,2.0,20.0,130.0,85.0,52.857143,69.0,102.0,0.0,...,36.4,13.0,8.8,35.0,7.23,64.0,6.75,1.207407,28.0,18.0
50%,1970-01-01 12:00:00,35128240.0,2.0,20.0,140.0,90.0,59.0,75.571429,113.0,0.0,...,36.7,19.0,11.3,41.0,7.36,96.5,12.0,2.199653,52.0,41.0
75%,1970-01-01 18:00:00,37267580.0,2.0,20.0,160.0,90.0,65.0,84.0,127.0,0.0,...,37.2,34.0,17.1,45.0,7.41,154.0,18.0,4.844086,116.0,98.0
max,1970-01-02 00:00:00,39880770.0,8.0,60.0,190.0,120.0,120.0,354.0,172.0,8.0,...,40.4,108.0,116.1,112.0,7.54,492.0,24.0,13.214676,317.0,316.0
std,,2813006.0,0.870144,5.000378,22.301634,14.152517,10.48662,23.484846,19.130344,1.263215,...,0.843788,19.423974,14.802951,10.242194,0.638164,92.633184,6.779237,2.702597,64.860871,64.856456
