# 1. Data Loading

This notebook is reflecting the adaptivity usecase in the Cardea [paper](https://arxiv.org/abs/2010.00509). It is concerned with loading the MIMIC data from a specified location into an entityset representation, then we work on storing the entityset into a pickle file.

In [1]:
import pickle

from cardea.data_loader.load_mimic import load_mimic_data

To load the MIMIC data, we use the `load_mimic_data` function which translates the schema of MIMIC into its entityset representation. The loaded data can be any subset of MIMIC and not necessarily all of the tables. Additionally, you can specify if you only want certain tables in the entityset by using `subset=['table1', 'table2', ..]`. 

Depending on the size of the tables, this can take upto several minutes.

In [2]:
folder_path = 'path/to/data/'

entityset = load_mimic_data(folder_path)
entityset

Entityset: mimic
  Entities:
    admissions [Rows: 1247, Columns: 19]
    callout [Rows: 721, Columns: 24]
    chartevents [Rows: 7401634, Columns: 15]
    cptevents [Rows: 12352, Columns: 12]
    datetimeevents [Rows: 97853, Columns: 14]
    diagnoses_icd [Rows: 13886, Columns: 5]
    drgcodes [Rows: 2652, Columns: 8]
    icustays [Rows: 1312, Columns: 12]
    inputevents_cv [Rows: 389220, Columns: 22]
    inputevents_mv [Rows: 76656, Columns: 31]
    labevents [Rows: 635846, Columns: 9]
    microbiologyevents [Rows: 13020, Columns: 16]
    outputevents [Rows: 91378, Columns: 13]
    patients [Rows: 1000, Columns: 8]
    prescriptions [Rows: 91742, Columns: 19]
    procedureevents_mv [Rows: 5204, Columns: 25]
    procedures_icd [Rows: 5362, Columns: 5]
    services [Rows: 1579, Columns: 6]
    transfers [Rows: 5764, Columns: 13]
  Relationships:
    callout.hadm_id -> admissions.hadm_id
    chartevents.hadm_id -> admissions.hadm_id
    cptevents.hadm_id -> admissions.hadm_id
    datet

The meaning of the entityset is represented by "entities" and "relationships". We can think of the entities being the tables of the data, and the relationship represents how each table is related to another and using what variable. As an example, we can see that `labevents.hadm_id -> admissions.hadm_id` which means that the column in `hadm_id` in `labelevents` references an admission instance through `hadm_id` in `admissions`.

In [3]:
with open('./mimic_entityset.pkl', 'wb') as file:
    pickle.dump(entityset, file)