# EXPLORING THE RELATIONSHIP BETWEEN CHART EVENTS AND TIME TO ICU STAY
Note: this dataset is open-source, but restricted access. You must request access via https://mimic.physionet.org/gettingstarted/access/. I downloaded the dataset onto my personal computer and ran the analyses locally. Given HIPAA and privacy considerations, I will only show summary plots from data in the database (no patient-specific information will be displayed.

This notebook utilizes previously analyzed data that looks at chart events (e.g., vital signs, lab results, etc.) that occurred between hospital admission and ICU stay to test whether certain events/measurements may predict "imminent" (e.g., <=1 day) ICU stays.<br>

Briefly, data from the PATIENTS, ADMISSIONS, ICUSTAYS, PRESCRIPTIONS, and CHARTEVENTS databases are merged based on subject ID, hospital admission ID, and ICU stay (only drugs prescribed and chart events recorded after hospital admission time and before ICU stay were included in the dataframe). Finally, 
# NEED TO COMPLETE 
<br>

The code to perform these analyses can be found on my github page (https://github.com/adamgiffordphd/imminent_icu_stays). The code includes functionality to parallel process the analysis to get through all ~330M rows in CHARTEVENTS.csv. This code was run on a private server with 40 processors.

In [1]:
import pickle
import glob

In [4]:
# there are ~3300 pickle files that contain the data that is described above
# this cell finds the pickle files in the saved data directory
# drugs.pickle is a list of all unique drugs in the dataset
pckl_files = glob.glob("pickle/20200811/*.pickle")

In [26]:
df = pickle.load(open(pckl_files[1],'rb'))

In [27]:
df.sample(5)

Unnamed: 0,SUBJECT_ID,HADM_ID,ICUSTAY_ID,ITEMID,CHARTTIME,VALUE,VALUENUM,VALUEUOM,GENDER,DOB,...,PROD_STRENGTH,DOSE_VAL_RX,DOSE_UNIT_RX,FORM_VAL_DISP,FORM_UNIT_DISP,ROUTE,DAYS_ADM_TO_DRUG,DAYS_DRUG_BEFORE_ICU,SAMEDAY_ADM_TO_ICU,DAYS_CHRT_TO_ICU
589028,19316,198168,257555,924,2132-11-05 02:27:00,,,,M,2072-03-29,...,100mg/2mL Vial,100,mg,1,VIAL,IV,-0.066667,0.10265,1,0.000567
128895,19185,103196,281157,781,2195-12-03 03:11:00,11,11.0,,M,2118-05-13,...,1000mg/100mL Vial,1000,mg,1,VIAL,IV DRIP,2.416667,0.161563,0,0.028924
129784,19185,103196,281157,1536,2195-12-03 03:11:00,139,139.0,,M,2118-05-13,...,500mg Premix Bag,500,mg,1,BAG,IV,2.416667,0.161563,0,0.028924
589074,19316,198168,257555,926,2132-11-05 02:27:00,RCA,,,M,2072-03-29,...,1g Frozen Bag,1,gm,1,BAG,IV,-0.066667,0.10265,1,0.000567
128881,19185,103196,281157,781,2195-12-03 03:11:00,11,11.0,,M,2118-05-13,...,1000 mL Bag,1000,ml,1,BAG,IV,2.416667,0.161563,0,0.028924


In [43]:
df.reset_index(inplace=True)
imm_idx = df[df['DAYS_CHRT_TO_ICU']<=1].index
imm_idx

Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            4983, 4984, 4985, 4986, 4987, 4988, 4989, 4990, 4991, 4992],
           dtype='int64', length=4993)

In [44]:
df.shape

(4993, 48)

In [45]:
df2 = df.iloc[imm_idx]
df2.head()

Unnamed: 0,index,SUBJECT_ID,HADM_ID,ICUSTAY_ID,ITEMID,CHARTTIME,VALUE,VALUENUM,VALUEUOM,GENDER,...,PROD_STRENGTH,DOSE_VAL_RX,DOSE_UNIT_RX,FORM_VAL_DISP,FORM_UNIT_DISP,ROUTE,DAYS_ADM_TO_DRUG,DAYS_DRUG_BEFORE_ICU,SAMEDAY_ADM_TO_ICU,DAYS_CHRT_TO_ICU
0,48,19246,124035,203260,781,2128-09-04,75,75.0,,F,...,"25,000 unit Premix Bag",25000,UNIT,1,BAG,IV,-0.059722,0.097303,1,0.097303
1,49,19246,124035,203260,781,2128-09-04,75,75.0,,F,...,250mg Tab,250,mg,1,TAB,PO,-0.059722,0.097303,1,0.097303
2,50,19246,124035,203260,781,2128-09-04,75,75.0,,F,...,HEPARIN BASE,250,ml,250,ml,IV,-0.059722,0.097303,1,0.097303
3,51,19246,124035,203260,781,2128-09-04,75,75.0,,F,...,80MG TAB,80,mg,1,TAB,PO,-0.059722,0.097303,1,0.097303
4,52,19246,124035,203260,781,2128-09-04,75,75.0,,F,...,"25,000 unit Premix Bag",25000,UNIT,1,BAG,IV,-0.059722,0.097303,1,0.097303


In [46]:
df2.shape

(4993, 48)

In [28]:
from numpy import nansum, nanmean, unique
df_bySubjAdICU = df.groupby(['SUBJECT_ID','HADM_ID','ICUSTAY_ID']).agg({'ITEMID': [list]})
                                                                        #'DAYS_ADM_TO_ICU': [nanmean],'DAYS_ADM_TO_DRUG': [nanmean],'DAYS_DRUG_BEFORE_ICU': [nanmean]})

In [29]:
df_bySubjAdICU.columns

MultiIndex([('ITEMID', 'list')],
           )

In [30]:
df_bySubjAdICU[('ITEMID','list')] = df_bySubjAdICU[('ITEMID','list')].apply(lambda x: unique(x))
df_bySubjAdICU

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,ITEMID
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,list
SUBJECT_ID,HADM_ID,ICUSTAY_ID,Unnamed: 3_level_2
19185,103196,281157,"[772, 781, 786, 787, 788, 791, 811, 813, 814, ..."
19246,100942,212853,"[742, 1125]"
19246,103522,239485,"[27, 31, 32, 54, 70, 71, 72, 77, 80, 82, 83, 8..."
19246,124035,203260,"[781, 787, 788, 791, 811, 813, 814, 815, 824, ..."
19246,129654,274628,"[824, 828, 829, 837, 861, 1127, 1162, 1286, 15..."
19246,150429,283770,"[781, 784, 786]"
19310,157811,250035,"[27, 31, 32, 39, 40, 50, 52, 54, 69, 80, 82, 8..."
19316,198168,257555,"[916, 917, 919, 920, 924, 925, 926, 927, 930, ..."


In [37]:
df_bySubjAdICUCHRT = df.groupby(['SUBJECT_ID','HADM_ID','ICUSTAY_ID','ITEMID']).agg({'SAMEDAY_ADM_TO_ICU': [nanmean],'DAYS_CHRT_TO_ICU': [nanmean] })
df_bySubjAdICUCHRT.head(50)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,SAMEDAY_ADM_TO_ICU,DAYS_CHRT_TO_ICU
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,nanmean,nanmean
SUBJECT_ID,HADM_ID,ICUSTAY_ID,ITEMID,Unnamed: 4_level_2,Unnamed: 5_level_2
19185,103196,281157,772,0,0.028924
19185,103196,281157,781,0,0.028924
19185,103196,281157,786,0,0.028924
19185,103196,281157,787,0,0.028924
19185,103196,281157,788,0,0.028924
19185,103196,281157,791,0,0.028924
19185,103196,281157,811,0,0.028924
19185,103196,281157,813,0,0.028924
19185,103196,281157,814,0,0.028924
19185,103196,281157,821,0,0.028924


In [6]:
# combine the data across pickle files
'''note: have to load and combine the data in batches because the resulting dataframe would be too
large. will do in batches of ~50, and compute running stats for visualization and assessment'''

for st_ix in range(0,len(pckl_files),50):
    if st_ix + 50 > len(pckl_files):
        en_ix = len(pckl_files)
    else:
        en_ix = st_ix + 50
        
    for f_ix in range(st_ix,en_ix):
        if f_ix==st_ix:
            df = pickle.load(open(pckl_files[f_ix],'rb'))
        else:
            tmp = pickle.load(open(pckl_files[f_ix],'rb'))
            df = df.append(tmp)
    if st_ix==0: 
        df_bySubjAdICU = df.groupby(['SUBJECT_ID','HADM_ID','ICUSTAY_ID']).agg({'ITEMID': [list]})
        df_bySubjAdICU[('ITEMID','list')] = df_bySubjAdICU[('ITEMID','list')].apply(lambda x: unique(x))
    

'pickle/20200811/2020_08_11_17_04_11_308426.pickle'