# ICUsICS DB tutorial

ICUsICS is an anonymized database built from the data stored into the Clinical Information System (CIS) database of 6 Intensive Care Units (ICUs) from the Catalan Institute of Health (ICS). Actually, it is a database of databases, because each ICU belongs to a different hospital and each CIS presents its own particularities.

ICUsICS is not hosted as a database 'per se', but as a directory of folders (tables) with parquet files inside (registries). Together with this tutorial there is a .png image (db_map.png) that shows the tables relation and fields names and types. This will be extremely helpful to search, fetch and merge information.

As you can see in the map, there are a total of 13 tables:

patients, d_variables and d_pharma are very small tables. patients contain some (few due to the anonymozed process) patient-level info and d_variables and d_pharma are dictionaries that contain all the metadata info needed to search the data (names, descriptions, table where data is stored, etc.).

diagnoses, pharma_orders, pharma_records, labresults_numeric, observation_numeric, observation_flagged, observation_categoric, monitored_categoric are medium-size tables, which mean they are splitted by hospital in 6 chunks (the parquet files inside contain the suffix \_h1, \_h2, \_h3, \_h4, \_h5, or \_h6). This will be useful to avoid reading registries of hospitals you don't want to include in your dataset, optimizing queries time.

monitored_numeric and derived_numeric are very large tables, which mean they are splitted by groups of patients in 600 chunks (100 chunks per hospital). The chunks have as suffix the first and last a_patientid in heach chunk (example: 101_109.parquet). This will be useful to avoid reading registries of patients you don't want to include in your dataset, optimizing queries time.

This tutorial contains demo code to build a dataset using data of hospital 3.

#### First, install fastparquet and awswrangler

In [2]:
!pip install fastparquet

Keyring is skipped due to an exception: 'keyring.backends'
Collecting fastparquet
  Using cached fastparquet-0.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB)
Collecting cramjam>=2.3.0
  Using cached cramjam-2.6.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
Installing collected packages: cramjam, fastparquet
Successfully installed cramjam-2.6.2 fastparquet-0.8.1
[0mNote: you may need to restart the kernel to use updated packages.


In [13]:
!pip install awswrangler

Keyring is skipped due to an exception: 'keyring.backends'
[0m

#### Then, import packages

In [17]:
import awswrangler as wr
import boto3
import pandas as pd
import numpy as np
import re
from sagemaker import get_execution_role
role = get_execution_role()
s3 = boto3.resource('s3')

## Before starting the creating cohort example, let's take a look to the patients table to get a general idea of the number of patients x hospital, mortality and lenght of stay distribution

In [18]:
%%time

icusics_db_patients = wr.s3.read_parquet(path="s3://icusics-db/patients/patients.parquet")

CPU times: user 75.5 ms, sys: 9.61 ms, total: 85.1 ms
Wall time: 235 ms


In [19]:
icusics_db_patients.head()

Unnamed: 0,hospital_coded,a_patientid,patientsex,age,height,weight,bmi,hospadmtime,distime,hospdistime,hospital_outcome
0,1,1000091,M,70,160,50,20,-549,6034,24482,ALIVE
1,1,1000109,M,50,160,80,31,-12516,10118,32875,ALIVE
2,1,1000211,M,70,160,60,23,-13013,2495,2495,EXITUS
3,1,1000999,F,30,150,60,27,-1731,8597,21402,ALIVE
4,1,1001000,F,50,160,80,31,0,31713,31830,EXITUS


#### Check if the database is correctly k5 anonymized

In [20]:
print('minimum number of patients grouped by indirect identifiers:', icusics_db_patients.groupby(
    ['hospital_coded','patientsex','age',
    'height','weight','bmi','hospital_outcome']).agg({'a_patientid':'nunique'})['a_patientid'].min())

minimum number of patients grouped by indirect identifiers: 5


#### Nº of patients in the database

In [21]:
print('number of patinets in the database:',icusics_db_patients['a_patientid'].nunique())
print('number of patients per hospital:')
icusics_db_patients.groupby('hospital_coded', as_index=False).agg({'a_patientid':'nunique'})

number of patinets in the database: 21139
number of patients per hospital:


Unnamed: 0,hospital_coded,a_patientid
0,1,4519
1,2,4653
2,3,3815
3,4,3949
4,5,2025
5,6,2178


#### Hospital mortality (note that this is not the real mortality of the hospital, but the mortality of the people who has been included in its anonymized version)

In [22]:
icusics_db_patients.groupby('hospital_coded')['hospital_outcome'].value_counts(normalize=True)

hospital_coded  hospital_outcome
1               ALIVE               0.738659
                EXITUS              0.261341
2               ALIVE               0.696970
                EXITUS              0.303030
3               ALIVE               0.747575
                EXITUS              0.252425
4               ALIVE               0.787035
                EXITUS              0.212965
5               ALIVE               0.881481
                EXITUS              0.118519
6               ALIVE               0.831956
                EXITUS              0.168044
Name: hospital_outcome, dtype: float64

#### ICU Length of stay (ICU LOS)

In [23]:
icusics_db_patients['distime_in_days'] = round(icusics_db_patients['distime']/1440,1)

In [24]:
print('SPOILER ALERT! you will see how even if the numbers are consistent enough, there are some \
impossible things such as cases with negative distimes. As happen with every database \
containing manually registered fields, this database can, in some cases, have erroneous times and values \
(the raw reality when working with real-world data)')
icusics_db_patients.groupby('hospital_coded')['distime_in_days'].describe()

SPOILER ALERT! you will see how even if the numbers are consistent enough, there are some impossible things such as cases with negative distimes. As happen with every database containing manually registered fields, this database can, in some cases, have erroneous times and values (the raw reality when working with real-world data)


Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
hospital_coded,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,4519.0,6.963023,12.010618,0.0,1.0,2.2,6.9,129.8
2,4653.0,7.049581,10.572153,-0.9,1.8,3.0,7.1,118.0
3,3815.0,6.600813,12.672646,0.0,1.6,2.7,5.8,192.8
4,3949.0,7.161104,12.900796,0.0,1.6,2.7,6.6,307.6
5,2025.0,4.127457,7.196876,0.0,1.0,1.9,3.7,70.1
6,2178.0,5.508586,8.961284,0.0,1.1,2.2,5.3,89.5


In [25]:
print('fortunately, these errors are few:')
round(icusics_db_patients[icusics_db_patients['distime']<0].shape[0]/icusics_db_patients.shape[0]*100,2)

fortunately, these errors are few:


0.01

### Example: Creating a cohort of patients in hospital 3 with:  
1- ICU_LOS > 2 days  
2- Primary diagnose of pneumonia (any type)   
3- Invasive Mechanical Ventilation (IMV)  
4- APACHE II > 20  
5- Lactate (arterial) > 2mmol/L at first ICU day  
6- Sedative Drugs (VAD)  

### 1- ICU_LOS > 2 days

In [26]:
%%time

patients = icusics_db_patients[icusics_db_patients['hospital_coded']==3]

CPU times: user 4.04 ms, sys: 0 ns, total: 4.04 ms
Wall time: 6.22 ms


In [27]:
patients.head()

Unnamed: 0,hospital_coded,a_patientid,patientsex,age,height,weight,bmi,hospadmtime,distime,hospdistime,hospital_outcome,distime_in_days
9172,3,3000004,M,40,180,120,37,-3,10309,17603,ALIVE,7.2
9173,3,3000184,F,80,160,80,31,-4505,3772,15073,ALIVE,2.6
9174,3,3000446,F,60,150,70,31,-381,9245,19099,ALIVE,6.4
9175,3,3000658,F,60,160,110,43,-1411,29873,29873,ALIVE,20.7
9176,3,3000852,F,40,160,90,35,-461,12215,23426,ALIVE,8.5


In [28]:
print(patients.shape)
print(patients['a_patientid'].nunique())

(3815, 12)
3815


In [29]:
los2d = patients[patients['distime']>2880].sort_values('a_patientid').reset_index(drop=True) # All times in the database are integers that represent the minuts from icu admission time

In [30]:
print(los2d['a_patientid'].nunique())
print(round(los2d['a_patientid'].nunique()/patients['a_patientid'].nunique()*100,2))

2312
60.6


### 2- Primary diagnose of pneumonia (any type)

In [31]:
diags = wr.s3.read_parquet(path="s3://icusics-db/diagnoses/diagnoses_h3.parquet")

In [32]:
diags.head()

Unnamed: 0,a_patientid,diag_type,referencecode,referencecodename
0,3000004,primary,F14.12,Abús de cocaïna amb intoxicació
1,3000004,secondary,F14.1,Abús de cocaïna
2,3000004,secondary,R45.1,Agitació i agitació psicomotora
3,3000004,secondary,D68,Altres tipus de defecte de la coagulació
4,3000004,secondary,D69.59,Altres tipus de trombocitopènia secundària


In [33]:
print(diags.shape)
print(diags['a_patientid'].nunique())

(27774, 4)
3815


In [34]:
pd_pneumo_patlist = tuple(set(diags[(diags['diag_type']=='primary') & (diags['referencecodename'].str.contains('pneum', case=False))]['a_patientid']))

In [35]:
los2d_pneumopd = los2d[los2d['a_patientid'].isin(pd_pneumo_patlist)].sort_values('a_patientid').reset_index(drop=True)

In [36]:
print(los2d_pneumopd['a_patientid'].nunique())
print(round(los2d_pneumopd['a_patientid'].nunique()/patients['a_patientid'].nunique()*100,2))

228
5.98


### 3- Invasive Mechanical Ventilation (IMV)  

#### First, import variables_ref table to look for the variable code

In [38]:
d_variables = wr.s3.read_parquet(path="s3://icusics-db/d_variables/d_variables.parquet")

In [39]:
d_variables.head()

Unnamed: 0,hospital_coded,table,a_variableid,choicecode,choicestringvalue,name,abbreviation,description
0,1,derived_numeric,1030000100,,,Chronic health evaluation,CHE,In APACHE II and A2
1,1,derived_numeric,1030000114,,,Daily worst APS,APS,Acute physiology score
2,1,derived_numeric,1030000140,,,Highest 24 h APACHE II,APACHE II,Automatically calculated APACHE II score
3,1,derived_numeric,1030000145,,,risk (R) of hospital death,R(APACHE II),
4,1,derived_numeric,1030000160,,,Major 24 h SAPS II,SAPS II,Càlcul automàtic SAPS II


In [40]:
# define key characters (remember that strings in this db can be in english, catalan or spanish language) to start a blind search

key_chars = 'vent|mec|inv'

imv_result_dummy = d_variables[(d_variables['hospital_coded']==3) & (
    (d_variables['name'].str.contains(key_chars, case=False, na=False)) | (
    d_variables['description'].str.contains(key_chars, case=False, na=False)) | (
    d_variables['choicestringvalue'].str.contains(key_chars, case=False, na=False)))]

print(imv_result_dummy.shape)

print("To many results, so you decide to ask to the mentors and they say that for that hospital, this feature is an observed_categoric \
feature named 'Teràpia real O2' with the choicestringcode 'Vent Mecànica'")

(335, 8)
To many results, so you decide to ask to the mentors and they say that for that hospital, this feature is an observed_categoric feature named 'Teràpia real O2' with the choicestringcode 'Vent Mecànica'


In [41]:
imv_result_dummy.head()

Unnamed: 0,hospital_coded,table,a_variableid,choicecode,choicestringvalue,name,abbreviation,description
9885,3,derived_numeric,3030001400,,,LVSW_left ventric stroke work,LVSW,"= SV x ARTmean x 0.0136 ; SV ml, ARTmean mmH..."
9886,3,derived_numeric,3030001410,,,LVSWi_left vent stroke w index,LVSWi,"= SI x ARTmean x 0.0136 ; SI ml/m², ARTmean ..."
9887,3,derived_numeric,3030001500,,,RVSW_right ventric stroke work,RVSW,"= SV x PAPmean x 0.0136 ; SV ml, PAPmean mmH..."
9888,3,derived_numeric,3030001510,,,RVSW_Right vent stroke w index,RVSWi,"= SVI x PAPmean x 0.0136 ; SVI ml/m², PAPmea..."
10461,3,monitored_categoric,3000003812,1.0,IPPV,Evita 2_4 ventilation mode,Evita24 Mode,Draeger Evita 2 dura and Evita 4 ventilation mode


In [42]:
imv_result = imv_result_dummy[(imv_result_dummy['table']=='observed_categoric') & (imv_result_dummy['name'].str.contains('Teràpia real O2', case=False, na=False)) & (
    imv_result_dummy['choicestringvalue'].str.contains('Vent Mecànica', case=False, na=False))]

print('So you finally get your result:')
imv_result

So you finally get your result:


Unnamed: 0,hospital_coded,table,a_variableid,choicecode,choicestringvalue,name,abbreviation,description
12241,3,observed_categoric,3015002262,12.0,Vent Mecànica,Teràpia real O2,O2 Teràpia,DI 21.CONTROL RESPIRATORI. Variable utilitzada...


#### Get patients with IMV registries for those who have accomplished with the inclusion criteria up to now

In [43]:
%%time

moncat = wr.s3.read_parquet(path="s3://icusics-db/observed_categoric/observed_categoric_h3.parquet")

CPU times: user 481 ms, sys: 653 ms, total: 1.13 s
Wall time: 1.64 s


In [44]:
moncat.head()

Unnamed: 0,a_patientid,a_variableid,time,choicecode,h
0,3000004,3010000100,1019,6.0,3
1,3000004,3010000100,2879,6.0,3
2,3000004,3010000100,2999,6.0,3
3,3000004,3010000100,3079,6.0,3
4,3000004,3010000100,3140,6.0,3


In [45]:
%%time

imv_patlist = tuple(set(moncat[(moncat['a_variableid']==3015002262) & (moncat['choicecode']==12)]['a_patientid']))

CPU times: user 74.1 ms, sys: 102 ms, total: 176 ms
Wall time: 289 ms


In [46]:
los2d_pneumopd_imv = los2d_pneumopd[los2d_pneumopd['a_patientid'].isin(imv_patlist)].sort_values('a_patientid').reset_index(drop=True)

In [47]:
print(los2d_pneumopd_imv['a_patientid'].nunique())
print(round(los2d_pneumopd_imv['a_patientid'].nunique()/patients['a_patientid'].nunique()*100,2))

131
3.43


### 4- APACHE II > 20 

In [48]:
# define key characters (remember that strings in this db can be in english, catalan or spanish language) to start a blind search

key_chars = 'apache'

apache_result_dummy = d_variables[(d_variables['hospital_coded']==3) & (
    (d_variables['name'].str.contains(key_chars, case=False, na=False)) | (
    d_variables['description'].str.contains(key_chars, case=False, na=False)) | (
    d_variables['choicestringvalue'].str.contains(key_chars, case=False, na=False)))]

print(apache_result_dummy.shape)

print("To many results, so you decide to ask to the mentors and they say that for that hospital, this feature is a derived_numeric \
feature named 'APACHE 2 validado'")

(60, 8)
To many results, so you decide to ask to the mentors and they say that for that hospital, this feature is a derived_numeric feature named 'APACHE 2 validado'


In [49]:
apache_result = apache_result_dummy[(apache_result_dummy['table']=='derived_numeric') & (apache_result_dummy['name'].str.contains('apache 2 validado', case=False, na=False))]

print('So you finally get your result:')
apache_result

So you finally get your result:


Unnamed: 0,hospital_coded,table,a_variableid,choicecode,choicestringvalue,name,abbreviation,description
9870,3,derived_numeric,3030000350,,,APACHE 2 validado,APACHE 2 man,Validated APACHE II score


#### Get patients with an APACHE II higher of 20 for those who have accomplished with the inclusion criteria up to now

In [50]:
%%time

bucket = s3.Bucket('icusics-db')
apacheII_20_patlist = tuple()

for my_bucket_object in bucket.objects.all():
    
    if all(x in my_bucket_object.key for x in ['derived_numeric_']):
        
        boundaries = re.findall('\d+', my_bucket_object.key)
        ub = boundaries.pop(1)
        lb = boundaries.pop()
        for patid in sorted(list(set(los2d_pneumopd_imv['a_patientid'].astype(str)))):
            if patid>lb and patid<ub:
                chunk = wr.s3.read_parquet(path=f"s3://icusics-db/{my_bucket_object.key}")
                apacheII_20_patlist_chunk = tuple(set(chunk[(chunk['a_variableid']==3030000350) & (chunk['value']>20) & (
                    chunk['a_patientid'].isin(tuple(set(los2d_pneumopd_imv['a_patientid']))))]['a_patientid']))
                apacheII_20_patlist = apacheII_20_patlist + apacheII_20_patlist_chunk
                
apacheII_20_patlist_unique = tuple(set(apacheII_20_patlist))

CPU times: user 20.2 s, sys: 4.99 s, total: 25.2 s
Wall time: 54.6 s


In [51]:
len(apacheII_20_patlist_unique)

60

In [52]:
los2d_pneumopd_imv_apacheII20 = los2d_pneumopd_imv[los2d_pneumopd_imv['a_patientid'].isin(apacheII_20_patlist_unique)].sort_values('a_patientid').reset_index(drop=True)

In [53]:
print(los2d_pneumopd_imv_apacheII20['a_patientid'].nunique())
print(round(los2d_pneumopd_imv_apacheII20['a_patientid'].nunique()/patients['a_patientid'].nunique()*100,2))

60
1.57


### 5- Lactate (arterial) > 2mmol/L at first ICU day 

In [54]:
# define key characters (remember that strings in this db can be in english, catalan or spanish language) to start a blind search

key_chars = 'lactat'

lactate_result_dummy = d_variables[(d_variables['hospital_coded']==3) & (
    (d_variables['name'].str.contains(key_chars, case=False, na=False)) | (
    d_variables['description'].str.contains(key_chars, case=False, na=False)) | (
    d_variables['choicestringvalue'].str.contains(key_chars, case=False, na=False)))]

print(lactate_result_dummy.shape)

print("To many results, so you decide to ask to the mentors and they say that for that hospital, this feature is a labresult_numeric \
feature that contains 'GSA' in its abbreviation")

(7, 8)
To many results, so you decide to ask to the mentors and they say that for that hospital, this feature is a labresult_numeric feature that contains 'GSA' in its abbreviation


In [55]:
lactate_result = lactate_result_dummy[(lactate_result_dummy['table']=='labresults_numeric') & (lactate_result_dummy['name'].str.contains('lactat', case=False, na=False)) & (
    lactate_result_dummy['abbreviation'].str.contains('GSA', case=False, na=False))]

print('So you finally get your result:')
lactate_result

So you finally get your result:


Unnamed: 0,hospital_coded,table,a_variableid,choicecode,choicestringvalue,name,abbreviation,description
10105,3,labresults_numeric,3024000658,,,Lactat art GSA,Lactat a GSA,
10141,3,labresults_numeric,3024000704,,,aSan-Lactat,Lactat _GSA,


#### Get patients with an arterial lactate higher than 2mmol/L during the first ICU day for those who have accomplished with the inclusion criteria

In [58]:
%%time

labres = wr.s3.read_parquet(path='s3://icusics-db/labresults_numeric/labresults_numeric_h3.parquet')

CPU times: user 129 ms, sys: 49.2 ms, total: 178 ms
Wall time: 471 ms


In [59]:
labres.head()

Unnamed: 0,a_patientid,a_variableid,time,value
0,3000446,3020000100,-13,30.2
1,3000446,3020000100,74,31.2
2,3000446,3020000100,1642,27.5
3,3000446,3020000100,3312,31.1
4,3000446,3020000100,4343,28.2


In [60]:
%%time

lac2_fd_patlist = tuple(set(labres[(labres['a_variableid'].isin([3024000658,3024000704])) & (labres['value']>2) & (labres['time']<1440) & (
    labres['a_patientid'].isin(tuple(set(los2d_pneumopd_imv_apacheII20['a_patientid']))))]['a_patientid']))

CPU times: user 26.5 ms, sys: 3.32 ms, total: 29.9 ms
Wall time: 28.8 ms


In [61]:
len(lac2_fd_patlist)

26

In [62]:
los2d_pneumopd_imv_apacheII20_lac2fd = los2d_pneumopd_imv_apacheII20[los2d_pneumopd_imv_apacheII20['a_patientid'].isin(lac2_fd_patlist)].sort_values(
    'a_patientid').reset_index(drop=True)

In [63]:
print(los2d_pneumopd_imv_apacheII20_lac2fd['a_patientid'].nunique())
print(round(los2d_pneumopd_imv_apacheII20_lac2fd['a_patientid'].nunique()/patients['a_patientid'].nunique()*100,2))

26
0.68


### 6- Sedative Drugs (VAD)

In [64]:
d_pharma = wr.s3.read_parquet(path='s3://icusics-db/d_pharma/d_pharma.parquet')

In [65]:
d_pharma.head()

Unnamed: 0,hospital_coded,a_pharmaid,pharmaname,pharmagroupname,pharmaformunit,pharmadoseunit,pharmadoseformratio,pharmavolumeunit,pharmavolumeformratio
0,1,1001000255,ABACAVIR 300 MG COMP,,comp,mg,300.0,ml,0.0
1,1,1001000256,AIGUA,Nutrició Enteral,ml,ml,1.0,ml,1.0
2,1,1001000258,BICARBONAT SODIC 1/6M,Serumteràpia,ml,ml,1.0,ml,1.0
3,1,1001000259,BICARBONAT SODIC 1M,Serumteràpia,ml,ml,1.0,ml,1.0
4,1,1001000275,GELATINA 3%,Serumteràpia,ml,ml,1.0,ml,1.0


In [66]:
# define key characters (remember that strings in this db can be in english, catalan or spanish language) to start a blind search

key_chars = 'sed'

sed_results_dummy = d_pharma[(d_pharma['hospital_coded']==3) & (d_pharma['pharmagroupname'].str.contains(key_chars, case=False, na=False))]

print(sed_results_dummy.shape)

print("We have 39 drugs in the group of sedatives, so we save all them in a tuple and search for them in drugs table'")

sedatives_ids = tuple(set(sed_results_dummy['a_pharmaid']))

(39, 9)
We have 39 drugs in the group of sedatives, so we save all them in a tuple and search for them in drugs table'


In [67]:
%%time

pharma_records = wr.s3.read_parquet(path='s3://icusics-db/pharma_records/pharma_records_h3.parquet')

CPU times: user 1.66 s, sys: 2.27 s, total: 3.92 s
Wall time: 5.27 s


In [68]:
pharma_records.head()

Unnamed: 0,a_patientid,ordernumber,a_pharmaid,time,givendose,routename
0,3001014,804140,3000000005,643,40.0,PERF IV
1,3001014,804140,3000002034,643,50.0,PERF IV
2,3001014,804141,3000000446,233,1.0,PERF IV
3,3001014,804141,3000000446,677,1.0,PERF IV
4,3001014,804141,3000000446,1193,1.0,PERF IV


In [69]:
sed_patlist = tuple(set(pharma_records[(pharma_records['a_patientid'].isin(tuple(set(los2d_pneumopd_imv_apacheII20_lac2fd['a_patientid'])))) & (
    pharma_records['a_pharmaid'].isin(sedatives_ids))]['a_patientid']))

In [70]:
los2d_pneumopd_imv_apacheII20_lac2fd_sedatives = los2d_pneumopd_imv_apacheII20_lac2fd[los2d_pneumopd_imv_apacheII20_lac2fd['a_patientid'].isin(sed_patlist)].sort_values(
    'a_patientid').reset_index(drop=True)

In [71]:
print(los2d_pneumopd_imv_apacheII20_lac2fd_sedatives['a_patientid'].nunique())
print(round(los2d_pneumopd_imv_apacheII20_lac2fd_sedatives['a_patientid'].nunique()/patients['a_patientid'].nunique()*100,2))

26
0.68


In [72]:
print('patients in ICUSICS DB for hospital 3',
      patients['a_patientid'].nunique())
print('from those, with ICU LOS > 2 days:',
      los2d['a_patientid'].nunique())
print('from those, with pneumonia as pd:',
      los2d_pneumopd['a_patientid'].nunique())
print('from those, with imv:',
      los2d_pneumopd_imv['a_patientid'].nunique())
print('from those, with an apache2 > 20:',
      los2d_pneumopd_imv_apacheII20['a_patientid'].nunique())
print('from those, with lactate >2mmol/L at day 1:',
      los2d_pneumopd_imv_apacheII20_lac2fd['a_patientid'].nunique())
print('from those, with sedatives:',
      los2d_pneumopd_imv_apacheII20_lac2fd_sedatives['a_patientid'].nunique())

patients in ICUSICS DB for hospital 3 3815
from those, with ICU LOS > 2 days: 2312
from those, with pneumonia as pd: 228
from those, with imv: 131
from those, with an apache2 > 20: 60
from those, with lactate >2mmol/L at day 1: 26
from those, with sedatives: 26


We see how only 26 patients fullfilled the inclusion criteria of this example. The objective was to explore all tables, not to obtain a real dataset with clinical criteria

In [73]:
los2d_pneumopd_imv_apacheII20_lac2fd_sedatives

Unnamed: 0,hospital_coded,a_patientid,patientsex,age,height,weight,bmi,hospadmtime,distime,hospdistime,hospital_outcome,distime_in_days
0,3,3006618,M,80,170,90,31,-7221,20926,20926,EXITUS,14.5
1,3,3010785,M,70,170,100,35,0,67191,67304,EXITUS,46.7
2,3,3189352,M,40,180,100,31,0,4516,14757,ALIVE,3.1
3,3,3295500,M,30,180,110,34,-1,46245,74523,ALIVE,32.1
4,3,3306000,M,60,170,120,42,-5,164080,169641,ALIVE,113.9
5,3,3319369,M,70,160,60,23,-10,8083,8093,EXITUS,5.6
6,3,3324322,F,60,160,100,39,-247,18387,27041,ALIVE,12.8
7,3,3433306,F,60,150,60,27,-12,3892,32921,ALIVE,2.7
8,3,3489340,F,80,150,70,31,-7,21346,58606,ALIVE,14.8
9,3,3509270,F,80,160,90,35,-1637,6846,47409,ALIVE,4.8


In [74]:
los2d_pneumopd_imv_apacheII20_lac2fd_sedatives['hospital_outcome'].value_counts(normalize=True)

EXITUS    0.653846
ALIVE     0.346154
Name: hospital_outcome, dtype: Float64