# ICUsICS DB tutorial

ICUsICS is an anonymized database built from the data stored into the Clinical Information System (CIS) database of 6 Intensive Care Units (ICUs) from the Catalan Institute of Health (ICS). Actually, it is a database of databases, because each ICU belongs to a different hospital and each CIS presents its own particularities. However, the tables architecture of each database is identical between hospitals, which facilitates data search and extraction.

ICUsICS is not hosted as a database 'per se', but as a directory of folders (tables) with csv files inside (registries). Inside icuics-db, there are 6 folders, 1 for each hospital, and inside them there are 10 folders:  

patients: it contains patient-level info (id, hospital, demografics and admission and discharge time and wards)  
variables_ref: it contains info of the variables present in the database (id, hospital, name, type). Key info: vartype 1, 2, 4 and 8 mean v_monitored, v_labres, v_observed and v_derived respectivelly (the tables which the variable is stored). Key info: datatype 0, 1 and 2 mean numeric, categoric and checkbox respectivelly.  
v_monitored: table with registries for vartype 1 variables  
v_labres: table with registries for vartype 2 variables  
v_observed: table with registries for vartype 4 variables  
v_derived: table with registries for vartype 8 variables  
drugs_ref: it contains info of the drugs present in the database (id, hospital, name, formunit, unit, etc.)  
drugs: table with registries for drugs  
diagnoses: table with the diagnoses  
insertions: table with the insertions  

In [2]:
import pandas as pd
import os
from sagemaker import get_execution_role
role = get_execution_role()

In [8]:
bucket='icusics-db-demo'
file_key = 'h3_db/drugs_ref/h3_drugs_ref.csv'
s3uri = 's3://{}/{}'.format(bucket, file_key)
df = pd.read_csv(s3uri)
df.head()

Unnamed: 0,a_pharmaid,hospital_coded,pharmaname,pharmagroupname,pharmaformunit,pharmadoseunit,pharmadoseformratio,pharmavolumeunit,pharmavolumeformratio
0,3000000004,3,COMPLEX DE PROTROMBINA 600 UI (F. IX) vi,.Antihemorràgics,vial,UI,600.0,ml,0.0
1,3000000005,3,OMEPRAZOL 40 mg vial,.Antiàcids,vial,mg,40.0,ml,0.0
2,3000000021,3,HEPARINA SODICA 5.000 UI/5 ml (1%) vial,ANTITROMBOTICS,ml,UI,1000.0,ml,1.0
3,3000000025,3,FOSFAT MONOPOTASSIC 10 mmol P/10 ml (1 M,.Electròlits,ml,mmol,1.0,ml,1.0
4,3000000049,3,FUROSEMIDA 250 mg/25 ml amp,DIURETICS,ml,mg,10.0,ml,1.0


In [6]:
df['vartype'].value_counts()

4    5023
2     424
1     328
8     218
Name: vartype, dtype: int64

In [7]:
df['datatype'].value_counts()

1    4597
0     974
2     422
Name: datatype, dtype: int64

In [None]:
%%time

h = '3'
f = f'h{h}_db/v_derived'

folder = os.listdir(f)
files = [match for match in folder if f"h{h}_" in match]

vmi = pd.DataFrame()
chunk_index=1

for file in files:
    
    chunk = pd.read_csv(f'{f}/{file}')
    vmi_chunck = chunk[(chunk['a_variableid']==3015002262) & (chunk['value']==12)]
    vmi = pd.concat([vmi, vmi_chunck])