MIMIC_Sepsis
=================

# 1 Preparation

To run this document the following requirements must be satisfied:

- Implement the database mimic in **PostgreSQL** and start it. The instruction can be seen [here](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/buildmimic/postgres). (The name of this environment should be **mimiciv**)
- generate useful abstractions of raw MIMIC-IV data. The instruction be seen [here](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/concepts_postgres) 



To create an anaconda environment and install all the required libraries, uncomment following and run:

In [11]:
#!conda create --name mimiciv_sepsis python=3.11
#!conda activate mimiciv_sepsis
#!pip install -r requirements.txt

Run the following cell to connect to the database.

In [12]:
%load_ext autoreload
%autoreload 2

import psycopg2
from psycopg2 import sql
import csv
import pandas as pd
import numpy as np
import os
import shutil
import csv
from datetime import timedelta
from sklearn.impute import KNNImputer
from sklearn.neighbors import KNeighborsRegressor

# implement the username, password and database name
conn = psycopg2.connect(host='', user='', password='', database='mimiciv')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# 2 Extract selected data from the original database 

We extract the `state space` and `action space` respectively from the mimiciv database. The table `itemid_info/mimic4 itemid.csv` lists all the items required.

***Uncomment the following cell if you first time run the code***

In [13]:
# uncomment the this cell if you first time run the code

# Read the SQL file

try:
    with open('sql/select_patients_cohort.sql', 'r') as file0:
        sql_script_select_patients_cohort = file0.read()
        
    with open('sql/state_from_chartevents.sql', 'r') as file1:
        sql_script_state = file1.read()

    with open('sql/action_from_inputevents.sql', 'r') as file2:
        sql_script_action_from_inputevents = file2.read()

    with open('sql/action_from_vasopressors_equivalent_dose.sql', 'r') as file3:
        sql_script_action_from_vasopressors_equivalent_dose = file3.read()

    # Execute the SQL script and create the tables in schema mimiciv_derived_sepsis
    cursor = conn.cursor()
    
    cursor.execute(sql.SQL(sql_script_select_patients_cohort))
    print("mimiciv_derived_sepsis.sepsis_patients_cohort is created")

    cursor.execute(sql.SQL(sql_script_state))
    print("mimiciv_derived_sepsis.sepsis_state is created")

    cursor.execute(sql.SQL(sql_script_action_from_inputevents))
    print("mimiciv_derived_sepsis.sepsis_action_inputevents is created")

    cursor.execute(sql.SQL(sql_script_action_from_vasopressors_equivalent_dose))
    print("mimiciv_derived_sepsis.sepsis_action_vasopressors_equivalent_dose is created")

    conn.commit()
    cursor.close()
    
except (Exception, psycopg2.DatabaseError) as error:
    print("Error executing SQL statement:", error)

mimiciv_derived_sepsis.sepsis_patients_cohort is created
mimiciv_derived_sepsis.sepsis_state is created
mimiciv_derived_sepsis.sepsis_action_inputevents is created
mimiciv_derived_sepsis.sepsis_action_vasopressors_equivalent_dose is created


Get the number of stay_ids

In [14]:
with conn.cursor() as cursor:
    command = "SELECT distinct stay_id FROM mimiciv_derived_sepsis.sepsis_patients_cohort;"
    cursor.execute(command)   
    result = cursor.fetchall()
    stay_ids= [row[0] for row in result]
    num_stay_ids = len(stay_ids)
    print('Number of stay_ids: ' + str(num_stay_ids))
    cursor.close()

Number of stay_ids: 7404


# 3 Data transfer

## 3.1 Data transfer of State Space
We transfer the data of State Space from Postgresql to csv.

In [15]:
# output to /output/data/data_raw/state/{state_name}.csv
from python.data_preprocessing.data_transfer import data_transfer_state

itemid_list_state, label_state = data_transfer_state(conn, num_stay_ids, threshold = 1000)

output:Heartrate.csv                           	number of stay_id:7404
output:ABPs.csv                                	number of stay_id:2206
output:NBPs.csv                                	number of stay_id:7239
output:ABPd.csv                                	number of stay_id:2207
output:NBPd.csv                                	number of stay_id:7238
output:ABPm.csv                                	number of stay_id:2239


output:NBPm.csv                                	number of stay_id:7238
output:RespiratoryRate.csv                     	number of stay_id:7403
output:TemperatureF.csv                        	number of stay_id:7187
output:TemperatureC.csv                        	number of stay_id:726
output:PH_A.csv                                	number of stay_id:3934
output:PH_V.csv                                	number of stay_id:3118
output:ABE.csv                                 	number of stay_id:3915
output:Hematocrit_serum.csv                    	number of stay_id:7331
output:Hematocrit_wholeblood.csv               	number of stay_id:1124
output:Hemoglobin.csv                          	number of stay_id:7325
output:Platele.csv                             	number of stay_id:7324
output:WBC.csv                                 	number of stay_id:7327
output:Chloride_serum.csv                      	number of stay_id:7353
output:Chloride_wholeblood.csv                 	number of stay_id:952
output:C

## 3.2 Data transfer of Action Space

### 3.2.1 Data transfer of Action Space for *IV fluid bolus*

 - IV fluid bolus
   - NaCl_0.9%
   - Dextrose_5%

In [16]:
# output to /output/data/data_raw/action/IV_fluid_bolus/{IV_fluid_bolus_name}.csv
from python.data_preprocessing.data_transfer import data_transfer_action_IV_fluid_bolus

data_transfer_action_IV_fluid_bolus(conn)

output action (IV_fluid_bolus):	NaCl_0_9%.csv
output action (IV_fluid_bolus):	Dextrose_5%.csv


### 3.2.2 Data transfer of Action Space for *Vasopressors*

we directly obtain `vasopressors_equivalent_dose` 

from `mimiciv_derived.norepinephrine_equivalent_dose` 

based on *"Vasopressor dose equivalence: A scoping review and suggested formula" by Goradia et al. 2020*.

In [17]:
# output to /output/data/data_raw/action/vasopressors/vasopressors_equivalent_dose.csv
from python.data_preprocessing.data_transfer import data_transfer_action_vasopressors_equivalent_dose

data_transfer_action_vasopressors_equivalent_dose(conn)

output action (vasopressors): vasopressors_equivalent_dose.csv


# 4 Hourly Sample

## 4.1 Hourly Sample on State Space

In [18]:
# output to /output/data/data_hourly_sample/state/stay_id_{selected_id}.csv
from python.data_preprocessing.hourly_sample import hourly_sample_state
import random
if os.path.exists('./output/data/data_hourly_sample/state'):shutil.rmtree('./output/data/data_hourly_sample/state')

# selected_ids = random.sample(stay_ids, 5)
# print(f'Selected stay_id: {selected_ids}')
# for selected_id in selected_ids:
#     hourly_sample_state(selected_id, itemid_list_state, label_state, k = 5)

# selected_ids = random.sample(stay_ids, 5)
# print(f'Selected stay_id: {selected_ids}')
selected_id = 31872514
hourly_sample_state(selected_id, itemid_list_state, label_state, k = 10)


#for selected_id in selected_ids:
#    hourly_sample_state(selected_id, itemid_list_state, label_state, k = 5)

# hourly_sample_state(selected_id, itemid_list_state, label_state, k = 5)# more than 72 hours ICU stay in following stay_ids
# selected_id = 32217866
# selected_id = 32332328
# selected_id = 38362310
selected_id = 31872514
print(f'Selected stay_id: {selected_id}')
hourly_sample_state(selected_id, itemid_list_state, label_state, k = 5)

Selected stay_id: 31872514


## 4.2 Hourly Sample on Action Space

### 4.2.1 Hourly sample IV_fluid_bolus for both continuous and discrete action space

In [19]:
# output to /output/data/data_hourly_sample/action/IV_fluid_bolus/stay_id_{selected_id}.csv
from python.data_preprocessing.hourly_sample import hourly_sample_action_IV_fluid_bolus
if os.path.exists('./output/data/data_hourly_sample/action/IV_fluid_bolus/'):shutil.rmtree('./output/data/data_hourly_sample/action/IV_fluid_bolus/')


# selected_id = 31872514 # more than 72 hours ICU stay 
# print(f'Selected stay_id: {selected_id}')
# hourly_sample_action_IV_fluid_bolus(selected_id)

count = 0
for selected_id in stay_ids:
    try:
        hourly_sample_action_IV_fluid_bolus(selected_id)
    except:
        # print(f'Error with {selected_id}')
        count += 1
print(f'Error count: {count}') # 911 out of 7404 stay_ids (12.3%) did not have IV_fluid_bolus. 7404 - 911 = 6493 (87.7%) stay_ids have IV_fluid_bolus

Error count: 911


### 4.2.2 Hourly sample vasopressors_equivalent_dose for both continuous and discrete action space

In [20]:
# output to /output/data/data_hourly_sample/action/vasopressors_equivalent_dose/stay_id_{selected_id}.csv
from python.data_preprocessing.hourly_sample import hourly_sample_action_vasopressors_equivalent_dose
if os.path.exists('./output/data/data_hourly_sample/action/vasopressors_equivalent_dose'):shutil.rmtree('./output/data/data_hourly_sample/action/vasopressors_equivalent_dose')


# selected_id = 31872514 # more than 72 hours ICU stay 
# print(f'Selected stay_id: {selected_id}')
# hourly_sample_action_vasopressors_equivalent_dose(selected_id)

count = 0
for selected_id in stay_ids:
    try:
        hourly_sample_action_vasopressors_equivalent_dose(selected_id)
    except:
        # print(f'Error with {selected_id}')
        count += 1
print(f'Error count: {count}') # 4452 out of 7404 stay_ids (60.1%) did not have vasopressors. 7404 - 4452 = 2952 (39.9%) stay_ids have vasopressors

Error count: 4452
