MIMIC_Sepsis
=================

# 1 Preparation

To run this document the following requirements must be satisfied:

- Implement the database mimic in **PostgreSQL** and start it. The instruction can be seen [here](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/buildmimic/postgres). (The name of this environment should be **mimiciv**)
- generate useful abstractions of raw MIMIC-IV data. The instruction be seen [here](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/concepts_postgres) 



To install all the required libraries, uncomment following and run:

In [15]:
# !pip install -r requirements.txt

Run the following cell to connect to the database.

In [16]:
%load_ext autoreload
%autoreload 2

import psycopg2
from psycopg2 import sql
import csv
import pandas as pd
import numpy as np
import os
import shutil
import csv
from datetime import timedelta
from sklearn.impute import KNNImputer
from sklearn.neighbors import KNeighborsRegressor

# implement the username, password and database name
conn = psycopg2.connect(host='', user='', password='', database='mimiciv')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# 2 Extract selected data from the original database 

We extract the `state space` and `action space` respectively from the mimiciv database. The table `itemid_info/mimic4 itemid.csv` lists all the items required.

***Uncomment the following cell if you first time run the code***

In [17]:
# # uncomment the this cell if you first time run the code

# # Read the SQL file

# try:
#     with open('sql/select_patients_cohort.sql', 'r') as file0:
#         sql_script_select_patients_cohort = file0.read()
        
#     with open('sql/state_from_chartevents.sql', 'r') as file1:
#         sql_script_state = file1.read()

#     with open('sql/action_from_inputevents.sql', 'r') as file2:
#         sql_script_action = file2.read()

#     # Execute the SQL script
#     cursor = conn.cursor()
    
#     cursor.execute(sql.SQL(sql_script_select_patients_cohort))
#     cursor.execute(sql.SQL(sql_script_state))
#     cursor.execute(sql.SQL(sql_script_action))

#     conn.commit()
#     cursor.close()
    
# except (Exception, psycopg2.DatabaseError) as error:
#     print("Error executing SQL statement:", error)

Get the number of stay_ids

In [18]:
with conn.cursor() as cursor:
    command = "SELECT distinct stay_id FROM mimiciv_derived.sepsis_patients_cohort;"
    cursor.execute(command)   
    result = cursor.fetchall()
    stay_ids= [row[0] for row in result]
    num_stay_ids = len(stay_ids)
    print('Number of stay_ids: ' + str(num_stay_ids))
    cursor.close()

Number of stay_ids: 6669


# 3 Data transfer

## 3.1 Data transfer of State Space
We transfer the data of State Space from Postgresql to csv.

In [19]:
from python.data_preprocessing.data_transfer import data_transfer_state

# output to /output/data/data_raw/state/{state_name}.csv
itemid_list_state, label_state = data_transfer_state(conn, num_stay_ids, threshold = 1000)

output:Heartrate.csv                           	number of stay_id:6669
output:ABPs.csv                                	number of stay_id:2129
output:NBPs.csv                                	number of stay_id:6632
output:ABPd.csv                                	number of stay_id:2131
output:NBPd.csv                                	number of stay_id:6632
output:ABPm.csv                                	number of stay_id:2168
output:NBPm.csv                                	number of stay_id:6632
output:RespiratoryRate.csv                     	number of stay_id:6669
output:TemperatureF.csv                        	number of stay_id:6576
output:TemperatureC.csv                        	number of stay_id:832
output:PH_A.csv                                	number of stay_id:3663
output:PH_V.csv                                	number of stay_id:3184
output:ABE.csv                                 	number of stay_id:3645
output:Hematocrit_serum.csv                    	number of stay_id:6596
output:

## 3.2 Data transfer of Action Space

### 3.2.1 Data transfer of Action Space for *IV fluid bolus*

 - IV fluid bolus
   - NaCl_0.9%
   - Dextrose_5%

In [20]:
from python.data_preprocessing.data_transfer import data_transfer_action_IV_fluid_bolus

# output to /output/data/data_raw/action/IV_fluid_bolus/{IV_fluid_bolus_name}.csv
data_transfer_action_IV_fluid_bolus(conn)

output action (IV_fluid_bolus):	NaCl_0_9%.csv
output action (IV_fluid_bolus):	Dextrose_5%.csv


### 3.2.2 Data transfer of Action Space for *Vasopressors*

we directly obtain `vasopressors_norepinephrine_equivalent_dose` 

from `mimiciv_derived.norepinephrine_equivalent_dose` 

based on *"Vasopressor dose equivalence: A scoping review and suggested formula" by Goradia et al. 2020*.

In [21]:
from python.data_preprocessing.data_transfer import data_transfer_action_vasopressors_equivalent_dose

# output to /output/data/data_raw/action/vasopressors/vasopressors_norepinephrine_equivalent_dose.csv
data_transfer_action_vasopressors_equivalent_dose(conn)

output action (vasopressors): vasopressors_norepinephrine_equivalent_dose.csv


# 4 Hourly Sample

## 4.1 Hourly Sample on State Space

In [22]:
from python.data_preprocessing.hourly_sample import hourly_sample_state
import random

# output to /output/data/data_hourly_sample/state/stay_id_{selected_id}.csv

selected_ids = random.sample(stay_ids, 5)
print(selected_ids)
# selected_id = 32950566


for selected_id in selected_ids:
    hourly_sample_state(selected_id, itemid_list_state, label_state, k = 5)

# hourly_sample_state(selected_id, itemid_list_state, label_state, k = 5)

[31307790, 33930962, 36910101, 36875488, 36528912]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered.loc[:,'chartdatetime'] = pd.to_datetime(df_filtered['chartdatetime'].copy())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered.loc[:,'chartdatetime'] = pd.to_datetime(df_filtered['chartdatetime'].copy())
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered.loc[:,'chartdat

## 4.2 Hourly Sample on Action Space

### 4.2.2 Hourly sample IV_fluid_bolus for both continuous and discrete action space

In [23]:
from python.data_preprocessing.hourly_sample import hourly_sample_action_IV_fluid_bolus

# output to /output/data/data_hourly_sample/action/IV_fluid_bolus/stay_id_{selected_id}.csv

selected_id = 31872514
# Use the function
hourly_sample_action_IV_fluid_bolus(selected_id)

# for selected_id in stay_ids:
#     try:
#         hourly_sample_IV_fluid_bolus(selected_id)
#     except:
#         print(f'Error with {selected_id}')

### 4.2.2 Hourly sample vasopressors_equivalent_dose for both continuous and discrete action space

In [24]:
from python.data_preprocessing.hourly_sample import hourly_sample_action_vasopressors_equivalent_dose

# output to /output/data/data_hourly_sample/action/vasopressors_norepinephrine_equivalent_dose/stay_id_{selected_id}.csv

selected_id = 31872514
# Use the function
hourly_sample_action_vasopressors_equivalent_dose(selected_id)

# for selected_id in stay_ids:
#     try:
#         hourly_sample_action(selected_id)
#     except:
#         print(f'Error with {selected_id}')