# MIMIC_Sepsis

## 1. Preparation

To run this document the following requirements must be satisfied:

- Implement the database mimic in **PostgreSQL** and start it. The instruction can be seen [here](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/buildmimic/postgres). (The name of this environment should be **mimiciv**)
- generate useful abstractions of raw MIMIC-IV data. The instruction be seen [here](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/concepts_postgres) 



To install all the libraries, run:
```
pip install psycopg2 csv pandas
```


After all the preparation is done, run the following cell to set the connection to the database.
 

In [None]:
import psycopg2
from psycopg2 import sql
import csv
import pandas as pd
import os

# implement the username, password and database name

#conn = psycopg2.connect(host='localhost', user='postgres', password='123', database='mimiciv')
conn = psycopg2.connect(host='', user='', password='', database='mimiciv')

## 2. Extract selected data from the original database 

According to the paper we need to extract the **state space** and **action space** respectively from the mimiciv database. The table **mimic4 itemid.csv** lists all the items required.

* Here the *IV fluid bolus* of the action space is removed as no data could be found.

In [None]:
# Read the SQL file

try:
    with open('sql_file/action_from_inputevents.sql', 'r') as file1:
        sql_script_action = file1.read()
        
    with open('sql_file/chartevents_dataneeded.sql', 'r') as file2:
        sql_script_state = file2.read()

    # Execute the SQL script
    cursor = conn.cursor()
    #cursor.execute(sql.SQL(sql_script_action))
    cursor.execute(sql.SQL(sql_script_state))

    conn.commit()
    cursor.close()
    
    
except (Exception, psycopg2.DatabaseError) as error:
    print("Error executing SQL statement:", error)

## 3. Data transfor
Here we first deal with the action data. 
TODO: here the value per minute is computed, do we need this?(paper4.1,P5)

In [None]:
# generate the dictionary of action
with open('csv/itemid_label_action.csv', newline='') as csvfile:
    # Create a CSV reader object
    reader = csv.reader(csvfile)
    # Skip the header row
    next(reader)
    # Initialize an empty dictionary and list
    action_label = {}
    a_itemid_list = []
    # Iterate over the rows in the CSV file
    for row in reader:
        # Add the key-value pair to the dictionary
        action_label[row[0]] = row[1]
        # Add the itemid to the list
        a_itemid_list.append(row[0])


with conn.cursor() as cursor:

    for itemid in a_itemid_list:
        command = "select stay_id, starttime, endtime,amount from mimiciv_derived.sepsis_action where itemid={} order by starttime;".format(itemid)
        cursor.execute(command)

        result = cursor.fetchall()
        df = pd.DataFrame(result)
        df.columns = ['stay_id', 'starttime', 'endtime', 'amount']
        
        df['duration'] = df['endtime'] - df['starttime']
        df['duration'] = df['duration'].dt.total_seconds()  # Convert duration to seconds
        df['duration'] = df['duration'] / 60
        df['value_per_minute'] = df['amount'] / df['duration']
        
        
        
        os.makedirs('./output/action', exist_ok=True)
        df.to_csv('./output/action/{}.csv'.format(action_label[str(itemid)]), index=0)
        print("output:"+action_label[str(itemid)]+".csv")


        

Then, we treat the **state space** with similar method.

In [None]:
# generate the dictionary of itemid-abbr
with open('csv/itemid_label_state.csv', newline='') as csvfile:
    # Create a CSV reader object
    reader = csv.reader(csvfile)
    # Skip the header row
    next(reader)
    # Initialize an empty dictionary and list
    label = {}
    itemid_list = []
    # Iterate over the rows in the CSV file
    for row in reader:
        # Add the key-value pair to the dictionary
        label[row[0]] = row[1]
        # Add the itemid to the list
        itemid_list.append(row[0])



# Execute the SQL command

with conn.cursor() as cursor:
    
    for itemid in itemid_list:
        command = "select stay_id, charttime, valuenum from mimiciv_derived.sepsis_state where itemid={} order by charttime;".format(itemid)
        cursor.execute(command)
        
        command_count=""
            
        result = cursor.fetchall()
        df=pd.DataFrame(result)
        os.makedirs('./output/state', exist_ok=True)
        df.to_csv('./output/state/{}.csv'.format(label[str(itemid)]),index=0)
        print("output:"+label[str(itemid)]+".csv")



## 4.hourly sample

In [None]:
# Set the folder path where the CSV files are stored
folder_path = './output_state/'
columns=['chartdatetime']

# define a new dataframe
df_output = pd.DataFrame(columns=columns)


#FIXME flag only for test
i=0

# Loop through the file paths and read each file into a DataFrame
for itemid in itemid_list:
    
    feature=label[str(itemid)]
    feature_num=feature+'num'
    path = folder_path + feature+'.csv'
    # Load the CSV file into a pandas DataFrame
    df = pd.read_csv(path,names=['stay_id', 'chartdatetime', feature,feature_num ])

    selected_id = 30588857
    df_filtered = df[df['stay_id'] == selected_id]
    

    # Convert the 'datetime' column to a datetime object
    df_filtered['chartdatetime'] = pd.to_datetime(df_filtered['chartdatetime'])


    # Set the 'datetime' column as the DataFrame's index
    df_filtered.set_index('chartdatetime', inplace=True)


    # # Resample the DataFrame hourly and forward fill missing values
    df_hourly= df_filtered.resample('H').ffill()
    df_hourly=df_hourly.drop(['stay_id'],axis=1)
    #print(df_hourly)
    df_output=pd.merge(df_output,df_hourly,how='outer',on='chartdatetime')
    
    i+=1
    if(i==30): break

df_output.to_csv('./output/stay_id30588857.csv',index=0)


# # Reset the index and save the resampled DataFrame to a new CSV file
# df_hourly.reset_index().to_csv('./output/your_resampled_file.csv', index=False, header=None)