Vasopressors Selection
======================

In this jupyter notebook we analyze the vasopressors selection of action space.

In [14]:
import psycopg2
from psycopg2 import sql
import csv
import pandas as pd
import numpy as np
import os
import shutil
import csv
from datetime import timedelta

# implement the username, password and database name
conn = psycopg2.connect(host='', user='', password='', database='mimiciv')

# 1 Data transfer of Vasopressors

In *1.1* we transfer data of following Vasopressors from Postgresql to csv:

 - Vasopressors 
   - Norepinephrine
   - Phenylephrine
   - Vasopressin
   - Epinephrine
   - Dopamine
   - Dobutamine
   - Milrinone

In *1.2*, we get the Vasopressors statistical data

In *1.3*, after calculation and comparison, we show the reason why we can omit the two insignificant vasopressors: Dobutamine and Milrinone.

So in this project we directly obtain the `vasopressors_norepinephrine_equivalent_dose` based on `"Vasopressor dose equivalence: A scoping review and suggested formula" by Goradia et al. 2020`.

## 1.1 transfer action data of Vasopressors from Postgresql to csv

Output files locate in `doc/output_action_vasopressors_selection/data/data_raw/action/vasopressors`

In [15]:
# generate the dictionary of action
with open('../csv/itemid_label_action.csv', newline='') as csvfile:
    # Create a CSV reader object
    reader = csv.reader(csvfile)
    # Skip the header row
    next(reader)
    # Initialize an empty dictionary and list
    action_label = {}
    a_itemid_list = []
    # Iterate over the rows in the CSV file
    for row in reader:
        # Add the key-value pair to the dictionary
        action_label[row[0]] = row[1]
        # Add the itemid to the list
        a_itemid_list.append(row[0])

if os.path.exists('./output_action_vasopressors_selection/data/data_raw/action/vasopressors'):shutil.rmtree('./output_action_vasopressors_selection/data/data_raw/action/vasopressors')
os.makedirs('./output_action_vasopressors_selection/data/data_raw/action/vasopressors')

with conn.cursor() as cursor:

    for itemid in a_itemid_list:
        if "Dextrose_5%" not in action_label[str(itemid)] and "NaCl_0_9%" not in action_label[str(itemid)]:
            # QUESTION: why do we need to order by starttime?
            command = "select stay_id, starttime, endtime, amount from mimiciv_derived.sepsis_action where itemid={} order by starttime;".format(itemid)
            cursor.execute(command)

            result = cursor.fetchall()
            df = pd.DataFrame(result)
            df.columns = ['stay_id', 'starttime', 'endtime', 'amount']
            
            df['duration'] = df['endtime'] - df['starttime']
            df['duration'] = df['duration'].dt.total_seconds()  # Convert duration to seconds
            df['duration'] = df['duration'] / 60
            df['value_per_minute'] = df['amount'] / df['duration']
            

            df.to_csv('./output_action_vasopressors_selection/data/data_raw/action/vasopressors/{}.csv'.format(action_label[str(itemid)]), index=0)
            print("output action (vasopressors):\t"+action_label[str(itemid)]+".csv")
    cursor.close()

output action (vasopressors):	Norepinephrine.csv
output action (vasopressors):	Vasopressin.csv
output action (vasopressors):	Dobutamine.csv
output action (vasopressors):	Milrinone.csv
output action (vasopressors):	Phenylephrine.csv
output action (vasopressors):	Dopamine.csv
output action (vasopressors):	Epinephrine.csv


## 1.2 Analyze data of Vasopressors 

### Get Vasopressors statistical data

In [16]:
# Folder path
folder_path = './output_action_vasopressors_selection/data/data_raw/action/vasopressors'
print('Item\tVasopressors\tCount\tPercentage\tTop N Percentage')

# Get all CSV file paths in the folder
file_paths = [os.path.join(folder_path, file) for file in os.listdir(folder_path) if file.endswith('.csv')]

# Store the number of rows for each CSV file
rows_dict = {}

# Calculate the total number of rows
total_rows = 0

# Iterate through each CSV file and get the number of rows
for file_path in file_paths:
    with open(file_path, 'r', newline='') as csvfile:
        csv_reader = csv.reader(csvfile)
        rows = sum(1 for row in csv_reader)
        rows_dict[file_path] = rows
        total_rows += rows

# Sort the dictionary by the number of rows in descending order
sorted_rows = sorted(rows_dict.items(), key=lambda x: x[1], reverse=True)

total_rows_first_n = 0
index = 0
# Print the sorted results
for file_path, rows in sorted_rows:
    index += 1
    total_rows_first_n += rows
    print(f"{index}\t{file_path[73:-4]}\t{rows}\t{round(rows/total_rows*100,2)}%\t\t{round(total_rows_first_n/total_rows*100,2)}%")
print(f"\tTotal\t\t{total_rows}\t100%\t\t100%")

Item	Vasopressors	Count	Percentage	Top N Percentage
1	Norepinephrine	60081	66.46%		66.46%
2	Phenylephrine	19667	21.76%		88.22%
3	Vasopressin	3763	4.16%		92.38%
4	Epinephrine	2829	3.13%		95.51%
5	Dopamine	2448	2.71%		98.22%
6	Dobutamine	1132	1.25%		99.47%
7	Milrinone	481	0.53%		100.0%
	Total		90401	100%		100%


## 1.3 Conclusion of Analysis of Vasopressors
The top 5 vasopressors account for 98.22% of the total, indicating that they have a significant contribution. Therefore, we only need to consider the top 5 vasopressors and can ignore the last two vasopressors, Dobutamine and Milrinone.

The last two vasopressors are also disregarded in this study: `"Vasopressor dose equivalence: A scoping review and suggested formula" by Goradia et al. 2020`.

The equivalent dose values for the top 5 vasopressors can be directly obtained from mimiciv_derived.norepinephrine_equivalent_dose.