# Medication Analysis

**Assumption:** Alarms before and after which a medication was administered are relevant

**Goal:** Analyze medication tables in MIMIC-III to prepare for indication of relevant alarms

In MIMIC-III, three tables contain information on medication - `PRESCRIPTIONS`, `INPUTEVENTS_MV` and `INPUTEVENTS_CV`. For our use case, only the first two are of interest, as we are currently only looking at data from ICU stays that were created with the MetaVision system. "Part A" for `PRESCRIPTIONS.csv` can be found in `/mimic_medication_analysis/prescriptions_analysis.ipynb`.

## Part B: Analyze INPUTEVENTS_MV.csv

`INPUTEVENTS_MV.csv` needs to be investigated because "inputs and outputs are extremely useful when studying intensive care unit patients. Inputs are any fluids which have been administered to the patient: such as oral or tube feedings or **intravenous solutions containing medications**." (see [MIMIC-III documentation](https://mimic.physionet.org/mimicdata/io/))

Certain ITEMIDs in `INPUTEVENTS_MV` hide medications and thus should be figured out and analyzed.

### Extract Relevant ITEMIDs

In [None]:
import dask.dataframe as dd
from dask.diagnostics import ProgressBar

# Data types based on MIMIC schema specification https://mit-lcp.github.io/mimic-schema-spy/tables/inputevents_mv.html
# Problem: Complicated use of integer data types with NaNs in Pandas, see https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions
# Decision: Decision: Integers are read in as 'float64' and strings as 'object'
d_items = dd.read_csv('../data/mimic-iii-clinical-database-1.4/D_ITEMS.csv', dtype={
    'ROW_ID': 'float64', # int according to specification
    'ITEMID': 'float64', # int according to specification
    'LABEL': 'object', # varchar according to specification
    'ABBREVIATION': 'object', # varchar according to specification
    'DBSOURCE': 'object', # varchar according to specification
    'LINKSTO': 'object', # varchar according to specification
    'CATEGORY': 'object', # varchar according to specification
    'UNITNAME': 'object', # varchar according to specification
    'PARAM_TYPE': 'object', # varchar according to specification
    'CONCEPTID': 'float64' # int according to specification
})

with ProgressBar():
    # Filter for ITEMIDs from INPUTEVENTS_MV
    medication_items = d_items[d_items['LINKSTO'] == 'inputevents_mv']

    # Filter by categories clearly related to medications (disregard "Fluids - Other (Not In Use)" because it also includes liquid and special nutrition)
    medication_items = medication_items[medication_items['CATEGORY'].isin(['Medications', 'Blood Products/Colloids', 'Antibiotics'])]

    # Apply the previously defined commands to the Dask DataFrame, resulting in the desired Pandas DataFrame
    medication_items = medication_items.compute()

### Read and Pre-Filter INPUTEVENTS_MV.csv

In [None]:
import pandas as pd
import dask.dataframe as dd
from dask.diagnostics import ProgressBar

# Data types based on MIMIC schema specification https://mit-lcp.github.io/mimic-schema-spy/tables/inputevents_mv.html
# Problem: Complicated use of integer data types with NaNs in Pandas, see https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions
# Decision: Decision: Floats and integers are read in as 'float64', strings as 'object', and timestamps via Dask's parse_dates provided for this purpose
inputevents = dd.read_csv('../data/mimic-iii-clinical-database-1.4/INPUTEVENTS_MV.csv', parse_dates=['STARTTIME', 'ENDTIME', 'STORETIME', 'COMMENTS_DATE'], dtype={
    'ROW_ID': 'float64', # int4 according to specification
    'SUBJECT_ID': 'float64', # int4 according to specification
    'HADM_ID': 'float64', # int4 according to specification
    'ICUSTAY_ID': 'float64', # int4 according to specification

    'ITEMID': 'float64', # int4 according to specification
    'AMOUNT': 'float64', # float8 according to specification
    'AMOUNTUOM': 'object', # varchar according to specification
    'RATE': 'float64', # float8 according to specification
    'RATEUOM': 'object', # varchar according to specification

    'CGID': 'float64', # int4 according to specification
    'ORDERID': 'float64', # int4 according to specification
    'LINKORDERID': 'float64', # int4 according to specification
    'ORDERCATEGORYNAME': 'object', # varchar according to specification
    'SECONDARYORDERCATEGORYNAME': 'object', # varchar according to specification
    'ORDERCOMPONENTTYPEDESCRIPTION': 'object', # varchar according to specification
    'ORDERCATEGORYDESCRIPTION': 'object', # varchar according to specification

    'PATIENTWEIGHT': 'float64', # float8 according to specification
    'TOTALAMOUNT': 'float64', # float8 according to specification
    'TOTALAMOUNTUOM': 'object', # varchar according to specification

    'ISOPENBAG': 'float64', # int2 according to specification
    'CONTINUEINNEXTDEPT': 'float64', # int2 according to specification
    'CANCELREASON': 'float64', # int2 according to specification
    'STATUSDESCRIPTION': 'object', # varchar according to specification
    'COMMENTS_STATUS': 'object', # varchar according to specification
    'COMMENTS_TITLE': 'object', # varchar according to specification
    'ORIGINALAMOUNT': 'float64', # float8 according to specification
    'ORIGINALRATE': 'float64' # float8 according to specification
})

# Get all relevant ICU stays
unique_ICU_stays = pd.read_parquet('../data/unique_icustays_in_chartevents_subset.parquet', engine='pyarrow')

with ProgressBar():
    # Extract relevant columns
    inputevents_subset = inputevents[['ICUSTAY_ID', 'STARTTIME', 'ENDTIME', 'ITEMID', 'AMOUNT', 'AMOUNTUOM', 'RATE', 'RATEUOM', 'STORETIME', 'ORDERID', 'LINKORDERID', 'ORDERCATEGORYNAME', 'SECONDARYORDERCATEGORYNAME', 'ORDERCOMPONENTTYPEDESCRIPTION', 'ORDERCATEGORYDESCRIPTION' , 'TOTALAMOUNT', 'TOTALAMOUNTUOM', 'STATUSDESCRIPTION', 'ORIGINALAMOUNT', 'ORIGINALRATE']]

    # Filter by ICUSTAY_IDs
    inputevents_subset = inputevents_subset[inputevents_subset.ICUSTAY_ID.isin(unique_ICU_stays.ICUSTAY_ID)]

    # Drop rows without ICUSTAY_ID
    inputevents_subset = inputevents_subset.dropna(how='any', subset=['ICUSTAY_ID'])

    # Reduce ITEMIDs to the ones whose categories are clearly related to medications
    inputevents_subset = inputevents_subset[inputevents_subset['ITEMID'].isin(medication_items.ITEMID.unique())]

    # Apply the previously defined commands to the Dask DataFrame, resulting in the desired Pandas DataFrame
    inputevents_subset = inputevents_subset.compute()

### Check for NaN Entries

In [None]:
inputevents_subset.isna().any()

In [None]:
# Drop rows with TOTALAMOUNT = NaN because it should always be set (also removes all NaN values in TOTALAMOUNTUOM)
inputevents_subset = inputevents_subset[inputevents_subset.TOTALAMOUNT.notnull()]
inputevents_subset.isna().any()

### Add Time Difference Column & Check for Valid Dates

In [None]:
# Calculate difference betweeen STARTTIME and ENDTIME
inputevents_subset['DURATION_IN_MIN'] = (pd.to_datetime(inputevents_subset['ENDTIME']) - pd.to_datetime(inputevents_subset['STARTTIME'])) / pd.Timedelta(minutes=1)

inputevents_subset.head()

In [None]:
# Remove negative durations (STARTTIME after ENDTIME)
inputevents_subset = inputevents_subset[inputevents_subset['DURATION_IN_MIN'] >= 0]

### Create Parquet File 'inputevents_based_medications'

In [None]:
# Sort rows for better overview
inputevents_subset = inputevents_subset.sort_values(by=['ICUSTAY_ID', 'STARTTIME','ENDTIME', 'ITEMID'])

# Reset index
inputevents_subset = inputevents_subset.reset_index(drop=True)

# Save as parquet file
pd.DataFrame(inputevents_subset).to_parquet('../data/inputevents_based_medications.parquet', engine='pyarrow')

### Check if Relevant ITEMIDs are also Recorded in CHARTEVENTS.csv

In [None]:
import dask.dataframe as dd
from dask.diagnostics import ProgressBar

chartevents = dd.read_csv('../data/mimic-iii-clinical-database-1.4/CHARTEVENTS.csv', parse_dates=['CHARTTIME','STORETIME'], dtype={
    'ROW_ID': 'float64', # int4 according to specification
    'SUBJECT_ID': 'float64', # int4 according to specification
    'HADM_ID': 'float64', # int4 according to specification
    'ICUSTAY_ID': 'float64', # int4 according to specification
    'ITEMID': 'float64', # int4 according to specification
    'CGID': 'float64', # int4 according to specification
    'VALUE': 'object',
    'VALUENUM': 'float64', # float8 according to specification
    'VALUEUOM': 'object',
    'WARNING': 'float64', # int4 according to specification
    'ERROR': 'float64', # int4 according to specification
    'RESULTSTATUS': 'object',
    'STOPPED': 'object'})

with ProgressBar():
    # Filter by ITEMIDs
    chartevents_medications = chartevents[chartevents.ITEMID.isin(inputevents_subset.ITEMID.unique())]

    # Drop rows without ICUSTAY_ID
    chartevents_medications = chartevents_medications.dropna(how='any', subset=['ICUSTAY_ID'])

    # Keep only the rows for which no error occurred, which is coded by a 0. (5584 rows are dropped because the boolean ERROR column equals 1, indicating an error.)
    chartevents_medications = chartevents_medications[chartevents_medications.ERROR.isin([0])]

    # Apply the previously defined commands to the Dask DataFrame, resulting in the desired Pandas DataFrame.
    chartevents_medications = chartevents_medications.compute()

# Sort the rows (not essential, but gives a better overview)
chartevents_medications = chartevents_medications.sort_values(by=['ICUSTAY_ID', 'CHARTTIME','ITEMID'])

# Reset index
chartevents_medications = chartevents_medications.reset_index(drop=True)

# Test if relevant ITEMIDs are in CHARTEVENTS
len(chartevents_medications.index) # 0

None of the relevant ITEMIDs are in CHARTEVENTS, which means that further analyses with these IDs can be based solely on the `inputevents_based_medications.parquet` file.

### Plot: Distribution of Status

In [None]:
import pandas as pd

inputevents_medication = pd.read_parquet('../data/inputevents_based_medications.parquet', engine='pyarrow')
inputevents_medication.info() # 1282499 entries

In [None]:
status_input = inputevents_medication\
    .groupby(['STATUSDESCRIPTION'])\
    .size()\
    .reset_index(name='Count')

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set variables
title = 'Distribution of Status'
plotdata = status_input
xvalue = 'STATUSDESCRIPTION'
xlabel = 'Status'
yvalue = 'Count'

# Actual plot
sns.set_style('whitegrid')
sns.barplot(
    data=plotdata,
    x=xvalue,
    y=yvalue,
    color=sns.color_palette('colorblind')[0])
plt.title(title, fontsize=18)
plt.xlabel(xlabel, fontsize=16)
plt.ylabel(yvalue, fontsize=16)
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

### Plot: Distribution of Ordercategory

In [None]:
categories_input = inputevents_medication\
    .groupby(['ORDERCATEGORYNAME'])\
    .size()\
    .reset_index(name='Count')

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set variables
title = 'Distribution of Ordercategories'
plotdata = categories_input
xvalue = 'ORDERCATEGORYNAME'
xlabel = 'Ordercategory'
yvalue = 'Count'

# Actual plot
sns.set_style('whitegrid')
sns.barplot(
    data=plotdata,
    x=xvalue,
    y=yvalue,
    color=sns.color_palette('colorblind')[0])
plt.title(title, fontsize=18)
plt.xlabel(xlabel, fontsize=16)
plt.ylabel(yvalue, fontsize=16)
plt.xticks(rotation=90)

plt.tight_layout()
plt.show()

### Plot: Medication Counts of ICU Stays

In [None]:
icustay_count_input = inputevents_medication\
    .groupby(['ICUSTAY_ID'])\
    .size()\
    .reset_index(name='Count')

icustay_count_input.Count.describe()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set variables
title = 'Medication Counts of ICU Stays'
plotdata = icustay_count_input
xvalue = 'Count'
xlabel = 'Medication Count'

# Actual plot
sns.set_style('whitegrid')
sns.histplot(
    data=plotdata,
    x=xvalue)
plt.title(title, fontsize=18)
plt.xlabel(xlabel, fontsize=16)
plt.ylabel(xvalue, fontsize=16)
plt.xlim(0)

plt.tight_layout()
plt.show()

### Time-Series Plot: Medication at Triggered Alarms

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

icustay_ids = [208809, 266144, 216834]

for icustay in icustay_ids:
    parameter_names = ['Heart Rate', '02 Saturation', 'Systolic Blood Pressure']
    parameter_names_abbrev = ['HR', 'O2Sat', 'NBPs']

    # measurement, low, high
    parameter_itemids = [[220045, 220047, 220046], [220277, 223770, 223769], [220179, 223752, 223751]]

    parameter_units = ['Beats per Minute', 'Percent', 'Millimeter Mercury']
    parameter_units_abbrev = ['bpm', '%', 'mmHg']

    for i in range(len(parameter_names)):
        medication_of_selected_icustay = inputevents_medication[inputevents_medication['ICUSTAY_ID'] == icustay]

        chartevents_subset = pd.read_parquet('../data/chartevents_subset.parquet', engine='pyarrow')
        chartevents_subset = chartevents_subset[(chartevents_subset['ITEMID'].isin(parameter_itemids[i]))]

        selected_icustay = chartevents_subset[(chartevents_subset['ICUSTAY_ID'] == icustay)].copy()

        # Add new column with ITEMID_LABEL
        selected_icustay['ITEMID_LABEL'] = np.nan
        selected_icustay['ITEMID_LABEL'] = np.where((selected_icustay['ITEMID'] == parameter_itemids[i][0]) & (selected_icustay['ITEMID_LABEL'] != np.nan), f'{parameter_names[i]} ({parameter_units_abbrev[i]})', selected_icustay['ITEMID_LABEL'])
        selected_icustay['ITEMID_LABEL'] = np.where((selected_icustay['ITEMID'] == parameter_itemids[i][1]) & (selected_icustay['ITEMID_LABEL'] != np.nan), f'Alarm Threshold: Low {parameter_names[i]} ({parameter_units_abbrev[i]})', selected_icustay['ITEMID_LABEL'])
        selected_icustay['ITEMID_LABEL'] = np.where((selected_icustay['ITEMID'] == parameter_itemids[i][2]) & (selected_icustay['ITEMID_LABEL'] != np.nan), f'Alarm Threshold: High {parameter_names[i]} ({parameter_units_abbrev[i]})', selected_icustay['ITEMID_LABEL'])

        # Convert CHARTTIME to datetime
        selected_icustay['CHARTTIME'] = pd.to_datetime(selected_icustay['CHARTTIME'])

        # Create time-indexed pandas series
        value_series = selected_icustay[(selected_icustay['ITEMID'] == parameter_itemids[i][0])][['CHARTTIME','VALUENUM']].set_index('CHARTTIME').squeeze().rename('VALUE')
        threshold_low_series = selected_icustay[(selected_icustay['ITEMID'] == parameter_itemids[i][1])][['CHARTTIME','VALUENUM']].set_index('CHARTTIME').squeeze().rename('THRESHOLD_LOW')
        threshold_high_series = selected_icustay[(selected_icustay['ITEMID'] == parameter_itemids[i][2])][['CHARTTIME','VALUENUM']].set_index('CHARTTIME').squeeze().rename('THRESHOLD_HIGH')

        # Merge series to data frame using pd.concat
        timeseries = pd.concat([value_series, threshold_high_series, threshold_low_series], axis=1).copy()

        # Interpolate missing values by using the last available value
        timeseries['THRESHOLD_LOW'] = timeseries['THRESHOLD_LOW'].interpolate('pad')
        timeseries['THRESHOLD_HIGH'] = timeseries['THRESHOLD_HIGH'].interpolate('pad')

        # Add columns containing the differences between the measured value and the currently valid threshold
        timeseries['DIF_VALUE_LOW'] = timeseries.VALUE - timeseries.THRESHOLD_LOW
        timeseries['DIF_VALUE_HIGH'] = timeseries.VALUE - timeseries.THRESHOLD_HIGH

        # Identify triggered alarms (a.k.a. alarm violations) for threshold of type LOW
        alarm_too_low = timeseries[(timeseries['DIF_VALUE_LOW'] <= 0)][['VALUE','THRESHOLD_LOW','DIF_VALUE_LOW']]

        # Identify triggered alarms (a.k.a. alarm violations) for threshold of type HIGH
        alarm_too_high = timeseries[(timeseries['DIF_VALUE_HIGH'] >= 0)][['VALUE','THRESHOLD_HIGH','DIF_VALUE_HIGH']]

        # Set variables for plot
        title = f'History of {parameter_names[i]} of ICU Stay {icustay}'
        xlabel = 'Time'
        ylabel = parameter_units[i]
        plotdata = selected_icustay
        xvalue = 'CHARTTIME'
        yvalue = 'VALUENUM'
        huevalue = 'ITEMID_LABEL'

        medications = list()

        # Config figure
        sns.set_style('whitegrid')
        fig, ax = plt.subplots(
            figsize=(11, 5),
            dpi=72 # e.g. 72 for screen, 300 for print
        )

        # Main plot
        sns.lineplot(
            data=plotdata,
            x=xvalue,
            y=yvalue,
            hue=huevalue,
            style=huevalue,
            drawstyle='steps-post', # Interpolate missing values by using the last available value
            markers=['^','v','p'],
            markersize=5,
            dashes=False,
            palette=[sns.color_palette('colorblind')[1], sns.color_palette('colorblind')[2], sns.color_palette('colorblind')[0]]
        )

        # Add vertical lines as HIGH alarm indicators
        #if 0 < len(alarm_too_high.index) < 11: # Only if between 1 and 11 alarms occur (otherwise the diagram gets too busy)

        for idx, item in enumerate(alarm_too_high.index):
            # Check if medication was given at day of alarm triggering
            for drug_idx in range(len(medication_of_selected_icustay)):
                current_alarm_timestamp = alarm_too_high.index[idx]
                current_medication = medication_of_selected_icustay.iloc[drug_idx]

                if current_alarm_timestamp == current_medication.STARTTIME:
                    current_medication_label = medication_items.loc[medication_items['ITEMID'] == current_medication.ITEMID, 'LABEL'].item()
                    medications.append(current_medication_label)

                    # Add red square
                    plt.scatter(
                        x=current_alarm_timestamp,
                        y=alarm_too_high.VALUE[idx],
                        color='r',
                        marker='s')

                    # Add PERIOD_IN_MIN as text
                    #plt.annotate(
                    #    text=current_medication.PERIOD_IN_MIN,
                    #    xy=(current_alarm_timestamp, 20))

                    break

            plt.axvline(
                item,
                linestyle='dotted',
                color=sns.color_palette('colorblind')[1],
                alpha=0.8,
                zorder=0)

        # Add vertical lines as LOW alarm indicators
        #if 0 < len(alarm_too_low.index) < 11: # Only if between 1 and 11 alarms occur (otherwise the diagram gets too busy)

        for idx, item in enumerate(alarm_too_low.index):
            # Check if medication was given at day of alarm triggering
            for drug_idx in range(len(medication_of_selected_icustay)):
                current_alarm_timestamp = alarm_too_low.index[idx]
                current_medication = medication_of_selected_icustay.iloc[drug_idx]

                if current_alarm_timestamp == current_medication.STARTTIME:
                    current_medication_label = medication_items.loc[medication_items['ITEMID'] == current_medication.ITEMID, 'LABEL'].item()
                    medications.append(current_medication_label)

                    # Add red square
                    plt.scatter(
                        x=current_alarm_timestamp,
                        y=alarm_too_low.VALUE[idx],
                        color='r',
                        marker='s')

                    # Add PERIOD_IN_MIN as text
                    #plt.annotate(
                    #    text=current_medication.PERIOD_IN_MIN,
                    #    xy=(current_alarm_timestamp, 20))

                    break

            plt.axvline(
                item,
                linestyle='dotted',
                color=sns.color_palette('colorblind')[2],
                alpha=0.8,
                zorder=0)

        # Configure legend
        plt.plot([], linestyle='dotted', color=sns.color_palette('colorblind')[1], alpha=0.8, zorder=0, label=f'Triggered Alarm: {parameter_names[i]} too High')
        plt.plot([], linestyle='dotted', color=sns.color_palette('colorblind')[2], alpha=0.8, zorder=0, label=f'Triggered Alarm: {parameter_names[i]} too Low')
        plt.scatter([], [], color='r', marker='s', label='Medication (from same time on)')
        plt.legend(title=None, bbox_to_anchor=(1.02, 0.3), loc='upper left', borderaxespad=0)

        # Configure title and labels
        ax.set_title(title, fontweight='bold', color='black', fontsize=14, y=1.05)
        ax.set_xlabel(xlabel, fontsize=12, labelpad=15)
        ax.set_ylabel(ylabel, fontsize=12, labelpad=15)
        plt.xticks(rotation=45)

        # Plot figure
        plt.tight_layout()
        #plt.savefig(f'../plots/inputevents/time-series/time_series_input_med_{parameter_names_abbrev[i]}_{icustay}.png', dpi=1200)
        plt.show(fig)

        print(f'Parameter {parameter_names_abbrev[i]} with ICU stay {icustay}:')
        for med in medications:
            print(med)

With the data from `INPUTEVENTS_MV`, significantly fewer medications can be observed. However, they can be classified very meaningfully. For example, there are drugs of the same category or there are clusters at certain points in time, e.g. when the blood pressure falls below the lower blood pressure limit, where drugs that raise blood pressure and fight bacteria are used.

## Conclusion

Therefore, we propose to introduce a flag, based on `inputevents_based_medications.parquet`, for all alarms that indicates whether medication was given within a certain period of time around the alarm (1h before to 1h after the alarm) and the alarm is therefore relevant.