# Medication Analysis

**Assumption:** Alarms before and after which a medication was administered are relevant

**Goal:** Analyze medication tables in MIMIC-III to prepare for indication of relevant alarms

In MIMIC-III, three tables contain information on medication - `PRESCRIPTIONS`, `INPUTEVENTS_MV` and `INPUTEVENTS_CV`. For our use case, only the first two are of interest, as we are currently only looking at data from ICU stays that were created with the MetaVision system. "Part B" for `INPUTEVENTS_MV.csv` can be found in `/mimic_medication_analysis/inputevents_mv_analysis.ipynb`.

## Part A: Analyze PRESCRIPTIONS.csv

The `PRESCRIPTIONS` table contains medication related order entries, i.e. prescriptions (see https://mimic.physionet.org/mimictables/prescriptions/ and http://people.cs.pitt.edu/~jlee/note/intro_to_mimic_db.pdf).

### Read and Pre-Filter PRESCRIPTIONS.csv

In [None]:
import pandas as pd
import dask.dataframe as dd
from dask.diagnostics import ProgressBar

# Data types based on MIMIC schema specification https://mit-lcp.github.io/mimic-schema-spy/tables/prescriptions.html
# Problem: Complicated use of integer data types with NaNs in Pandas, see https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions
# Decision: Integers are read in as 'float64', strings as 'object', and timestamps via Dask's parse_dates provided for this purpose
prescriptions = dd.read_csv('../data/mimic-iii-clinical-database-1.4/PRESCRIPTIONS.csv', parse_dates=['STARTDATE', 'ENDDATE'], dtype={
    'ROW_ID': 'float64', # int4 according to specification
    'SUBJECT_ID': 'float64', # int4 according to specification
    'HADM_ID': 'float64', # int4 according to specification
    'ICUSTAY_ID': 'float64', # int4 according to specification

    'DRUG_TYPE': 'object', # varchar according to specification
    'DRUG': 'object', # varchar according to specification
    'DRUG_NAME_POE': 'object', # varchar according to specification
    'DRUG_NAME_GENERIC': 'object', # varchar according to specification
    'FORMULARY_DRUG_CD': 'object', # varchar according to specification

    'GSN': 'object', # varchar according to specification
    'NDC': 'object', # varchar according to specification

    'PROD_STRENGTH': 'object', # varchar according to specification
    'DOSE_VAL_RX': 'object', # varchar according to specification
    'DOSE_UNIT_RX': 'object', # varchar according to specification
    'FORM_VAL_DISP': 'object', # varchar according to specification
    'FORM_UNIT_DISP': 'object', # varchar according to specification
    'ROUTE': 'object' # varchar according to specification
})

prescriptions.head()

In [None]:
unique_ICU_stays = pd.read_parquet('../data/unique_icustays_in_chartevents_subset.parquet', engine='pyarrow')

with ProgressBar():
    # Extract relevant columns (ICUSTAY_ID, date period, drug and its dosis)
    prescriptions_subset = prescriptions[['ICUSTAY_ID', 'STARTDATE', 'ENDDATE', 'DRUG_TYPE', 'DRUG', 'DOSE_VAL_RX', 'DOSE_UNIT_RX', 'FORM_VAL_DISP', 'FORM_UNIT_DISP']]

    # Filter by ICUSTAY_IDs
    prescriptions_subset = prescriptions_subset[prescriptions.ICUSTAY_ID.isin(unique_ICU_stays.ICUSTAY_ID)]

    # Convert ICUSTAY_ID to integer (aka remove ".0")
    prescriptions_subset['ICUSTAY_ID'] = prescriptions_subset['ICUSTAY_ID'].astype(int)

    # Apply the previously defined commands to the Dask DataFrame, resulting in the desired Pandas DataFrame
    prescriptions_subset = prescriptions_subset.compute()

len(prescriptions_subset.index)  # 1,294,243

### Check for NaN Entries

In [None]:
prescriptions_subset.isna().any()

In [None]:
prescriptions_subset['ENDDATE'].isna().sum() # 783 NaT values

In [None]:
prescriptions_subset['FORM_UNIT_DISP'].isna().sum() # 634 NaN values including all dose and form describing NAN values

In [None]:
# Drop rows with ENDDATE = NaT
prescriptions_subset = prescriptions_subset[prescriptions_subset.ENDDATE.notnull()]

# Drop rows with FORM_UNIT_DISP = NaN (also removes rows with NAN entries in DOSE_VAL_RX, DOSE_UNIT_RX and FORM_VAL_DISP columns)
prescriptions_subset = prescriptions_subset[prescriptions_subset.FORM_UNIT_DISP.notnull()]

### Add Date Difference Column & Check for Valid Dates

In [None]:
import numpy as np

# Calculate difference betweeen STARTDATE and ENDDATE
prescriptions_subset['DATE_DIF'] = pd.to_datetime(prescriptions_subset['ENDDATE']) - pd.to_datetime(prescriptions_subset['STARTDATE'])

# Extract integer values (aka remove ' days')
prescriptions_subset['DATE_DIF'] = (prescriptions_subset['DATE_DIF'] / np.timedelta64(1,'D')).astype(int)

len(prescriptions_subset[prescriptions_subset['DATE_DIF'] < 0]) # 5,546 negative date differences

In [None]:
# Remove negative date differences (STARTDATE after ENDDATE)
prescriptions_subset = prescriptions_subset[prescriptions_subset['DATE_DIF'] >= 0]

### Create Parquet File 'prescriptions_based_medications'

In [None]:
# Sort rows for better overview
prescriptions_subset = prescriptions_subset.sort_values(by=['ICUSTAY_ID', 'STARTDATE','ENDDATE'])

# Reset index
prescriptions_subset = prescriptions_subset.reset_index(drop=True)

# Save as parquet file
pd.DataFrame(prescriptions_subset).to_parquet('../data/prescriptions_based_medications.parquet', engine='pyarrow')

### Plot: Medication Period per Medication Information

In [None]:
import pandas as pd

prescriptions_medication = pd.read_parquet('../data/prescriptions_based_medications.parquet', engine='pyarrow')
prescriptions_medication.info() # 1,287,829 entries

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set variables
title = 'Medication Period per Medication Information'
plotdata = prescriptions_medication
xvalue = 'DATE_DIF'
xlabel = 'Medication Period in Days'
ylabel = 'Count'

# Actual plot
sns.set_style('whitegrid')
sns.histplot(
    data=plotdata,
    x=xvalue,
    binwidth=2)
plt.title(title, fontsize=18)
plt.xlabel(xlabel, fontsize=16)
plt.ylabel(ylabel, fontsize=16)

plt.tight_layout()
plt.show()

### Plot: Count of Medication Information by ICUSTAY_ID

In [None]:
icustay_id_count = prescriptions_medication\
    .groupby(['ICUSTAY_ID'])\
    .size()\
    .reset_index(name='Count')

icustay_id_count.Count.describe()

A minimum of 1 to a maximum of 727 medications were administered during an ICU stay, considering all 23,287 ICU stays that are relevant and received a medication.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set variables
title = 'Medication Counts of ICU Stays'
plotdata = icustay_id_count
xvalue = 'Count'
xlabel = 'Medication Count'

# Actual plot
sns.set_style('whitegrid')
sns.histplot(
    data=plotdata,
    x=xvalue)
plt.title(title, fontsize=18)
plt.xlabel(xlabel, fontsize=16)
plt.ylabel(xvalue, fontsize=16)
plt.xlim(0)

plt.tight_layout()
plt.show()

For most of the 23,287 ICU stays that are relevant and received a medication, 0 to approximately 80 medications are administered.

### Plot: Medication of one ICUSTAY_ID over Time

In [None]:
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

icustay_ids = [208809, 260223, 266144, 216834]

for icustay in icustay_ids:

    # Set variables
    title = f'Medication Information for ICUSTAY_ID {icustay}'
    plotdata = prescriptions_medication[prescriptions_medication['ICUSTAY_ID'] == icustay]
    xlabel = 'Time'
    ylabel = 'Index of Medication'
    colors = {
        'MAIN' : 'r',
        'BASE' : 'b',
        'ADDITIVE' : 'g' # does not exist for ICUSTAY_ID 260223
    }

    # Actual plot
    figure(figsize=(15, 20), dpi=80)
    for i in range(len(plotdata)):
        if plotdata.iloc[i].DATE_DIF == 0:
            plt.scatter(
                x=[plotdata.iloc[i].STARTDATE, plotdata.iloc[i].ENDDATE],
                y=[i, i],
                s=1,
                color=colors[plotdata.iloc[i].DRUG_TYPE])
        else:
            plt.plot(
                [plotdata.iloc[i].STARTDATE, plotdata.iloc[i].ENDDATE],
                [i, i],
                color = colors[plotdata.iloc[i].DRUG_TYPE])

    # Add title and labels
    plt.title(title, fontsize=22)
    plt.xlabel(xlabel, fontsize=16)
    plt.xticks(rotation=45)
    plt.ylabel(ylabel, fontsize=16)

    # Add legend
    plt.plot(plotdata.STARTDATE.min(), 0, color='r', label='MAIN')
    plt.plot(plotdata.STARTDATE.min(), 0, color='b', label='BASE')
    plt.rcParams['legend.title_fontsize'] = 16
    plt.legend(title='Drug Type', loc='upper left', fontsize=14, fancybox=True)

    plt.tight_layout()
    plt.show()

### Time-Series Plot: Medication at Triggered Alarms

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

icustay_ids = [208809, 260223, 266144, 216834]

for icustay in icustay_ids:
    parameter_names = ['Heart Rate', '02 Saturation', 'Systolic Blood Pressure']
    parameter_names_abbrev = ['HR', 'O2Sat', 'NBPs']

    # measurement, low, high
    parameter_itemids = [[220045, 220047, 220046], [220277, 223770, 223769], [220179, 223752, 223751]]

    parameter_units = ['Beats per Minute', 'Percent', 'Millimeter Mercury']
    parameter_units_abbrev = ['bpm', '%', 'mmHg']

    for i in range(len(parameter_names)):
        medication_of_selected_icustay = prescriptions_medication[prescriptions_medication['ICUSTAY_ID'] == icustay]

        chartevents_subset = pd.read_parquet('../data/chartevents_subset.parquet', engine='pyarrow')
        chartevents_subset = chartevents_subset[(chartevents_subset['ITEMID'].isin(parameter_itemids[i]))]

        selected_icustay = chartevents_subset[(chartevents_subset['ICUSTAY_ID'] == icustay)].copy()

        # Add new column with ITEMID_LABEL
        selected_icustay['ITEMID_LABEL'] = np.nan
        selected_icustay['ITEMID_LABEL'] = np.where((selected_icustay['ITEMID'] == parameter_itemids[i][0]) & (selected_icustay['ITEMID_LABEL'] != np.nan), f'{parameter_names[i]} ({parameter_units_abbrev[i]})', selected_icustay['ITEMID_LABEL'])
        selected_icustay['ITEMID_LABEL'] = np.where((selected_icustay['ITEMID'] == parameter_itemids[i][1]) & (selected_icustay['ITEMID_LABEL'] != np.nan), f'Alarm Threshold: Low {parameter_names[i]} ({parameter_units_abbrev[i]})', selected_icustay['ITEMID_LABEL'])
        selected_icustay['ITEMID_LABEL'] = np.where((selected_icustay['ITEMID'] == parameter_itemids[i][2]) & (selected_icustay['ITEMID_LABEL'] != np.nan), f'Alarm Threshold: High {parameter_names[i]} ({parameter_units_abbrev[i]})', selected_icustay['ITEMID_LABEL'])

        # Convert CHARTTIME to datetime
        selected_icustay['CHARTTIME'] = pd.to_datetime(selected_icustay['CHARTTIME'])

        # Create time-indexed pandas series
        value_series = selected_icustay[(selected_icustay['ITEMID'] == parameter_itemids[i][0])][['CHARTTIME','VALUENUM']].set_index('CHARTTIME').squeeze().rename('VALUE')
        threshold_low_series = selected_icustay[(selected_icustay['ITEMID'] == parameter_itemids[i][1])][['CHARTTIME','VALUENUM']].set_index('CHARTTIME').squeeze().rename('THRESHOLD_LOW')
        threshold_high_series = selected_icustay[(selected_icustay['ITEMID'] == parameter_itemids[i][2])][['CHARTTIME','VALUENUM']].set_index('CHARTTIME').squeeze().rename('THRESHOLD_HIGH')

        # Merge series to data frame using pd.concat
        timeseries = pd.concat([value_series, threshold_high_series, threshold_low_series], axis=1).copy()

        # Interpolate missing values by using the last available value
        timeseries['THRESHOLD_LOW'] = timeseries['THRESHOLD_LOW'].interpolate('pad')
        timeseries['THRESHOLD_HIGH'] = timeseries['THRESHOLD_HIGH'].interpolate('pad')

        # Add columns containing the differences between the measured value and the currently valid threshold
        timeseries['DIF_VALUE_LOW'] = timeseries.VALUE - timeseries.THRESHOLD_LOW
        timeseries['DIF_VALUE_HIGH'] = timeseries.VALUE - timeseries.THRESHOLD_HIGH

        # Identify triggered alarms (a.k.a. alarm violations) for threshold of type LOW
        alarm_too_low = timeseries[(timeseries['DIF_VALUE_LOW'] <= 0)][['VALUE','THRESHOLD_LOW','DIF_VALUE_LOW']]

        # Identify triggered alarms (a.k.a. alarm violations) for threshold of type HIGH
        alarm_too_high = timeseries[(timeseries['DIF_VALUE_HIGH'] >= 0)][['VALUE','THRESHOLD_HIGH','DIF_VALUE_HIGH']]

        # Set variables for plot
        title = f'History of {parameter_names[i]} of ICU Stay {icustay}'
        xlabel = 'Time'
        ylabel = parameter_units[i]
        plotdata = selected_icustay
        xvalue = 'CHARTTIME'
        yvalue = 'VALUENUM'
        huevalue = 'ITEMID_LABEL'

        # Config figure
        sns.set_style('whitegrid')
        fig, ax = plt.subplots(
            figsize=(11, 5),
            dpi=72 # e.g. 72 for screen, 300 for print
        )

        # Main plot
        sns.lineplot(
            data=plotdata,
            x=xvalue,
            y=yvalue,
            hue=huevalue,
            style=huevalue,
            drawstyle='steps-post', # Interpolate missing values by using the last available value
            markers=['^','v','p'],
            markersize=5,
            dashes=False,
            palette=[sns.color_palette('colorblind')[1], sns.color_palette('colorblind')[2], sns.color_palette('colorblind')[0]]
        )

        # Add vertical lines as HIGH alarm indicators
        #if 0 < len(alarm_too_high.index) < 11: # Only if between 1 and 11 alarms occur (otherwise the diagram gets too busy)

        for idx, item in enumerate(alarm_too_high.index):
            # Check if medication was given at day of alarm triggering
            for drug_idx in range(len(medication_of_selected_icustay)):
                current_alarm_timestamp = alarm_too_high.index[idx]
                current_medication = medication_of_selected_icustay.iloc[drug_idx]

                if current_alarm_timestamp.date() == current_medication.STARTDATE.date():
                    # Add red square
                    plt.scatter(
                        x=current_alarm_timestamp,
                        y=alarm_too_high.VALUE[idx],
                        color='r',
                        marker='s')

                    # Add DATE_DIF as text
                    #plt.annotate(
                    #    text=current_medication.DATE_DIF,
                    #    xy=(current_alarm_timestamp, 20))

                    break

            plt.axvline(
                item,
                linestyle='dotted',
                color=sns.color_palette('colorblind')[1],
                alpha=0.8,
                zorder=0)

        # Add vertical lines as LOW alarm indicators
        #if 0 < len(alarm_too_low.index) < 11: # Only if between 1 and 11 alarms occur (otherwise the diagram gets too busy)

        for idx, item in enumerate(alarm_too_low.index):
            # Check if medication was given at day of alarm triggering
            for drug_idx in range(len(medication_of_selected_icustay)):
                current_alarm_timestamp = alarm_too_low.index[idx]
                current_medication = medication_of_selected_icustay.iloc[drug_idx]

                if current_alarm_timestamp.date() == current_medication.STARTDATE.date():
                    # Add red square
                    plt.scatter(
                        x=current_alarm_timestamp,
                        y=alarm_too_low.VALUE[idx],
                        color='r',
                        marker='s')

                    # Add DATE_DIF as text
                    #plt.annotate(
                    #    text=current_medication.DATE_DIF,
                    #    xy=(current_alarm_timestamp, 20))

                    break

            plt.axvline(
                item,
                linestyle='dotted',
                color=sns.color_palette('colorblind')[2],
                alpha=0.8,
                zorder=0)

        # Configure legend
        plt.plot([], linestyle='dotted', color=sns.color_palette('colorblind')[1], alpha=0.8, zorder=0, label=f'Triggered Alarm: {parameter_names[i]} too High')
        plt.plot([], linestyle='dotted', color=sns.color_palette('colorblind')[2], alpha=0.8, zorder=0, label=f'Triggered Alarm: {parameter_names[i]} too Low')
        plt.scatter([], [], color='r', marker='s', label='Medication was Given (from same day on)')
        plt.legend(title=None, bbox_to_anchor=(1.02, 0.3), loc='upper left', borderaxespad=0)

        # Configure title and labels
        ax.set_title(title, fontweight='bold', color='black', fontsize=14, y=1.05)
        ax.set_xlabel(xlabel, fontsize=12, labelpad=15)
        ax.set_ylabel(ylabel, fontsize=12, labelpad=15)
        plt.xticks(rotation=45)

        # Plot figure
        plt.tight_layout()
        #plt.savefig(f'../plots/prescriptions/time-series/time_series_pres_med_{parameter_names_abbrev[i]}_{icustay}.png', dpi=1200)
        plt.show(fig)

Almost always, when alarms occur, a prescription was administered on the same day. Since `PRESCRIPTIONS.csv` does not contain time information, it could be that these are direct interventions after the occurrence of alarms or other treatments during the same day. In the majority of these situations, the medication was stopped after one day. Whether this describes mainly one-off short treatments and actually a treatment that lasts the whole day cannot be answered with the data from `PRESCRIPTIONS.csv`.

## Conclusion

Following from this, we decided to not use this table for the introduction of the medication flag.