# Analysis of `alarm_violations.csv`

For each parameter ...
* Concerning alarm violations ...
  * Create a boxplot and stripplot based on the VALUENUM
  * Clean the data if necessary
  * Create histogram for cleaned VALUENUM  
* Concerning alarm thresholds ...
  * Create a boxplot and stripplot based on the THRESHOLD_VALUE (stratified by THRESHOLD_TYPE)
  * Clean the data if necessary
  * Create histogram for cleaned THRESHOLD_VALUE (stratified by THRESHOLD_TYPE)


## Import Data

In [None]:
# Overview: Import all libraries used.
import numpy as np
import pandas as pd
#import scipy
#import matplotlib.pyplot as plt
import seaborn as sns
#import sklearn

In [None]:
import pandas as pd
alarm_violations = pd.read_csv('./alarm_violations.csv')
alarm_violations.head()

## Parameter: Heart Rate

* `220045` **Heart Rate** (HR), metavision, in bpm (numeric)
* `220046` **Heart rate Alarm - High** (HR Alarm - High), metavision, in bpm (numeric)
* `220047` **Heart rate Alarm - Low** (HR Alarm - Low), metavision, in bpm (numeric)

### Heart Rate - Alarm Violations

In [None]:
HR_violations = alarm_violations[(alarm_violations["ITEMID"] == 220045)]
display(HR_violations)

In [None]:
HR_violations.VALUENUM.describe()

In [None]:
import seaborn as sns
sns.boxplot(data=HR_violations, x='VALUENUM')

Among the VALUENUM values is at least one negative heart rate (min = -88) and a maximum heart rate of 86101 bpm.
These appear to be implausible outliers.

Let's check whether the unit (VALUEUOM) is "bpm" for all heart rates (what is expected according to `D_ITEMS.csv`).

In [None]:
HR_violations.VALUEUOM.unique()

All heart rates are given in bpm as expected.

Let's check the literature to see what heart rates can be expected based on medical knowledge.

* General guideline: "*To estimate your maximum age-related heart rate, subtract your age from 220.*" (https://www.cdc.gov/physicalactivity/basics/measuring/heartrate.htm)
  * First idea: Let's take >220 as upper cut off when removing implausible outliers
* "*The fastest human ventricular conduction rate reported to date is a conducted tachyarrhythmia with ventricular rate of 480 beats per minute*" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3273956/)
  * Second idea: Let's take >480 as upper cut off when removing implausible outliers
* Pacemakers may be the reason why heart rates are recorded around 1000 and higher.
  * Came to my attention via https://www.quora.com/What-is-the-fastest-heartbeat-rate-ever-recorded
  * Investigate later and find literature on this.

Decision for now: Consider heart rates below 0 and above 480 bpm as implausible values to be removed before further analysis.

In [None]:
HR_violations_clean = HR_violations[(HR_violations["VALUENUM"] >= 0) & (HR_violations["VALUENUM"] <= 480)]
HR_violations_removed_too_low = HR_violations[(HR_violations["VALUENUM"] < 0)]
HR_violations_removed_too_high = HR_violations[(HR_violations["VALUENUM"] > 480)]

In [None]:
# Check rows that were removed because of too low VALUENUM
display(HR_violations_removed_too_low.sort_values(by=['VALUENUM']))
HR_violations_removed_too_low.VALUENUM.describe()
# To me, seems to make sense to remove them.

In [None]:
display(HR_violations_removed_too_low[["VALUENUM","THRESHOLD_VALUE","THRESHOLD_TYPE"]].sort_values(by=['VALUENUM']))

In [None]:
# Check rows that were removed because of too high VALUENUM
display(HR_violations_removed_too_high.sort_values(by=['VALUENUM']))
HR_violations_removed_too_high.VALUENUM.describe()
# To me, seems to make sense to remove them. Maybe we will reconsider after following up on the pacemaker issue.
# Question that came to my mind: Should we train the ML model only on clean data or should we include outliers as they appear in the real data and might have a 'hidden meaning'?

In [None]:
display(HR_violations_removed_too_high[["VALUENUM","THRESHOLD_VALUE","THRESHOLD_TYPE"]].sort_values(by=['VALUENUM']))

In [None]:
# Check cleaned HR_violations
display(HR_violations_clean.sort_values(by=['VALUENUM']))
HR_violations_clean.VALUENUM.describe()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")

fig, axs = plt.subplots(1, 3, figsize=(25, 5))
fig.suptitle("Heart Rate - Alarm Violations", fontsize=18)

sns.stripplot(data=HR_violations_clean, x='VALUENUM', ax=axs[0])
axs[0].set_title("HR_violations_clean scatter plot")
axs[0].set_xlabel("HR_violations_clean VALUENUM")

sns.boxplot(data=HR_violations_clean, x='VALUENUM', ax=axs[1])
axs[1].set_title("HR_violations_clean boxplot")
axs[1].set_xlabel("HR_violations_clean VALUENUM")

sns.histplot(data=HR_violations_clean, x='VALUENUM', ax=axs[2])
axs[2].set_title("HR_violations_clean histogram")
axs[2].set_xlabel("HR_violations_clean VALUENUM")

plt.show(fig)

### Heart Rate - Alarm Thresholds

In [None]:
display(HR_violations)

In [None]:
# Check Heart rate Alarm - High threshold
HR_violations[(HR_violations["THRESHOLD_TYPE"] == "HIGH")].THRESHOLD_VALUE.describe()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("whitegrid")

fig, (fig_box, fig_hist) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.15, .85)}, figsize=(7, 5))
fig.suptitle("Heart rate alarm thresholds of type HIGH", fontsize=16)
sns.boxplot(data=HR_violations[(HR_violations["THRESHOLD_TYPE"] == "HIGH")], x="THRESHOLD_VALUE", ax=fig_box)
fig_box.set(xlabel="")
sns.histplot(data=HR_violations[(HR_violations["THRESHOLD_TYPE"] == "HIGH")], x="THRESHOLD_VALUE", kde=True, ax=fig_hist)
fig_hist.set_xlabel("THRESHOLD_VALUE (Heart rate in bpm)", fontsize=12)
fig_hist.set_ylabel("Count", fontsize=12)

plt.show()

The minimum value of the HIGH alarm thresholds is 0, which is surprisingly low.

Let's check the suspiciously low HIGH alarm thresholds.

In [None]:
# Looking at THRESHOLD_VALUE smaller than 40 ...
HR_threshold_check_high = HR_violations[(HR_violations["THRESHOLD_TYPE"] == "HIGH") & (HR_violations["THRESHOLD_VALUE"] < 40)].sort_values(by=['THRESHOLD_VALUE'])
display(HR_threshold_check_high)
HR_threshold_check_high.THRESHOLD_VALUE.describe()

In [None]:
sns.boxplot(data=HR_threshold_check_high, x='THRESHOLD_VALUE')

In [None]:
# Now, let's refine and look only on THRESHOLD_VALUE smaller than 10 ...
HR_threshold_check_high = HR_violations[(HR_violations["THRESHOLD_TYPE"] == "HIGH") & (HR_violations["THRESHOLD_VALUE"] < 10)].sort_values(by=['THRESHOLD_VALUE'])
display(HR_threshold_check_high)
HR_threshold_check_high.THRESHOLD_VALUE.describe()

There are 111 THRESHOLD_VALUEs that are 0 >= x =< 1 for thresholds of type HIGH, which is suspicious.

 A possible explanation could be that ICU staff sets a too low threshold by mistake (e.g. typing 0 instead of 100). This would directly trigger an alarm.

Keeping in mind that the data set includes only the violated thresholds; unusually high thresholds will rarely trigger an alarm, while unusually low alarms (such as 0 bpm) will immediately cause an alarm in living people.

In [None]:
# Check Heart rate Alarm - Low threshold
HR_violations[(HR_violations["THRESHOLD_TYPE"] == "LOW")].THRESHOLD_VALUE.describe()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("whitegrid")

fig = sns.stripplot(data=HR_violations[(HR_violations["THRESHOLD_TYPE"] == "LOW")], x="THRESHOLD_VALUE")
fig.set_title("Scatterplot for heart rate alarm thresholds of type LOW", fontsize=14)
fig.set_xlabel("THRESHOLD_VALUE (Heart rate in bpm)", fontsize=12)

plt.show(fig)

The maximum value of the LOW alarm thresholds is 85160, which is surprisingly high.

Let's check the suspiciously low HIGH alarm thresholds.

In [None]:
HR_threshold_check_low = HR_violations[(HR_violations["THRESHOLD_TYPE"] == "LOW") & (HR_violations["THRESHOLD_VALUE"] > 480)].sort_values(by=['THRESHOLD_VALUE'])
display(HR_threshold_check_low)
HR_threshold_check_low.THRESHOLD_VALUE.describe()

In [None]:
HR_threshold_check_low_01 = HR_violations[(HR_violations["THRESHOLD_TYPE"] == "LOW") & (HR_violations["THRESHOLD_VALUE"] > 100) & (HR_violations["THRESHOLD_VALUE"] <= 1000)].sort_values(by=['THRESHOLD_VALUE'])
HR_threshold_check_low_02 = HR_violations[(HR_violations["THRESHOLD_TYPE"] == "LOW") & (HR_violations["THRESHOLD_VALUE"] > 1000) & (HR_violations["THRESHOLD_VALUE"] <= 90000)].sort_values(by=['THRESHOLD_VALUE'])

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
fig, axs = plt.subplots(1, 2, figsize=(15, 5))
fig.suptitle("A closer look at the suspiciously high LOW thresholds", fontsize=16)

sns.stripplot(ax=axs[0], data=HR_threshold_check_low_01, x="THRESHOLD_VALUE")
axs[0].set_title("Scatterplot for heart rate alarm thresholds of type LOW", fontsize=14)
axs[0].set_xlabel("THRESHOLD_VALUE (Heart rate in bpm)", fontsize=12)

sns.stripplot(ax=axs[1], data=HR_threshold_check_low_02, x="THRESHOLD_VALUE")
axs[1].set_title("Scatterplot for heart rate alarm thresholds of type LOW", fontsize=14)
axs[1].set_xlabel("THRESHOLD_VALUE (Heart rate in bpm)", fontsize=12)

plt.show(fig)

In [None]:
HR_threshold_check_low_01.THRESHOLD_VALUE.describe()

In [None]:
HR_threshold_check_low_02.THRESHOLD_VALUE.describe()

## Additional Visualizations

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
fig, axs = plt.subplots(1, 2, figsize=(20, 5))
fig.suptitle("Distribution of heart rate alarm violations stratified by threshold type in cleaned data set", fontsize=18)

sns.histplot(ax=axs[0], data=HR_violations_clean, x="VALUENUM", hue="THRESHOLD_TYPE", palette=["darkblue", "darkgreen"])
axs[0].set_title("Histogram", fontsize=12)
axs[0].set_xlabel("VALUENUM (Heart rate in bpm)", fontsize=12)
axs[0].set_ylabel("Count", fontsize=12)

sns.kdeplot(ax=axs[1], data=HR_violations_clean, x="VALUENUM", hue="THRESHOLD_TYPE", palette=["darkblue", "darkgreen"])
axs[1].set_title("Kernel density estimate (KDE)", fontsize=12)
axs[1].set_xlabel("VALUENUM (Heart rate in bpm)", fontsize=12)
axs[1].set_ylabel("Density", fontsize=12)

plt.show(fig)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("whitegrid")

fig, (fig_box, fig_hist) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.15, .85)}, figsize=(10, 5))
fig.suptitle("Heart rate alarm violations in cleaned data set", fontsize=18)
sns.boxplot(data=HR_violations_clean, x="VALUENUM", ax=fig_box)
fig_box.set(xlabel="")
sns.histplot(data=HR_violations_clean, x="VALUENUM", kde=True, ax=fig_hist)
fig_hist.set_xlabel("VALUENUM (Heart rate in bpm)", fontsize=12)
fig_hist.set_ylabel("Count", fontsize=12)

plt.show()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
fig, axs = plt.subplots(1, 2, figsize=(20, 5))
fig.suptitle("Distribution of heart rate alarm violations stratified by threshold type in cleaned data set", fontsize=18)

sns.histplot(ax=axs[0], data=HR_violations_clean, x="THRESHOLD_VALUE", hue="THRESHOLD_TYPE", palette=["darkblue", "darkgreen"])
axs[0].set_title("Histogram", fontsize=12)
axs[0].set_xlabel("VALUENUM (Heart rate in bpm)", fontsize=12)
axs[0].set_ylabel("Count", fontsize=12)

sns.kdeplot(ax=axs[1], data=HR_violations_clean, x="THRESHOLD_VALUE", hue="THRESHOLD_TYPE", palette=["darkblue", "darkgreen"])
axs[1].set_title("Kernel density estimate (KDE)", fontsize=12)
axs[1].set_xlabel("VALUENUM (Heart rate in bpm)", fontsize=12)
axs[1].set_ylabel("Density", fontsize=12)

plt.show(fig)