# Analysis of `alarm_violations.csv` for Respiratory Rate

For each parameter ...
* Concerning alarm violations ...
  * Create a boxplot and stripplot based on the VALUENUM
  * Clean the data if necessary
  * Create histogram for cleaned VALUENUM  
* Concerning alarm thresholds ...
  * Create a boxplot and stripplot based on the THRESHOLD_VALUE (stratified by THRESHOLD_TYPE)
  * Clean the data if necessary
  * Create histogram for cleaned THRESHOLD_VALUE (stratified by THRESHOLD_TYPE)


## Import Data

In [None]:
# Overview of libraries used
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
alarm_violations = pd.read_csv('./alarm_violations.csv')
alarm_violations.head()

## Parameter: Respiratory Rate (Number of Breaths per Minute)

* `220210` **Respiratory Rate** (RR), metavision, in insp/min (numeric)
* `224161` **Resp Alarm - High** (Resp Alarm - High), metavision, in insp/min (numeric)
* `224162` **Resp Alarm - Low** (Resp Alarm - Low), metavision, in insp/min (numeric)

### Respiratory Rate - Alarm Violations

In [None]:
RR_violations = alarm_violations[(alarm_violations['ITEMID'] == 220210)]
display(RR_violations)

In [None]:
RR_violations.VALUENUM.describe()

In [None]:
RR_violations.boxplot(column='VALUENUM')

Among the VALUENUM values is a maximum respiratory rate of 2.355.555 insp/min.
This appears to be at least one implausible outlier.

Let's check whether the unit (VALUEUOM) is "insp/min" for all respiratory rates (what is expected according to `D_ITEMS.csv`).

In [None]:
RR_violations.VALUEUOM.unique()

All respiratory rates are given in insp/min as expected.

Let's check the literature to see what respiratory rates can be expected based on medical knowledge.

* General range:
    * for adults: 12 to 20 insp/min
    * much higher for kids, especially for babies under 2 years (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3789232/figure/F2/)
* Abnormal values for adults: (https://onlinelibrary.wiley.com/doi/full/10.5694/j.1326-5377.2008.tb01825.x?casa_token=UjZimsSmcVIAAAAA%3A2cMU2S0v9D15Mx72WCOms4LbCztCJ0_TnZIheDI-qZ8x8a0VU7HWBRs6TTv9SGoqfHC0fSf5ctnduwA&sid=nlm%3Apubmed)
    * varies from over 14 to over 36 insp/min
    * over 20 insp/min = probably unwell
    * over 24  insp/min = likely to be critically ill
* Maximum Breathing Capacity (MBC):
    * "...has been determined with various expiratory and respiratory resistances (singly and combined) at breathing rates of 4–196 per minute." (https://journals.physiology.org/doi/abs/10.1152/jappl.1957.11.1.79)

Decision for now: Consider respiratory rates over 196 insp/min as implausible values to be removed before further analysis. In addition, assume a threshold range from 10 to 70 and remove the one extreme outlier.

In [None]:
RR_violations_without_outlier = RR_violations[RR_violations['VALUENUM'] < 4000]
RR_violations_without_outlier.VALUENUM.describe()

In [None]:
sns.stripplot(data=RR_violations_without_outlier, x='VALUENUM').set_title('Respiratory Rate - Original Alarm Violations (without outlier)')

In [None]:
RR_violations_above_196 = RR_violations[RR_violations['VALUENUM'] > 196]
RR_violations_above_196.VALUENUM.describe()

In [None]:
display(RR_violations_above_196.sort_values(by=['VALUENUM']))
len(RR_violations_above_196) # 41 violations were removed

In [None]:
RR_violations_above_196 = RR_violations_above_196[RR_violations_above_196['VALUENUM'] < 4000]
sns.stripplot(data=RR_violations_above_196, x='VALUENUM').set_title('Respiratory Rate - Removed Alarm Violations (without outlier)')
RR_violations_above_196.VALUENUM.describe()

There is one extreme outlier, 14 values over 914 and 26 values between 196 and 400.
Maybe investigate jump from 400 to 914 and also keep values up to 400.

### Respiratory Rate - Cleaned Alarm Violations

In [None]:
RR_violations_cleaned = RR_violations[RR_violations['VALUENUM'] <= 196]
display(RR_violations_cleaned.sort_values(by=['VALUENUM']))
RR_violations_cleaned.VALUENUM.describe()

In [None]:
sns.set_style('whitegrid')
fig, axs = plt.subplots(1, 3, figsize=(25, 5))
fig.suptitle('Respiratory Rate - Cleaned Alarm Violations', fontsize=18)

sns.stripplot(data=RR_violations_cleaned, x='VALUENUM', ax=axs[0])
axs[0].set_title('Scatter Plot')
axs[0].set_xlabel('VALUENUM')

sns.boxplot(data=RR_violations_cleaned, x='VALUENUM', ax=axs[1])
axs[1].set_title('Boxplot')
axs[1].set_xlabel('VALUENUM')

sns.histplot(data=RR_violations_cleaned, x='VALUENUM', ax=axs[2])
axs[2].set_title('Histogram')
axs[2].set_xlabel('VALUENUM')

plt.show(fig)

### Respiratory Rate - Alarm Thresholds

#### Respiratory Rate - HIGH Alarm Thresholds

In [None]:
RR_threshold_high = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'HIGH')]
RR_threshold_high.THRESHOLD_VALUE.describe()

In [None]:
sns.boxplot(data=RR_threshold_high, x='THRESHOLD_VALUE').set_title('Original HIGH Thresholds of Respiratory Rate')

The minimum value of the HIGH alarm thresholds is 0, which should be at least one unit bigger than minimal value 10. Let's check the suspiciously low HIGH alarm thresholds up to 10.

In [None]:
RR_threshold_high_under_10 = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'HIGH') & (RR_violations['THRESHOLD_VALUE'] < 10)]
sns.histplot(data=RR_threshold_high_under_10, x='THRESHOLD_VALUE').set_title('HIGH Thresholds of Respiratory Rate up to Value of 10')
# Decided to remove them

The maximal value of HIGH alarm thresholds should be 36 according to the found literature. Let's check the HIGH alarm thresholds up to this value.

In [None]:
RR_threshold_high_over_36 = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'HIGH') & (RR_violations['THRESHOLD_VALUE'] > 36)]
sns.histplot(data=RR_threshold_high_over_36, x='THRESHOLD_VALUE').set_title('HIGH Thresholds of Respiratory Rate with Values from 36 Onwards')
# Decided to keep these values as babies and especially ICU patients can have much higher thresholds

In [None]:
RR_threshold_high_cleaned = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'HIGH') & (RR_violations['THRESHOLD_VALUE'] > 10)].sort_values(by=['THRESHOLD_VALUE'])
display(RR_threshold_high_cleaned)
RR_threshold_high_cleaned.THRESHOLD_VALUE.describe() # Ranges from 11 to 55 now

#### Respiratory Rate - LOW Alarm Thresholds

In [None]:
RR_threshold_low = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'LOW')].sort_values(by=['THRESHOLD_VALUE'])
display(RR_threshold_low)
RR_threshold_low.THRESHOLD_VALUE.describe()

The minimum value of a LOW threshold is 1 which should be at least 10 because of found minimum in literature. The maximum value of a LOW threshold is 8.409.010 which is definitely too high und also much higher than the maximal HIGH threshold that is 55. Let's check how many values are too high.

In [None]:
RR_threshold_low_over_55 = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'LOW') & (RR_violations['THRESHOLD_VALUE'] > 55)].sort_values(by=['THRESHOLD_VALUE'])
display(RR_threshold_low_over_55)
RR_threshold_low_over_55.THRESHOLD_VALUE.describe()

There seem to be two clusters among the definitely too high LOW thresholds - one around 835 with 103 values and one around 8,350,000 with 18 values.

In [None]:
RR_threshold_low_8mio = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'LOW') & (RR_violations['THRESHOLD_VALUE'] > 8300000) & (RR_violations['THRESHOLD_VALUE'] < 8500000)]
RR_threshold_low_8mio.THRESHOLD_VALUE.describe()

sns.histplot(data=RR_threshold_low_8mio, x='THRESHOLD_VALUE').set_title('Respiratory Rate - Original LOW Thresholds (Around 8.35 Mio)')

In [None]:
RR_threshold_low_800 = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'LOW') & (RR_violations['THRESHOLD_VALUE'] > 820) & (RR_violations['THRESHOLD_VALUE'] < 850)]
RR_threshold_low_800.THRESHOLD_VALUE.describe()

sns.histplot(data=RR_threshold_low_800, x='THRESHOLD_VALUE').set_title('Respiratory Rate - Original LOW Thresholds (Around 835)')

The remaining 27,325 values range from 1 to 123.

In [None]:
RR_threshold_low_under_125 = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'LOW') & (RR_violations['THRESHOLD_VALUE'] < 125)]
RR_threshold_low_under_125.THRESHOLD_VALUE.describe()

sns.histplot(data=RR_threshold_low_under_125, x='THRESHOLD_VALUE').set_title('Respiratory Rate - Original LOW Thresholds (up to 123)')
plt.ylim(0, 1700) # Ignore outlier (8) occuring 19,210 times

As the LOW threshold always have to be higher than the HIGH threshold, LOW thresholds over 55 can be removed. Additionally, as already mentioned, LOW thresholds should be at least 10.

In [None]:
RR_threshold_low_cleaned = RR_violations[(RR_violations['THRESHOLD_TYPE'] == 'LOW') & (RR_violations['THRESHOLD_VALUE'] >= 10) & (RR_violations['THRESHOLD_VALUE'] < 55)].sort_values(by=['THRESHOLD_VALUE'])
display(RR_threshold_low_cleaned)
RR_threshold_low_cleaned.THRESHOLD_VALUE.describe() # Ranges from 10 to 50 now

### Respiratory Rate - Cleaned Alarm Thresholds

After data cleaning, the lower threshold (LOW) of the respiratory rate ranges from 10 to 50 and the upper threshold (HIGH) ranges from 11 to 55.

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(20, 15))
fig.suptitle('Respiratory Rate - Cleaned Thresholds', fontsize=18)

sns.boxplot(data=RR_threshold_low_cleaned, x='THRESHOLD_VALUE', ax=axs[0][0])
axs[0][0].set_title('Cleaned LOW Threshold')
axs[0][0].set_xlabel('THRESHOLD_VALUE')

sns.histplot(data=RR_threshold_low_cleaned, x='THRESHOLD_VALUE', ax=axs[0][1])
axs[0][1].set_title('Cleaned LOW Threshold')
axs[0][1].set_xlabel('THRESHOLD_VALUE')

sns.boxplot(data=RR_threshold_high_cleaned, x='THRESHOLD_VALUE', ax=axs[1][0])
axs[1][0].set_title('Cleaned HIGH Threshold')
axs[1][0].set_xlabel('THRESHOLD_VALUE')

sns.histplot(data=RR_threshold_high_cleaned, x='THRESHOLD_VALUE', ax=axs[1][1])
axs[1][1].set_title('Cleaned HIGH Threshold')
axs[1][1].set_xlabel('THRESHOLD_VALUE')

plt.show(fig)

In [None]:
df = pd.concat(axis=0, ignore_index=True, objs=[
    pd.DataFrame.from_dict({'THRESHOLD_VALUE': RR_threshold_low_cleaned['THRESHOLD_VALUE'], 'Threshold': 'LOW'}),
    pd.DataFrame.from_dict({'THRESHOLD_VALUE': RR_threshold_high_cleaned['THRESHOLD_VALUE'], 'Threshold': 'HIGH'})
])

fig, ax = plt.subplots()
fig.suptitle('Respiratory Rate - Cleaned Thresholds', fontsize=12)
sns.histplot(data=df, x='THRESHOLD_VALUE', hue='Threshold', multiple='dodge', bins=range(10, 60, 5), ax=ax)
ax.set_xlabel('THRESHOLD_VALUE (bin size = 5)')

plt.ylim(0, 2500)
for p in ax.patches:
    if p.get_height() > 2200:
        ax.text(x=p.get_x(), y=2200, s=p.get_height())
    else:
        ax.text(x=p.get_x(), y=p.get_height(), s=p.get_height())