# Analysis of Measurement Values and Its Thresholds
## Comparison of Original Values and Values in 'alarm_violations.csv'

In [None]:
import math
import pandas as pd

CHUNK_SIZE = 1000
NROWS = 1000000
DATA_PATH = './mimic-iii-clinical-database-1.4/CHARTEVENTS.csv'

csv_iter = pd.read_csv(DATA_PATH, iterator=True, chunksize=CHUNK_SIZE, nrows=NROWS, usecols=['ITEMID', 'VALUENUM'])
chunks = [0] * math.ceil(NROWS / CHUNK_SIZE)

for i, chunk in enumerate(csv_iter):
    chunks[i] = chunk

df = pd.concat(chunks, axis=0)
del chunks

### Analysis of Heart Rate

In [None]:
hr_values = df[(df['ITEMID'] == 220045)]
hr_values.VALUENUM.describe() # measurements range from -88 to 6,632

In [None]:
hr_values_negative = hr_values[hr_values['VALUENUM'] < 0]
hr_values_negative.VALUENUM.value_counts().sort_index() # 1 outlier

In [None]:
hr_values_above_350 = hr_values[hr_values['VALUENUM'] > 350]
hr_values_above_350.VALUENUM.value_counts().sort_index() # 1 outlier

The lower limit of the measurement values of the heart rate coincide with the ones in 'alarm_violations.csv', but the upper limit in the respective CSV is much higher (86,101). Both extrema in original data set are single outliers.

In [None]:
hr_low = df[(df['ITEMID'] == 220047)]
hr_low.VALUENUM.describe() # LOW thresholds range from 8 to 50,120

In [None]:
hr_low_above_350 = hr_low[hr_low['VALUENUM'] > 350]
hr_low_above_350.VALUENUM.value_counts().sort_index() # 2 outliers

These LOW thresholds of the heart rate coincide with the ones (10 to 85,160) generated in 'alarm_violations.csv'. Both upper limits are definitely too high, but the ones in the original data set are just two outliers.

In [None]:
hr_high = df[(df['ITEMID'] == 220046)]
hr_high.VALUENUM.describe() # HIGH thresholds range from 10 to 1,230

In [None]:
hr_high_above_350 = hr_high[hr_high['VALUENUM'] > 350]
hr_high_above_350.VALUENUM.value_counts().sort_index() # 3 outliers

These HIGH thresholds of the heart rate coincide with the ones (0 to 175) generated in 'alarm_violations.csv'. Its upper limit is definitely too high but these are just two outliers.

### Analysis of Systolic Blood Pressure

In [None]:
nbps_values = df[(df['ITEMID'] == 220179)]
nbps_values.VALUENUM.describe() # measurements range from 0 to 269

These values contradicts the ones from 'alarm_violations.csv' (-69 to ~140k). Beyond that, however, they are in the possible range from 0 to 375.

In [None]:
nbps_low = df[(df['ITEMID'] == 223752)]
nbps_low.VALUENUM.describe() # LOW thresholds range from 20 to 920

In [None]:
nbps_low_above_375 = nbps_low[nbps_low['VALUENUM'] > 375]
nbps_low_above_375.VALUENUM.value_counts().sort_index() # 4 outlier

The highest LOW threshold in 'alarm_violations.csv' is much higher (~95k) as the one in the original data set. In addition, there are just four values that are higher than the maximal possible value 375.

In [None]:
nbps_high = df[(df['ITEMID'] == 223751)]
nbps_high.VALUENUM.describe() # HIGH thresholds range from 1 to 160,150

In [None]:
nbps_high_above_375 = nbps_high[nbps_high['VALUENUM'] > 375]
nbps_high_above_375.VALUENUM.value_counts().sort_index() # 7 outlier

The maximal HIGH threshold in 'alarm_violations.csv' is much lower (240) as the one in the original data set. Actually, even the fact that its minimal value starts at 0 is not possible. Additionally, the HIGH thresholds, that are higher than possible (> 375) are only 7 outliers.

### Analysis of O2 Saturation

In [None]:
o2sat_values = df[(df['ITEMID'] == 220277)]
o2sat_values.VALUENUM.describe() # measurements range from 0 to 100

In contrast to these values, the measurement values of the O2 saturation in 'alarm_violations.csv' go up until 1,000.

In [None]:
o2sat_low = df[(df['ITEMID'] == 223770)]
o2sat_low.VALUENUM.describe() # LOW thresholds range from 2 to 90,100

In [None]:
o2sat_low_above_100 = o2sat_low[o2sat_low['VALUENUM'] > 100]
o2sat_low_above_100.VALUENUM.value_counts().sort_index() # 4 outliers

These LOW thresholds of the O2 saturation coincide with the ones (50 to 90,100) generated in 'alarm_violations.csv'. Nevertheless, the maximal LOW threshold should be at most 99 which is indeed the case, because there are only 4 higher outlier values.

In [None]:
o2sat_high = df[(df['ITEMID'] == 223769)]
o2sat_high.VALUENUM.describe() # HIGH thresholds range from 10 to 1,000

In [None]:
o2sat_high_above_100 = o2sat_high[o2sat_high['VALUENUM'] > 100]
o2sat_high_above_100.VALUENUM.value_counts().sort_index() # 29 outliers

In contrast to these values, the HIGH thresholds of the O2 saturation in 'alarm_violations.csv' only go up until 100 and thus look plausible. Anyway, the higher HIGH thresholds are only 29 outlier values.

### Analysis of Respiratory Rate

In [None]:
rr_values = df[(df['ITEMID'] == 220210)]
rr_values.VALUENUM.describe() # measurements range from 0 to 200

The measurement values of the respiratory rate in 'alarm_violations.csv' range from 0 to 2.35 million which completely contradicts the original values.

In [None]:
rr_low = df[(df['ITEMID'] == 224162)]
rr_low.VALUENUM.describe() # LOW thresholds range from 0 to 93

Maximal LOW threshold of the respiratory rate is 93 which speaks against the values in the million range from 'alarm_violations.csv'.

In [None]:
rr_high = df[(df['ITEMID'] == 224161)]
rr_high.VALUENUM.describe() # HIGH thresholds range from 0 to 160

The HIGH thresholds of the respiratory rate in 'alarm_violations.csv' range from 0 to 55 which is plausible regarding the range found in 'CHARTEVENTS.csv'.

### Analysis of Minute Volume

Tbc.