# Creation of Sampling Rate for Vital Parameter

Aim: Analyze Sampling Rates by ICUSTAY ID an ITEMID to generate an overview

The sampling rate describes the average of vital parameter measurements obtained in one hour for a specific vital parameter - The threshold values are not analyzed

Structure of the new data frame (sampling_rates_for_icustay_itemid.parquet):
* ICUSTAY_ID
* ITEMID
* CHARTTIME_MIN -> minimum timestamp for that ICUSTAY_ID - ITEMID. "When was the first measurement of this parameter conducted for this ICUSTAY_ID?"
* CHARTTIME_MAX -> maximum timestamp for that ICUSTAY_ID - ITEMID. "When was the last measurement of this parameter conducted for this ICUSTAY_ID?"
* ICUSTAY_DURATION(h) -> timedelta between first and last timestamp for that ICUSTAY_ID - ITEMID. "How much time has passed between the first and the last measurement in hours?"
* VALUENUM_COUNT -> number of measurements for this ICUSTAY_ID - ITEMID. "How many measurements are available over the entire period for this parameter for this ICUSTAY_ID?"
* SAMPLING_RATE -> number of measurements for this ICUSTAY_ID - ITEMID divided by the timedelta between first and last timestamp for that ICUSTAY_ID - ITEMID in hours. "How many measurements were obtained on average per hour for this ICUSTAY_ID - ITEMID?"

## Load and Prepare Data

First, a filter is being applied to chartevents that filters on relevant ICUSTAY IDs and ITEM IDs.

In [1]:
import pandas as pd
import pyarrow as pa

# Read chartevents_subset from parquet file to pandas data frame
chartevents_subset = pd.read_parquet('./data/chartevents_subset.parquet', engine='pyarrow')
unique_icu_stays = pd.read_parquet('./data/unique_icustays_in_chartevents_subset.parquet', engine='pyarrow')

In [2]:
# Select relevant ICUSTAY_IDs for analysis - only the ones appearing for the analyzed ITEM IDs
icustayid_filter = unique_icu_stays.ICUSTAY_ID

# Filter by ICU_stay
sampling_rate_data = chartevents_subset[chartevents_subset.ICUSTAY_ID.isin(icustayid_filter)].copy()

In [3]:
# Sampling Rate Analysis is only being conducted on the vital parameters, not thresholds
# Filter for ITEMIDs that refer to vital parameter 
# Heart Rate: 220045 | NBP: 220179 | O2: 220277
itemids_for_values_filter = [220045, 220179, 220277]
sampling_rate_data = sampling_rate_data[sampling_rate_data.ITEMID.isin(itemids_for_values_filter)].copy()


## Fill Sampling Rate Data Frame

Calculate the relevant columns with groupby statements, as this turned out to be much faster than a for-loop.
One row is being generated per ICUSTAY_ID - ITEMID combination.

In [4]:
# Calculate CHARTTIME_MIN for each ICUSTAYID - ITEMID combination
sampling_rate_data_min = sampling_rate_data.groupby(['ICUSTAY_ID','ITEMID'])['CHARTTIME'].min()
sampling_rate_data_min_df = sampling_rate_data_min.to_frame()
sampling_rate_data_min_df.reset_index(inplace=True)
sampling_rate_data_min_df = sampling_rate_data_min_df.rename(columns = {'CHARTTIME':'CHARTTIME_MIN'})
len(sampling_rate_data_min_df)

70015

In [5]:
# Calculate CHARTTIME_MAX for each ICUSTAYID - ITEMID combination
sampling_rate_data_max = sampling_rate_data.groupby(['ICUSTAY_ID','ITEMID'])['CHARTTIME'].max()
sampling_rate_data_max_df = sampling_rate_data_max.to_frame()
sampling_rate_data_max_df.reset_index(inplace=True)
sampling_rate_data_max_df = sampling_rate_data_max_df.rename(columns = {'CHARTTIME':'CHARTTIME_MAX'})
len(sampling_rate_data_max_df)

70015

In [6]:
# Calculate VALUENUM_COUNT for each ICUSTAYID - ITEMID combination
sampling_rate_data_count = sampling_rate_data[['ICUSTAY_ID','ITEMID','VALUENUM']].copy()
sampling_rate_data_count = sampling_rate_data_count.groupby(['ICUSTAY_ID','ITEMID']).count()
sampling_rate_data_count = sampling_rate_data_count.rename(columns = {'VALUENUM':'VALUENUM_COUNT'})
sampling_rate_data_count = sampling_rate_data_count.reset_index()


In [8]:
# Merge together by ICUSTAYID, ITEMID
# Resulting data frame columns: ICUSTAY_ID, ITEMID, CHARTTIME_MIN, CHARTTIME_MAX, VALUENUM_COUNT
sampling_rates_for_icustay_itemid = pd.merge(sampling_rate_data_min_df, sampling_rate_data_max_df,  how='left', on=['ICUSTAY_ID','ITEMID'])
sampling_rates_for_icustay_itemid = pd.merge(sampling_rates_for_icustay_itemid,sampling_rate_data_count,how='left', on=['ICUSTAY_ID','ITEMID'])
len(sampling_rates_for_icustay_itemid)
sampling_rates_for_icustay_itemid.head()

Unnamed: 0,ICUSTAY_ID,ITEMID,CHARTTIME_MIN,CHARTTIME_MAX,VALUENUM_COUNT
0,200001.0,220045.0,2181-11-25 19:06:00,2181-11-28 20:00:00,99
1,200001.0,220179.0,2181-11-25 19:08:00,2181-11-28 20:00:00,31
2,200001.0,220277.0,2181-11-25 19:14:00,2181-11-28 20:00:00,96
3,200010.0,220045.0,2132-08-05 00:39:00,2132-08-05 21:00:00,23
4,200010.0,220179.0,2132-08-05 00:40:00,2132-08-05 21:17:00,21


In [None]:
import datetime as dt
# Calculate ICUSTAY_DURATION(h) for each ICUSTAYID - ITEMID combination
sampling_rates_for_icustay_itemid['ICUSTAY_DURATION'] = sampling_rates_for_icustay_itemid['CHARTTIME_MAX']-sampling_rates_for_icustay_itemid['CHARTTIME_MIN']
sampling_rates_for_icustay_itemid['ICUSTAY_DURATION(s)'] = sampling_rates_for_icustay_itemid['ICUSTAY_DURATION'].dt.total_seconds()
sampling_rates_for_icustay_itemid['ICUSTAY_DURATION(h)'] = divmod(sampling_rates_for_icustay_itemid['ICUSTAY_DURATION(s)'], 3600)[0]

#drop helper columns
sampling_rates_for_icustay_itemid = sampling_rates_for_icustay_itemid.drop(columns=['ICUSTAY_DURATION','ICUSTAY_DURATION(s)'])

In [None]:
import numpy as np
# Calculate SAMPLING_RATE for each ICUSTAYID - ITEMID combination
# If the ICUSTAY_DURATION(h) is zero, take VALUENUM_COUNT as SAMPLING_RATE to not divide by 0
sampling_rates_for_icustay_itemid['SAMPLING_RATE'] = np.where(sampling_rates_for_icustay_itemid['ICUSTAY_DURATION(h)'] == 0,sampling_rates_for_icustay_itemid['VALUENUM_COUNT'],(sampling_rates_for_icustay_itemid['VALUENUM_COUNT']/sampling_rates_for_icustay_itemid['ICUSTAY_DURATION(h)']))
         

## Save Data Frame to parquet File

In [None]:
import pandas as pd
import pyarrow as pa
import numpy as np
# Save sampling_rates_for_icustay_itemid as parquet file
pd.DataFrame(sampling_rates_for_icustay_itemid).to_parquet('./data/sampling_rates_for_icustay_itemid.parquet', engine='pyarrow')