# Medication Admin Intermittent - Outlier Handling Demo

This notebook demonstrates how to load intermittent medication administration data and apply outlier handling based on medication category and dose unit combinations.

In [1]:
import pandas as pd
from clifpy.tables.medication_admin_intermittent import MedicationAdminIntermittent
from clifpy.utils.outlier_handler import apply_outlier_handling

## Step 1: Load Intermittent Medication Data

Load data with filters for specific sedative medications.

In [4]:
# Note: Adjust cohort_hosp_ids to match your dataset or remove the filter
# cohort_hosp_ids = [...]  # Your hospitalization IDs here

intm_sed = MedicationAdminIntermittent.from_file(
    config_path='../config/config.yaml',
    columns=[
        'hospitalization_id', 'admin_dttm', 'med_name', 'med_category', 
        'med_dose', 'med_dose_unit', 'mar_action_name', 'mar_action_category'
    ],
    filters={
        'med_category': ['hydromorphone', 'fentanyl', 'lorazepam', 'midazolam', 'propofol'],
        # 'hospitalization_id': cohort_hosp_ids  # Uncomment if you have specific IDs
    }
)

ðŸ“¢ Initialized medication_admin_intermittent table
ðŸ“¢ Data directory: /Users/wliao0504/code/clif/CLIF-MIMIC/output/rclif-dev-test
ðŸ“¢ File type: parquet
ðŸ“¢ Timezone: US/Eastern
ðŸ“¢ Output directory: output
ðŸ“¢ Loaded schema from /Users/wliao0504/code/clif/pyCLIF/clifpy/schemas/medication_admin_intermittent_schema.yaml
ðŸ“¢ Loaded outlier configuration


## Step 2: Inspect the Data Before Outlier Handling

In [5]:
print(f"Total rows: {len(intm_sed.df):,}")
print(f"\nData shape: {intm_sed.df.shape}")
print(f"\nColumns: {list(intm_sed.df.columns)}")

Total rows: 537,636

Data shape: (537636, 8)

Columns: ['hospitalization_id', 'admin_dttm', 'med_name', 'med_category', 'med_dose', 'med_dose_unit', 'mar_action_name', 'mar_action_category']


In [6]:
# Check med_dose statistics before outlier handling
print("\nmed_dose statistics BEFORE outlier handling:")
print(intm_sed.df['med_dose'].describe())
print(f"\nNon-null med_dose count: {intm_sed.df['med_dose'].notna().sum():,}")


med_dose statistics BEFORE outlier handling:
count    537636.000000
mean         22.267834
std          28.487940
min           0.000000
25%           1.000000
50%          10.000000
75%          40.000000
max        1500.000000
Name: med_dose, dtype: float64

Non-null med_dose count: 537,636


In [7]:
# Check medication category and unit combinations
print("\nMedication category and unit combinations:")
med_unit_combo = intm_sed.df.groupby(['med_category', 'med_dose_unit'])['med_dose'].agg(['count', 'mean', 'min', 'max']).reset_index()
print(med_unit_combo)


Medication category and unit combinations:
    med_category med_dose_unit   count        mean         min          max
0       fentanyl           mcg  224228   46.038128    0.000000  1500.000000
1       fentanyl            mg     323    0.241360    0.000833     2.000000
2  hydromorphone          dose       1    1.000000    1.000000     1.000000
3  hydromorphone           mcg       7  178.571442  125.000008   250.000015
4  hydromorphone            mg  110210    0.797934    0.000000   165.259552
5      lorazepam            mg   42035    1.068767    0.000000    41.000000
6      midazolam            mg   64588    1.459700    0.000000    75.000000
7       propofol           mcg     297   14.351851    2.500000   150.000000
8       propofol            mg   95947   14.760398    0.000000  1440.000000


## Step 3: Apply Outlier Handling

This will use the ranges defined in `outlier_config.yaml` for medication_admin_intermittent:

- propofol (mg): 0.0 - 400.0
- midazolam (mg): 0.0 - 20.0
- fentanyl (mcg): 0.0 - 500.0
- hydromorphone (mg): 0.0 - 4.0
- lorazepam (mg): 0.0 - 10.0

In [8]:
# Apply outlier handling using CLIF standard config
apply_outlier_handling(intm_sed)

# Or use custom config:
# apply_outlier_handling(intm_sed, outlier_config_path='config/outlier_config.yaml')

Using CLIF standard outlier ranges

Building outlier expressions...


Building expressions: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1/1 [00:00<00:00, 2341.88column/s]


Applying outlier filtering...


Processing: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1/1 [00:00<00:00, 44.06operation/s]


Medication Table - Category/Unit Statistics:
  fentanyl (mcg)                : 224228 values â†’     31 nullified (  0.0%)
  fentanyl (mg)                 :    323 values â†’      0 nullified (  0.0%)
  hydromorphone (dose)          :      1 values â†’      0 nullified (  0.0%)
  hydromorphone (mcg)           :      7 values â†’      0 nullified (  0.0%)
  hydromorphone (mg)            : 110210 values â†’    562 nullified (  0.5%)
  lorazepam (mg)                :  42035 values â†’     12 nullified (  0.0%)
  midazolam (mg)                :  64588 values â†’     16 nullified (  0.0%)
  propofol (mcg)                :    297 values â†’      0 nullified (  0.0%)
  propofol (mg)                 :  95947 values â†’     10 nullified (  0.0%)





## Step 4: Inspect the Data After Outlier Handling

In [9]:
# Check med_dose statistics after outlier handling
print("\nmed_dose statistics AFTER outlier handling:")
print(intm_sed.df['med_dose'].describe())
print(f"\nNon-null med_dose count: {intm_sed.df['med_dose'].notna().sum():,}")


med_dose statistics AFTER outlier handling:
count    537005.000000
mean         22.226227
std          27.707930
min           0.000000
25%           1.000000
50%          10.000000
75%          40.000000
max         450.000000
Name: med_dose, dtype: float64

Non-null med_dose count: 537,005


In [10]:
# Check medication category and unit combinations after
print("\nMedication category and unit combinations AFTER outlier handling:")
med_unit_combo_after = intm_sed.df.groupby(['med_category', 'med_dose_unit'])['med_dose'].agg(['count', 'mean', 'min', 'max']).reset_index()
print(med_unit_combo_after)


Medication category and unit combinations AFTER outlier handling:
    med_category med_dose_unit   count        mean         min         max
0       fentanyl           mcg  224197   45.939426    0.000000  450.000000
1       fentanyl            mg     323    0.241360    0.000833    2.000000
2  hydromorphone          dose       1    1.000000    1.000000    1.000000
3  hydromorphone           mcg       7  178.571442  125.000008  250.000015
4  hydromorphone            mg  109648    0.751006    0.000000    4.000000
5      lorazepam            mg   42023    1.063825    0.000000   10.000000
6      midazolam            mg   64572    1.449159    0.000000   20.000000
7       propofol           mcg     297   14.351851    2.500000  150.000000
8       propofol            mg   95937   14.696059    0.000000  400.000000


## Step 5: Verify Medication Schema Data Loaded

In [11]:
# Check that medication-specific features are available
print("\nMedication category to group mapping:")
print(intm_sed.med_category_to_group_mapping)

# Note: This may return an empty dict if the schema doesn't have the mapping section
# That's expected and the code handles it gracefully


Medication category to group mapping:
{}


## Step 6: Sample the Data

In [12]:
# Display a sample of the data
print("\nSample of processed data:")
intm_sed.df.head(10)


Sample of processed data:


Unnamed: 0,hospitalization_id,admin_dttm,med_name,med_category,med_dose,med_dose_unit,mar_action_name,mar_action_category
0,26839898,2167-07-24 05:31:00-05:00,Fentanyl,fentanyl,25.0,mcg,FinishedRunning,given
1,26839898,2167-07-20 10:19:00-05:00,Fentanyl,fentanyl,25.0,mcg,FinishedRunning,given
2,26839898,2167-07-20 11:00:00-05:00,Fentanyl,fentanyl,25.0,mcg,FinishedRunning,given
3,26839898,2167-07-23 21:02:00-05:00,Fentanyl,fentanyl,25.0,mcg,FinishedRunning,given
4,26839898,2167-07-24 01:23:00-05:00,Fentanyl,fentanyl,25.0,mcg,FinishedRunning,given
5,26839898,2167-07-24 00:06:00-05:00,Fentanyl,fentanyl,25.0,mcg,FinishedRunning,given
6,26840482,2150-04-11 05:21:00-05:00,Propofol,propofol,10.0,mg,FinishedRunning,given
7,26840778,2171-11-27 18:30:00-05:00,Lorazepam (Ativan),lorazepam,1.0,mg,FinishedRunning,given
8,26840778,2171-11-27 22:28:00-05:00,Lorazepam (Ativan),lorazepam,0.5,mg,FinishedRunning,given
9,26840778,2171-11-27 18:20:00-05:00,Lorazepam (Ativan),lorazepam,1.0,mg,FinishedRunning,given


In [13]:
# Check for any extreme values that should have been nullified
print("\nChecking for potential outliers that should have been caught:")
for med_cat in ['propofol', 'midazolam', 'fentanyl', 'hydromorphone', 'lorazepam']:
    subset = intm_sed.df[intm_sed.df['med_category'] == med_cat]
    if len(subset) > 0:
        max_dose = subset['med_dose'].max()
        print(f"{med_cat}: max dose = {max_dose}")


Checking for potential outliers that should have been caught:
propofol: max dose = 400.0
midazolam: max dose = 20.0
fentanyl: max dose = 450.0
hydromorphone: max dose = 250.00001525878906
lorazepam: max dose = 10.0


## Expected Results

After running outlier handling, you should see:

1. Values outside the configured ranges converted to NaN
2. Statistics showing how many values were nullified per medication/unit combination
3. Max values should be within the configured ranges:
   - propofol (mg): â‰¤ 400.0
   - midazolam (mg): â‰¤ 20.0
   - fentanyl (mcg): â‰¤ 500.0
   - hydromorphone (mg): â‰¤ 4.0
   - lorazepam (mg): â‰¤ 10.0