# **Patient annotation of extubation success/failure**

For the purposes of this study, extubation failure is defined as follows (definition determined based on literature and clinical discussion with Dr Murali):

- Re-intubation within 48 hours of extubation
- Mortality within 48 hours of extubation
- OR placed on ventilatory support (Non-invasive ventilation) within 6 hours post extubation

The rationale is as follows: Re-intubation / death within 48 hours is a standard definition used across the literature for extubation failure. The goal of placing a patient on ventilation is for them to be eventually extubated and if they require further ventilation or die, ventilatory support and consequent extubation cannot be deemed to be successful.

If neither is applicable but they are required to be placed on ventilatory support such as CPAP and BIPAP, which are all forms of Non-invasive ventilation. As per Dr Murali, were this to be required within 6 hours of extubation - it is indicative that these patients have not recovered fully and this group would likley struggle with further intervention. As such, this is deemed to be classed as failure.


We will apply each criterion one at a time with the intention of fully annotating the derived patient set for model training.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd

In [None]:
# Load patient set
patients_file_path = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/mimic_data_analysis/datasets/exclusion_criteria/criteria_6.parquet'
patients_df = pd.read_parquet(patients_file_path)
patients_df.shape[0]

5970

**Re-intubation within 48 hours of extubation**

Re-intubation (invasive ventilation) can be determined by looking at the procedure_events table.

Intubation events have the itemid: 244385. If a patient has an Intubation event where the starttime is within 48 hours of the extubation endtime of their first ventilation stay (already filtered), then this is classed as failure.

This can be further refined/validated by looking at Invasive ventilation events as well.

Invasive ventilation is represented by the item_id: 225792. If a patient has an Invasive ventilation event where the start_time is within 48 hours of extubation_endtime then they will be classed as failure.

In [None]:
# Load the procedure events table
procedure_file_path = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/mimic-iv-2.2-raw-data/icu/procedureevents.csv'
procedure_events_df = pd.read_csv(procedure_file_path)
procedure_events_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,caregiver_id,starttime,endtime,storetime,itemid,value,valueuom,...,orderid,linkorderid,ordercategoryname,ordercategorydescription,patientweight,isopenbag,continueinnextdept,statusdescription,originalamount,originalrate
0,10000032,29079034,39553978,88981.0,2180-07-23 14:43:00,2180-07-23 14:44:00,2180-07-23 14:43:00,225966,1.0,,...,6416557,6416557,Procedures,Task,39.4,0,0,FinishedRunning,1.0,0
1,10000032,29079034,39553978,,2180-07-23 14:24:00,2180-07-23 23:50:00,2180-07-23 23:50:49.983,224275,566.0,min,...,6497934,6497934,Peripheral Lines,ContinuousProcess,39.4,1,0,FinishedRunning,566.0,1
2,10000032,29079034,39553978,,2180-07-23 14:24:00,2180-07-23 23:50:00,2180-07-23 23:50:49.983,224277,566.0,min,...,9643097,9643097,Peripheral Lines,ContinuousProcess,39.4,1,0,FinishedRunning,566.0,1
3,10000980,26913865,39765666,,2189-06-27 09:01:00,2189-06-27 20:38:00,2189-06-27 20:38:29.047,225794,697.0,min,...,5989583,5989583,Ventilation,ContinuousProcess,76.2,1,0,FinishedRunning,697.0,1
4,10000980,26913865,39765666,,2189-06-27 09:15:00,2189-06-27 20:38:00,2189-06-27 20:38:29.047,224277,683.0,min,...,476764,476764,Peripheral Lines,ContinuousProcess,76.2,1,0,FinishedRunning,683.0,1


In [None]:
procedure_events_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 696092 entries, 0 to 696091
Data columns (total 22 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   subject_id                696092 non-null  int64  
 1   hadm_id                   696092 non-null  int64  
 2   stay_id                   696092 non-null  int64  
 3   caregiver_id              562488 non-null  float64
 4   starttime                 696092 non-null  object 
 5   endtime                   696092 non-null  object 
 6   storetime                 696092 non-null  object 
 7   itemid                    696092 non-null  int64  
 8   value                     696092 non-null  float64
 9   valueuom                  338500 non-null  object 
 10  location                  152930 non-null  object 
 11  locationcategory          152930 non-null  object 
 12  orderid                   696092 non-null  int64  
 13  linkorderid               696092 non-null  i

In [None]:
# Only focus on intubation events
intubation_events_df = procedure_events_df[procedure_events_df['itemid'] == 244385]
intubation_events_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,caregiver_id,starttime,endtime,storetime,itemid,value,valueuom,...,orderid,linkorderid,ordercategoryname,ordercategorydescription,patientweight,isopenbag,continueinnextdept,statusdescription,originalamount,originalrate


Inbtubation events have not been recorded or are not present, hence we will focus on invasive ventilation starttimes.

In [None]:
# Only focus on invasive ventilation
invasive_vent_df = procedure_events_df[procedure_events_df['itemid'] == 225792]
invasive_vent_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,caregiver_id,starttime,endtime,storetime,itemid,value,valueuom,...,orderid,linkorderid,ordercategoryname,ordercategorydescription,patientweight,isopenbag,continueinnextdept,statusdescription,originalamount,originalrate
20,10001884,26184834,37510196,42150.0,2131-01-11 04:40:00,2131-01-12 17:40:00,2131-01-12 17:49:00,225792,2220.0,min,...,3830120,3830120,Ventilation,ContinuousProcess,65.0,1,0,FinishedRunning,2220.0,1
27,10001884,26184834,37510196,91332.0,2131-01-13 04:00:00,2131-01-19 17:45:00,2131-01-19 18:44:00,225792,9465.0,min,...,4465887,4465887,Ventilation,ContinuousProcess,65.0,1,0,FinishedRunning,9465.0,1
28,10001884,26184834,37510196,91332.0,2131-01-15 04:07:00,2131-01-19 17:43:00,2131-01-19 18:44:00,225792,6576.0,min,...,1861924,1861924,Ventilation,ContinuousProcess,65.0,1,0,FinishedRunning,6576.0,1
33,10002013,23581541,39060235,27479.0,2160-05-18 14:19:00,2160-05-18 18:01:00,2160-05-18 18:39:00,225792,222.0,min,...,4169380,4169380,Ventilation,ContinuousProcess,96.0,1,0,FinishedRunning,222.0,1
80,10002428,23473524,35479615,27479.0,2156-05-11 16:05:00,2156-05-20 10:45:00,2156-05-20 10:51:00,225792,12640.0,min,...,3976442,3976442,Ventilation,ContinuousProcess,48.4,1,0,FinishedRunning,12640.0,1


In [None]:
# Convert dates to datetime
invasive_vent_df['starttime'] = pd.to_datetime(invasive_vent_df['starttime'])
invasive_vent_df['endtime'] = pd.to_datetime(invasive_vent_df['endtime'])

In [None]:
invasive_vent_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 30710 entries, 20 to 696079
Data columns (total 22 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   subject_id                30710 non-null  int64         
 1   hadm_id                   30710 non-null  int64         
 2   stay_id                   30710 non-null  int64         
 3   caregiver_id              27749 non-null  float64       
 4   starttime                 30710 non-null  datetime64[ns]
 5   endtime                   30710 non-null  datetime64[ns]
 6   storetime                 30710 non-null  object        
 7   itemid                    30710 non-null  int64         
 8   value                     30710 non-null  float64       
 9   valueuom                  30710 non-null  object        
 10  location                  0 non-null      object        
 11  locationcategory          0 non-null      object        
 12  orderid              

In [None]:
# Create and set extubation_failure column to 0
patients_df['extubation_failure'] = 0
patients_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,0
1,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0
2,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,0
3,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,0
4,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0


In [None]:
patients_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5970 entries, 0 to 6047
Data columns (total 14 columns):
 #   Column                         Non-Null Count  Dtype         
---  ------                         --------------  -----         
 0   subject_id                     5970 non-null   int64         
 1   hadm_id                        5970 non-null   int64         
 2   stay_id                        5970 non-null   int64         
 3   ventilation_starttime          5970 non-null   datetime64[ns]
 4   ventilation_endtime            5970 non-null   datetime64[ns]
 5   ventilation_itemid             5970 non-null   int64         
 6   ventilation_ordercategoryname  5970 non-null   object        
 7   extubation_starttime           5970 non-null   datetime64[ns]
 8   extubation_endtime             5970 non-null   datetime64[ns]
 9   extubation_itemid              5970 non-null   int64         
 10  extubation_ordercategoryname   5970 non-null   object        
 11  ventilation_duration  

In [None]:
from datetime import timedelta

In [None]:
# Check for re-intubation within 48 hours of extubation
patients_df['extubation_failure'] = 0

for index, row in patients_df.iterrows():
  subject_id = row['subject_id']
  extubation_endtime = row['extubation_endtime']
  time_window_end = extubation_endtime + timedelta(hours=48)

  reintubations = invasive_vent_df[
      (invasive_vent_df['subject_id'] == subject_id) &
      (invasive_vent_df['starttime'] > extubation_endtime) &
      (invasive_vent_df['starttime'] <= time_window_end)
  ]

  if not reintubations.empty:
    patients_df.at[index, 'extubation_failure'] = 1

patients_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1
1,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0
2,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,0
3,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,0
4,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0


In [None]:
extubation_failure_count = patients_df['extubation_failure'].sum()

print(f"Number of extubation failures so far: {extubation_failure_count}")

Number of extubation failures so far: 595


**Mortality within 48 hours**

Mortality can be determined by looking at deathtime from admissions table.

In [None]:
# Load the admissions table
admissions = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/mimic-iv-2.2-raw-data/hosp/admissions.csv'
admissions_df = pd.read_csv(admissions)
admissions_df.head()

Unnamed: 0,subject_id,hadm_id,admittime,dischtime,deathtime,admission_type,admit_provider_id,admission_location,discharge_location,insurance,language,marital_status,race,edregtime,edouttime,hospital_expire_flag
0,10000032,22595853,2180-05-06 22:23:00,2180-05-07 17:15:00,,URGENT,P874LG,TRANSFER FROM HOSPITAL,HOME,Other,ENGLISH,WIDOWED,WHITE,2180-05-06 19:17:00,2180-05-06 23:30:00,0
1,10000032,22841357,2180-06-26 18:27:00,2180-06-27 18:49:00,,EW EMER.,P09Q6Y,EMERGENCY ROOM,HOME,Medicaid,ENGLISH,WIDOWED,WHITE,2180-06-26 15:54:00,2180-06-26 21:31:00,0
2,10000032,25742920,2180-08-05 23:44:00,2180-08-07 17:50:00,,EW EMER.,P60CC5,EMERGENCY ROOM,HOSPICE,Medicaid,ENGLISH,WIDOWED,WHITE,2180-08-05 20:58:00,2180-08-06 01:44:00,0
3,10000032,29079034,2180-07-23 12:35:00,2180-07-25 17:55:00,,EW EMER.,P30KEH,EMERGENCY ROOM,HOME,Medicaid,ENGLISH,WIDOWED,WHITE,2180-07-23 05:54:00,2180-07-23 14:00:00,0
4,10000068,25022803,2160-03-03 23:16:00,2160-03-04 06:26:00,,EU OBSERVATION,P51VDL,EMERGENCY ROOM,,Other,ENGLISH,SINGLE,WHITE,2160-03-03 21:55:00,2160-03-04 06:26:00,0


In [None]:
admissions_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 431231 entries, 0 to 431230
Data columns (total 16 columns):
 #   Column                Non-Null Count   Dtype 
---  ------                --------------   ----- 
 0   subject_id            431231 non-null  int64 
 1   hadm_id               431231 non-null  int64 
 2   admittime             431231 non-null  object
 3   dischtime             431231 non-null  object
 4   deathtime             8598 non-null    object
 5   admission_type        431231 non-null  object
 6   admit_provider_id     431227 non-null  object
 7   admission_location    431231 non-null  object
 8   discharge_location    312076 non-null  object
 9   insurance             431231 non-null  object
 10  language              431231 non-null  object
 11  marital_status        421998 non-null  object
 12  race                  431231 non-null  object
 13  edregtime             299282 non-null  object
 14  edouttime             299282 non-null  object
 15  hospital_expire_f

In [None]:
# Merge patient list with admissions to get death time
patients_merged_df = patients_df.merge(admissions_df[['subject_id', 'deathtime']], on='subject_id', how='left')
patients_merged_df['deathtime'] = pd.to_datetime(patients_merged_df['deathtime'])
patients_merged_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure,deathtime
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1,NaT
1,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1,NaT
2,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1,NaT
3,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1,NaT
4,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1,NaT


In [None]:
# Drop duplicates
patients_merged_df = patients_merged_df.drop_duplicates(subset=['subject_id'])
patients_merged_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure,deathtime
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1,NaT
22,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0,NaT
29,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,0,NaT
32,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,0,2186-11-17 18:30:00
33,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0,NaT


In [None]:
patients_merged_df.shape[0]

5970

In [None]:
# Check for a valid deathtime within 48 hours of extubation_endtime
for index, row in patients_merged_df.iterrows():
  if row['extubation_failure'] == 0: # If not already classed as failure
    deathtime = row['deathtime']
    extubation_endtime = row['extubation_endtime']
    time_window_end = extubation_endtime + timedelta(hours=48)

    if pd.notna(deathtime) and extubation_endtime <= deathtime <= time_window_end:
      patients_merged_df.at[index, 'extubation_failure'] = 1

patients_merged_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure,deathtime
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1,NaT
22,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0,NaT
29,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,0,NaT
32,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,1,2186-11-17 18:30:00
33,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0,NaT


In [None]:
# Count the number of 1s in the extubation_failure column
extubation_failure_count = patients_merged_df['extubation_failure'].sum()
print(f"Number of extubation failures considering re-intubation and mortality within 48 hours: {extubation_failure_count}")

Number of extubation failures considering re-intubation and mortality within 48 hours: 1082


In [None]:
patients_df = patients_merged_df

**Placed on ventilatory support (Non-invasive ventilation) within 6 hours post extubation**

As agreed with Dr Murali.

NIV has itemid: 225794. Any patient that has a starttime with this itemid within 6 hours of extubation_endtime will be classed as failure.

This will then be enriched with itemid's of specific NIV events as indicated by Dr Murali that can be found in the chart events table. These are highlighted below:

- 227287: O2 Flow (additional cannula)
- 227577: BiPap Mode
- 227578: BiPap Mask
- 227579: BiPap EPAP
- 227580: BiPap IPAP
- 227581: BiPap bpm (S/T -Back up)
- 227582: BiPap O2 Flow
- 227583: Autoset/CPAP

In [None]:
# Only focus on non-invasive ventilation events
non_invasive_vent_df = procedure_events_df[procedure_events_df['itemid'] == 225794]
non_invasive_vent_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,caregiver_id,starttime,endtime,storetime,itemid,value,valueuom,...,orderid,linkorderid,ordercategoryname,ordercategorydescription,patientweight,isopenbag,continueinnextdept,statusdescription,originalamount,originalrate
3,10000980,26913865,39765666,,2189-06-27 09:01:00,2189-06-27 20:38:00,2189-06-27 20:38:29.047,225794,697.0,min,...,5989583,5989583,Ventilation,ContinuousProcess,76.2,1,0,FinishedRunning,697.0,1
14,10001884,26184834,37510196,31763.0,2131-01-12 21:30:00,2131-01-13 04:00:00,2131-01-15 04:07:00,225794,390.0,min,...,4809276,4809276,Ventilation,ContinuousProcess,65.0,1,0,FinishedRunning,390.0,1
45,10002155,20345487,32358465,47007.0,2131-03-10 00:15:00,2131-03-10 07:59:00,2131-03-10 16:26:00,225794,464.0,min,...,1744937,1744937,Ventilation,ContinuousProcess,21.1,1,0,FinishedRunning,464.0,1
71,10002428,20321825,34807493,99293.0,2156-04-30 22:54:00,2156-05-02 05:03:00,2156-05-02 09:42:00,225794,1809.0,min,...,8673415,8673415,Ventilation,ContinuousProcess,55.0,1,0,FinishedRunning,1809.0,1
114,10002495,24982426,36753294,6579.0,2141-05-23 20:15:00,2141-05-24 01:36:00,2141-05-24 06:47:00,225794,321.0,min,...,183864,183864,Ventilation,ContinuousProcess,64.1,1,0,FinishedRunning,321.0,1


In [None]:
# Convert dates to datetime
non_invasive_vent_df['starttime'] = pd.to_datetime(non_invasive_vent_df['starttime'])
non_invasive_vent_df['endtime'] = pd.to_datetime(non_invasive_vent_df['endtime'])
non_invasive_vent_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3011 entries, 3 to 696076
Data columns (total 22 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   subject_id                3011 non-null   int64         
 1   hadm_id                   3011 non-null   int64         
 2   stay_id                   3011 non-null   int64         
 3   caregiver_id              2556 non-null   float64       
 4   starttime                 3011 non-null   datetime64[ns]
 5   endtime                   3011 non-null   datetime64[ns]
 6   storetime                 3011 non-null   object        
 7   itemid                    3011 non-null   int64         
 8   value                     3011 non-null   float64       
 9   valueuom                  3011 non-null   object        
 10  location                  0 non-null      object        
 11  locationcategory          0 non-null      object        
 12  orderid                

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_invasive_vent_df['starttime'] = pd.to_datetime(non_invasive_vent_df['starttime'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_invasive_vent_df['endtime'] = pd.to_datetime(non_invasive_vent_df['endtime'])


In [None]:
patients_df.drop(columns=['deathtime'], inplace=True)
patients_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1
22,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0
29,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,0
32,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,1
33,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0


In [None]:
patients_df.shape[0]

5970

In [None]:
# Check for NIV within 6 hours of extubation endtime
for index, row in patients_df.iterrows():
  subject_id = row['subject_id']
  extubation_endtime = row['extubation_endtime']
  time_window_end = extubation_endtime + timedelta(hours=6)

  niv_events = non_invasive_vent_df[
      (non_invasive_vent_df['subject_id'] == subject_id) &
      (non_invasive_vent_df['starttime'] > extubation_endtime) &
      (non_invasive_vent_df['starttime'] <= time_window_end)
  ]

  if not niv_events.empty:
    patients_df.at[index, 'extubation_failure'] = 1

patients_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1
22,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0
29,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,0
32,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,1
33,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0


In [None]:
extubation_failure_count = patients_df['extubation_failure'].sum()
print(f"Number of extubation failures including NIV: {extubation_failure_count}")

Number of extubation failures including NIV: 1147


In [None]:
# Save progress in parquet file
patients_df.to_parquet('/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/mimic_data_analysis/datasets/annotated_set/annotation_v01.parquet')

Enrichment of annotation based on supplementary respiratory support using relevant items from chartevents data.

- 227287: O2 Flow (additional cannula)
- 227577: BiPap Mode
- 227578: BiPap Mask
- 227579: BiPap EPAP
- 227580: BiPap IPAP
- 227581: BiPap bpm (S/T -Back up)
- 227582: BiPap O2 Flow
- 227583: Autoset/CPAP

In [None]:
pip install dask

In [None]:
import dask.dataframe as dd
import pandas as pd

In [None]:
# Read in chartevents datafile into a dask dataframe
chart_file = '/Users/akram/Documents/Final Project/data_analysis/chartevents.csv'

dtypes = {'itemid': 'int64',
    'caregiver_id': 'float64',
       'value': 'object',
       'valuenum': 'float64',
       'warning': 'float64'}

chartevents_df = dd.read_csv(chart_file, dtype=dtypes)

In [None]:
# Read in patient data
patients_file_path = '/Users/akram/Documents/Final Project/data_analysis/annotation_v01.parquet'
patients_df = pd.read_parquet(patients_file_path)
patients_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1
22,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0
29,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,0
32,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,1
33,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0


In [None]:
patients_df.shape[0]

5970

In [None]:
# Define NIV related events
niv_itemid = {
    227287,  # O2 Flow (additional cannula)
    227577,  # BiPap Mode
    227578,  # BiPap Mask
    227579,  # BiPap EPAP
    227580,  # BiPap IPAP
    227581,  # BiPap bpm (S/T -Back up)
    227582,  # BiPap O2 Flow
    227583   # Autoset/CPAP
}

In [None]:
niv_chartevents_df = chartevents_df[chartevents_df['itemid'].isin(niv_itemid)]
niv_chartevents_df['charttime'] = dd.to_datetime(niv_chartevents_df['charttime'])
niv_chartevents_df.info()

<class 'dask.dataframe.core.DataFrame'>
dtypes: datetime64[ns](1), float64(3), int64(4), string(3)

In [None]:
# Merge patient list with NIV events
patients_merged_df = niv_chartevents_df.merge(patients_df, on=['subject_id', 'hadm_id', 'stay_id'], how='left')
patients_merged_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,caregiver_id,charttime,storetime,itemid,value,valuenum,valueuom,...,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure
0,10000980,26913865,39765666,26402.0,2189-06-27 09:00:00,2189-06-27 09:02:00,227287,40,40.0,L/min,...,NaT,,,NaT,NaT,,,,,
1,10002155,23822395,33685454,8388.0,2129-08-07 08:00:00,2129-08-07 11:40:00,227287,6,6.0,L/min,...,NaT,,,NaT,NaT,,,,,
2,10002155,23822395,33685454,8388.0,2129-08-07 11:00:00,2129-08-07 11:40:00,227287,6,6.0,L/min,...,NaT,,,NaT,NaT,,,,,
3,10002155,23822395,33685454,8388.0,2129-08-07 12:00:00,2129-08-07 16:08:00,227287,6,6.0,L/min,...,NaT,,,NaT,NaT,,,,,
4,10002155,23822395,33685454,8388.0,2129-08-07 16:00:00,2129-08-07 16:08:00,227287,6,6.0,L/min,...,NaT,,,NaT,NaT,,,,,


In [None]:
# Check if charttime is within 6 hours after extubation_endtime
patients_merged_df['within_6_hours'] = (patients_merged_df['charttime'] > patients_merged_df['extubation_endtime']) & (patients_merged_df['charttime'] <= (patients_merged_df['extubation_endtime'] + timedelta(hours=6)))


In [None]:
# Compute the result to filter the patients that meet the condition
result = patients_merged_df[patients_merged_df['niv_within_6h']].compute()

In [None]:
# Get the unique patient IDs that meet the criteria and currently have extubation_failure == 0
unique_patient_ids = result['subject_id'].unique()
patients_to_update = patients_df[(patients_df['subject_id'].isin(unique_patient_ids)) & (patients_df['extubation_failure'] == 0)]

In [None]:
# Update extubation_failure column in the patients dataframe
patients_df.loc[patients_to_update.index, 'extubation_failure'] = 1

In [None]:
patients_df.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1
22,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0
29,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,1
32,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,1
33,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0


In [None]:
extubation_failure_count = patients_df['extubation_failure'].sum()
print(f"Number of extubation failures including full NIV: {extubation_failure_count}")

Number of extubation failures including full NIV: 1911


In [None]:
patients_df.shape[0]

5970

In [None]:
# Save annotated set in parquet file
patients_df.to_parquet('/Users/akram/Documents/Final Project/data_analysis/annotation_v03.parquet')