Machine Learning Applications for Health (COMP90089_2022_SM2) | Assignment 1

Name: Arya Araban -- Student id: 1439683

# Q2 - List the criteria used by clinicians to define Hypotension:

###Absolute Blood Pressure Thresholds

•	Systolic blood pressure <90 mmHg is considered hypotensive.

•	Mean arterial pressure <65 mmHg is considered hypotensive.

•	These absolute thresholds are commonly used definitions for hypotension, though optimal blood pressure targets may depend on the individual patient.

###Relative Blood Pressure Changes
•	A drop in systolic blood pressure >40 mmHg from the patient's baseline is considered relative hypotension.

•	Relative hypotension signifies a significant change for that individual patient, even if absolute thresholds are not crossed.

###Orthostatic Changes
•	Orthostatic hypotension is defined as:

&nbsp;&nbsp;&nbsp;•	A fall in systolic blood pressure >20 mmHg after standing from a supine position.

&nbsp;&nbsp;&nbsp;•	A fall in diastolic blood pressure >10 mmHg after standing from a supine position.

•	Positional changes provide evidence of impaired autoregulation and volume status.

###Clinical Evidence of Hypoperfusion
•	Hypotension may be present before blood pressure reaches severely low thresholds. Clinicians rely on signs of tissue hypoperfusion:

&nbsp;&nbsp;&nbsp;•	Oliguria - urine output <0.5 mL/kg/hour

&nbsp;&nbsp;&nbsp;•	Altered mental status - confusion, delirium, obtundation, coma.

&nbsp;&nbsp;&nbsp;•	Cool, clammy, mottled skin.

&nbsp;&nbsp;&nbsp;•	Metabolic acidosis - serum lactate >2 mmol/L is commonly used.

&nbsp;&nbsp;&nbsp;•	Hyperlactemia - serum lactate >2 mmol/L, even without acidemia.

•	The presence of these signs, even with a normal blood pressure, should prompt suspicion for occult or early shock.

###Compensated Shock
•	Normal blood pressure may be maintained in early shock because of compensatory mechanisms like tachycar`dia and vasoconstriction. The presence of these findings along with risk factors for shock should heighten clinical suspicion.




# Q3 - List the criteria used by clinicians to define Hypotension:

The goal it to identify the most relevant ICD-10 codes that can be used to accurately document hypotension in a clinical setting. the World Health Organisation's code browser was used to capture these codes and their details.

###I95.0 Idiopathic hypotension

This code captures hypotension with no identified secondary cause. It can be applied when hypotension is chronic and the underlying mechanism is unknown. Using this code indicates that extensive workup did not reveal an explanation for the low blood pressure.

###I95.1 Orthostatic hypotension
•Hypotension, postural

•Excl.: neurogenic orthostatic hypotension [Shy-Drager] (G23.8)

This code specifically indicates that the hypotension occurs or worsens when the patient moves from a supine to upright position. Orthostatic hypotension is a common and important subtype, as it provides insight into impaired autoregulation. This is the most accurate code for postural hypotension.

###I95.2 Hypotension due to drugs
• Use additional external cause code (Chapter XX), if desired, to identify drug.

This code identifies drug-induced hypotension, which is a frequent iatrogenic cause. An additional external cause code can specify the culprit medication. Proper documentation of drug-induced hypotension has implications for clinical management and prevention.

###I95.8 Other hypotension
• Chronic hypotension

This code covers hypotension from other specified secondary causes not captured by the previous codes. The code description indicates it includes chronic hypotension, providing a means to document persistent low blood pressure.


###I95.9 Hypotension, unspecified

This code can be used when the clinician determines the patient has hypotension but the underlying etiology remains unclear. This leaves the cause open for further investigation.


###G97.2 Intracranial hypotension following ventricular shunting

This code pinpoints a specific scenario in which cerebrospinal fluid leakage after ventricular shunting leads to intracranial hypotension. Capturing this iatrogenic complication provides data on the outcomes of these common neurosurgical procedures.


# Q4 - Use CSIRO’s Shrimp browser to navigate SNOMED-CT and ...:




After reviewing SNOMED-CT concepts in the Shrimp browser, there is no perfect 1:1 match to the ICD-10 codes for hypotension. However, there are related SNOMED-CT concepts that capture similar clinical meanings.

For example, the ICD-10 code "I95.2 - Hypotension due to drugs" corresponds to the SNOMED-CT concept "Drug-induced hypotension (disorder)" with code 234171009. While not a direct match, this captures the same clinical scenario of medication-induced low blood pressure.

In general, mapping between terminologies like ICD-10 and SNOMED-CT when combining data sources has tradeoffs:

###Pros:

•  Mapping between different terminologies can enable integrating health data from disparate sources, allowing a more comprehensive analysis when the data is combined.

• Unified coding and classification of data from various sources establishes a consistent structure and semantics. This facilitates more effective large-scale analytics on aggregated health data.

• Gaps and inconsistencies revealed during mapping can inform of needed improvement efforts for the terminologies and associated coding principles.


###Cons:

• Inaccurate or imprecise mappings will lead to faulty combining of distinct data elements, undermining the validity of analysis.

• Verifying high-quality mappings often requires significant manual review by clinical subject matter experts, which demands extensive labor.

• Mapped terminologies require ongoing curation over time as codes and concepts evolve to prevent terminological drift and maintain currency.

<br>

To enable effective analytics, mappings should be created thoughtfully, validated carefully, and documented thoroughly. Context from clinical experts is key to avoiding pitfalls from terminology mismatches. Ongoing maintenance is needed as codes evolve.

# Q5

##part 1:

find patients where Hypotension lasted > 60 minutes and was corrected after the use of intravenous vasoactive drugs.

###Setup and Import libraries

In [1]:
# Import libraries
from datetime import timedelta
import os

import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from IPython.display import display, HTML, Image
%matplotlib inline

plt.style.use('ggplot')
plt.rcParams.update({'font.size': 20})

# Access data using Google BigQuery.
from google.colab import auth
from google.cloud import bigquery

In [2]:
# authenticate
auth.authenticate_user()

In [3]:
# Set up environment variables
project_id = 't-collective-395900'
if project_id == 'CHANGE-ME':
  raise ValueError('You must change project_id to your GCP project.')
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id

# Read data from BigQuery into pandas dataframes.
def run_query(query, project_id=project_id):
  return pd.io.gbq.read_gbq(
      query,
      project_id=project_id,
      dialect='standard')

# set the dataset
# if you want to use the demo, change this to mimic_demo
dataset = 'mimiciv'


###Find item ids of relevant data for Hypotension.

We will use Absolute Blood Pressure Thresholds to identify Hypotension. In order to do that, we first need to find the associated item ids for Arterial Blood Pressure systolic and Arterial Blood Pressure mean.

In [4]:
# Find item ids of relevant

itemids = run_query("""
SELECT itemid, label
FROM `physionet-data.mimiciv_icu.d_items`
WHERE LOWER(label) LIKE '%blood pressure%'
""")

itemids


Unnamed: 0,itemid,label
0,227539,ART Blood Pressure Alarm Source
1,220056,Arterial Blood Pressure Alarm - Low
2,220058,Arterial Blood Pressure Alarm - High
3,223751,Non-Invasive Blood Pressure Alarm - High
4,223752,Non-Invasive Blood Pressure Alarm - Low
5,227537,ART Blood Pressure Alarm - High
6,227538,ART Blood Pressure Alarm - Low
7,220050,Arterial Blood Pressure systolic
8,220051,Arterial Blood Pressure diastolic
9,220052,Arterial Blood Pressure mean


Based on the found itemIDs, we will only be needing:

• 220050: Arterial Blood Pressure systolic

• 220052: Arterial Blood Pressure mean

###Implementation & rationale:


As stated previously, the goal is to identify patients who experienced hypotensive episodes lasting greater than 60 minutes, which were subsequently corrected after administration of intravenous vasopressors.

<br>

First, two SQL queries are performed to extract the relevant data:

• Blood pressure measurements are queried from the chartevents table, filtering for systolic and mean arterial pressure itemids. This provides the timeline of blood pressure readings for each patient.

• Vasopressor medication administrations are queried from the emar table, filtering for the medications of interest. This provides vasopressor order times. We note that we use a predefined list of vasopressors, and filter appropriately based on this.

<br>

Next, helper functions are defined:

• get_episodes: Takes a single patient's BP data, checks for hypotensive readings (<90 systolic or <65 mean), tracks start/end times, and returns a list of all continuous hypotensive episodes lasting over 60 minutes.

• vasopressor_given: Takes a hypotensive episode timeframe, checks the vasopressor data for any administrations within that timeframe, and if found, returns the timestamp in which vasopressor used. Otherwise returns False.

• bp_corrected: Takes as input the vasopressor administration time, checks the BP data to see if recovered to normal levels at the charttime immediately following the administration of the vasopressor (We assume that the vasopressor takes effect immediately or shortly after its administration).

<br>

The main logic loops through each patient's BP data, gets their prolonged hypotensive episodes using get_episodes, and checks if a vasopressor was given during the episode timeframe. If so, it checks if BP was corrected afterwards. If both criteria are met, the patient ID is added to the list of those meeting the criteria.

Finally, the full BP dataset is filtered to only patients whose IDs are in the criteria list, giving the final cohort of patients with prolonged hypotension corrected after vasopressors.

<br>

Note: The following code should take around 12 minutes to execute in Colab

In [5]:
# Query for systolic & mean BP

bp_query = f"""
  SELECT subject_id, charttime, safe_cast(value AS INT) as value, itemid
  FROM `physionet-data.mimiciv_icu.chartevents`
  WHERE itemid IN (220050, 220052)
"""
# Load BP data
bp = run_query(bp_query)


vaso_query = (f"""
SELECT subject_id, medication, scheduletime
FROM `physionet-data.mimiciv_hosp.emar`
WHERE LOWER(medication) LIKE '%' || ('dopamine') || '%'
    OR LOWER(medication) LIKE '%' || ('epinephrine') || '%'
    OR LOWER(medication) LIKE '%' || ('norepinephrine') || '%'
    OR LOWER(medication) LIKE '%' || ('levophed') || '%'
    OR LOWER(medication) LIKE '%' || ('dobutamine') || '%'
    OR LOWER(medication) LIKE '%' || ('milrinone') || '%'
    OR LOWER(medication) LIKE '%' || ('vasopressin') || '%'
    OR LOWER(medication) LIKE '%' || ('nitroglycerin') || '%'
    OR LOWER(medication) LIKE '%' || ('nitroprusside') || '%'
    OR LOWER(medication) LIKE '%' || ('hydralazine') || '%'
    OR LOWER(medication) LIKE '%' || ('labetalol') || '%'
    OR LOWER(medication) LIKE '%' || ('methylene blue') || '%'
    OR LOWER(medication) LIKE '%' || ('terlipressin') || '%'
    OR LOWER(medication) LIKE '%' || ('angiotensin ii') || '%'
""")

# Load vasopressor data
vaso = run_query(vaso_query)

# Get hypotension episodes
def get_episodes(bp_group):

  episodes = []
  start_time = None

  for i, row in bp_group.iterrows():

    if not pd.isna(row['itemid']) and not pd.isna(row['value']):

      # Check if systolic or MAP meets hypotension threshold
      if (row['itemid'] == 220050 and row['value'] < 90) or \
        (row['itemid'] == 220052 and row['value'] < 65):

        # If new hypotensive reading, start tracking episode.
        if start_time is None:
          start_time = row['charttime']

      else:
        # If prior hypotensive reading, update end time
        if start_time:
          end_time = row['charttime']
          # Calculate duration between start and end
          duration = end_time - start_time

          # Check if duration exceeds 60 minutes
          if duration > timedelta(minutes=60):
            episodes.append((start_time, end_time))

           # Reset start time
          start_time = None

  return episodes

# Check if vasopressor given during episode, if so return the time. Else returns False
def vasopressor_given(subject_id, start_time, end_time, vaso):

  vaso_subj = vaso[vaso['subject_id'] == subject_id]

  # Note that scheduletime from the emar table is assumed to be the exact time in which patient took vasopressor.
  vaso_times = vaso_subj[(vaso_subj['scheduletime'] > start_time) &
                         (vaso_subj['scheduletime'] < end_time)]


  if len(vaso_times) > 0:
    return vaso_times.iloc[0]['scheduletime'] # Return first time

  else:
    return False

# Check if BP corrected in next charttime after vasopressor used.
def bp_corrected(subject_id, vaso_time, bp):

  bp_subj = bp[(bp['subject_id'] == subject_id) &
                (bp['charttime'] > vaso_time)]

  bp_subj = bp_subj.reset_index(drop=True)

  if len(bp_subj) > 0:
    normal_sys = (bp_subj.iloc[0]['itemid'] == 220050) & (bp_subj.iloc[0]['value'] >= 90)
    normal_map = (bp_subj.iloc[0]['itemid'] == 220052) & (bp_subj.iloc[0]['value'] >= 65)

    return (normal_sys or
            normal_map)

  return False




### Main logic

meets_crit = []

# Analyze each subject's BP data
for subject_id, bp_group in bp.groupby('subject_id'):

  # Sort by charttime to make sure comparisons in order
  bp_group = bp_group.sort_values('charttime')

  # Get hypotensive episodes
  episodes = get_episodes(bp_group)

  # Check each episode
  for start_time, end_time in episodes:
    vaso_time = vasopressor_given(subject_id, start_time, end_time, vaso)

    # Check if a vasopressor was given and that blood pressure was corrected after the episode where vasopressor was used.
    if vaso_time and bp_corrected(subject_id, vaso_time, bp):

      meets_crit.append(subject_id) #Current subject meets criteria. Append them to list

meets_crit = list(dict.fromkeys(meets_crit)) #remove duplicates from meets_crit

final = bp[bp['subject_id'].isin(meets_crit)]

In [6]:
print(final) # Final filtered dataframe

print(f"number of unqiue subject_id = {len(meets_crit)}")

         subject_id           charttime  value  itemid
19         11880433 2115-03-28 00:00:00    153  220050
64         11717909 2131-05-08 04:00:00    145  220050
105        16939306 2174-04-03 11:00:00    174  220050
115        10354450 2163-10-06 02:00:00    147  220050
123        19674244 2196-11-04 19:00:00    147  220050
...             ...                 ...    ...     ...
4535708    12121645 2148-01-20 21:00:00    139  220050
4535817    13515178 2154-02-02 00:00:00    139  220050
4535845    17316181 2151-01-28 03:00:00    139  220050
4535995    13814237 2124-04-14 17:00:00    139  220050
4536004    10835377 2129-02-15 17:00:00    139  220050

[221829 rows x 4 columns]
number of unqiue subject_id = 440


##part 2:
For each patient that was picked up by the algorithm, include a column that states whether ANY ICD code for hypotension was added to their list of diagnoses at the time they were discharged.



In [7]:
unique_subject_ids_df =  pd.DataFrame(meets_crit, columns=['subject_id']).drop_duplicates(subset='subject_id', inplace=False)

In [8]:
unique_subject_ids_df

Unnamed: 0,subject_id
0,10005817
1,10023486
2,10039708
3,10055939
4,10064854
...,...
435,19895003
436,19928591
437,19962126
438,19963068


As stated, the goal is to have DataFrame of subject IDs meeting the hypotension criteria and add a column indicating if they have a hypotension diagnosis code.

To do this:

1) Query the diagnoses table for any ICD codes related to hypotension. This gives us a DataFrame with subjects and their diagnosis codes.

2) Left join the diagnoses DataFrame to our unique subjects DataFrame, matching on the subject_id column. This adds the diagnosis code data alongside each subject.

3) For any subjects without a matching diagnosis, the code column will be NaN. We replace these NaN values with False to indicate no diagnosis.

4) Create a new column indicating True/False if the diagnosis code column contains a real code rather than False. This gives a clear indicator of diagnosis presence.

5) Finally, we drop the intermediary diagnosis code column, leaving just the subject_id and the True/False diagnosis indicator column.

In [9]:
unique_subject_ids_df =  pd.DataFrame(meets_crit, columns=['subject_id'])

# Query diagnoses_icd for hypotension codes
diag_query = """
  SELECT subject_id, icd_code
  FROM `physionet-data.mimiciv_hosp.diagnoses_icd`
  WHERE icd_code LIKE 'I95%'
"""

hypotension_dx = run_query(diag_query)

# Merge diagnoses onto unique subjects DataFrame
unique_subjects_dx = unique_subject_ids_df.merge(
  hypotension_dx, how='left', on='subject_id')

unique_subjects_dx.drop_duplicates(subset='subject_id', inplace=True)
# Fill na with False
unique_subjects_dx['icd_code'].fillna(False, inplace=True)

# Create boolean column indicating hypotension diagnosis
unique_subjects_dx['hypotension_diagnosis'] = unique_subjects_dx['icd_code'].apply(
  lambda x: True if isinstance(x, str) else False)

# Drop icd_code column
unique_subjects_dx.drop(columns=['icd_code'], inplace=True)

In [10]:
unique_subjects_dx

Unnamed: 0,subject_id,hypotension_diagnosis
0,10005817,False
1,10023486,True
2,10039708,True
3,10055939,False
4,10064854,False
...,...,...
504,19895003,False
505,19928591,False
506,19962126,False
507,19963068,True


In [23]:
total_true = unique_subjects_dx['hypotension_diagnosis'].sum()

percentage_true = (unique_subjects_dx['hypotension_diagnosis'].sum() / len(unique_subjects_dx)) * 100





print(f"number of subjects identified = {len(unique_subjects_dx)} ")

print(f"total number of hypotension_diagnosis identified = {total_true}")

print(f"accuracy = {round(percentage_true)}%")

number of subjects identified = 440 
total number of hypotension_diagnosis identified = 153
accuracy = 35%


The results indicate that out of 440 total subjects identified by the algorithm as having prolonged hypotension corrected with vasopressors, 153 (35%) had a hypotension diagnosis code in their discharge record.

<br>

the algorithm demonstrated efficacy in identifying hypotension by enriching for diagnosis codes, though further refinement of thresholds may improve alignment with physician coding behavior.