Generate text based on the provided prompt, which is related to identifying causality between medical conditions. The function replaces a placeholder in the prompt with the actual test case, and the generated text is printed and returned. Finally, the function is called once with a specific test case ("A: heart rate, B: sepsis").

Identifying which medical tests, markers, and parameters commonly encountered in clinical settings are related to sepsis. 

# TODO
* get label and fluid
* merge these two together into DF
* add all admission data too?
* Urea nitrogen need to be just blood...
* match only word

Run `rph.sh` script to SSH into server.

# Identifying causality

In [1]:
import pandas as pd
import numpy as np

#datetime calculations
from datetime import datetime

In [2]:
import requests
import re

# For local streaming, the websockets are hosted without ssl - http://
HOST = 'localhost:5000'
URI = f'http://{HOST}/api/v1/generate'

# For reverse-proxied streaming, the remote will likely host with ssl - https://
# URI = 'https://your-uri-here.trycloudflare.com/api/v1/generate'


def run(prompt,test):
    request = {
        'prompt': prompt.replace("{test}",test),
        'max_new_tokens': 250,
        'temperature': 0.0,
        'top_p': 0.1,
        'length_penalty': 5,
        'early_stopping': True,
        'seed': 0,
    }

    response = requests.post(URI, json=request)
    print(response.json()['results'][0]['text'])

    return response.json()['results'][0]['text']


Yes, A (heart rate) can cause B (sepsis). Elevated heart rate can lead to increased cardiac output, which can increase the delivery of bacteria-carrying blood to the site of infection, leading to sepsis.


'Yes, A (heart rate) can cause B (sepsis). Elevated heart rate can lead to increased cardiac output, which can increase the delivery of bacteria-carrying blood to the site of infection, leading to sepsis.'

In [None]:
prompt_causality = '''
[INST]
You are a medical expert. You are tasked to identify causality between two conditions. 
Return yes if A causes B. If B causes A or if you are not sure, return no. 
Keep your responses short and succinct. 
Input
{test}
[/INST]
'''

Use the local to focus on cause and effect. Like what symptoms are causing sepsis.

In [12]:
# Function to get the most recent value for a given item ID list from a DataFrame
def get_most_recent_value(df, itemids):

    filtered_df = df[df['itemid'].isin(itemids)].sort_values('charttime')

    if not filtered_df.empty:
        return filtered_df.iloc[-1]['value']
    
    else:
        return None

In [5]:
# initialise a dataframe with symptoms
symptom = ['uti', 'low oxygen', 'low neutrophils', 'low BP', 'high BP', 'age>50', 'pregnancy', 'age>70']
df = pd.DataFrame(symptom, columns=['symptom'])
df

Unnamed: 0,symptom
0,uti
1,low oxygen
2,low neutrophils
3,low BP
4,high BP
5,age>50
6,pregnancy
7,age>70


In [6]:
# get relation by checking symptoms against LLM
df['relation'] = df.apply(lambda x: run(prompt_causality, f"A: {x['symptom']}, B: sepsis"), axis=1)

Yes, A (uti) causes B (sepsis).
Yes, A (low oxygen) causes B (sepsis). Low oxygen levels can lead to sepsis, a life-threatening condition that can cause organ failure and death.
Yes, A (low neutrophils) causes B (sepsis). Low neutrophil counts can impair the body's ability to fight off infections, which can lead to the development of sepsis.
No. Sepsis can cause low blood pressure (BP), but low BP does not cause sepsis.
Yes, A (high BP) can cause B (sepsis). High blood pressure can lead to damage to blood vessels, which can increase the risk of bacterial infection and sepsis.
Yes, A (age > 50) causes B (sepsis). As people age, their immune systems weaken, making them more susceptible to infections like sepsis.
No. Sepsis can be caused by various factors, including infection, but pregnancy is not a known cause of sepsis.
Yes, A (age > 70) causes B (sepsis). As people age, their immune systems weaken, making them more susceptible to infections like sepsis.


In [7]:
# create short result of relation
df['relation_short'] = df['relation'].apply(lambda x: 'yes' if 'yes' in x.lower() else 'no')

In [8]:
df

Unnamed: 0,symptom,relation,symptom_short
0,uti,"Yes, A (uti) causes B (sepsis).",yes
1,low oxygen,"Yes, A (low oxygen) causes B (sepsis). Low oxy...",yes
2,low neutrophils,"Yes, A (low neutrophils) causes B (sepsis). Lo...",yes
3,low BP,"No. Sepsis can cause low blood pressure (BP), ...",no
4,high BP,"Yes, A (high BP) can cause B (sepsis). High bl...",yes
5,age>50,"Yes, A (age > 50) causes B (sepsis). As people...",yes
6,pregnancy,"No. Sepsis can be caused by various factors, i...",no
7,age>70,"Yes, A (age > 70) causes B (sepsis). As people...",yes


# Identifying causation for input disease

In [110]:
prompt_top_indicators = '''
[INST]
You are a medical researcher with expertise in diagnosing diseases. 
Your task is to identify what biological markers, demographic data, or symptoms are most crucial for diagnosing a specific disease.

Input Disease: [Disease_Name]

Please list the top medical tests, markers, and parameters that are essential for diagnosing [Disease_Name]. Please keep only the most crucial indicators. 

Return only a list of the biological indicators as your response to my query. The list should be in the following format:

['indicator_1', 'indicator_2', 'indicator_3', ... , 'indicator_n']

Input
{test}
[/INST]
'''

In [111]:
run(prompt_top_indicators,"Input Disease: uti")

Exception: File `'(prompt_top_indicators,Input Disease: uti)'` not found.

# Load data

In [3]:
lab_path = '../data/LABEVENTS.csv'
df_lab = pd.read_csv(lab_path)

lab_desc_path = '../data/D_LABITEMS.csv'
df_lab_desc = pd.read_csv(lab_desc_path)

df_items_desc = pd.read_csv('../data/D_ITEMS.csv')

In [4]:
df_lab_desc[df_lab_desc['ITEMID']==50802]
df_items_desc[df_items_desc.ITEMID == 70067]

Unnamed: 0,ROW_ID,ITEMID,LABEL,ABBREVIATION,DBSOURCE,LINKSTO,CATEGORY,UNITNAME,PARAM_TYPE,CONCEPTID
10673,12313,70067,SWAB,,hospital,microbiologyevents,SPECIMEN,,,


# Microbiology events

In [9]:
# filter for only microbiology events
df_mb_desc = df_items_desc[df_items_desc['LINKSTO']=='microbiologyevents']

For the LABEVENTS table, we have to store both the label value and the fluid or the cateogory value. For the Microbiology events, what distinguishes the different ITEMIDs?

In [25]:
print("MB values:", len(df_mb_desc['LABEL'].values))
print("MB unique values:", len(df_mb_desc['LABEL'].unique()))

MB values: 436
MB unique values: 421


In [None]:
df_mb_desc[df_mb_desc.duplicated(subset='LABEL', keep=False)]['LABEL'].unique()

In [112]:
df_mb_desc[df_mb_desc.duplicated(subset='LABEL', keep=False)].head()

Unnamed: 0,ROW_ID,ITEMID,LABEL,ABBREVIATION,DBSOURCE,LINKSTO,CATEGORY,UNITNAME,PARAM_TYPE,CONCEPTID
10341,12340,80001,,,hospital,microbiologyevents,ORGANISM,,,
10607,12247,70001,,,hospital,microbiologyevents,SPECIMEN,,,
10638,12278,70032,Blood (EBV),,hospital,microbiologyevents,SPECIMEN,,,
10641,12281,70035,"FLUID,OTHER",,hospital,microbiologyevents,SPECIMEN,,,
10646,12286,70040,SWAB,,hospital,microbiologyevents,SPECIMEN,,,


# Lab values

In [11]:
df_lab_full = pd.merge(df_lab, df_lab_desc, on='ITEMID')


KeyboardInterrupt



In [46]:
# Convert all string types to uppercase 
df_lab_desc = df_lab_desc.applymap(lambda x: x.upper() if type(x) == str else x)


  df_lab_desc = df_lab_filtered.applymap(lambda x: x.upper() if type(x) == str else x)


In [133]:
# https://www.sciencedirect.com/science/article/pii/S2352914823000783#tbl1
# https://doi.org/10.1016/j.imu.2023.101236

data_ref001 = {
    'Predictor': [
        'ICULOS', 'HospAdmTime', 'Age', 'Gender',
        'O2Sat', 'Resp', 'HR', 'Temp', 'DBP', 'MAP', 'SBP',
        'FiO2', 'pH', 'SIRS_Score', 'Base Excess', 'Glucose', 'PCO2', 'WBC', 'OXYGEN SATURATION',
        'Potassium', 'Calcium', 'Hematocrit', 'Hemoglobin', 'Magnesium', 'urea nitrogen', 'Phosphate', 'Bicarbonate',
        'Creatinine', 'Platelet', 'Chloride'
    ],
    'Description': [
        'ICU length of stay (hours since ICU admission)', 'The time between hospital and ICU admission', 'Age (years)', 'Female (0) or male (1)',
        'Pulse oximetry (%)', 'Respiration rate (breaths per minute)', 'Heart rate (beats per minute)', 'Temperature (deg C)',
        'Diastolic BP (mm Hg)', 'Mean arterial pressure (mm Hg)', 'Systolic BP (mm Hg)',
        'Fraction of inspired oxygen (%)', 'A blood pH test is a normal part of a blood gas test or arterial blood gas (ABG) test. It measures how much oxygen and carbon dioxide are in your blood',
        'Four SIRS criteria were defined, namely tachycardia (heart rate >90 beats/min), tachypnea (respiratory rate >20 breaths/min), fever or hypothermia (temperature >38 or <36 °C), and leukocytosis, leukopenia, or bandemia (white blood cells >1200/mm3, <4000/mm3 or bandemia ≥10%',
        'Excess bicarbonate (mmol/L)', 'Serum glucose (mg/dL)', 'The partial pressure of carbon dioxide from arterial blood (mm Hg). Ranges 35 to 45 mmHg, or 4.7 to 6.0 kPa.', 'Leukocyte count (count/L)',
        'Oxygen saturation from arterial blood (%) SaO2', 'Potassiam (mmol/L)', 'Calcium (mg/dL)', 'Hematocrit (%)', 'Hemoglobin (g/dL)',
        'Magnesium (mmol/dL)', 'Blood urea nitrogen (mg/dL)', 'Phosphate (mg/dL)', 'Bicarbonate (mmol/L)', 'Creatinine (mg/dL)',
        'Platelet count (count/mL)', 'Chloride (mmol/L)'
    ],
    'Category': [
        'Demographic data', 'Demographic data', 'Demographic data', 'Demographic data',
        'Clinical time series data', 'Clinical time series data', 'Clinical time series data', 'Clinical time series data',
        'Clinical time series data', 'Clinical time series data', 'Clinical time series data', 
        'Laboratory values', 'Laboratory values', 'Laboratory values', 'Laboratory values', 'Laboratory values',
        'Laboratory values', 'Laboratory values', 'Laboratory values', 'Laboratory values', 'Laboratory values',
        'Laboratory values', 'Laboratory values', 'Laboratory values', 'Laboratory values', 'Laboratory values',
        'Laboratory values', 'Laboratory values', 'Laboratory values', 'Laboratory values'
    ],
    'Type': [
        'Numeric', 'Date', 'Numeric', 'Categorical',
        'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric',
        'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric',
        'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric',
        'Numeric', 'Numeric', 'Numeric', 'Numeric', 'Numeric'
    ],
    'Table': [
        'ICUSTAYS', 'ADMISSIONS', 'PATIENTS', 'PATIENTS',
        'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS',
        'CHARTEVENTS', 'LABEVENTS', 'Derived', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS',
        'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS',
        'LABEVENTS', 'LABEVENTS', 'LABEVENTS'
    ]
    
}

df_ref001 = pd.DataFrame(data_ref001)
df_ref001

Unnamed: 0,Predictor,Description,Category,Type,Table
0,ICULOS,ICU length of stay (hours since ICU admission),Demographic data,Numeric,ICUSTAYS
1,HospAdmTime,The time between hospital and ICU admission,Demographic data,Date,ADMISSIONS
2,Age,Age (years),Demographic data,Numeric,PATIENTS
3,Gender,Female (0) or male (1),Demographic data,Categorical,PATIENTS
4,O2Sat,Pulse oximetry (%),Clinical time series data,Numeric,CHARTEVENTS
5,Resp,Respiration rate (breaths per minute),Clinical time series data,Numeric,CHARTEVENTS
6,HR,Heart rate (beats per minute),Clinical time series data,Numeric,CHARTEVENTS
7,Temp,Temperature (deg C),Clinical time series data,Numeric,CHARTEVENTS
8,DBP,Diastolic BP (mm Hg),Clinical time series data,Numeric,CHARTEVENTS
9,MAP,Mean arterial pressure (mm Hg),Clinical time series data,Numeric,CHARTEVENTS


In [50]:
input: 
  table: the data table name (LABEVENTS, microbiology, CHARTEVENTS, ADMISSIONS)

# convert reference to uppercase
reference = reference.applymap(lambda x: x.upper() if type(x) == str else x)
# filter by name of table (input)
reference_filtered = reference[reference['Table'] == table]]

# check if predictor exists in table input data
reference_filtered['Exists'] = reference_filtered['Predictor'].apply(lambda x: 'yes' if x in df_lab_desc['LABEL'].values else 'no')

#df_ref001_lab.sort_values('Exists')[['Predictor', 'Description', 'Exists']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_ref001_lab['Exists'] = df_ref001_lab['Variable'].apply(lambda x: 'yes' if x.upper() in df_lab_desc['LABEL'].values else 'no')


Unnamed: 0,Variable,Description,Exists
24,BUN,Blood urea nitrogen (mg/dL),no
20,Calcium,Calcium (mg/dL),no
28,Platelet,Platelet count (count/mL),no
12,pH,A blood pH test is a normal part of a blood ga...,yes
27,Creatinine,Creatinine (mg/dL),yes
26,Bicarbonate,Bicarbonate (mmol/L),yes
25,Phosphate,Phosphate (mg/dL),yes
23,Magnesium,Magnesium (mmol/dL),yes
22,Hemoglobin,Hemoglobin (g/dL),yes
21,Hematocrit,Hematocrit (%),yes


Certain labels dont exist in the data labels. However they may be under different labels. 

Other labels exist, however there are multiple associated ITEMIDs depending on their fluid and category. We will need to varify that the label is important based on it's fluid and category too. 

In [44]:
# 4 values of pH
df_lab_desc[df_lab_desc['LABEL'] == ('PH')]

Unnamed: 0,ROW_ID,ITEMID,LABEL,FLUID,CATEGORY,LOINC_CODE
147,21,50820,pH,Blood,Blood Gas,11558-4
158,32,50831,pH,Other Body Fluid,Blood Gas,2748-2
420,294,51094,pH,Urine,Chemistry,2756-5
690,691,51491,pH,Urine,Hematology,5803-2


In [299]:
df_lab_desc[df_lab_desc['LABEL'].str.contains('Platelet', case=False)]

Unnamed: 0,ROW_ID,ITEMID,LABEL,FLUID,CATEGORY,LOINC_CODE,Exists,ref002
566,440,51240,LARGE PLATELETS,Blood,Hematology,34167-7,yes,yes
590,464,51264,PLATELET CLUMPS,Blood,Hematology,40741-1,yes,yes
591,465,51265,PLATELET COUNT,Blood,Hematology,777-3,yes,yes
592,466,51266,PLATELET SMEAR,Blood,Hematology,778-1,yes,yes


## Ref 002

Sup_data_Development and Evaluation of a Machine Learning Model for the Early Identification of Patients at Risk for Sepsis.

Certain predictors identified in this paper were manually converted to match the abbreviations of the data label. 

Table 1

In [183]:
data_ref002 = {
    'Predictor': [
        'AGE', 'DOBUTAMINE  (Y/N)', 'DOPAMINE  (Y/N)', 'EPINEPHRINE  (Y/N)', 'NOREPINEPHRINE  (Y/N)',
        'SYSTOLIC BP x AGE', 'SHOCK INDEX (HR/SYSTOLIC BP)', 'SHOCK INDEX x AGE',
        'ANION GAP', 'ALBUMIN', 'ALKALINE PHOSPHATASE', 'ASPARATE AMINOTRANSFERASE (AST)', 'BANDS', 'BILIRUBIN',
        'UREA NITROGEN', 'CALCIUM', 'CREATININE', 'ESTIMATED GFR (MDRD EQUATION)', 'GLUCOSE', 'LACTIC ACID',
        'LYMPHOCYTE', 'MONOCYTE', 'NEUTROPHIL', 'PLATELET', 'WBC',
        'GLASGOW COMA SCALE', 'SUPPLEMENTAL OXYGEN (Y/N)', 'VENTILATOR', 'ALTERED MENTAL STATUS',
        'ABSCESS', 'ACUTE', 'ALTERED', 'BACTEREMIA', 'CELLULITIS', 'CYSTITIS', 'DIABETES',
        'FAILURE', 'LACTIC', 'LEUKOCYTOSIS', 'PNA', 'PNEUMONIA', 'PYELONEPHRITIS', 'RESPIRATORY',
        'SEPSIS', 'SEPTIC', 'UROSEPSIS', 'UTI',
        'HEART RATE', 'MEAN ARTERIAL PRESSURE', 'RESPIRATORY RATE', 'SYSTTOLIC BLOOD PRESSURE',
        'OXYGEN SATURATION', 'TEMPERATURE', 'WEIGHT', 'HEIGHT', 'BMI'
    ],
    'Category': [
        'Demographic', 'Medication', 'Medication', 'Medication', 'Medication',
        'Engineered Feature', 'Engineered', 'Engineered',
        'Laboratory Result', 'Laboratory Result', 'Laboratory Result', 'Laboratory Result', 'Laboratory Result', 'Laboratory Result',
        'Laboratory Result', 'Laboratory Result', 'Laboratory Result', 'Laboratory Result', 'Laboratory Result',
        'Laboratory Result', 'Laboratory Result', 'Laboratory Result', 'Laboratory Result', 'Laboratory Result', 'Laboratory Result',
        'Neurological Evaluation', 'Nursing Documentation', 'Nursing Documentation', 'Nursing Documentation',
        'Text- Keyword (ED chief complaint)', 'Text- Keyword (ED chief complaint)', 'Text- Keyword (ED chief complaint)', 'Text- Keyword (ED chief complaint)', 'Text- Keyword (ED chief complaint)', 'Text- Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)',
        'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)',
        'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)', 'Text-Keyword (ED chief complaint)',
        'Vital Sign', 'Vital Sign', 'Vital Sign', 'Vital Sign',
        'Vital Sign', 'Vital Sign', 'Physiological', 'Physiological', 'Physiological'
    ],
    'Table': [
        'PATIENTS', 'PRESCRIPTIONS', 'PRESCRIPTIONS', 'PRESCRIPTIONS', 'PRESCRIPTIONS',
        'Derived', 'Derived', 'Derived',
        'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS',
        'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS',
        'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS', 'LABEVENTS',
        'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS',
        'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS',
        'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS',
        'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS', 'ED_EVENTS',
        'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS', 'CHARTEVENTS',
        'CHARTEVENTS', 'CHARTEVENTS', 'ADMISSIONS', 'ADMISSIONS', 'Derived'
    ]
}

df_ref002 = pd.DataFrame(data_ref002)

#     "Derived" means that the variable is often calculated from other variables
#    "ED_EVENTS" is a placeholder for the table containing Emergency Department events

In [65]:
df_ref002_lab[df_ref002_lab['Exists']=='no']


df_lab_desc['LABEL'] = df_lab_desc['LABEL'].str.upper()
reference_filtered['Exists'] = reference_filtered['Variable'].apply(lambda x: 'yes' if x in df_lab_desc['LABEL'].values else 'no')

Unnamed: 0,Variable,TYPE_OF_VARIABLE,Table,Exists
15,CALCIUM,Laboratory Result,LABEVENTS,no
19,LACTIC ACID,Laboratory Result,LABEVENTS,no
20,LYMPHOCYTE,Laboratory Result,LABEVENTS,no
21,MONOCYTE,Laboratory Result,LABEVENTS,no
22,NEUTROPHIL,Laboratory Result,LABEVENTS,no
23,PLATELET,Laboratory Result,LABEVENTS,no


In [209]:
df_ref002

Unnamed: 0,Predictor,Category,Table
0,AGE,Demographic,PATIENTS
1,DOBUTAMINE (Y/N),Medication,PRESCRIPTIONS
2,DOPAMINE (Y/N),Medication,PRESCRIPTIONS
3,EPINEPHRINE (Y/N),Medication,PRESCRIPTIONS
4,NOREPINEPHRINE (Y/N),Medication,PRESCRIPTIONS
5,SYSTOLIC BP x AGE,Engineered Feature,Derived
6,SHOCK INDEX (HR/SYSTOLIC BP),Engineered,Derived
7,SHOCK INDEX x AGE,Engineered,Derived
8,ANION GAP,Laboratory Result,LABEVENTS
9,ALBUMIN,Laboratory Result,LABEVENTS


In [72]:
# validate that lactate is the same as lactic acid
words_to_check = ['LYMPHOCYTE', 'MONOCYTE', 'PLATE', 'NEUTROPHIL', 'CALCIUM', 'LACTATE', 'AMINOTRANSFERASE']


other = ['ANION GAP', 'ALBUMIN', 'ALKALINE PHOSPHATASE',
       'AMINOTRANSFERASE', 'BANDS', 'BILIRUBIN',
       'UREA NITROGEN', 'CALCIUM', 'CREATININE',
       'ESTIMATED GFR (MDRD EQUATION)', 'GLUCOSE', 'LACTATE',
       'LYMPHOCYTE', 'MONOCYTE', 'NEUTROPHIL', 'PLATELET',
       'WHITE BLOOD CELLS', 'WBC']



def checkwords(df, words_to_check):
    filtered_dfs = []
    for word in words_to_check:
        filtered_df = df_lab_desc[df_lab_desc['LABEL'].apply(lambda x: word in x)]
        filtered_dfs.append(filtered_df)
    df_query_labels = pd.concat(filtered_dfs).drop_duplicates()
    return df_query_labels

checkwords(df_lab_desc, words_to_check)
print(checkwords(df_lab_desc, other).shape)


(106, 3)


In [122]:
df_ref002_check = pd.DataFrame(checkwords(df_lab_desc, words_to_check))
df_ref002_check.head()

Unnamed: 0,ITEMID,LABEL,FLUID,ref1,ref2
19,51365,ATYPICAL LYMPHOCYTES,JOINT FLUID,yes,yes
29,51375,LYMPHOCYTES,JOINT FLUID,yes,yes
39,51385,ATYPICAL LYMPHOCYTES,OTHER BODY FLUID,yes,yes
81,51427,LYMPHOCYTES,OTHER BODY FLUID,yes,yes
94,51440,ATYPICAL LYMPHOCYTES,PLEURAL,yes,yes


In [None]:
data_ref003 = {
'Predictor': [
    'Lactate',
    'Systolic BP', 
    'Diastolic BP',
    'Glasgow Coma Scale',
    'Heart Rate',
    'Respiratory Rate',
    'WBC',
    'WBC',
    'Procalcitonin',
    'C-Reactive Protein',
    'Blood Culture',
    'Temperature',
    'Platelet Count',
    'Bilirubin',
    'Creatinine',
    'Bicarbonate',
    'Blood Glucose',
    'Urine Culture',
    'AST', 
    'ALT',
    'Troponin',
    'Hemoglobin',
    'INR', 
    'PTT',
    'Albumin',
    'D-Dimer',
    'Ferritin',
    'LDH',
    'UREA NITROGEN',
    'PaO2', 
    'FiO2',
    'Age',
    'Comorbidities',
    'Surgery History',
    'Immunosuppression Status',
    'Alcohol Abuse History',
    'Intravenous Drug Use History',
    'Recent Infection or Antibiotic Use',
    'Mechanical Ventilation Status',
    'Central Venous Catheter Use',
    'Urinary Catheter Use',
    'Recent Hospitalization',
    'Chronic Renal Failure',
    'Chronic Liver Disease',
    'Pregnancy Status',
    'Burns or Trauma',
    'Neutropenia',
    'Nutritional Status',
    'BMI',
    'Blood Glucose on Admission',
    'Infection Source',
    'Skin Mottling',
    'Skin Color',
    'Urine Output'
],
'Description': [
    "High Lactate Levels",
    "Low Blood Pressure (Hypotension) ('Systolic BP', 'Diastolic BP')",
    "Low Blood Pressure (Hypotension) ('Systolic BP', 'Diastolic BP')",
    "Altered Mental Status",
    "High Heart Rate (Tachycardia)",
    "Rapid Breathing (Tachypnea)",
    "High WBC Count",
    "Low WBC Count",
    "Elevated Procalcitonin",
    "Elevated CRP",
    "Positive Blood Culture",
    "Fever or Hypothermia",
    "Low Platelet Count",
    "Elevated Bilirubin",
    "Elevated Creatinine",
    "Low Bicarbonate Levels",
    "High Blood Glucose",
    "Positive Urine Culture",
    "Elevated Liver Enzymes (AST, ALT)",
    "Elevated Liver Enzymes (AST, ALT)",
    "Elevated Troponin",
    "Low Hemoglobin",
    "High INR or PTT",
    "High INR or PTT",
    "Low Albumin",
    "Elevated D-Dimer",
    "Elevated Ferritin",
    "Elevated LDH",
    "High Blood Urea Nitrogen",
    "Low PaO2/FiO2 Ratio",
    "Low PaO2/FiO2 Ratio",
    "Age > 65 years",
    "Comorbidities (e.g., Diabetes, Heart Disease)",
    "Recent Surgery",
    "Immunosuppression",
    "Chronic Alcohol Abuse",
    "Intravenous Drug Use",
    "Recent Infection or Antibiotic Use",
    "Mechanical Ventilation",
    "Central Venous Catheter Use",
    "Urinary Catheter Use",
    "Recent Hospitalization",
    "Chronic Renal Failure",
    "Chronic Liver Disease",
    "Pregnancy",
    "Burns or Trauma",
    "Neutropenia",
    "Malnutrition",
    "Obesity",
    "Hyperglycemia on Admission",
    "Community-Acquired vs Hospital-Acquired Infection",
    "Skin Mottling",
    "Cyanosis (Bluish Skin)",
    "Oliguria (Low Urine Output)"
],
'Table': [
    "labevents",  # High Lactate Levels
    "chartevents",  # Low Blood Pressure (Hypotension)
    "chartevents",  # Low Blood Pressure (Hypotension)
    "chartevents",  # Altered Mental Status
    "chartevents",  # High Heart Rate (Tachycardia)
    "chartevents",  # Rapid Breathing (Tachypnea)
    "labevents",  # High WBC Count
    "labevents",  # Low WBC Count
    "labevents",  # Elevated Procalcitonin
    "labevents",  # Elevated CRP
    "microbiologyevents",  # Positive Blood Culture
    "chartevents",  # Fever or Hypothermia
    "labevents",  # Low Platelet Count
    "labevents",  # Elevated Bilirubin
    "labevents",  # Elevated Creatinine
    "labevents",  # Low Bicarbonate Levels
    "labevents",  # High Blood Glucose
    "microbiologyevents",  # Positive Urine Culture
    "labevents",  # Elevated Liver Enzymes (AST, ALT)
    "labevents",  # Elevated Liver Enzymes (AST, ALT)
    "labevents",  # Elevated Troponin
    "labevents",  # Low Hemoglobin
    "labevents",  # High INR or PTT
    "labevents",  # High INR or PTT
    "labevents",  # Low Albumin
    "labevents",  # Elevated D-Dimer
    "labevents",  # Elevated Ferritin
    "labevents",  # Elevated LDH
    "labevents",  # High BUN
    "chartevents",  # Low PaO2/FiO2 Ratio
    "labevents",  # Low PaO2/FiO2 Ratio
    "patients",  # Age > 65 years
    "admissions",  # Comorbidities (e.g., Diabetes, Heart Disease)
    "admissions",  # Recent Surgery
    "admissions",  # Immunosuppression
    "admissions",  # Chronic Alcohol Abuse
    "admissions",  # Intravenous Drug Use
    "admissions",  # Recent Infection or Antibiotic Use
    "chartevents",  # Mechanical Ventilation
    "procedures_icd",  # Central Venous Catheter Use
    "procedures_icd",  # Urinary Catheter Use
    "admissions",  # Recent Hospitalization
    "admissions",  # Chronic Renal Failure
    "admissions",  # Chronic Liver Disease
    "admissions",  # Pregnancy
    "admissions",  # Burns or Trauma
    "labevents",  # Neutropenia
    "admissions",  # Malnutrition
    "admissions",  # Obesity
    "labevents",  # Hyperglycemia on Admission
    "admissions",  # Community-Acquired vs Hospital-Acquired Infection
    "chartevents",  # Skin Mottling
    "chartevents",  # Cyanosis (Bluish Skin)
    "outputevents"  # Oliguria (Low Urine Output)
]
}

df_ref003 = pd.DataFrame(data_ref003)
df_ref003

Unnamed: 0,Predictor,Description,Table
0,Lactate,High Lactate Levels,labevents
1,Systolic BP,Low Blood Pressure (Hypotension) ('Systolic BP...,chartevents
2,Diastolic BP,Low Blood Pressure (Hypotension) ('Systolic BP...,chartevents
3,Glasgow Coma Scale,Altered Mental Status,chartevents
4,Heart Rate,High Heart Rate (Tachycardia),chartevents
5,Respiratory Rate,Rapid Breathing (Tachypnea),chartevents
6,WBC,High WBC Count,labevents
7,WBC,Low WBC Count,labevents
8,Procalcitonin,Elevated Procalcitonin,labevents
9,C-Reactive Protein,Elevated CRP,labevents


In [None]:
df_ref003[df_ref003['Table']=='labevents']

Unnamed: 0,Predictor,Description,Table
0,Lactate,High Lactate Levels,labevents
6,WBC,High WBC Count,labevents
7,WBC,Low WBC Count,labevents
8,Procalcitonin,Elevated Procalcitonin,labevents
9,C-Reactive Protein,Elevated CRP,labevents
12,Platelet Count,Low Platelet Count,labevents
13,Bilirubin,Elevated Bilirubin,labevents
14,Creatinine,Elevated Creatinine,labevents
15,Bicarbonate,Low Bicarbonate Levels,labevents
16,Blood Glucose,High Blood Glucose,labevents


In [None]:
prompt_check_lab_events = '''
[INST]
You are a medical expert. You are tasked to identify causality between two conditions. 
Return yes if A is important in identifying B. If B causes A or if you are not sure, return no. 
Keep your responses short and succinct. 
Input
{test}
[/INST]
'''
df['relation'] = df.apply(lambda x: run(prompt_causality, f"A: {x['LABEL']} with fluid type of {x['FLUID']}, B: sepsis"), axis=1)

In [80]:




#99 values...
print(len(df_lab_desc[df_lab_desc['ref002']=='yes']['LABEL'].values))
df_lab_desc[df_lab_desc['ref002']=='yes']['LABEL'].values



99


In [177]:
def get_filtered_reference(table_query, reference):
  """
    input: 
    table_query: the data table name (LABEVENTS, microbiology, CHARTEVENTS, ADMISSIONS)
    reference: a dataframe of the sepsis reference, containing predictors
    
    returns:
    
  """
  # convert reference string data to uppercase
  reference = reference.applymap(lambda x: x.upper() if type(x) == str else x)
  # filter by name of table (input)
  reference_filtered = reference[reference['Table'] == table_query]

  # check if variable exists in table input data
  reference_filtered['Exists'] = reference_filtered['Predictor'].apply(lambda x: 'yes' if x in df_lab_desc['LABEL'].values else 'no')
  
  # debugging to check the predictors not in df
  predictors_missing = reference_filtered[reference_filtered['Exists']=='no']['Predictor'].values
  
  return reference_filtered, predictors_missing


In [184]:
reference_filtered = get_filtered_reference('LABEVENTS', df_ref001)[0]

  reference = reference.applymap(lambda x: x.upper() if type(x) == str else x)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reference_filtered['Exists'] = reference_filtered['Predictor'].apply(lambda x: 'yes' if x in df_lab_desc['LABEL'].values else 'no')


In [None]:
df_lab

In [185]:
# Function to match 'Predictor' to 'LABEL' and return 'ITEMID'
def match_predictor_to_itemid(predictor):
    matching_rows = df_lab_desc[df_lab_desc['LABEL'].str.contains(predictor, case=False, na=False)]
    if matching_rows.empty:
        return None
    else:
        return matching_rows['ITEMID'].tolist()

# Apply the function to the 'Predictor' column and create a new column called 'Matching_ITEMID'
reference_filtered['Matching_ITEMID'] = reference_filtered['Predictor'].apply(match_predictor_to_itemid)
reference_filtered



Unnamed: 0,Predictor,Description,Category,Type,Table,Exists,Matching_ITEMID
12,PH,A BLOOD PH TEST IS A NORMAL PART OF A BLOOD GA...,LABORATORY VALUES,NUMERIC,LABEVENTS,yes,"[51347, 51349, 51350, 51351, 51352, 51365, 513..."
14,BASE EXCESS,EXCESS BICARBONATE (MMOL/L),LABORATORY VALUES,NUMERIC,LABEVENTS,yes,[50802]
15,GLUCOSE,SERUM GLUCOSE (MG/DL),LABORATORY VALUES,NUMERIC,LABEVENTS,yes,"[50809, 50842, 50931, 51014, 51022, 51034, 510..."
16,PCO2,THE PARTIAL PRESSURE OF CARBON DIOXIDE FROM AR...,LABORATORY VALUES,NUMERIC,LABEVENTS,yes,"[50818, 50830]"
17,WBC,LEUKOCYTE COUNT (COUNT/L),LABORATORY VALUES,NUMERIC,LABEVENTS,yes,"[51363, 51384, 51439, 51458, 51128, 51300, 515..."
18,OXYGEN SATURATION,OXYGEN SATURATION FROM ARTERIAL BLOOD (%) SAO2,LABORATORY VALUES,NUMERIC,LABEVENTS,yes,[50817]
19,POTASSIUM,POTASSIAM (MMOL/L),LABORATORY VALUES,NUMERIC,LABEVENTS,yes,"[50822, 50833, 50847, 50971, 51041, 51057, 510..."
20,CALCIUM,CALCIUM (MG/DL),LABORATORY VALUES,NUMERIC,LABEVENTS,no,"[51468, 51469, 51470, 50808, 50893, 51029, 510..."
21,HEMATOCRIT,HEMATOCRIT (%),LABORATORY VALUES,NUMERIC,LABEVENTS,yes,"[51348, 51369, 51422, 51445, 50810, 51115, 512..."
22,HEMOGLOBIN,HEMOGLOBIN (G/DL),LABORATORY VALUES,NUMERIC,LABEVENTS,yes,"[50805, 50811, 50814, 50852, 50855, 51212, 512..."


In [186]:
get_filtered_reference('LABEVENTS', df_ref002)[1]

  reference = reference.applymap(lambda x: x.upper() if type(x) == str else x)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reference_filtered['Exists'] = reference_filtered['Predictor'].apply(lambda x: 'yes' if x in df_lab_desc['LABEL'].values else 'no')


array(['CALCIUM', 'LACTIC ACID', 'LYMPHOCYTE', 'MONOCYTE', 'NEUTROPHIL',
       'PLATELET'], dtype=object)

In [207]:
df_ref001.head()

Unnamed: 0,Predictor,Description,Category,Type,Table
0,ICULOS,ICU length of stay (hours since ICU admission),Demographic data,Numeric,ICUSTAYS
1,HospAdmTime,The time between hospital and ICU admission,Demographic data,Date,ADMISSIONS
2,Age,Age (years),Demographic data,Numeric,PATIENTS
3,Gender,Female (0) or male (1),Demographic data,Categorical,PATIENTS
4,O2Sat,Pulse oximetry (%),Clinical time series data,Numeric,CHARTEVENTS


In [199]:
references = [df_ref001, df_ref002, df_ref003]


table_query = "LABEVENTS"
refcount = 0

# Function to check if any 'Predictor' exists in the 'LABEL'
def check_label_in_predictor(label):
  for predictor in df_reference['Predictor']:
      if predictor in label:
          # todo: match only word
          return 'yes'
  return 'no'


for reference in references: 
  # get unique reference name
  refcount+=1
  reference.name = "ref" + str(refcount)
  
  # filter reference by table
  df_reference = get_filtered_reference(table_query, reference)[0]
  
  # add reference for each feature
  #df_lab_desc[reference.name] = df_lab_desc['LABEL'].apply(lambda x: 'yes' if any(feature in x for feature in df_reference['Predictor']) else 'no')
  
  
  # Apply the function to the 'LABEL' column and create a new column called 'reference_name_exists'
  df_lab_desc[reference.name] = df_lab_desc['LABEL'].apply(check_label_in_predictor)


  reference = reference.applymap(lambda x: x.upper() if type(x) == str else x)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reference_filtered['Exists'] = reference_filtered['Predictor'].apply(lambda x: 'yes' if x in df_lab_desc['LABEL'].values else 'no')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_lab_desc[reference.name] = df_lab_desc['LABEL'].apply(check_label_in_predictor)
  reference = reference.applymap(lambda x: x.upper() if type(x) == str else x)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col

In [206]:
df_lab_desc[(df_lab_desc['ref1']=='yes') | (df_lab_desc['ref2']=='yes') | (df_lab_desc['ref3']=='yes')]

Unnamed: 0,ITEMID,LABEL,FLUID,ref1,ref2,ref3
0,51346,BLASTS,CEREBROSPINAL FLUID (CSF),no,no,yes
1,51347,EOSINOPHILS,CEREBROSPINAL FLUID (CSF),yes,no,no
2,51348,"HEMATOCRIT, CSF",CEREBROSPINAL FLUID (CSF),yes,no,no
3,51349,HYPERSEGMENTED NEUTROPHILS,CEREBROSPINAL FLUID (CSF),yes,yes,no
4,51350,IMMUNOPHENOTYPING,CEREBROSPINAL FLUID (CSF),yes,no,no
...,...,...,...,...,...,...
718,51519,YEAST,URINE,no,no,yes
720,51521,"ACID PHOSPHATASE, PROSTATIC",BLOOD,yes,no,no
728,51529,ESTIMATED ACTUAL GLUCOSE,BLOOD,yes,yes,no
732,51533,WBCP,BLOOD,yes,yes,yes


In [208]:
df_lab_desc[(df_lab_desc['ref1']=='yes') & (df_lab_desc['ref2']=='yes') & (df_lab_desc['ref3']=='yes')]

Unnamed: 0,ITEMID,LABEL,FLUID,ref1,ref2,ref3
17,51363,"WBC, CSF",CEREBROSPINAL FLUID (CSF),yes,yes,yes
38,51384,"WBC, JOINT FLUID",JOINT FLUID,yes,yes,yes
93,51439,"WBC, OTHER FLUID",OTHER BODY FLUID,yes,yes,yes
112,51458,"WBC, PLEURAL",PLEURAL,yes,yes,yes
168,50841,"CREATININE, ASCITES",ASCITES,yes,yes,yes
239,50912,CREATININE,BLOOD,yes,yes,yes
347,51021,"CREATININE, JOINT FLUID",JOINT FLUID,yes,yes,yes
358,51032,"CREATININE, BODY FLUID",OTHER BODY FLUID,yes,yes,yes
378,51052,"CREATININE, PLEURAL",PLEURAL,yes,yes,yes
393,51067,24 HR CREATININE,URINE,yes,yes,yes


In [159]:
df_lab_desc[df_lab_desc['LABEL'].str.contains('PLATE', case=False)]
# There are 4 items relating to platelets and neutrophils, 5 items related to monocytes, and 13 related to lymphocyctes. The data misspells the word 'ASPARATE'. We should also check for both AST and ALT. 

df_lab_desc[df_lab_desc['LABEL'].str.contains('MONOCYTE', case=False)]
# There are 5 monocytes. 
# We would likely be interested in those returned from Blood fluids
# 51253, 51254

df_lab_desc[df_lab_desc['LABEL'].str.contains('LYMPHOCYTE', case=False)]
# thirteen lymphocytes


df_lab_desc[df_lab_desc['LABEL'].str.contains('AMINOTRANSFERASE', case=False)]
#df_lab_desc['LABEL'] = df_lab_desc['LABEL'].replace('ASPARATE AMINOTRANSFERASE (AST)', 'ASPARTATE AMINOTRANSFERASE (AST)')

df_lab_desc[df_lab_desc['LABEL'].str.contains('CALCIUM')]
# ESTIMATED GFR (MDRD EQUATION) 
# ITEMID == 50920


Unnamed: 0,ITEMID,LABEL,FLUID,ref1,ref2
122,51468,CALCIUM CARBONATE CRYSTALS,URINE,yes,yes
123,51469,CALCIUM OXALATE CRYSTALS,URINE,yes,yes
124,51470,CALCIUM PHOSPHATE CRYSTALS,URINE,yes,yes
130,50803,"CALCULATED BICARBONATE, WHOLE BLOOD",BLOOD,yes,no
131,50804,CALCULATED TOTAL CO2,BLOOD,no,no
135,50808,FREE CALCIUM,BLOOD,yes,yes
137,50810,"HEMATOCRIT, CALCULATED",BLOOD,yes,no
220,50893,"CALCIUM, TOTAL",BLOOD,yes,yes
221,50894,CALCULATED FREE TESTOSTERONE,BLOOD,no,no
222,50895,CALCULATED TBG,BLOOD,no,no


In [None]:
# Shock Index 
# assuming chartevents has 'HEART RATE' and 'SYSTOLIC BP'
chartevents['SHOCK INDEX'] = chartevents['HEART RATE'] / chartevents['SYSTOLIC BP']



def calc_gfr(scr, age, gender, ethnicity):
#    https://github.com/chapmanbe/shilpy_6018_2017_term_project
    """Calculates the estimated Glomerular Filteration Rate(eGFR)
    Based on MDRD equation.
    GFR=175×(Scr)^-1.154×(Age)^-0.203×(0.742 if female)×(1.212 if AA)

    Arguments: 
        scr: Patient's serum creatinine level in mg/dL as float
        age: Patient's age in years as integer
        gender: Patient's gender "male" or "female" as string
        ethnicity: Patient's ethicity   

    Returns: 
        gfr: Patient's eGFR in mL/min rounded to decimal places.

    """
    if scr == 0.0 or age == 0.0:
        return 0

    if gender == 'F' and ethnicity == 'African American':
        egfr = round(175*(scr)**(-1.154) * age**(-0.203) * (0.742) * 1.212, 2)
    elif gender == 'F' and ethnicity != 'African American':
        egfr = round(175*(scr)**(-1.154) * age**(-0.203) * (0.742), 2)
    elif gender == 'M' and ethnicity == 'African American':
        egfr = round(175*(scr)**(-1.154) * age**(-0.203) * 1.212, 2)
    else:
        egfr = round(175*(scr)**(-1.154) * age**(-0.203), 2)    
    return egfr

In [210]:
df_ad = pd.read_csv('../data/ADMISSIONS.csv')

In [217]:
(df_ad[df_ad['HAS_CHARTEVENTS_DATA']==1])

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,21,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1
1,22,23,152223,2153-09-03 07:15:00,2153-09-08 19:10:00,,ELECTIVE,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Medicare,,CATHOLIC,MARRIED,WHITE,,,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,0,1
2,23,23,124321,2157-10-18 19:34:00,2157-10-25 14:00:00,,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME HEALTH CARE,Medicare,ENGL,CATHOLIC,MARRIED,WHITE,,,BRAIN MASS,0,1
3,24,24,161859,2139-06-06 16:14:00,2139-06-09 12:48:00,,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME,Private,,PROTESTANT QUAKER,SINGLE,WHITE,,,INTERIOR MYOCARDIAL INFARCTION,0,1
4,25,25,129635,2160-11-02 02:06:00,2160-11-05 14:55:00,,EMERGENCY,EMERGENCY ROOM ADMIT,HOME,Private,,UNOBTAINABLE,MARRIED,WHITE,2160-11-02 01:01:00,2160-11-02 04:27:00,ACUTE CORONARY SYNDROME,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58970,58593,98797,105447,2132-12-24 20:06:00,2132-12-25 12:00:00,2132-12-25 12:00:00,EMERGENCY,EMERGENCY ROOM ADMIT,DEAD/EXPIRED,Medicare,ENGL,CATHOLIC,WIDOWED,WHITE,2132-12-24 18:53:00,2132-12-24 21:41:00,ALTERED MENTAL STATUS,1,1
58971,58594,98800,191113,2131-03-30 21:13:00,2131-04-02 15:02:00,,EMERGENCY,CLINIC REFERRAL/PREMATURE,HOME,Private,ENGL,NOT SPECIFIED,SINGLE,WHITE,2131-03-30 19:44:00,2131-03-30 22:41:00,TRAUMA,0,1
58972,58595,98802,101071,2151-03-05 20:00:00,2151-03-06 09:10:00,2151-03-06 09:10:00,EMERGENCY,CLINIC REFERRAL/PREMATURE,DEAD/EXPIRED,Medicare,ENGL,CATHOLIC,WIDOWED,WHITE,2151-03-05 17:23:00,2151-03-05 21:06:00,SAH,1,1
58973,58596,98805,122631,2200-09-12 07:15:00,2200-09-20 12:08:00,,ELECTIVE,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Private,ENGL,NOT SPECIFIED,MARRIED,WHITE,,,RENAL CANCER/SDA,0,1


In [219]:
df_ref001

Unnamed: 0,Predictor,Description,Category,Type,Table
0,ICULOS,ICU length of stay (hours since ICU admission),Demographic data,Numeric,ICUSTAYS
1,HospAdmTime,The time between hospital and ICU admission,Demographic data,Date,ADMISSIONS
2,Age,Age (years),Demographic data,Numeric,PATIENTS
3,Gender,Female (0) or male (1),Demographic data,Categorical,PATIENTS
4,O2Sat,Pulse oximetry (%),Clinical time series data,Numeric,CHARTEVENTS
5,Resp,Respiration rate (breaths per minute),Clinical time series data,Numeric,CHARTEVENTS
6,HR,Heart rate (beats per minute),Clinical time series data,Numeric,CHARTEVENTS
7,Temp,Temperature (deg C),Clinical time series data,Numeric,CHARTEVENTS
8,DBP,Diastolic BP (mm Hg),Clinical time series data,Numeric,CHARTEVENTS
9,MAP,Mean arterial pressure (mm Hg),Clinical time series data,Numeric,CHARTEVENTS


In [220]:
df_ref002

Unnamed: 0,Predictor,Category,Table
0,AGE,Demographic,PATIENTS
1,DOBUTAMINE (Y/N),Medication,PRESCRIPTIONS
2,DOPAMINE (Y/N),Medication,PRESCRIPTIONS
3,EPINEPHRINE (Y/N),Medication,PRESCRIPTIONS
4,NOREPINEPHRINE (Y/N),Medication,PRESCRIPTIONS
5,SYSTOLIC BP x AGE,Engineered Feature,Derived
6,SHOCK INDEX (HR/SYSTOLIC BP),Engineered,Derived
7,SHOCK INDEX x AGE,Engineered,Derived
8,ANION GAP,Laboratory Result,LABEVENTS
9,ALBUMIN,Laboratory Result,LABEVENTS


In [221]:
df_ref003

Unnamed: 0,Predictor,Description,Table
0,Lactate,High Lactate Levels,labevents
1,Systolic BP,Low Blood Pressure (Hypotension) ('Systolic BP...,chartevents
2,Diastolic BP,Low Blood Pressure (Hypotension) ('Systolic BP...,chartevents
3,Glasgow Coma Scale,Altered Mental Status,chartevents
4,Heart Rate,High Heart Rate (Tachycardia),chartevents
5,Respiratory Rate,Rapid Breathing (Tachypnea),chartevents
6,WBC,High WBC Count,labevents
7,WBC,Low WBC Count,labevents
8,Procalcitonin,Elevated Procalcitonin,labevents
9,C-Reactive Protein,Elevated CRP,labevents
