Kunaal Agarwal (aad5ha), Daivik Siddhi (awr7mj), Shaurya Bedi (wvr4fe)

In [9]:
import pandas as pd
import numpy as np
import random

# Auto-triage Proof of Concept simulation

### Emergency Department Entry Survey

As patients enter the Emergency Room (ER) they often complete a brief health questionnaire. The questionnaire includes the relevant medical information that clinicans quickly sift through before assessing the patient themselves. We've simulated the results of this questionnaire and included a plethora of initial features. The core features revolve around symptomology, with an associated severity score, patient demographics, vitals, and other key components faciltating high-quality medical care. 

In [16]:
# Symptom severity labels: {0: mild, 1: moderate, 2: severe}
symptom_templates = [
    ("Chest pain and trouble breathing", 2),
    ("Sudden weakness on one side of body", 2),
    ("Severe headache with vision changes", 2),
    ("Pounding headache with neck stiffness", 2),
    ("Severe allergic reaction with hives", 2),
    ("Repeated episodes of chest tightness", 2),
    ("Confusion and slurred speech", 2),
    ("Abdominal pain and vomiting", 1),
    ("Sharp abdominal cramps", 1),
    ("High fever and body aches", 1),
    ("Moderate shortness of breath when walking", 1),
    ("Chronic cough worsening over weeks", 1),
    ("Persistent diarrhea and dehydration", 1),
    ("Back pain after lifting heavy object", 1),
    ("Nausea and lightheadedness", 1),
    ("Multiple falls, leg pain", 1),
    ("Intermittent palpitations", 1),
    ("Cut on hand with mild bleeding", 0),
    ("Twisted ankle, slight swelling", 0),
    ("Mild rash on arms", 0),
    ("Sore throat and cough", 0)
]

genders = ['Male', 'Female']
ethnicities = ['White', 'Black', 'Hispanic', 'Asian', 'Other']
insurance_options = ['Private', 'Medicare', 'Medicaid', 'Uninsured']
chronic_diseases = ['None', 'Hypertension', 'Diabetes', 'Asthma', 'COPD', 'Heart Disease', 'Chronic Kidney Disease']
medications = ['Aspirin', 'Metformin', 'Lisinopril', 'Albuterol', 'Atorvastatin', 'Insulin', 'Warfarin']
allergies = ['None', 'Penicillin', 'Peanuts', 'Latex', 'NSAIDs']
languages = ['English', 'Spanish', 'Other']
arrival_modes = ['Ambulance', 'Walk-in', 'Referral']
times_of_day = ['Morning', 'Afternoon', 'Evening', 'Night']

In [17]:
random.seed(42)
np.random.seed(42)

n_samples = 1000
records = []

for _ in range(n_samples):
    gender = random.choice(genders)
    ethnicity = random.choice(ethnicities)
    insurance = random.choice(insurance_options)
    age = np.random.randint(18, 90)
    pain_level = np.random.randint(0, 11)
    duration = round(np.random.exponential(scale=6), 1)
    desc, label = random.choice(symptom_templates)
    
    noise = np.random.rand()
    if noise < 0.05:
        label = min(label + 1, 2)
    elif noise > 0.95:
        label = max(label - 1, 0)
    
    chronic = random.choice(chronic_diseases)
    if chronic == 'None':
        meds = ['None']
    else:
        meds = random.sample(medications, k=np.random.randint(1, 3))
    
    heart_rate = np.random.randint(50, 140)
    systolic_bp = np.random.randint(90, 181)
    diastolic_bp = np.random.randint(60, 101)
    respiratory_rate = np.random.randint(12, 31)
    temperature_c = round(np.random.normal(loc=37, scale=1), 1)  # around normal
    oxygen_sat = np.random.randint(85, 101)

    arrival = random.choice(arrival_modes)
    time_day = random.choice(times_of_day)
    
    allergy = random.choice(allergies)
    language = random.choice(languages)
    
    height_cm = np.random.randint(150, 201)
    weight_kg = np.random.randint(50, 121)
    bmi = round(weight_kg / ((height_cm / 100) ** 2), 1)
    
    records.append({
        'age': age,
        'gender': gender,
        'ethnicity': ethnicity,
        'insurance_status': insurance,
        'pain_level': pain_level,
        'symptom_duration_hrs': duration,
        'symptom_description': desc,
        'chronic_disease': chronic,
        'current_medications': ", ".join(meds),
        'heart_rate': heart_rate,
        'systolic_bp': systolic_bp,
        'diastolic_bp': diastolic_bp,
        'respiratory_rate': respiratory_rate,
        'temperature_c': temperature_c,
        'oxygen_saturation': oxygen_sat,
        'arrival_mode': arrival,
        'time_of_day': time_day,
        'known_allergies': allergy,
        'language_proficiency': language,
        'height_cm': height_cm,
        'weight_kg': weight_kg,
        'bmi': bmi,
        'severity': label
    })

df_enhanced = pd.DataFrame(records)
df_enhanced.to_csv('data/synthetic_patient_survey_data.csv', index=False)
df_enhanced.head()

Unnamed: 0,age,gender,ethnicity,insurance_status,pain_level,symptom_duration_hrs,symptom_description,chronic_disease,current_medications,heart_rate,...,temperature_c,oxygen_saturation,arrival_mode,time_of_day,known_allergies,language_proficiency,height_cm,weight_kg,bmi,severity
0,69,Male,White,Medicaid,10,9.1,Abdominal pain and vomiting,Hypertension,"Metformin, Insulin",132,...,37.3,92,Ambulance,Morning,NSAIDs,Spanish,152,71,30.7,1
1,70,Male,White,Private,1,7.7,Confusion and slurred speech,Hypertension,"Atorvastatin, Warfarin",113,...,38.0,96,Ambulance,Afternoon,NSAIDs,Spanish,171,98,33.5,2
2,76,Male,Asian,Medicaid,9,0.3,Chest pain and trouble breathing,Chronic Kidney Disease,Warfarin,111,...,35.8,91,Ambulance,Night,Peanuts,Spanish,167,53,19.0,1
3,77,Male,Black,Medicaid,1,2.9,Pounding headache with neck stiffness,,,51,...,37.4,96,Walk-in,Morning,Peanuts,Spanish,156,93,38.2,2
4,25,Female,White,Uninsured,2,14.4,Cut on hand with mild bleeding,,,53,...,36.5,86,Walk-in,Morning,NSAIDs,Spanish,175,93,30.4,0


### Conversational agent interfrace simulation

Following the survey results we intend to use a AI-speech model to interact with each patient and ask customized follow-up question to gain more information about the patient's condition. This would mimic the initial interaction a patient would go through with a nurse or physician assistant. 