# EmergencyZIP AI

## Intro

As a Data Scientist in Healthcare with 2+ years of industry and work experience, one of the valuable pieces of knowledge that I learned on the job is that Zipcode (Postal Code) is indeed an important factor in determining an individual's health status (thus, wherever the individual lives can determine how healthy or unhealthy an individual can be). Also, healthcare, especially the quality of healthcare, is also an important factor in determining the standard of life of an area as well. However, many people may not be familiar with all the medical facilities that their area may provide or may not know the best one to go to based on what illness or medical condition that they may experience now. Thus, I created the EmergencyZIP AI to help individuals determine the best medical facilities to attend based on the symptoms or conditions that they may be experiencing right now, and to take greater consideration of zipcode/postal code as one of the most important factors of an individual's health status. 

## Use Case and Solution Approach

Individuals may want to know or curious about what current illness or medical condition that they may be facing at the moment, and the nearest and most appropriate medical facilities in their area that can best treat their illness or condition. I created EmergencyZip AI to utilize AI capabilities to not only inform the individual of what illness or condition that they may be facing now, but also inform the individual of the nearest appropriate medical facilities that can best treat the patient with great consideration of the zipcode/postal code as one of the most important factors of an individual's health status. 

## Process

The AI utilizes Structured Output/JSON Mode/Controlled Generation, Few-Shot Prompting, and Grounding to operate. First, it uses JSON Mode to process patient demographic information and structures it to a formalized JSON form (just like filling out a patient form to receive medical care except simplier and more abstract). Then it utilizes Few-Shot Prompting and Structured Output/Controlled Generation to create a formalized request for the AI. Then it utilizes grounding to provide the best and most accurate potential diagnosis for the individual based on the current symptoms/conditions that the individual is facing now. Then using the diagnosis and the current zipcode of the individaul (the AI uses Few-Shot Prompting to extract the zipcode of the individual from the JSON form), the AI finds the nearest and best medical facilities that can treat the individual for the diagnosed illness or condition using grounding. The AI also utilizes grounding to inform the individual payment and insurance plan options that the suggested medical facilities can accept as payment for the treatment. 

## Innovation/Novelty

While the internet is a powerful tool to use to research medical facilities available in the area, with so many search results provided, it can be messy and overwhelming at times to find the best medical facility for treatment of illnesses and medical conditions or even know what actual/potential illness or medical condition that the individual may be facing. With EmergencyZIP AI, the responses provided are tailored and focused only on the individual and the potential illness or medical condition that they are facing now. Also while doctors usually do a great job with diagnosing patients, there could be human bias or error that can be involved in the diagnosis process, so the AI takes in consideration of these potential biases (taking in consideration of patient demographic information) and gives the most accurate diagnosis as possible without being biased. 

### Importing Relevant Libraries and Technologies 

I set up Gemini API to implement my AI (the code also retrieves the API Key from Kaggle Secrets to retrieve the Google API that makes it possible for the AI to work) and imported the Pandas and Random libraries (and other unnamed libraries) to do my test run to check if the AI works as intended or not. 

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import random

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/patient-symptom-dataset/patient_symptom_data_with_demographics_imperial.csv


The cell below removes any irrelevant libraries that may conflict with the Google Generative AI capabilities and install the Google Gemini Generative AI package

In [2]:
!pip uninstall -qqy jupyterlab  # Remove unused packages from Kaggle's base image that conflict
!pip install -U -q "google-genai==1.7.0"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h

Install all relevant libraries for the Google Generative AI capabilities

In [3]:
from google import genai
from google.genai import types

from IPython.display import HTML, Markdown, display

In [4]:
from google.api_core import retry


is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

genai.models.Models.generate_content = retry.Retry(
    predicate=is_retriable)(genai.models.Models.generate_content)

Retrieve the Google API Key from Kaggle Secrets to make possible for the AI to work in this code.

In [5]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

In [6]:
client = genai.Client(api_key=GOOGLE_API_KEY)

## Generative AI Capabilities Used and Reasoning 

These are the 3 Generative AI Capabilities Used in this AI and why they are used in this AI: 
* **Structured Output/JSON Mode/Controlled Generation**: For JSON Mode, I want to somewhat mimic the forms that patients fill out when they are requesting care from a medical facility, but simplier and more abstract. For Structured Output and Controlled Generation, I want my responses to be as accurate and precise as possible where creativity may not be required but accuracy is definitely required to get the best responses as possible.
* **Few-Shot Prompting**: First of all, I want to create requests for the AI to know who it is providing information for, but I want the formatting of the requests to be as consistent as possible so that the information does not get muddled up which would make things more difficult for the AI to process and provide information for. Then, I also want to extract the zipcode from the JSON form, but sometimes the AI can hallucinate and return irrelevant or incorrect information, so I want the let the AI know how to return the zipcode correctly.
* **Grounding**: This is the central capability of the AI since the AI is responsible for finding all the relevant possible nearest locations to the patient based on postal code/zipcode. Since we are exploring so many zipcodes and it is possible that the AI does not know every zipcode or medical facilities per zipcode, we need to rely on external sources to get the best answers possible. However, since we don't have a document listing out every medical facility per zipcode or all zipcodes, we would need to rely on the internet to give us answers. 

### Generative AI Capability #1: Structured Output/JSON Mode/Controlled Generation

#### JSON Mode

I created a Python class MedicalPatients to format the JSON form as a simple and abstract patient form, and then I created a get_json_form function that takes in an entry (in this test case, it's a Pandas dataframe entry/row) and structure it to a JSON form as structured in the MedicalPatient class. 

##### get_json_form function: 

**Description**: Function takes in an entry (in this case, a Pandas dataframe row) and structures the information provided into a formalized JSON form as structured in the MedicalPatients class. 

**Input**: ***patient_symptom_entry***: Dataframe Pandas row (for now) of all information needed to be processed as JSON form 

**Output**: ***json_form***: String of formalized JSON form

In [7]:
import typing_extensions as typing

class MedicalPatient(typing.TypedDict): 
    zipcode: int
    age: int
    gender: str
    race: str
    ethnicity: str
    height: float
    weight: float
    symptoms: list[str]

def get_json_form(patient_symptom_entry): 
    entry_str = patient_symptom_entry.to_string()
    json_response = client.models.generate_content(
        model='gemini-2.0-flash', 
        config=types.GenerateContentConfig(
            temperature=0.1, 
            response_mime_type="application/json", 
            response_schema=MedicalPatient
        ), 
        contents=entry_str
    )
    json_form = json_response.text
    return json_form

#### Structured Output/Controlled Generation

**Description**: I created the get_general_model_config function to return a Gemini-generated output based on the temperature, top_p, and maximum number of tokens to return inputs.

**Input**: 
* ***temperature***: double that reflects the degree of randomness in selecting tokens for output
* ***top_p***: double that is the maximum culmulative probability that the model can reach to select tokens as candidates
* ***max_output_tokens***: int that indicates the maximum number of tokens that should be provided and included in the input

**Output**: ***str_output***: String of Gemini-generated output based on input parameters provided

In [8]:
def get_general_model_config(temperature, top_p, max_output_tokens): 
    str_output = types.GenerateContentConfig(
        temperature=temperature, 
        top_p=top_p, 
        max_output_tokens=max_output_tokens
    )
    return str_output

### Generative AI Capability #2: Few-Shot Prompting

**Description**: I created the few_shot_func function to implement the few-shot prompting AI capability as code

**Input**: 
* ***model_input***: String that is the input (e.g. a JSON form) that the model takes in consideration of to provide an output response
* ***prompt***: String that is the input prompt to provide to the Gemini model to provide a response for; in few-shot prompting, the input prompt provides a few examples with their expected responses to the Gemini model so that the Gemini models knows what to produce for their output format-wise and what their output should look like
* ***config***: Python method that is configuration to be used for the Gemini model to produce output (*refer to get_general_model_config as example of Python method to use as possible parameter value*)

**Output**: ***str_output***: String of Gemini-generated output based on input parameters provided

In [9]:
def few_shot_func(model_input, prompt, config): 
    few_shot_response = client.models.generate_content(
        model='gemini-2.0-flash', 
        config=config,
        contents=[prompt, model_input]
    )
    str_response = few_shot_response.text
    return str_response

### Generative AI Capability #3: Grounding

**Description**: I created the grounding_func function to implement the grounding AI capability as code

**Input**: ***input_prompt***: String that is the input prompt to provide to the Gemini model to provide a response for

**Output**: ***str_response***: String that is the output provided by the Gemini model as response to the input prompt

In [10]:
def grounding_func(input_prompt): 
    config_with_search = types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())],
    )
    response = client.models.generate_content(
        model='gemini-2.0-flash', 
        contents=input_prompt, 
        config=config_with_search,
    )
    rc = response.candidates[0]
    str_response = rc.content.parts[0].text
    return str_response

## EmergencyZIP AI Test Run

Kaggle Notebooks seem to not work or do well with Python input, so instead, we generated a dataset of patients with relevant demographic information and their symptoms through ChatGPT (so the dataset below is ChatGPT-generated), and then the code picks a random patient to test run the AI to make sure it actually works as intended. 

In [11]:
patient_symptom_df = pd.read_csv('/kaggle/input/patient-symptom-dataset/patient_symptom_data_with_demographics_imperial.csv')
patient_symptom_df

Unnamed: 0,PatientID,ZipCode,Age,Gender,Symptoms,Race,Ethnicity,Height_in,Weight_lb
0,P001,70112,32,Female,Joint pain;Runny nose;Loss of taste;Sore throat,Other,Hispanic or Latino,73.6,229.3
1,P002,90002,21,Female,Joint pain;Muscle ache;Dry cough,White,Hispanic or Latino,76.8,262.3
2,P003,20001,46,Male,Chest tightness;Sneezing;Fever;Breathing diffi...,Black or African American,Not Hispanic or Latino,67.3,187.4
3,P004,73301,45,Male,Fatigue;Abdominal pain,White,Not Hispanic or Latino,67.7,183.0
4,P005,96813,23,Non-binary,Rash;Sore throat;Loss of smell;Abdominal pain;...,Pacific Islander,Not Hispanic or Latino,74.8,211.6
...,...,...,...,...,...,...,...,...,...
95,P096,55414,80,Female,Blurred vision;Dry cough,Black or African American,Hispanic or Latino,66.5,264.6
96,P097,10001,88,Male,Joint pain;Palpitations,White,Not Hispanic or Latino,61.8,152.1
97,P098,55401,55,Non-binary,Diarrhea;Palpitations;Low-grade fever;Joint pain,Native American,Hispanic or Latino,68.5,163.1
98,P099,68102,83,Non-binary,Sneezing;Fatigue;Headache,Native American,Not Hispanic or Latino,78.7,251.3


We utilize the randint function from the Random library to select a random patient to process information and provide responses for. We then utilize JSON Mode through calling the get_json_form method to have the AI mimic filling out a patient form in a simple and abstract way, and to also have the patient/individual demographic information structured in a formalized way. 

In [12]:
patient_index = random.randint(0, 99)
patient_json_form = get_json_form(patient_symptom_df.iloc[patient_index])
print(patient_json_form)

{
  "zipcode": 90002,
  "age": 51,
  "gender": "Male",
  "race": "Native American",
  "ethnicity": "Not Hispanic or Latino",
  "height": 78.0,
  "weight": 227.1,
  "symptoms": [
    "Chest pain",
    "Dizziness",
    "Vomiting",
    "Breathing difficulties"
  ]
}


In this first round of few-shot prompting, we utilize few-shot prompting to create a formalized request for the AI model to know what type of patient or individual that the model will be retrieving information for. Few-shot prompting is also used to ensure accuracy and consistency of the format of the request (the model is always clear on what information to extract from the patient and how to process that information into a formalized request). We also used structured output/controlled generation to ensure that the model is consistent in generating the output response as possible. 

In [13]:
model_config = get_general_model_config(0.1, 1, 500)

few_shot_prompt = """Parse JSON into a string request and only return output: 

EXAMPLE: 
{
"zipcode": 78664, 
"age": 30, 
"gender": "Female", 
"race": "White", 
"ethnicity": "Not Hispanic or Latino", 
"height": 64.6, 
"weight": 130.5, 
"symptoms": ["Cough", "Fever", "Sore throat"]
}
"30-year old Not Hispanic or Latino White female patient with height of 64.6 inches and weight of 130.5 pounds from zipcode 78664 currently experiencing symptoms of cough, fever, and sore throat."

EXAMPLE: 
{
"zipcode": 77055, 
"age": 25, 
"gender": "Non-binary", 
"race": "Asian", 
"ethnicity": "Not Hispanic or Latino", 
"height": 60.5, 
"weight": 100.2, 
"symptoms": ["Joint pain", "Loss of smell"]
}
"25-year old Not Hispanic or Latino Asian non-binary patient with height of 60.5 inches and weight of 100.2 pounds from zipcode 77055 currently experiencing symptoms of joint pain and loss of smell."

EXAMPLE: 
{
"zipcode": 78681, 
"age": 15, 
"gender" "Male", 
"race": "Black", 
"ethnicity": "Hispanic or Latino", 
"height": 74.6, 
"weight": 180.5, 
"symptoms": ["Fatigue", "Loss of Vision", "Headache", "Joint pain"]
}
"15-year old Hispanic or Latino Black male patient with height of 74.6 inches and weight of 180.5 pounds from zipcode 78681 currently experiencing symptoms of fatigue, loss of vision, headache, and joint pain."
"""

patient_medical_request = few_shot_func(patient_json_form, few_shot_prompt, model_config)
print(patient_medical_request)

"51-year old Not Hispanic or Latino Native American male patient with height of 78.0 inches and weight of 227.1 pounds from zipcode 90002 currently experiencing symptoms of chest pain, dizziness, vomiting, and breathing difficulties."



In this first round of grounding, we utilize grounding to provide a potential but precise diagnosis based on the symptoms and other relevant demographic information, especially zipcode, provided. We are not only utilizing the data that the AI has already been provided to conduct the diagnosis, but also other sources, like the internet as well. 

*Note: If you have restarted the Kaggle notebook, you may have to run the cell below twice to actually get a helpful and accurate answer.* 

In [14]:
diagnosis_prompt = patient_medical_request + "What disease are they most likely to experience now? Only return the singular most likely disease, please."
diagnosis = grounding_func(diagnosis_prompt)
Markdown(diagnosis)

To determine the most likely disease, I need to consider the patient's characteristics (age, ethnicity, sex, height, weight), location (zip code), and symptoms. Let's use search queries to narrow down potential conditions.



In this second round of few-shot prompting, we utilize few-shot prompting to extract the zipcode of the patient from the JSON form (since at this point, we are using the JSON form to get our data information). Few-shot prompting is utilized to ensure accuracy of the output based on the current format of the JSON form (in this case, a simplified and abstract version of a patient form). 

In [15]:
model_config_two = get_general_model_config(0, 1, 50)

few_shot_prompt_two = """Parse JSON to return only the Zipcode: 

EXAMPLE: 
{
"zipcode": 78664, 
"age": 30, 
"gender": "Female", 
"race": "White", 
"ethnicity": "Not Hispanic or Latino", 
"height": 64.6, 
"weight": 130.5, 
"symptoms": ["Cough", "Fever", "Sore throat"]
}
78664

EXAMPLE: 
{
"zipcode": 77055, 
"age": 25, 
"gender": "Non-binary", 
"race": "Asian", 
"ethnicity": "Not Hispanic or Latino", 
"height": 60.5, 
"weight": 100.2, 
"symptoms": ["Joint pain", "Loss of smell"]
}
77055

EXAMPLE: 
{
"zipcode": 78681, 
"age": 15, 
"gender" "Male", 
"race": "Black", 
"ethnicity": "Hispanic or Latino", 
"height": 74.6, 
"weight": 180.5, 
"symptoms": ["Fatigue", "Loss of Vision", "Headache", "Joint pain"]
}
78681
"""

patient_zipcode = few_shot_func(patient_json_form, few_shot_prompt_two, model_config_two)
print(patient_zipcode)

90002



In this second round of grounding, we utilize grounding to provide the nearest appropriate medical facilities for the individual to go to for treatment based on zipcode. We are not only utilizing the data that the AI has already been provided to find the nearest location, but also other sources, like the internet as well. 

*Note: Would not recommend to run the cell below more than once to get helpful and accurate information*

In [16]:
recommendations_prompt = diagnosis + 'Can you get me the nearest best-rated medical facilities with their addresses that can treat the disease for zip code ' + str(patient_zipcode) + '?'
recommendations = grounding_func(recommendations_prompt)
Markdown(recommendations)

Okay, I will search for the highest-rated medical facilities near the 90002 zip code. Here are some options based on my search:

*   **Watts Medical Offices - Kaiser Permanente:** Located within the 90002 zip code at 1465 E 103rd St, Los Angeles, CA 90002.
*   **Watts Healthcare Corporation:** Located at 10300 Compton Ave, Los Angeles, CA 90002.

Other highly-rated hospitals within a few miles of the 90002 zip code include:

*   **PIH Health Good Samaritan Hospital:** Located at 1225 Wilshire Blvd, Los Angeles, CA 90017.
*   **Adventist Health White Memorial:** Located at 1720 E Cesar E Chavez Ave, Los Angeles, CA 90033.
*   **California Hospital Medical Center:** Located at 1401 South Grand Avenue, Los Angeles, CA 90015.
*   **UCLA Medical Center:** Located at 757 Westwood Plaza, Los Angeles, CA 90095. Nationally ranked and the #1 hospital in Los Angeles and California.
*   **Cedars-Sinai Medical Center:** Located at 8700 Beverly Boulevard, Los Angeles, CA 90048. Also nationally ranked.
*   **Keck Medical Center of USC:** Located in Los Angeles.

It's worth noting that hospital ratings can vary based on the source (e.g., U.S. News & World Report, Centers for Medicare & Medicaid Services (CMS)).


In this third round of grounding, we utilize grounding to provide payment and insurance plan information for each of the location recommended for treatment. We are not only utilizing the data that the AI has already been provided to retrieve the information, but also other sources, like the internet as well. 

*Note: Would not recommend to run the cell below more than once to get helpful and accurate information*

In [17]:
payments_prompt = "For each of the locations recommended to me: " + recommendations + " what types of payments and insurance plans are accepted for patients coming in for treatment?"
payments = grounding_func(payments_prompt)
Markdown(payments)

It's important to contact each facility directly or check their website to confirm the most up-to-date information on accepted insurance plans and payment options. Insurance plans and coverage can change, so direct confirmation is always best.

Here's a summary of the payment and insurance information for each facility, based on the search results:

*   **Watts Medical Offices - Kaiser Permanente:**
    *   Kaiser Permanente plans are accepted.
    *   They follow the same quality review processes for practitioners and facilities in Marketplace Silver-tier plans as they do for all other Kaiser Foundation Health Plan products and lines of business.
    *   Contact them directly to confirm specific plan coverage.

*   **Watts Healthcare Corporation:**
    *   Accepts most insurance plans, including Medi-Cal Managed Care options, California Health and Wellness, and Anthem Blue Cross Partnership Plans, Medicare, and private pay.
    *   Offers a sliding fee discount program for those who are uninsured, underinsured, ineligible for government programs, or otherwise unable to pay.
    *   They have certified enrollment counselors to help with applying for Medicaid, Medicare, or insurance plans through California's insurance marketplace.
    *   All insurances are accepted. Sliding scale is available for those without insurance.

*   **PIH Health Good Samaritan Hospital:**
    *   Accepts Aetna, Affiliated Health Funds, Anthem Blue Cross, Beech Street Network, Blue Shield, Centivo, Cigna, First Health Network / Coventry Health Care, and Health Net.
    *   For HMO plans, patients need to select a PIH Health Physician as their Primary Care Physician.

*   **Adventist Health White Memorial:**
    *   Accepts most major health plans, including Medi-Cal and Medicare.
    *   They can assist with enrolling patients into a plan that fits their needs.
    *   Accepts L.A. Care, Health Net, Blue Shield Promise, Anthem, and Molina for Managed Care Medi-Cal.
    *   Offers financial assistance and payment options for those who are uninsured or need additional resources.

*   **California Hospital Medical Center:**
    *   It's advisable to contact the hospital directly or visit their website to get the most accurate and up-to-date information on accepted insurance plans and payment options.

*   **UCLA Medical Center:**
    *   Accepts Medicare-assignment and private indemnity insurance.
    *   Participates in over 100 local and national managed care networks.
    *   It's best to call 1-800-UCLA-MD1 to confirm whether your specific insurance is accepted.
    *   They accept HMOs, PPOs, Medicare, Medicare Advantage and Medi-Cal.

*   **Cedars-Sinai Medical Center:**
    *   Contracted with over 100 types of insurance plans, including private insurance, HMO, PPO, POS, EPO, Medicare, and Medi-Cal.
    *   If a plan is not listed, it is recommended to contact the health insurance provider directly.

*   **Keck Medical Center of USC:**
    *   Accepts Medicare-assignment and private insurance.
    *   Works with many local and national managed care networks.
    *   It's recommended to call (800) USC-CARE to check if your insurance is accepted.
    *   They work closely with patients and their insurance company to create a seamless payment process.


## Limitations and Possible Next Steps

While the AI works as intended, this is a very basic setup/implementation of the AI (to be honest, we were short on time to implement the more advanced features/capabilities to make the AI run more efficiently and impressively). Thus, possible next steps/future directions are as follows: 

## Conclusion and Potential Impact 

We intend for the AI to empower individuals to make informed decisions that is best for their health and to know more and better of the variety of medical options provided for them to take care of their health, especially in illness and bad health (thus, we hope that with this AI, individuals take the initiative to become healthier and more health-conscious individuals). We also hope that the AI can help improve health outcomes for every zipcode/postal code as possible. 