# Prescription Evaluation with GPT4 and openFDA

This notebook explores the possibility of using large language model API, integrated with a drugs database, for the purpose of checking a doctor's prescription on a patient for potential drug interactions and adverse effects specific to the patient.

In [None]:
from tqdm import tqdm

## GPT API

In [None]:
!pip install openai
import openai

Collecting openai
  Using cached openai-0.28.0-py3-none-any.whl (76 kB)
Installing collected packages: openai
Successfully installed openai-0.28.0


In [None]:
# Initialize the OpenAI API with your API key
openai.api_key = 'your_key'  # Replace with your GPT key

def ask_gpt(question, model="gpt-3.5-turbo"):
    """
    Query the GPT-3.5 Turbo model with a given question.

    Parameters:
    - question (str): The input question or prompt for the model.

    Returns:
    - str: The model's response.
    """

    response = openai.ChatCompletion.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a knowledgeable medical database designed to provide concise and direct answers to medical questions."},
            {"role": "user", "content": question}
        ]
    )

    return response.choices[0].message['content']

In [None]:
prompt = """I am a doctor, I would like you to check my prescription:
medical history: Hypertension, Type 2 Diabetes, and Asthma.
symptoms: Persistent cough, fever, and fatigue.
My prescription: Lisinopril 10mg daily, Metformin 500mg twice daily, and Albuterol as needed for asthma attacks.
Drug contexts:
- Lisinopril: Ingredients: ACE inhibitor. Adverse effects: Dizziness, dry cough, elevated blood potassium levels.
- Metformin: Ingredients: Oral antihyperglycemic agent. Adverse effects: Stomach upset, diarrhea, low blood sugar.
- Albuterol: Ingredients: Bronchodilator. Adverse effects: Tremors, nervousness, increased heart rate.

Please answer the following questions in concise point form, taking into account the provided drug context:
- Possible interactions between prescribed drugs?
- Adverse effect of given drugs that are specifically related to patient’s pre-existing conditions and medical history?

At the end of your answer, evaluate the level of dangerousness of this treatment, based on interactions and adverse effects. Dangerousness is categorized as: LOW, MEDIUM, HIGH
Your answer should look like this:
`
* interactions:
- <interaction 1>
- <interaction 2>
- ...

* adverse effects:
- <adverse effect 1>
- <adverse effect 2>
- ...`

* dangerousness: <LOW / MEDIUM / HIGH>

Note that you don't have to include any interactions or adverse effect, only those that are necessary.
"""

response = ask_gpt(prompt)
print(response)

## OpenFDA API

In [None]:
import requests

def trim_openfda_response(json_response):
    """Trim the openFDA JSON response to include only specific fields.

    Parameters:
    - json_response (dict): The raw JSON response from the openFDA API.

    Returns:
    - dict: A trimmed version of the JSON response.
    """

    # List of desired fields
    desired_fields = [
        "spl_product_data_elements",
        "boxed_warning",
        "contraindications",
        "drug_interactions",
        "adverse_reactions",
        "warnings"
    ]

    trimmed_response = {}

    # Check if results are present in the response
    if 'results' in json_response:
        for field in desired_fields:
            if field in json_response['results'][0]:
                trimmed_response[field] = json_response['results'][0][field]

    return trimmed_response

def search_openfda_drug(drug_name):
    """Search for a drug in the openFDA database.

    Parameters:
    - drug_name (str): The name of the drug to search for.

    Returns:
    - dict: The JSON response from the openFDA API containing drug information, or None if there's an error.
    """

    base_url = "https://api.fda.gov/drug/label.json"
    query = f"?search=openfda.generic_name:{drug_name}&limit=1"

    try:
        response = requests.get(base_url + query)

        # Check for successful request
        if response.status_code == 200:
            return response.json()

    except requests.RequestException:
        # If any request-related exception occurs, simply return None
        print(f"Error encountered searching for drug {drug_name} with code {response.status_code}.")

    return None

In [None]:
# Test the function
drug_info = search_openfda_drug("Lisinopril")
print(drug_info)



### Text summarization

The word count for each field is too long for GPT models, we need to trim it down by providing a summary

**Ignore this for now, if we use GPT3.5 Turbo with 16K context we don't need any of this**

In [None]:
def get_word_count_by_field(trimmed_data):
    """Calculate the word count for each field in the trimmed data.

    Parameters:
    - trimmed_data (dict): The trimmed data dictionary.

    Returns:
    - dict: Dictionary where keys are field names and values are word counts.
    """

    word_counts = {}

    for key, value in trimmed_data.items():
        if isinstance(value, list):
            # If the value is a list, convert it into a single string
            value_str = ' '.join(value)
        else:
            value_str = str(value)

        word_counts[key] = len(value_str.split())

    return word_counts


In [None]:
!pip install transformers


Collecting transformers
  Downloading transformers-4.33.2-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.17.1-py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.8/294.8 kB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m35.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m39.6 MB/s[0m eta [36m0:00:0

We choose BART for simplicity and state-of-the-art performance

In [None]:
from transformers import BartForConditionalGeneration, BartTokenizer

model_name = "facebook/bart-large-cnn"
model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:
def chunk_text(text, chunk_size=3072, max_chunks=5):    # The chunk size is set with max input tokens of 1024 and average token length of 4 chars in mind
    """Breaks the text into chunks for processing."""
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
    return chunks[:max_chunks]

def summarize_text(text, max_output_length=100):
    """Summarize the input text using BART."""
    # Break the text into chunks
    text_chunks = chunk_text(text)

    summarized_chunks = []

    for chunk in tqdm(text_chunks):
        inputs = tokenizer.encode("summarize: " + chunk, return_tensors="pt", max_length=1024, truncation=True)
        summary_ids = model.generate(inputs, max_length=max_output_length, min_length=20, length_penalty=2.0, num_beams=4, early_stopping=True)
        summarized_chunks.append(tokenizer.decode(summary_ids[0], skip_special_tokens=True))

    # Join the summarized chunks to form a final summary
    return ' '.join(summarized_chunks)


In [None]:
def summarize_field(info, field):
    # Summarize the given field in info
    warnings_text = info.get(field, [""])[0]

    # Summarize the warnings text
    return summarize_text(warnings_text)

In [None]:
def summarize_fields(data):
    """
    Summarizes the specified fields in the input dictionary.

    Parameters:
    - data (dict): Dictionary containing fields to be summarized.

    Returns:
    - dict: Dictionary with summarized fields.
    """

    # Fields to be summarized
    fields_to_summarize = [
        "contraindications",
        "drug_interactions",
        "adverse_reactions",
        "warnings"
    ]

    summarized_data = {}

    for field in fields_to_summarize:
        if field in data:
            summarized_data[field] = summarize_text(data[field][0])

    return summarized_data

In [None]:
summaries = summarize_fields(trimmed_info)

100%|██████████| 1/1 [00:41<00:00, 41.10s/it]
100%|██████████| 3/3 [01:42<00:00, 34.33s/it]
100%|██████████| 3/3 [01:35<00:00, 31.76s/it]
100%|██████████| 4/4 [01:37<00:00, 24.41s/it]


In [None]:
for field, summary in summaries.items():
    print(field + ": " + str(len(summary) // 4))

contraindications: 70
drug_interactions: 205
adverse_reactions: 215


## Test entry

Let's test the entire pipeline by providing a sample patient history and sample doctor prescription

### Inputs

We will be using MIMIC for data entries. The entries in the dataset has all of patient history, symptoms as well as prescriptions. We parse them into drug names and pass each into OpenFDA for more information.

In [None]:
input_text = "admission date discharge date date birth sex service medicine allergy known allergy adverse drug reaction attending last name un chief complaint shortness breath major surgical invasive procedure esophagogastroduodenoscopy endoscopic clipping intubation extubation history present illness yo male history recent admission av block presumed lyme disease htn dm prior imi w systolic chf ef h gib unclear etiology presented acute onset dyspnea lying bed home lasting two hour patient discharged cardiology service following admission dyspnea found new heart block elevated troponin st change ekg concern new onset av block secondary lyme disease patient discharged home ceftriaxone also restarted aspirin admission tonight patient started dyspnea home rest pt denies chest pain complains nausea episode non bloody emesis looked dark brown got go bathroom felt lightheaded fell hit head lose consciousness pt endorses dark stool noticed since starting iron pt denies fever cough abdominal pain ed initial v ra exam significant pale conjunctiva guaiac positive dark stool lab notable hct previous hct day ago wbc potassium bicarb creatinine lactate inr ekg significant sinus rhythm street address elevation ii iii avf twi avl felt consistent prior two gauge iv placed patient transfused unit prbcs additional unit crossmatched ng lavage marroon return clear patient given protonix bolus gtt potassium patient received calcium chloride insulin albuterol amp sodium bicarb emergency department patient noted worsening dyspnea telemetry became bradycardiac three four epidoses lasting approximately one minute patient heart rate improved spontaneously require atropine transfer patient sinus tach sbp arrival micu patient feel comfortable endorses intermittent dyspnea chest pain ng tube place draining dark brown red fluid denies abdominal pain nausea vomiting diarrhea past medical history schf ef reported dm complicated neuropathy ckd ac htn hl ckd baseline cr chronic anemia uncertain etiology baseline high chronic leukocytosis chronic gi bleed uncertain etiology barretts esophagus prior sbo adhesion p loa social history life hospital wife bedbound m name stitle full time caretaker one son life home retired former pack year smoker quit year ago former beer drinker denies illicits family history mother died name ni father died cancer grandfather died mi dm physical exam admission general appearance acute distress eye conjunctiva perrl lymphatic cervical wnl cardiovascular normal normal peripheral vascular bilateral dp pulse respiratory chest crackle base bilaterally abdominal soft non tender bowel sound present skin warm pertinent result lab admission wbc rbc hgb hct mcv mch mchc rdw glucose urea n creat sodium potassium chloride total co anion gap lactate k pt ptt inrpt hematocrit blood hct blood hct blood hct pm blood hct blood hct pm blood hct lactate blood lactate k blood lactate blood lactate microbiology blood culture x ngtd urine culture growth lyme serology antibody b burgdorferi detected eia imaging tte left atrium mildly dilated left ventricular wall thickness normal posterior wall thin fibrotic akinetic left ventricular cavity size normal overall left ventricular systolic function moderately depressed lvef secondary akinesis inferior posterior wall tissue doppler imaging suggests increased left ventricular filling pressure pcwp mmhg right ventricular free wall thickness normal right ventricular chamber size normal depressed free wall contractility aortic valve leaflet mildly thickened minimally increased gradient consistent minimal aortic valve stenosis mitral valve leaflet mildly thickened mitral valve prolapse moderate mitral regurgitation seen tricuspid valve leaflet mildly thickened moderate pulmonary artery systolic hypertension trivial physiologic pericardial effusion echocardiographic sign tamponade compared finding prior study image reviewed left ventricular ejection fraction reduced secondary extensive inferior posterior wall dysfunction cxr chin elevated tip endotracheal tube upper margin clavicle le cm carina probably acceptable position tube could advanced mm secure seating pulmonary edema mild atelectasis left base new mild cardiomegaly stable pleural effusion pneumothorax brief hospital course mr known lastname year old male history av nodal blockade htn dm prior imi systolic chf ef h gib unclear etiology presenting dyspnea hematemesis secondary upper gib well myocardial ischemia setting gib gi bleed upper gi bleed demonstrated hematemesis ng lavage bloody fluid initially given unit prbcs l ivf improvement hemodynamics patient evidence active end organ ischemia given troponin elevation st change ekg elevated lactate patient received total unit prbc well one ffp platelet transfusion gi saw patient performed endoscopy twice first provide adequate visualization due significant bleeding second endoscopy visualized vascular lesion consistent dieulafoy lesion clipped post procedure patient remained hemodynamically stable stable hct require transfusion hematocrit remained stable floor ppi transitioned iv po repeat endoscopy day discharge showed barretts biopsy taken repeat egd week myocardial ischemia patient likely demand ischemia setting gib without chest pain patient troponin elevation prior hospitalization setting renal failure repeat tte performed compared finding prior study image reviewed left ventricular ejection fraction reduced secondary extensive inferior posterior wall dysfunction atrius cardiology evaluated patient beta blocker initially held acute gi bleed restarted stable heart rhythm stable occasional nd degree block similar previous hospitalization asa restarted need restarted discretion pcp cardiologist restarted home dos lisinopril hctz restarted mg metoprolol succinate follow atrius cardiology lyme carditis av block patient presented osh new onset high grade av block narrow complex junctional escape rhythm patient currently undergoing empiric treatment lyme disease given history tick exposure initial lyme serology negative repeated still negative continued ceftriaxone project day course end cardiology feel pacer indicated time given improvement treatment hyperkalemia unclear etiology improved ed following administration calcium bicarb insulin likely secondary ckd potassium normalized wnl time transfer floor remained stable chf tte ef hospitalization history ef prior tte repeat tte ckd creatinine increased baseline possibly setting poor perfusion setting hemorrhage patient cr remained elevated time transfer slowly trended back toward baseline leukocytosis baseline elevated wbc additional elevation felt secondary inflammatory state created gi bleed myocardial ischemia dm continued home dose lantus insulin sliding scale transitional issue need asa restarted need lab checked pcp follow visit need uptitration bb tolerated code status full communication son name ni telephone fax wife name ni telephone fax follow appts gi cardiology id pcp medication admission ceftriaxone g iv qh course complete simvastatin mg daily insulin glargine unit qhs omeprazole mg hospital ferrous sulfate mg hospital aspirin mg daily discharge medication atorvastatin mg tablet sig one tablet po daily daily disp tablet refill insulin glargine unit ml solution sig eighteen unit subcutaneous bedtime omeprazole mg capsule delayed releasee c sig one capsule delayed releasee c po twice day ceftriaxone dextroseiso o gram ml piggyback sig two g intravenous qh every hour day last day completed disp q vial refill sodium chloride syringe sig see ml injection qh every hour needed line flush sodium chloride flush ml iv qh prn line flush peripheral line flush ml normal saline every hour prn heparin porcine pf unit ml syringe sig see ml intravenous prn needed needed line flush heparin flush unit ml ml iv prn line flush picc heparin dependent flush ml normal saline followed heparin daily prn per lumen order filled pharmacy dosage form syringe strength unit ml metoprolol succinate mg tablet extended release hr sig one tablet extended release hr po day disp tablet extended release hr refill lisinopril mg tablet sig one tablet po daily daily hydrochlorothiazide mg capsule sig one capsule po daily daily discharge disposition home service facility year digit discharge diagnosis primary diagnosis upper gastrointestinal bleed dieulafoys lesion non st elevation myocardial infarction secondary demand ischemia secondary diagnosis chronic systolic congestive heart failure pulmonary hypertension chronic kidney disease barretts esophagus hypertension hyperlipidemia discharge condition mental status clear coherent level consciousness alert interactive activity status ambulatory requires assistance aid walker cane discharge instruction dear mr known lastname pleasure caring hospital admitted serious gastrointestinal bleed required endoscopic procedure intensive care unit procedure bleeding controlled blood test following intervention stable heart trouble previous hospitalization kept close eye well made following change medication continue ceftriaxone g iv daily stopped simvastatin start atorvastatin mg instead changed metoprolol mg daily continue take med prescribed weigh every morning name md md weight go lb followup instruction department infectious disease monday first name namepattern name md md telephone fax building lm hospital unit name hospital campus west best parking hospital ward name garage name last name lf first name lf location location un university college primary care address hospital university college numeric identifier phone telephone fax appt thursday also need seen cardiologist one month follow please call make appointment also need repeat endoscopy week called gi department schedule hearsd e week need call telephone fax schedule"

After this we will parse into three categories: pre-existing conditions, symptoms, and prescriptions. Furthermore we will acquire drug names in a list. In the absense of that process we will use a stub.

In [None]:
def parse_input(input_text):
    parse_input_prompt = f"""
    Please parse the following medical note in point form, without losing any important information:
    `{input_text}`

    your answer should look like:
    `Patient's medical history:
    - <point 1>
    - <point 2>
    - ...

    Patient's symptoms:
    - <point 1>
    - <point 2>
    - ...

    Prescription:
    - ...

    DRUGS: <drug 1>, <drug 2>, ...
    `
    Please be reminded to give the generic names for the drugs, remember to list the drugs in a comma separated list as shown
    """

    parsed_notes = ask_gpt(parse_input_prompt)

    # Extract the drugs portion from the notes
    drug_line = [line for line in parsed_notes.split("\n") if line.startswith("DRUGS:")][0]

    # Strip the "DRUGS: " prefix and split the drugs by ", "
    drugs = drug_line.replace("DRUGS: ", "").strip().split(", ")

    return parsed_notes, drugs

In [None]:
parsed_notes, drug_names = parse_input(input_text)

In [None]:
print(parsed_notes)

Patient's medical history:
- Av block
- Presumed Lyme disease
- HTN (hypertension)
- DM (diabetes mellitus)
- Prior IMI (ischemic myocardial infarction)
- Systemic systolic heart failure
- H. gib (hematemesis) with unclear etiology
- AFib (atrial fibrillation)
- CHF (chronic heart failure) with reduced EF (ejection fraction)
- CKD (chronic kidney disease)
- Anemia of uncertain etiology
- GI (gastrointestinal) bleed of uncertain etiology
- Barrett's esophagus
- Prior SBO (small bowel obstruction) with adhesion

Patient's symptoms:
- Shortness of breath
- Acute onset dyspnea
- Recent admission with AV block and presumed Lyme disease
- Chest pain
- Nausea and non-bloody dark brown emesis
- Dark stool
- Lightheadedness
- Fall and head injury

Prescription:
- Ceftriaxone (IV)
- Aspirin (restarted)
- Protonix (IV)
- Potassium (IV)
- Calcium chloride (IV)
- Insulin
- Albuterol
- Sodium bicarbonate

DRUGS: Ceftriaxone, Aspirin, Protonix, Potassium, Calcium chloride, Insulin, Albuterol, Sodium 

In [None]:
drug_names

['']

**We might need to trim each field down further (or summarize), in order to fit into the token limit**

Two approaches:
1. Summarize each long entry (more accurate)
1. hard-limit the number of characters in each entry (faster but less accurate)

**Ignored for now**

char count for 3 drugs looks fine for a 16k token model (about 4000 tokens), we can most likely fit 7-8 drugs comfortably

In [None]:
def get_drug_info_string(drug_names):
    # Make the drug_to_info dictionary into a string with each line of the form drug: info
    drug_info_string = ""
    for drug in drug_names:
        info = search_openfda_drug(drug)
        drug_info_string += drug + ": " + str(trim_openfda_response(search_openfda_drug(drug))) + "\r\n"
    return drug_info_string


In [None]:
drug_info_string = get_drug_info_string(drug_names)

In [None]:
print(drug_info_string)




In [None]:
prompt = f"""I am a doctor, I would like you to check my prescription:
{parsed_notes}

Drug contexts:
{drug_info_string}

Please answer the following questions in concise point form, taking into account the provided drug context:
- Possible interactions between prescribed drugs?
- Adverse effect of given drugs, only answer those that are specifically related to patient’s pre-existing conditions and symptoms?

At the end of your answer, evaluate the level of dangerousness of this treatment, based on interactions and adverse effects that are specific to the patient. Dangerousness is categorized as: LOW, MEDIUM, HIGH
Your answer should look like this (you should include the * where specified):
`
* INTERACTIONS:
- <interaction 1>
- <interaction 2>
- ...

* ADVERSE EFFECTS:
- <adverse effect 1>
- <adverse effect 2>
- ...`

* DANGEROUSNESS: <LOW / MEDIUM / HIGH>

Note that you don't have to include any interactions or adverse effect, only those that are necessary.
"""

In [None]:
print(prompt)

I am a doctor, I would like you to check my prescription:

    Patient's medical history:
    - Past surgical history of gastric wrap.
    - MRI revealed hydrocephalus and cystic mass around pineal gland.
    - Transcallosal resection of third ventricle tumor conducted.
    - MRI of entire spine showed no evidence of metastatic disease.
    - Pathology of mass removed is consistent with ependymoma.
    - Hypersensitive to tobramycin

    Patient's symptom:
    - Severe headache unrelieved by Advil and Tylenol.
    - Photophobia and neck stiffness.
    - Difficulty with short-term memory postoperatively.
    - Double vision on lateral gaze.

    Prescription:
    - Percocet: one to two tablets orally every four hours as needed.
    - Decadron: 0.5mg orally every 12 hours.
    - Pantoprazole: 40mg orally daily.
    

Drug contexts:


Please answer the following questions in concise point form, taking into account the provided drug context:
- Possible interactions between prescribed drugs

In [None]:
response = ask_gpt(prompt, model="gpt-3.5-turbo-16k")
print(response)

* INTERACTIONS:
- There are no known interactions between Percocet, Decadron, and Pantoprazole based on the provided drug contexts.

* ADVERSE EFFECTS:
- Percocet (oxycodone and acetaminophen): The provided drug context does not mention any specific adverse effects related to the patient's pre-existing conditions and symptoms.
- Decadron (dexamethasone): The provided drug context mentions hypersensitivity to tobramycin. Dexamethasone is a steroid and can cause hypersensitivity reactions, although specific adverse effects related to the patient's pre-existing conditions and symptoms are not mentioned.
- Pantoprazole: The provided drug context does not mention any specific adverse effects related to the patient's pre-existing conditions and symptoms.

* DANGEROUSNESS: LOW


In [None]:
def check_danger_level(output_text):
    """Check for dangerousness level in the provided text and print a warning if it's 'HIGH'."""
    keyword = "DANGEROUSNESS: "

    # Find the starting position of the keyword
    start_idx = output_text.find(keyword)

    # If keyword is found, extract the word after it
    if start_idx != -1:
        end_idx = output_text.find('\n', start_idx)  # Find the next newline after the keyword
        dangerousness_level = output_text[start_idx + len(keyword):end_idx].strip() if end_idx != -1 else output_text[start_idx + len(keyword):].strip()

        # Print warning if dangerousness level is 'HIGH'
        if dangerousness_level == "HIGH":
            print("WARNING: The dangerousness level is HIGH. Take necessary precautions.")