## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python using GPU
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m129.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m127.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m240.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m272.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m212.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependenc

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [1]:
#Checking my Python version and platform to help with installing the correct libraries
import sys, platform
print(f"Python Version: {sys.version}")
print(f"Platform: {platform.platform()}")

Python Version: 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]
Platform: Linux-6.1.123+-x86_64-with-glibc2.35


In [2]:
# For installing the libraries & downloading models from HF Hub
!pip install -q \
huggingface_hub>=0.25.0 \
pandas>=2.2.2 \
tiktoken==0.6.0 \
pymupdf==1.25.1 \
langchain==0.2.6 \
langchain-community==0.2.6 \
langchain-core==0.2.10 \
langchain-text-splitters==0.2.2 \
chromadb==0.5.5 \
sentence-transformers==3.2.0 \
numpy>=1.26.0

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opencv-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible.
opencv-python-headless 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
opencv-contrib-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.[0m[31m
[0m

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [1]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma
from chromadb.config import Settings

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

In [2]:
# Since we'll be asking the same questions multiple times in this model, I'll set them as variables now for easy deployment later
user_input1 = "What is the protocol for managing sepsis in a critical care unit?"
user_input2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
user_input3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
user_input4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
user_input5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"

## Question Answering using LLM

#### Downloading and Loading the model

In [3]:
#Setting the model path and name from Hugging Face
model_path = hf_hub_download(
    repo_id="TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
    filename="mistral-7b-instruct-v0.2.Q6_K.gguf"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [4]:
#Checking my VRAM to optimize LLM parameters
!nvidia-smi

Mon Sep 15 00:46:58 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   50C    P0             26W /   70W |     114MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [5]:
llm = Llama(
    model_path=model_path,
    n_ctx=4096,
    n_gpu_layers=-1,
    n_batch=128,
    )

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


#### Response

In [None]:
#Creating a function to send a query to the LLaMa model, extract and clean the model's first response
def response(query,max_tokens=500,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    text = model_output['choices'][0]['text']
    return text.strip()

In [None]:
print(response("What treatment options are available for managing hypertension?"))

Llama.generate: prefix-match hit


Hypertension, or high blood pressure, is a common condition that can increase the risk of various health problems such as heart disease, stroke, and kidney damage. The good news is that there are several effective treatment options available to help manage hypertension and reduce the risk of complications. Here are some of the most commonly used treatments:

1. Lifestyle modifications: Making lifestyle changes is often the first line of defense against hypertension. This may include eating a healthy diet rich in fruits, vegetables, whole grains, and lean proteins; limiting sodium intake; getting regular physical activity; maintaining a healthy weight; and reducing stress.
2. Medications: If lifestyle modifications alone are not enough to control blood pressure, medications may be necessary. There are several classes of drugs used to treat hypertension, including diuretics, beta blockers, ACE inhibitors, calcium channel blockers, and ARBs (angiotensin receptor blockers). Your healthcare

**Our model seems to be providing good responses to our queries using information that can be gleaned from an internet or google search.**

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
print(response(user_input1))

Llama.generate: prefix-match hit


Sepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:

1. Early recognition: Recognize the signs and symptoms of sepsis early and initiate treatment as soon as possible. Sepsis can present with various clinical features, including fever or hypothermia, tachycardia or bradycardia, altered mental status, respiratory distress, and lactic acidosis.
2. Source control: Identify and address the source of infection as quickly as possible. This may involve surgical intervention, such as drainage of an abscess or debridement of necrotic tissue.
3. Fluid resuscitation: Administer intravenous fluids to maintain adequate blood pressure and organ perfusion. The goal is to achieve a mean arterial pressure (MAP) of at least 65 mmHg and a central venous oxygen saturation (ScvO2) of greater than 70%.
4. Vasopressor support: 

**The model's response defines the condition and provides generally accepted steps for treatment, so its response is valid given the input, but is incomplete given the max tokens.**

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
print(response(user_input2))

Llama.generate: prefix-match hit


Appendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms of appendicitis can vary from person to person, but some common signs include:

1. Abdominal pain: The pain may start as a mild discomfort around the navel or in the lower right abdomen, which then gradually moves to the right lower quadrant and becomes more severe over time. The pain may be constant or intermittent and is often worsened by movement, coughing, or deep breathing.
2. Loss of appetite: People with appendicitis may lose their appetite due to abdominal pain and discomfort.
3. Nausea and vomiting: Vomiting is a common symptom of appendicitis, especially in the later stages of the condition.
4. Fever: A fever of 100.4°F (38°C) or higher may be present in some cases of appendicitis.
5. Constipation or diarrhea: Both constipation and diarrhea can occur with appendicitis, depending on the location and severity o

**The model's response defines the condition, provides common symptoms, provides an answer to the question of whether or not it can be cured via medicine, and provides surgical procedures to treat it since it cannot be cured via medicine, so the response is valid given the input. However, the response is incomplete given the max tokens.**

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
print(response(user_input3))

Llama.generate: prefix-match hit


Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles. It can result in round or oval bald patches on the scalp, but it can also occur on other parts of the body such as the beard area, eyebrows, or eyelashes.

The exact cause of alopecia areata is not known, but it's believed to be related to a problem with the immune system. Some possible triggers for this condition include stress, genetics, viral infections, and certain medications.

There are several treatments that have been shown to be effective in addressing sudden patchy hair loss:

1. Corticosteroids: These are anti-inflammatory drugs that can help reduce inflammation and suppress the immune system's attack on the hair follicles. They can be applied topically or taken orally, depending on the severity of the condition.
2. Minoxidil: This is a medication that has been shown to promote hair growth in some people with alopecia areata. It works by increasing blood f

**The model's response gives a name to the condition and defines it, lists the effective treatments, and addresses the question of cause. So the response is valid given the current imput. However, the answer is incomplete due to the max tokens.**

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
print(response(user_input4))

Llama.generate: prefix-match hit


A person who has sustained a physical injury to brain tissue, also known as a traumatic brain injury (TBI), may require various treatments depending on the severity and location of the injury. Here are some common treatments recommended for TBIs:

1. Emergency care: The first priority is to ensure the person's airway is clear, they are breathing, and their heart is beating normally. In severe cases, emergency surgery may be required to remove hematomas or other obstructions.
2. Medications: Depending on the symptoms, medications may be prescribed to manage conditions such as swelling, seizures, pain, or infections. For example, corticosteroids may be used to reduce brain swelling, and anticonvulsants may be given to prevent seizures.
3. Rehabilitation: Rehabilitation is an essential part of the recovery process for TBI patients. Rehabilitation may include physical therapy to help with mobility and strength, occupational therapy to help with daily living skills, speech therapy to improv

**The model's response defines the issue and gives a name to it, and provides treatment solutions for temporary or permanent impairment, so it is a valid response given the input. Unlike the previous responses, this one IS completed within the max tokens.**

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
print(response(user_input5))

Llama.generate: prefix-match hit


First and foremost, if you suspect that someone has fractured their leg while hiking, it's essential to ensure their safety and prevent further injury. Here are some necessary precautions:

1. Keep the person calm and still: Encourage them to remain as still as possible to minimize pain and prevent worsening the injury.
2. Assess the situation: Check for any signs of shock, such as pale skin, rapid heartbeat, or shallow breathing. If you notice these symptoms, seek medical help immediately.
3. Immobilize the leg: Use a splint, sling, or other available materials to immobilize the leg and prevent movement. Be sure not to apply too much pressure on the injury site.
4. Provide pain relief: Offer over-the-counter pain medication, such as acetaminophen or ibuprofen, to help manage pain.
5. Seek medical attention: If the fracture is severe or if you suspect that there may be other injuries, seek medical help as soon as possible.

Once you've ensured the person's safety and stability, conside

**The model's response addresses all parts of the query from precaution and immediate treatment to care and recovery in a logical manner, so it is a valid and complete response given the input.**

## Question Answering using LLM with Prompt Engineering

**So far, the model's responses are addressing the queries well, but are providing responses similar to google searches and do not fit within the allowable max tokens. To further address the business needs, the model should be from the POV of a healthcare professional attempting to quickly diagnose, treat, and develop a care plan for incoming patients. We'll further refine the model's responses using prompt engineering to address these needs.**

In [None]:
#Creating a system prompt to help address the business needs and guide the model's output
system_prompt = "You are a healthcare professional dealing with incoming patients. Answer the query in a list format of steps to take in sequential order to provide effective treatment for the patient. Then provide a suggested follow-up care plan. Limit your answer to 500 tokens or less"

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_query1 = system_prompt + "\n" + user_input1
print(response(user_query1))

Llama.generate: prefix-match hit


1. Identify and recognize sepsis signs: fever, tachycardia, respiratory distress, altered mental status, lactate levels.
2. Obtain blood cultures before administering antibiotics.
3. Start broad-spectrum antibiotics as soon as possible.
4. Provide adequate fluid resuscitation to maintain adequate tissue perfusion.
5. Monitor and manage hemodynamic parameters: MAP, CVP, urine output.
6. Administer vasopressors if necessary to maintain MAP >65 mmHg.
7. Provide oxygen therapy to maintain SpO2 >90%.
8. Monitor and treat metabolic acidosis with bicarbonate or other alkalizing agents.
9. Monitor and manage electrolyte imbalances.
10. Provide source control if possible: drain abscesses, debride necrotic tissue.
Follow-up care plan:
1. Continue antibiotics based on culture results.
2. Close monitoring of vital signs, fluid status, and organ function.
3. Nutritional support through enteral or parenteral feeding.
4. Physical therapy to prevent complications such as deep vein thrombosis and press

**This response seems to come from the POV of a healthcare professional much more than what we recieved without prompt engineering, so the system prompt seems well suited to the business needs. However, to prioritize quick treatment over follow up, we'll adjust the system prompt so that more tokens can be used for effective treatments in an emergency, while follow-up plans can take a secondary prioritization.**

In [None]:
#Updating the system prompt to use 75% of it words on treatment and 25% of the follow up plan.
system_prompt = "You are a healthcare professional dealing with incoming patients. Answer the query in a list format of steps to take in sequential order to provide effective treatment for the patient. Then provide a suggested follow-up care plan. Limit your answer to 500 words or less, and use 75% of the words for the effective treatment and 25% for the follow-up plan."

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_query2 = system_prompt + "\n" + user_input2
print(response(user_query2))

Llama.generate: prefix-match hit


1. Recognize the Symptoms: Appendicitis is characterized by abdominal pain, usually starting around the navel area and moving to the lower right side. Other symptoms include loss of appetite, nausea, vomiting, fever, and constipation or diarrhea.
2. Perform a Physical Examination: Conduct a thorough physical examination to confirm the presence of appendicitis based on the patient's symptoms and signs such as rebound tenderness, guarding, and rigidity in the lower right quadrant of the abdomen.
3. Order Diagnostic Tests: Request imaging studies like ultrasound or CT scan to help diagnose appendicitis and rule out other conditions with similar symptoms. A white blood cell count may also be ordered to assess inflammation.
4. Admit the Patient: If a definitive diagnosis of appendicitis is made, admit the patient to the hospital for further management.
5. Provide Intravenous Fluids and Pain Relief: Begin intravenous fluids to prevent dehydration due to vomiting and maintain adequate hydrati

**Great response, but incomplete. The 75/25 guardrail seemed to work well. However, if this were generated on a tablet shared by many professionals, they may not immediately know what the problem is that we are addressing. We'll address this by updating the prompt to include a 'case title'.**

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
#Updating the system prompt to provide a case title.
system_prompt = "You are a healthcare professional dealing with incoming patients. First, provide a concise title for the case. Then answer the query in a list format of steps to take in sequential order to provide effective treatment for the patient. Then provide a suggested follow-up care plan. Limit your answer to 500 words or less, and use 75% of the words for the effective treatment and 25% for the follow-up plan."

In [None]:
user_query3 = system_prompt + "\n" + user_input3
print(response(user_query3))

Llama.generate: prefix-match hit


Title: Addressing Sudden Patchy Hair Loss: Effective Treatments and Possible Causes
1. Identify the cause: Conduct a thorough examination of the patient's medical history, lifestyle habits, and any recent stressors to determine potential causes such as alopecia areata, nutritional deficiencies, or hormonal imbalances.
2. Rule out underlying conditions: Order relevant lab tests, including thyroid function tests, iron studies, and vitamin D levels, to rule out any underlying medical conditions that may be contributing to the hair loss.
3. Consider topical treatments: Prescribe a corticosteroid cream or ointment for local application to the affected areas to reduce inflammation and promote hair regrowth in cases of alopecia areata.
4. Explore nutritional interventions: Recommend a balanced diet rich in essential vitamins and minerals, particularly iron, zinc, and biotin, which are important for healthy hair growth.
5. Consider supplements: If the patient's diet is lacking in essential nut

**The model reacted to the new system prompt well, providing a title for the case. However, it still doesn't seem to be applying the 75/25 rule in this instance. We'll try to adjust the format of the prompt further.**

In [None]:
#Reformatting the prompt to get stricter adherence to guardrails.
system_prompt = """
  You are a healthcare professional dealing with incoming patients suffering from an injury or illness.
  1. Supply a concise title for the case.
  2. Create an effective treatment plan for the patient in sequential list format.
  3. Provide a suggested follow-up care plan in list format.

  Your response must be 500 words or less.
  75% of the response must address effective treatment.
  25% of the response must address the follow-up plan.
  """


In [None]:
user_query3 = system_prompt + "\n" + user_input3
print(response(user_query3))

Llama.generate: prefix-match hit


Title: Sudden Patchy Hair Loss: Identifying Causes and Effective Treatments

1. Thoroughly examine the patient's scalp to identify any underlying conditions causing patchy hair loss. Possible causes include alopecia areata, psoriasis, seborrheic dermatitis, or fungal infections.
2. For alopecia areata, consider prescribing corticosteroids (topical or injectable) to reduce inflammation and promote hair regrowth. Alternatively, immunotherapy with minoxidil or anthralin may be recommended.
3. In cases of psoriasis, treat the underlying condition with topical or systemic medications such as corticosteroids, dithranol, or biologics like infliximab or adalimumab.
4. For seborrheic dermatitis, prescribe antifungal shampoos (such as ketoconazole or selenium sulfide) and topical corticosteroids to manage symptoms and promote hair regrowth.
5. If a fungal infection is suspected, prescribe antifungal medications (topical or oral) based on the specific type of fungus identified.
6. Encourage the p

**Much better response here. This can quickly be picked up by another healthcare colleague to pick up where one may have left off. Let's see if we can get a more concise answer for even more speedy and efficient directions.**

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
#Updating the system prompt for more precise answers.
system_prompt = """
  You are a healthcare professional dealing with incoming patients suffering from an injury or illness.
  1. Supply a concise title for the case.
  2. Create an effective treatment plan for the patient in sequential list format.
  3. Provide a suggested follow-up care plan in list format.

  Your response must be 300 words or less.
  75% of the response must address effective treatment.
  25% of the response must address the follow-up plan.
  """


In [None]:
user_query4 = system_prompt + "\n" + user_input4
print(response(user_query4))

Llama.generate: prefix-match hit


This condition is commonly referred to as a traumatic brain injury (TBI).

Title: Traumatic Brain Injury: Managing Acute Symptoms and Long-Term Recovery

1. Immediate Care:
   - Assess airway, breathing, and circulation.
   - Provide oxygen support if necessary.
   - Control any bleeding through pressure application or surgery.
   - Monitor vital signs closely.
   - Administer glucose solution intravenously to prevent hypoglycemia.
   - Use a cervical collar to immobilize the neck and prevent further injury.

2. Medications:
   - Prescribe analgesics for pain management.
   - Provide anti-inflammatory drugs to reduce swelling and inflammation.
   - Administer diuretics to help manage increased intracranial pressure.
   - Use sedatives or muscle relaxants to control agitation, seizures, or spasms.
   - Prescribe medications for specific symptoms such as nausea, insomnia, or depression.

3. Rehabilitation:
   - Begin physical therapy to improve motor function and mobility.
   - Initiate 

**Well crafted response that provides a case title, immediate treatment, follow up care plan and addresses all parts of the query. However, the 75/25 rule is still not being followed. We'll further refine the prompt using stricter language.**

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
#Updating the system prompt for stricter guidelines to ensure 75/25 and format.
system_prompt = """
You are a healthcare professional dealing with incoming patients suffering from an injury or illness.

Your task is to produce a structured response of 300 words or less.
Your response MUST include:
1. A concise case title.
2. A **Treatment Plan section** (~225 words, 75% of the response) in sequential list format.
3. A **Follow-up Care section** (~75 words, 25% of the response) in sequential list format.

The two sections must be clearly labeled with headings:
- "Treatment Plan"
- "Follow-up Care"

If your output does not follow this word allocation and structure, it is invalid.
  """

In [None]:
user_query5 = system_prompt + "\n" + user_input5
print(response(user_query5))

Llama.generate: prefix-match hit


**Case Title:** Fractured Leg: Necessary Precautions and Treatment Steps

**Treatment Plan:**
1. Assess the severity of the injury: Determine if it is an open or closed fracture, and assess the degree of swelling, bruising, and deformity.
2. Immobilize the leg: Apply a splint, sling, or cast to prevent movement and promote healing.
3. Pain management: Provide appropriate pain relief medication as prescribed by a healthcare professional.
4. Rest and elevate: Encourage the patient to rest and keep their leg elevated above heart level to reduce swelling and discomfort.
5. Monitor circulation: Check for signs of poor circulation, such as coolness, numbness, or discoloration, and report any concerns to a healthcare professional.
6. Maintain hydration and nutrition: Encourage the patient to drink plenty of fluids and eat nutritious foods to support their recovery.
7. Follow-up appointments: Schedule regular follow-up appointments with a healthcare professional to monitor progress, adjust tre

**This one is a little tricky. It still seems as though the answer is not coming from a health care professional assessing a case, as it gives direction to get advice from a healthcare professional. We'll adjust the system prompt further to attempt to address this issue while providing a golden example.**

In [None]:
#Updating the system prompt to provide the response in the form of direction for other healthcare professionals
system_prompt = """
You are a senior healthcare professional dealing with incoming patients suffering from an injury or illness.

Your task is to produce a structured response of 300 words or less that provides direction to junior healthcare professionals on treatment and follow up care for the patient.
Your response MUST include:
1. A concise case title.
2. A **Treatment Plan section** (required length: 200–240 words, ~75% of the response) in sequential list format.
3. A **Follow-up Care section** (required length: 65–85 words, ~25% of the response) in sequential list format.

The two sections MUST be clearly labeled with the following headings exactly:
- "Treatment Plan"
- "Follow-up Care"

### Golden Example
**Case Title: Severe Asthma Attack**

**Treatment Plan (~225 words):**
1. Administer oxygen therapy...
2. Provide nebulized bronchodilators...
...

**Follow-up Care (~75 words):**
1. Schedule weekly follow-ups...
2. Provide education on inhaler use...
...

If your output does not strictly follow this structure and word allocation, it is invalid.
"""

In [None]:
user_query5 = system_prompt + "\n" + user_input5
print(response(user_query5))

Llama.generate: prefix-match hit


**Case Title: Fractured Leg from Hiking Injury**

**Treatment Plan (~230 words):**
1. Immobilize the affected leg using a splint or cast to prevent further damage and promote healing.
2. Administer analgesics for pain management, such as acetaminophen or nonsteroidal anti-inflammatory drugs (NSAIDs).
3. Monitor vital signs, including heart rate, blood pressure, respiratory rate, and oxygen saturation levels.
4. Assess the severity of the fracture and determine if surgery is necessary. If so, arrange for an orthopedic consultation and potential surgical intervention.
5. Encourage early mobility and range-of-motion exercises to prevent complications such as blood clots or muscle atrophy.
6. Provide a nutritious diet to support the healing process and maintain adequate hydration.
7. Monitor for signs of infection, including redness, swelling, warmth, or increased pain, and administer antibiotics if necessary.

**Follow-up Care (~55 words):**
1. Schedule follow-up appointments with the hea

**FANTASTIC! Love this response. All guidelines adhered to in a concise manner.**

## Data Preparation for RAG

### Loading the Data

In [6]:
#Accessing Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
#Defining the file path to the data
manual_pdf_path = "/content/drive/MyDrive/Datasets/MerckManual.pdf"

In [8]:
#Loading the data
pdf_loader = PyMuPDFLoader(manual_pdf_path)
manual = pdf_loader.load()

### Data Overview

#### Checking the first 5 pages

In [None]:
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(manual[i].page_content,end="\n")

Page Number : 1
agcunning25@yahoo.com
LQGU5E9KHD
ant for personal use by agcunning25@ya
shing the contents in part or full is liable 

Page Number : 2
agcunning25@yahoo.com
LQGU5E9KHD
This file is meant for personal use by agcunning25@yahoo.com only.
Sharing or publishing the contents in part or full is liable for legal action.

Page Number : 3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ..........................................................................................................................................................................................

#### Checking the number of pages

In [None]:
len(manual)

4114

### Data Chunking

In [None]:
#Initiating and instance of the Text Splitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512, #Setting chunk size to 512
    chunk_overlap= 50 #Setting the overlap at about 10%
)

In [10]:
#Chunking the document
document_chunks = pdf_loader.load_and_split(text_splitter)

In [None]:
#Checking the length of the document chunks
len(document_chunks)

8685

In [None]:
#Now we'll check the overlap by reviewing page contents
print(document_chunks[99].page_content)

Diagnosis is based on results of medical and diet histories, physical examination, body composition
analysis (see p. 58), and selected laboratory tests.
History: History should include questions about dietary intake (see
Fig. 2-1), recent changes in weight, and risk factors for undernutrition, including drug and alcohol use.
Unintentional loss of ≥ 10% of usual body weight during a 3-mo period indicates a high probability of
undernutrition. Social history should include questions about whether money is available for food and
whether the patient can shop and cook.
Review of systems should focus on symptoms of nutritional deficiencies (see Table 2-1). For example,
impaired night vision may indicate vitamin A deficiency.
Physical examination: Physical examination should include measurement of height and weight,
inspection of body fat distribution, and anthropometric measurements of lean body mass. Body mass
index (BMI = weight(kg)/height(m)2) adjusts weight for height (see
Table 6-2 on p.

In [None]:
print(document_chunks[100].page_content)

Table 2-2). This measurement may be affected by physical activity, genetic factors, and age-related
muscle loss.
Physical examination should focus on signs of specific nutritional deficiencies. Signs of PEU (eg, edema,
muscle wasting, skin changes) should be sought. Examination should also focus on signs of conditions
that could predispose to nutritional deficiencies, such as dental problems. Mental status should be
assessed, because depression and cognitive impairment can lead to weight loss.
The widely used Subjective Global Assessment (SGA) uses information from the patient history (eg,
weight loss, change in intake, GI symptoms), physical examination findings (eg, loss of muscle and
subcutaneous fat, edema, ascites), and the clinician's judgment of the patient's nutritional status. The
Mini Nutritional Assessment (MNA) has been validated and is widely used, especially for elderly patients
(see Fig. 2-1). The Simplified Nutrition Assessment Questionnaire (SNAQ), a simple, validated 

In [None]:
print(document_chunks[101].page_content)

mortality better than measurement of the other proteins. However, the correlation of albumin with
morbidity and mortality may be related to nonnutritional as well as nutritional factors. Inflammation
produces cytokines that cause albumin and other nutritional protein markers to extravasate, decreasing
serum levels. Because prealbumin, transferrin, and retinol-binding protein decrease more rapidly during
starvation than does albumin, their measurements are sometimes used to diagnose or assess the severity
of acute starvation. However, whether they are more sensitive or specific than albumin is unclear.
Total lymphocyte count, which often decreases as undernutrition progresses, may be determined.
Undernutrition causes a marked decline in CD4+ T lymphocytes, so this count may not be useful in
patients who have AIDS.
Skin tests using antigens can detect impaired cell-mediated immunity in PEU and in some other disorders
of undernutrition (see p.
1098).
Other laboratory tests, such as measur

**I seem to be missing some overlap between documents pages 100 and 101, but I see it is the start of a new chapter that may have something to do with it.  We may adjust this again later if answers are not sufficient.**

### Embedding

In [11]:
#Defining the embedding model
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')

  warn_deprecated(


modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [None]:
#Checking the length of the embedding vectors
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content)
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

Dimension of the embedding vector  1024


True

In [None]:
#Viewing the first two imbedding vectors
embedding_1, embedding_2

([-0.0236783716827631,
  -0.009335610084235668,
  0.016989680007100105,
  0.014054998755455017,
  -0.011940686032176018,
  -0.025579089298844337,
  0.008072292432188988,
  0.04362092167139053,
  0.02449682168662548,
  0.02480185590684414,
  0.03686145320534706,
  -0.015069477260112762,
  0.0195174477994442,
  -0.0280099269002676,
  -0.0019658387172967196,
  -0.012372362427413464,
  -0.01570005714893341,
  -0.044060081243515015,
  -0.006519334856420755,
  0.0035241544246673584,
  0.01404760591685772,
  -0.0036845537833869457,
  -0.07074537873268127,
  -0.02528299391269684,
  -0.017416715621948242,
  0.019922688603401184,
  0.02256469614803791,
  0.012531399726867676,
  0.06827457994222641,
  0.04991303011775017,
  -0.03104369156062603,
  0.0015946936327964067,
  0.02100214548408985,
  -0.06830018758773804,
  -0.01795371249318123,
  -0.007349485065788031,
  0.04539479687809944,
  -0.022198064252734184,
  -0.02847151644527912,
  -0.04356801509857178,
  0.006608023773878813,
  -0.001175968

### Vector Database

In [12]:
#Creating the medical_db directory if it doesn't already exist
out_dir = 'medical_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [13]:
#Creating the Vector Database from our chunked documents using the embedding model and saving to the medical_db
vectorstore = Chroma.from_documents(document_chunks, embedding_model, persist_directory=out_dir)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [14]:
#Loading the vector database
vectorstore = Chroma(persist_directory=out_dir, embedding_function=embedding_model)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [None]:
#Checking the embedding model details
vectorstore.embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='thenlper/gte-large', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [None]:
#Performing a similarity search to test functionality of our vector database
#Storing the similiarity search return into a results object
results = vectorstore.similarity_search("Causes Treatments Hypertension",k=3)

#Looping through the top 3 results and printing in an easily readable format
for i, doc in enumerate(results):
    print(f"Document {i+1}:")
    print(doc.page_content)
    print("\n")

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


Document 1:
particularly because lifelong treatment is required, can interfere with adequate BP control. Education, with
empathy and support, is essential for success.
Drugs for Hypertension
Diuretics: Main classes (see
Table 208-5) are thiazide-type diuretics, loop diuretics, and K-sparing diuretics. Loop diuretics are used to
treat hypertension only in patients who have lost > 50% of kidney function; these diuretics are given twice
daily. Diuretics modestly reduce plasma volume and reduce vascular resistance, possibly via shifts in Na
from intracellular to extracellular loci. These drugs are the least expensive initial therapy, and the dose
needed is small, especially for the elderly (eg, for most people > 60 hydrochlorothiazide 12.5 mg is
sufficient). Thiazide-type diuretics are most commonly used. In addition to other antihypertensive effects,
they cause vasodilation as long as intravascular volume is normal. All thiazides are equally effective in
equivalent doses.
All diuretics ex

### Retriever

In [None]:
#Creating a retriever object from our vector store
retriever = vectorstore.as_retriever(search_type='similarity', search_kwargs={'k':3})

In [None]:
#Testing the functionality of our retriever
rel_docs = retriever.get_relevant_documents("What treatment options are available for managing hypertension?")

#Looping through the top 3 results and printing in an easily readable format
for i, doc in enumerate(rel_docs):
    print(f"Document {i+1}:")
    print(doc.page_content)
    print("\n")

Document 1:
particularly because lifelong treatment is required, can interfere with adequate BP control. Education, with
empathy and support, is essential for success.
Drugs for Hypertension
Diuretics: Main classes (see
Table 208-5) are thiazide-type diuretics, loop diuretics, and K-sparing diuretics. Loop diuretics are used to
treat hypertension only in patients who have lost > 50% of kidney function; these diuretics are given twice
daily. Diuretics modestly reduce plasma volume and reduce vascular resistance, possibly via shifts in Na
from intracellular to extracellular loci. These drugs are the least expensive initial therapy, and the dose
needed is small, especially for the elderly (eg, for most people > 60 hydrochlorothiazide 12.5 mg is
sufficient). Thiazide-type diuretics are most commonly used. In addition to other antihypertensive effects,
they cause vasodilation as long as intravascular volume is normal. All thiazides are equally effective in
equivalent doses.
All diuretics ex

**We recieved a slightly different answer from our retriever than our vector store similarity search by removing "Causes" from the query. This looks promising.**

In [None]:
#Storing the model's response to the same query while NOT using the vector store
model_output = llm(
      "What treatment options are available for managing hypertension?",
      max_tokens=500,
      temperature=0,
      )

Llama.generate: prefix-match hit


In [None]:
#printing the model output
print(model_output['choices'][0]['text'])



Hypertension, or high blood pressure, is a common condition that can increase the risk of various health problems such as heart disease, stroke, and kidney damage. The good news is that there are several effective treatment options available to help manage hypertension and reduce the risk of complications. Here are some of the most commonly used treatments:

1. Lifestyle modifications: Making lifestyle changes is often the first line of defense against hypertension. This may include eating a healthy diet rich in fruits, vegetables, whole grains, and lean proteins; limiting sodium intake; getting regular physical activity; maintaining a healthy weight; and reducing stress through techniques such as meditation or deep breathing exercises.
2. Medications: If lifestyle modifications alone are not enough to control hypertension, medications may be necessary. There are several classes of drugs used to treat hypertension, including diuretics, beta blockers, ACE inhibitors, calcium channel b

**As expected, the model's output is generic and could be gleaned from an internet or google search rather than the information from the medical manual in our vector store.**

### System and User Prompt Template

In [None]:
#Defining the system message
qna_system_message = """
You are a senior healthcare professional dealing with incoming patients suffering from an injury or illness.

Your task is to produce a structured response of 300 words or less that provides direction to junior healthcare professionals on treatment and follow up care for the patient.
Your response MUST include:
1. A concise title for the case.
2. A **Treatment Plan section** (required length: 200–240 words, ~75% of the response) in sequential list format.
3. A **Follow-up Care section** (required length: 65–85 words, ~25% of the response) in sequential list format.

The two sections MUST be clearly labeled with the following headings exactly:
- "Treatment Plan"
- "Follow-up Care"

### Golden Example
**Case Title: Severe Asthma Attack**

**Treatment Plan (~225 words):**
1. Administer oxygen therapy...
2. Provide nebulized bronchodilators...
...

**Follow-up Care (~75 words):**
1. Schedule weekly follow-ups...
2. Provide education on inhaler use...
...

If your output does not strictly follow this structure and word allocation, it is invalid.
User input will have the context required by you to answer user questions.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Please answer only using the context provided in the input. Do not mention anything about the context in your final answer.

If the answer is not found in the context, respond "I don't know".
"""

In [16]:
#Defining the user message template
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""

### Response Function

In [None]:
def generate_rag_response(user_input,k=3,max_tokens=500,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = (response['choices'][0]['text'].strip())
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
print(generate_rag_response(user_input1))

Llama.generate: prefix-match hit


## Treatment Plan
1. Administer oxygen therapy as needed through mask or nasal prongs.
2. Monitor systemic pressure, CVP, PAOP, pulse oximetry, ABGs, blood glucose, lactate, and electrolyte levels, renal function, sublingual PCO2, urine output, temperature, BP, pulse, and respiration rate frequently.
3. Provide replacement-dose corticosteroids.
4. Initiate fluid resuscitation with 0.9% saline until CVP reaches 8 mm Hg (10 cm H2O) or PAOP reaches 12 to 15 mm Hg.
5. If the patient remains hypotensive after CVP or PAOP has been raised to target levels, administer dopamine to increase mean BP to at least 60 mm Hg. If dopamine dose exceeds 20 μg/kg/min, add another vasopressor such as norepinephrine.
6. Administer parenteral antibiotics based on suspected source and sensitivity patterns after taking specimens for Gram stain and culture.
7. Normalize blood glucose levels between 80 to 110 mg/dL (4.4 to 6.1 mmol/L) using a continuous IV insulin infusion.
8. Drain abscesses and excise necrotic

**The response generated didn't quite follow the protocol I layed out in the system message because it didn't create a title, but overall, the answer is definitely drawn from the documents including exact medicines and amounts. We'll adjust the system prompt one more time to see if we can get a title, then move on.**

In [17]:
#Defining the system message
qna_system_message = """
You are a senior healthcare professional dealing with incoming patients suffering from an injury or illness.

Your task is to produce a structured response of 300 words or less that provides direction to junior healthcare professionals on treatment and follow up care for the patient.
Your response MUST include:
1. A **Concise Clinical Title** for the case.
2. A **Treatment Plan section** (required length: 200–240 words, ~75% of the response) in sequential list format.
3. A **Follow-up Care section** (required length: 65–85 words, ~25% of the response) in sequential list format.

The two sections MUST be clearly labeled with the following headings exactly:
- "Treatment Plan"
- "Follow-up Care"

### Golden Example
**Case Title: Severe Asthma Attack (MAX 7 words)**

**Treatment Plan (~220 words):**
1. Administer oxygen therapy...
2. Provide nebulized bronchodilators...
...

**Follow-up Care (~73 words):**
1. Schedule weekly follow-ups...
2. Provide education on inhaler use...
...

If your output does not strictly follow this structure and word allocation, it is invalid.

User input will have the context required by you to answer user questions.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Please answer only using the context provided in the input. Do not mention anything about the context in your final answer.

If the answer is not found in the context, respond "I don't know".
"""

In [None]:
print(generate_rag_response(user_input1))

Llama.generate: prefix-match hit


**Answer:**

**Case Title: Sepsis Management in Critical Care Unit**

**Treatment Plan:**
1. Administer oxygen therapy as needed.
2. Provide replacement-dose corticosteroids.
3. Monitor systemic pressure, CVP or PAOP, pulse oximetry, ABGs, blood glucose, lactate, and electrolyte levels, renal function, and possibly sublingual PCO2 frequently.
4. Measure urine output using an indwelling catheter.
5. Initiate fluid resuscitation with 0.9% saline until CVP reaches 8 mm Hg (10 cm H2O) or PAOP reaches 12 to 15 mm Hg.
6. If hypotension persists after target levels are reached, administer dopamine to increase mean BP to at least 60 mm Hg. If dopamine dose exceeds 20 μg/kg/min, add another vasopressor such as norepinephrine.
7. Provide parenteral antibiotics after specimens have been taken for Gram stain and culture.
8. Initiate empiric therapy immediately upon suspecting sepsis.
9. Administer normalization of blood glucose levels using a continuous IV insulin infusion to maintain glucose betw

**AHA! Got it! We'll now move on to experimenting with other parameters outside of the system prompt.**

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
print(generate_rag_response(user_input2, temperature=0.15))

Llama.generate: prefix-match hit


## Answer:
**Case Title: Appendicitis Treatment** 

**Treatment Plan:**
1. Administer IV antibiotics (third-generation cephalosporins).
2. Monitor temperature and WBC count.
3. If appendix is perforated, continue antibiotics until normalization or for a fixed course.
4. If surgery is impossible, provide antibiotics to improve survival rate.
5. For large inflammatory masses involving the appendix, terminal ileum, and cecum, perform resection of entire mass and ileocolostomy.
6. In late cases with pericolic abscesses, drain by ultrasound-guided percutaneous catheter or open operation (appendectomy to follow at a later date).
7. Remove Meckel's diverticulum concomitantly with appendectomy unless extensive inflammation around the appendix prevents the procedure.
8. Perform appendectomy for non-perforated appendicitis.

**Follow-up Care:**
1. Schedule follow-up appointments to monitor recovery and check for complications.
2. Provide education on signs of infection or complications, such as 

**Let's test by returning the temperature to 0.**

In [None]:
print(generate_rag_response(user_input2, temperature=0))

Llama.generate: prefix-match hit


## Answer:
**Case Title: Appendicitis Treatment** 

**Treatment Plan:**
1. Administer IV antibiotics (third-generation cephalosporins).
2. Monitor temperature and WBC count.
3. If perforated, continue antibiotics until normalization or for a fixed course.
4. If surgery is impossible, provide antibiotics to improve survival rate.
5. For large inflammatory masses involving appendix, terminal ileum, and cecum, perform resection of entire mass and ileocolostomy.
6. In late cases with pericolic abscess, drain abscess via ultrasound-guided catheter or open operation.
7. Remove Meckel's diverticulum concomitantly with appendectomy if no extensive inflammation around the appendix.
8. Perform appendectomy for nonperforated appendicitis.

**Follow-up Care:**
1. Schedule follow-up appointments to monitor recovery and check for complications.
2. Provide education on signs of infection or complications, such as fever, abdominal pain, or wound issues.


**It seems temperature only increased the conciseness of the response, not the structure or quality, but higher temp does seem to provide more detail that junior healthcare professionals may enjoy. When moving to the final parameters, I think we'll keep a temp of 0.15**

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
#We'll experiment with top_p on this question specifically because it can have so many different causes. We'll have to keep temperature up to see if it has any effect. We'll start with default top_p of 0.95.
print(generate_rag_response(user_input3, temperature=0.15))

Llama.generate: prefix-match hit


**Answer:**

**Case Title: Sudden Patchy Hair Loss (MAX 10 words)**

**Treatment Plan:**
1. Diagnose underlying cause: Alopecia Areata or other conditions.
2. For Alopecia Areata, consider topical treatments like Minoxidil or Corticosteroids.
3. In severe cases, systemic treatments such as Oral Antimalarials, Retinoids, or Immunosuppressants may be necessary.
4. Monitor hair loss with daily counts and scalp biopsy if diagnosis is unclear.
5. For non-alopecia causes, treat underlying conditions.

**Follow-up Care:**
1. Regularly assess treatment progress.
2. Educate patients on proper use of treatments.
3. Encourage patience as results may take several months to appear.
4. Monitor for side effects and adjust treatments accordingly.


**Strangely, case title is including the parameter I gave it in the system prompt. The model also seems to not be using the full word count parameter in the system prompt.**

In [None]:
#Now we'll see what the answer looks like with a top_p of 0.3.
print(generate_rag_response(user_input3, temperature=0.15, top_p=0.3))

Llama.generate: prefix-match hit


**Answer:**

**Case Title: Sudden Patchy Hair Loss (MAX 10 words)**

**Treatment Plan:**
1. Diagnose underlying cause through microscopic hair examination or scalp biopsy.
2. For alopecia areata, consider topical corticosteroids, minoxidil, or immunotherapy.
3. For androgenetic alopecia, use minoxidil or finasteride.
4. Eliminate physical traction for traction alopecia.
5. Treat underlying disorders for hair loss due to other causes.
6. Consider behavior modification, clomipramine, or an SSRI for trichotillomania.
7. For scarring alopecia, use long-acting oral tetracyclines and potent topical corticosteroids.

**Follow-up Care:**
1. Monitor hair loss progression and response to treatment.
2. Schedule regular follow-ups for ongoing conditions or treatments.
3. Provide education on proper usage of prescribed medications.
4. Encourage patients to maintain a healthy lifestyle, including stress management and balanced diet.


**Again, with my limited knowledge of medical practices, higher temperature and lower top_p seem to be providing more detail while still following the guidelines.**

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
#Using our favored parameters on query 4 as a baseline
print(generate_rag_response(user_input4, temperature=0.15, top_p=0.3))

Llama.generate: prefix-match hit


## Treatment Plan
1. Ensure reliable airway and maintain adequate ventilation, oxygenation, and blood pressure.
2. Monitor intracranial pressure if severe injury is present.
3. Decompress the brain if intracranial pressure is increased.
4. Remove intracranial hematomas if necessary.
5. Provide oxygen therapy as needed.
6. Administer nebulized bronchodilators for respiratory complications.
7. Surgery may be required to place monitors or decompress the brain.
8. Maintain adequate brain perfusion and oxygenation in the first few days after injury.
9. Prevent complications of altered sensorium.
10. Provide rehabilitation services for functional recovery.

## Follow-up Care
1. Schedule weekly follow-ups to evaluate progress and prevent complications.
2. Provide education on proper use of inhalers or other prescribed medications.
3. Monitor for signs of infection, such as fever or increased confusion.
4. Encourage participation in cognitive therapy sessions.
5. Coordinate care with rehabilit

**Interesting to note the Case Title has once again disappeared.**

In [15]:
#Experimenting with the retriever parameters - specifically moving to mmr to see if we can pull in different chunks. This will be interesting since it seems to my limited knowledge that brain injuries should be a very standardized treatment.
retriever = vectorstore.as_retriever(search_type='mmr', search_kwargs={'k':3, 'fetch k':15, 'lambda_mult': 0.5})

In [None]:
#Generating a response to query 4 with the new mmr search type
print(generate_rag_response(user_input4, temperature=0.15, top_p=0.3))

Llama.generate: prefix-match hit


**Answer:**

**Case Title: Traumatic Brain Injury (MAX 10 words)**

**Treatment Plan:**
1. Ensure reliable airway and maintain adequate ventilation, oxygenation, and blood pressure.
2. Monitor intracranial pressure and treat if increased.
3. Decompress brain if necessary.
4. Administer oxygen therapy.
5. Provide nebulized bronchodilators if needed for respiratory distress.
6. Manage hypotension with fluids and vasopressors.
7. Monitor for arrhythmias, myocardial depression, and impaired glutamate uptake/release.
8. Consider decompressive craniotomy for large cerebral infarcts with impending herniation (< 50 years old).
9. Provide meticulous long-term care: avoid stimulants, opioids; start enteral feeding, prevent pressure ulcers, desiccation of eyes, contractures, UTIs, and deep venous thrombosis.
10. Implement multidisciplinary team approach for physical therapy, occupational therapy, speech and language therapy.

**Follow-up Care:**
1. Schedule weekly follow-ups.
2. Provide education

**WOW! Vastly different answers (and a Case Title showing up???). Unfortunately, with my ignorance of medical techniques, I can only go with my gut in preferring the answer where the guardrails were followed better.**

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

**Finally, we'll go all the way back to experimenting with chunk_overlap to see if there will be any effect on the answers, as I noted there was some missing overlap earlier.**

In [None]:
#Generating a response to query 5 with the mmr search type and preferred parameters
print(generate_rag_response(user_input5, temperature=0.15, top_p=0.3))

Llama.generate: prefix-match hit


## Treatment Plan
1. Immobilize the affected leg using a splint or cast to prevent further damage.
2. Administer analgesics for pain relief.
3. Provide tetanus prophylaxis if the wound is open.
4. Begin antibiotic therapy if there's a risk of infection.
5. Encourage the patient to use crutches or a cane for mobility support.
6. Schedule follow-up appointments with a healthcare professional for assessment and potential surgery.

## Follow-up Care
1. Monitor the healing progress of the fracture.
2. Provide education on proper use of assistive devices (crutches, canes).
3. Encourage regular exercise to maintain overall health and prevent complications.
4. Schedule follow-up appointments for further assessment and potential surgery if necessary.


**Good grief! The inconsistency with following the instructions in the system prompt with this model are incredible. Once again, there is no Case Title.**

In [9]:
#Initiating and instance of the Text Splitter with larger overlap
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512, #Setting chunk size to 512
    chunk_overlap= 96 #Setting the overlap at about 20%
)

In [9]:
#Chunking the document
document_chunks = pdf_loader.load_and_split(text_splitter)

In [None]:
#Checking the length of the document chunks
len(document_chunks)

9061

In [None]:
#Creating the Vector Database from our chunked documents using the embedding model and saving to the medical_db
vectorstore = Chroma.from_documents(document_chunks, embedding_model, persist_directory=out_dir)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [None]:
#Loading the vector database
vectorstore = Chroma(persist_directory=out_dir, embedding_function=embedding_model)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [None]:
#Finally, comparing the response to our last query 5 response using the new chunks
print(generate_rag_response(user_input5, temperature=0.15, top_p=0.3))

Llama.generate: prefix-match hit


## Answer:
**Case Title: Fractured Leg (MAX 7 words)**

**Treatment Plan:**
1. Immobilize the affected leg using a splint or cast.
2. Administer oxygen therapy if breathing difficulties are present.
3. Provide analgesics for pain management.
4. Transport the patient to the nearest medical facility as soon as possible.
5. In the field, move the cane to the lower step before descending with the bad leg.

**Follow-up Care:**
1. Schedule regular follow-ups with a healthcare professional.
2. Provide education on proper use of crutches or other assistive devices.
3. Encourage gentle range-of-motion exercises as recommended by a healthcare professional.


**What do you know? A Case Title!?!?! There doesn't seem to be much difference between the original answer in this section and the larger overlap. However, I can't imagine that a larger overlap would hurt the performance other than providing redundancy in top_k which I believe is large enough to suit the business needs.**

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [18]:
groundedness_rater_system_message  = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
The answer should be derived only from the information presented in the context

Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.
"""

In [19]:
relevance_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. First write down the steps that are needed to evaluate the context as per the metric.
2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the context using the evaluaton criteria and assign a score.
"""

In [20]:
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [21]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=500,temperature=0.15,top_p=0.3,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
ground,rel = generate_ground_relevance_response(user_input1,max_tokens=500)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the information in the context related to managing sepsis in a critical care unit.
2. Compare each point in the AI generated answer with the corresponding information in the context.
3. Determine if the answer is derived solely from the context.

Explanation:
The AI generated answer adheres to the metric as it includes all the steps for managing sepsis in a critical care unit that are outlined in the context. The treatment plan, monitoring requirements, and follow-up care mentioned in the answer are directly derived from the information provided in the context. Therefore, the answer is completely based on the context.

Rating:
Based on the evaluation criteria, I would rate this answer as a 5 since it follows the metric completely by deriving all the information from the context.

 Steps to evaluate the context as per the relevance metric:
1. Identify the main aspects of the question which are "What is the protocol for managing sepsis in a crit

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [26]:
ground,rel = generate_ground_relevance_response(user_input2,max_tokens=500)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the main question parts: common symptoms for appendicitis and if it can be cured via medicine.
2. Extract relevant information from the context regarding appendicitis treatment, focusing on symptoms and medical interventions.
3. Check if the AI-generated answer is derived solely from the context provided.

The AI-generated answer adheres to the metric as it only includes information directly related to the question from the context. The answer mentions IV antibiotics for appendicitis treatment, which is explicitly stated in the context. It also discusses surgical procedures and follow-up care based on the context's information.

Rating: 5 (The metric is followed completely)

 Steps to evaluate context as per the relevance metric:
1. Identify the main aspects of the question: common symptoms for appendicitis and whether it can be cured via medicine or if surgery is required.
2. Determine if the context discusses these main aspects: The context 

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [23]:
ground,rel = generate_ground_relevance_response(user_input3,max_tokens=500)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify if the treatment plan mentioned in the AI generated answer is derived only from the information presented in the context.
2. Check if the possible causes and treatments for sudden patchy hair loss are consistent with the context provided.
3. Verify that no additional or irrelevant information is included in the answer.

Explanation:
The AI generated answer adheres to the metric as it suggests treatments based on the conditions mentioned in the context, such as topical corticosteroids for inflammation reduction and long-acting oral tetracycline in combination with a potent topical corticosteroid for autoimmune disorders like alopecia areata. The answer also mentions follow-up care, which is consistent with the context's information on monitoring treatment progress and addressing concerns.

Rating:
Based on the given evaluation criteria, I would rate this answer as a 5 because it follows the metric completely by deriving the answer solely from t

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [24]:
ground,rel = generate_ground_relevance_response(user_input4,max_tokens=500)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the information in the context related to treatments for traumatic brain injury (TBI).
2. Determine if the AI generated answer includes only the treatments mentioned in the context.
3. Evaluate if the follow-up care mentioned in the answer is also derived from the context.

Explanation:
The context provides information about the initial treatment for TBI, which includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed for patients with more severe injury to place monitors, decompress the brain, or remove intracranial hematomas. In the first few days after the injury, maintaining adequate brain perfusion and oxygenation and preventing complications of altered sensorium are important. Subsequently, many patients require rehabilitation. The context also mentions speech therapy and augmentative communication devices as part of the treatment for language impairment.

The AI gene

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [25]:
ground,rel = generate_ground_relevance_response(user_input5,max_tokens=500)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the information in the context related to the question.
2. Determine if the AI generated answer includes only the information from the context.
3. Check if the answer provides a treatment plan and follow-up care for a person with a fractured leg.

Explanation:
The AI generated answer adheres to the metric as it is derived solely from the information presented in the context. The context discusses the necessary precautions and treatment steps for a person who has fractured their leg, and the AI generated answer includes those exact steps. Therefore, the answer follows the metric completely.

Rating:
Based on the evaluation criteria, I would rate the answer as 5 - The metric is followed completely.

 Steps to evaluate the context as per the relevance metric:
1. Identify the main aspects of the question: necessary precautions, treatment steps, care, and recovery for a person with a fractured leg during a hiking trip.
2. Determine if the context c

## Actionable Insights and Business Recommendations

* This model was built specifically for an urgent care setting rather than a research setting based on the Problem Statement, specifically, this part: "Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care." However, with a few tweaks to the system prompt, it could be repurposed for research settings as well.

* Throughout the process of building this model, it became highly evident that we now "**understand** issues like information overload." The medical manual was vast and time-consuming to load, and the model stuggled with out-of-memory issues. However, this model's performance and evaluation shows that we can indeed streamline access to vast quantities of data and provide suggestions to help in quick-decision making, especially in emergency settings. A model of this type could also be updated frequently as more research and new publications become available without skipping a beat.

* While this prototype shows proof of concept that a similar model could be deployed in the field to help standardize care, more conversations need to be had about the cost involved and range of freedom that is appropriate for the business needs. The model evaluation, while having near perfect scores, shows the discrepancy between the evaluation metrics and the system prompt in terms of the amount of freedom necessary for particularly tricky scenarios, and the rigidity necessary for standardized care for emergencies. It also struggled with consistency in the formatting of its answers, which another model that is not free or is built internally could provide resolutions to both of these conflicting areas.

<font size=6 color='blue'>Power Ahead</font>
___