## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [None]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

# Restart runtime after installation
import os
os.kill(os.getpid(), 9)

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m69.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m135.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m103.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m216.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m267.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [1]:
# For installing the libraries & downloading models from HF Hub
!pip install -q \
huggingface_hub>=0.25.0 \
pandas>=2.2.2 \
tiktoken==0.6.0 \
pymupdf==1.25.1 \
langchain==0.3.0 \
langchain-community==0.3.0 \
langchain-text-splitters==0.3.6 \
chromadb==0.5.5 \
sentence-transformers==3.2.0 \
numpy==1.26.4

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opencv-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible.
opencv-python-headless 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
opencv-contrib-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.[0m[31m
[0m

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [1]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [2]:
model_name_or_path = "TheBloke/Llama-2-7B-chat-GGUF"
model_basename = "llama-2-7b-chat.Q5_K_M.gguf" # the model is in gguf format



*   I am using the 7B parameter model as opposed to the 13B model

    * Using a lighter model helps reduce GPU usage however the quality of the model responses may decrease.



In [3]:
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [4]:
lcpp_llm = Llama(
    model_path=model_path,
    n_ctx=4096,  # Context window
    n_batch=1024,  # Batch size
    n_gpu_layers=43,  # Number of GPU layers
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


#### Response

In [5]:
# Creating the response function
def response(query,max_tokens=512,temperature=0,top_p=0.95,top_k=50):
    model_output = lcpp_llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

Default Parameters:

* max_tokens=512
* temperature=0
* top_p=0.95
* top_k=50

In [6]:
# Creating a dictionary of querys
query_dict = {1:"What is the protocol for managing sepsis in a critical care unit?",
              2:"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?",
              3:"What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?",
              4:"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?",
              5:"What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"}

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [41]:
# Response for query 1
print(response(query_dict[1]))

Llama.generate: prefix-match hit



Sepsis is a life-threatening condition that requires prompt recognition and management in a critical care unit (CCU). Here are some general protocols for managing sepsis in a CCU:

1. Early recognition and activation of sepsis team: The first step in managing sepsis is to recognize the signs and symptoms early and activate the sepsis team, which typically includes intensivists, critical care nurses, and other healthcare professionals with expertise in managing sepsis.
2. Fluid resuscitation: Sepsis-induced hypotension is a common complication, and prompt fluid resuscitation is essential to restore blood pressure and perfusion of vital organs. The goal of fluid resuscitation is to maintain mean arterial pressure ≥65 mmHg.
3. Vasopressor therapy: In addition to fluid resuscitation, vasopressors may be used to further raise blood pressure and improve organ perfusion. The choice of vasopressor depends on the severity of hypotension and the presence of cardiovascular dysfunction.
4. Antibi

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [42]:
# Response for query 2
print(response(query_dict[2]))

Llama.generate: prefix-match hit












*   Note that the model did not generate a response to this query.




### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [43]:
# Response for query 3
print(response(query_dict[3]))

Llama.generate: prefix-match hit




Sudden patchy hair loss, also known as alopecia areata, can be a distressing condition that affects both men and women. It is characterized by round or oval-shaped patches of baldness on the scalp, eyebrows, beard, or other areas of the body. The exact cause of alopecia areata is not known, but it is believed to be an autoimmune disorder, where the immune system mistakenly attacks healthy hair follicles.
There are several effective treatments for sudden patchy hair loss, including:
1. Corticosteroid injections: These can help reduce inflammation and promote hair growth by suppressing the immune system's attack on the hair follicles.
2. Topical corticosteroids: Applying a corticosteroid cream or lotion to the affected area can also help reduce inflammation and promote hair growth.
3. Minoxidil (Rogaine): This over-the-counter solution can help stimulate hair growth by increasing blood flow to the scalp and widening the hair follicles.
4. Anthralin: This medication can help reduce infl

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [44]:
# Response for query 4
print(response(query_dict[4]))

Llama.generate: prefix-match hit



The treatment options for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, depend on the severity and location of the injury. Here are some common treatments:
1. Medications: Depending on the type and severity of the injury, medications may be prescribed to manage symptoms such as pain, inflammation, and swelling. For example, nonsteroidal anti-inflammatory drugs (NSAIDs) or corticosteroids may be used to reduce inflammation and swelling in the brain.
2. Rehabilitation therapy: Physical, occupational, and speech therapy may be recommended to help the person regain lost functions and skills. For example, physical therapy may help improve mobility and balance, while speech therapy may help improve communication skills.
3. Surgery: In some cases, surgery may be necessary to relieve pressure on the brain or repair damaged blood vessels. For example, a craniectomy (a procedure in which a portion of the skull is 

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [45]:
# Response for query 5
print(response(query_dict[5]))

Llama.generate: prefix-match hit



A person who has fractured their leg during a hiking trip requires prompt medical attention to prevent further damage and promote proper healing. Here are some necessary precautions and treatment steps:
1. Assess the injury: Check for any other injuries or complications, such as bleeding, nerve damage, or infection.
2. Immobilize the leg: Use a splint or cast to immobilize the affected leg and prevent further movement. This will help reduce pain and promote healing.
3. Apply ice: Apply ice packs to the affected area for 15-20 minutes every hour to reduce swelling and pain.
4. Elevate the leg: Elevate the affected leg above the level of the heart to reduce swelling and promote blood flow.
5. Administer pain medication: Give the person over-the-counter pain medication, such as ibuprofen or acetaminophen, as directed to help manage pain.
6. Transportation: If the injury is severe, transport the person to a medical facility for further evaluation and treatment.
7. Follow-up care: After th

### Observations of LLM responses on original queries:

*   The answers appear to be generic.
* Some of the responses are truncated as it has surpassed the max token limit.
*   Some of the LLM responses contain information that was the query did not ask about. This can lead to information overload.
* While the responses appear to be factually correct, the LLM may not necessarily be trained on the most up-to-date medical information and procedures and therefore the responses may not be optimal.




## Question Answering using LLM with Prompt Engineering and LLM fine-tuning

### Query 1: What is the protocol for managing sepsis in a critical care unit?



In [46]:
q1="""
Summarize the key steps in the standard protocol for managing sepsis in a critical care unit.
Focus only on initial recognition, immediate interventions, and ongoing monitoring.
Keep the response concise and structured as bullet points
"""
print(response(q1,top_p=0.9))

Llama.generate: prefix-match hit



* Initial recognition:
	+ Rapid assessment of vital signs (tachycardia, tachypnea, hypertension, hypotension)
	+ Assessment of mental status (confusion, agitation, lethargy)
	+ Identification of source of infection (urinary tract, respiratory, surgical site, etc.)
* Immediate interventions:
	+ Administration of broad-spectrum antibiotics
	+ Intravenous fluid resuscitation (fluids and vasopressors as needed)
	+ Oxygen therapy (non-invasive or invasive if necessary)
	+ Early mobilization (e.g., turning, repositioning) to prevent immobility-related complications
* Ongoing monitoring:
	+ Continuous vital sign monitoring (temperature, blood pressure, heart rate, respiratory rate, oxygen saturation)
	+ Regular assessment of mental status and level of consciousness
	+ Close observation for signs of organ dysfunction or failure (e.g., kidney, liver, cardiac)
	+ Frequent blood cultures and other diagnostic tests as needed to guide management


Observations:

* Prompt engineering techniques
  * Instruction refinement and scoping
    * This allows for a summarized, structured and concise response, preventing information overload.

  * LLM Fine-tuning
    * Top_p is reduced from .95 to 0.9.
      * This decreases the cumulative probability cutoff for token selection. This leads to a less diverse response.





### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [47]:
q2="""
List the common symptoms of appendicitis.
Then, briefly explain whether it can be treated with medicine alone,
and if not, name the standard surgical procedure used.
Keep the answer short and clear, using bullet points.
"""
print(response(q2,temperature=0.1))

Llama.generate: prefix-match hit



Symptoms of Appendicitis:
• Severe pain in the lower right abdomen that starts suddenly and worsens quickly
• Nausea and vomiting
• Loss of appetite
• Fever
• Abdominal tenderness and swelling
• Abdominal guarding (tightening of muscles to guard the area from the pain)

Can Appendicitis be treated with medicine alone?
No, appendicitis cannot be treated with medicine alone.

Standard Surgical Procedure for Appendicitis:
• Appendectomy (removal of the inflamed appendix)
• Open appendectomy (incision in the abdomen to access the appendix) or laparoscopic appendectomy (minimally invasive surgery using a camera and specialized instruments)


Observations:

* Prompt engineering techniques
  * Decomposition: The original prompt combined three questions into one. I broke it down into a clear sequence: (1) symptoms → (2) medicine vs surgery → (3) surgical procedure. This ensures clarity and enhances the model response.

  * Output Structuring: I requested bullet points to ensure the answer is easy to scan and avoids long paragraphs, reducing information overload.
  

* LLM Fine-tuning
    * Increasing temperature from 0 to 0.1
      * This increases the 'randomness' of the response and leads to more 'creativity'. In the context of the medical field, this would not be recommended as we want the responses to be as deterministic as possible.



### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [48]:
q3="""
Briefly explain the possible causes of sudden patchy hair loss (localized bald spots on the scalp).
Then, list effective treatment options. Present the answer in two sections: Causes and Treatments,
with bullet points under each.
"""
print(response(q3,top_k=30))

Llama.generate: prefix-match hit



Causes of Sudden Patchy Hair Loss:
• Genetics: Inheritance plays a significant role in many cases of patchy hair loss.
• Autoimmune Disorders: Conditions like alopecia areata, lupus, and rheumatoid arthritis can cause sudden patchy hair loss due to an immune system imbalance.
• Hormonal Imbalances: Changes in hormone levels, such as those experienced during pregnancy or menopause, can lead to patchy hair loss.
• Medications: Certain medications like chemotherapy drugs and blood thinners can cause sudden patchy hair loss as a side effect.
• Infections: Fungal infections of the scalp (such as ringworm) or bacterial infections can lead to patchy hair loss.
• Traction Alopecia: Excessive styling and pulling on the hair, such as tight braids or ponytails, can cause localized bald spots due to traction alopecia.
Treatment Options for Sudden Patchy Hair Loss:
• Medications: Minoxidil (Rogaine) and finasteride (Propecia) are two medications FDA-approved for treating patchy hair loss. Minoxidi

Observations:

* Prompt engineering techniques
  * Instruction Refinement: I specified "briefly explain" to prevent overly detailed medical explanations.

  * Output Structuring: I asked for a two-section format (Causes and Treatments) with bullet points, ensuring clarity and avoiding information overload.

* LLM Fine-tuning
    * Top_k is reduced from 50 to 30
      * This lowers the maximum number of most-likely token to consider at each step which will make the response more focused



### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [49]:
q4="""
Summarize the recommended treatments for brain injuries that impair brain function.
Organize the answer into three parts: (1) Immediate medical care, (2) Rehabilitation therapies,
(3) Long-term management/support. Keep the response concise and avoid excessive detail.
"""
print(response(q4,max_tokens=1024))

Llama.generate: prefix-match hit



Part 1: Immediate Medical Care
The primary goal of immediate medical care for brain injuries is to stabilize the patient's condition and prevent further damage. This may involve:
* Monitoring vital signs, such as blood pressure and oxygen levels.
* Managing swelling by keeping the patient's head immobile or using a special collar.
* Controlling bleeding through surgery or medication.
* Providing oxygen therapy or mechanical ventilation if needed.

Part 2: Rehabilitation Therapies
Rehabilitation therapies aim to help the patient regain lost functions and skills, such as:
* Physical therapy to improve mobility and balance.
* Occupational therapy to help with daily activities like dressing and feeding.
* Speech therapy to improve communication and language skills.
* Cognitive therapy to enhance memory, attention, and problem-solving abilities.

Part 3: Long-term Management/Support
Long-term management and support for brain injuries may involve:
* Monitoring the patient's condition and ad

Observations:

* Prompt engineering techniques
  * Scoping: Narrowed the broad medical question to focus only on treatment categories instead of full pathophysiology.

  * Output Structuring: Requested a three-part framework, which helps present information in a logical, digestible way and prevents information overload.

* LLM Fine-tuning
    * Increasing max_tokens from 512 to 1024.
      * This increases the maximum length of the response. While it could lead to infiormation overload, it will allow the model to respond fully without truncation.



### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [50]:
q5="""
Explain the precautions and treatment steps for a leg fracture during a hiking trip.
Divide the answer into three parts: (1) Immediate first aid on-site,
(2) Medical treatment after reaching care, (3) Key considerations for recovery and rehabilitation.
Keep the response clear and concise, using bullet points.
"""
print(response(q5,max_tokens=256,top_p=0.8,top_k=20))

Llama.generate: prefix-match hit



Part 1 - Immediate First Aid On-Site:
• Assess the situation and determine the severity of the fracture.
• Apply direct pressure to the affected area to control bleeding.
• Elevate the injured leg above heart level to reduce swelling.
• Use a splint or immobilize the leg with a walking stick to prevent further movement.
• Keep the patient warm and comfortable, using extra clothing or a space blanket if available.
• Administer pain medication as needed (e.g., ibuprofen or acetaminophen).

Part 2 - Medical Treatment After Reaching Care:
• Call for emergency medical services (EMS) and provide location information.
• Transport the patient to a hospital or clinic for further evaluation and treatment.
• A healthcare professional will perform a thorough examination, X-rays or other imaging tests may be ordered to confirm the fracture and determine its severity.
• Treatment may include casting or surgery, depending on the severity of the fracture.
• Monitor the patient for signs of complicati

Observations:

* Prompt engineering techniques
  * Decomposition: Split the question into clear stages of care (on-site → medical → recovery) to avoid a long, mixed explanation.

  * Output Structuring: Requested sections with bullet points to ensure clarity and prevent information overload.
  

* LLM Fine-tuning
  * Decrease max token from 512 to 256
    * This lead to truncation and prevented the model from explaining part 3.
  * Decrease top_p from 0.95 to 0.8
  * Decrease in top_k from 50 to 20
    * These parameter changes leads to a more concise, focused and deterministic response which is necessary in the medical field.



## Data Preparation for RAG

### Loading the Data

In [17]:
# Mounting google drive
from google.colab import drive
drive.mount('/content/drive')

# Loading the medical journal
pdf_file='/content/drive/MyDrive/AIML/M5_NLP/Project/medical_diagnosis_manual.pdf'
pdf_loader = PyMuPDFLoader(pdf_file)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Data Overview

#### Checking the first 5 pages

In [18]:
# Load the document into pages
pages = pdf_loader.load()

# Print the first 5 pages
for i, page in enumerate(pages[:5]):
    print(f"\n--- Page {i+1} ---\n")
    print(page.page_content[:1000])  # print first 1000 characters for readability


--- Page 1 ---

gadi.shepherd@gmail.com
S9V2MR8AOK
ant for personal use by gadi.shepherd@g
shing the contents in part or full is liable 


--- Page 2 ---

gadi.shepherd@gmail.com
S9V2MR8AOK
This file is meant for personal use by gadi.shepherd@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.


--- Page 3 ---

Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ................................................................................................................................................................................

#### Checking the number of pages

In [19]:
# Check number of pages
print(f"Total pages: {len(pages)}")

Total pages: 4114


### Data Chunking

In [20]:
# Breaking text into chunks with overlap
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=256,
    chunk_overlap=16
)

In [21]:
# Creates a pdf with the chunks of text
document_chunks = pdf_loader.load_and_split(text_splitter)

### Embedding

In [22]:
# Loading an embedding model to embed the chunks
embedding_model = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2')

  embedding_model = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2')


### Vector Database

In [23]:
# Creating a Chroma vector database
report = 'Journal_QnA'
vectorstore = Chroma.from_documents(
    document_chunks,
    embedding_model,
    collection_name=report
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


### Retriever

In [24]:
# Setting up a retriever to be utlized in the RAG model
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 6} # The retriever will fetch the 6 most similar context chunks to the user query
)

### System and User Prompt Template

In [25]:
# System messgae template
qna_system_message="""
You are an assistant whose work is to give answers to questions with respect to a context.
User input will have the context required by you to answer user questions.

This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Strictly use only the information provided in the ###Context.
Do not mention anything about the information in ###Context or the question in ###Question in your final answer.

If the answer to ###Question cannot be derived from the ###Context, just respond by saying "I don't know".

Remember that the answer to ###Question might not always be directly present in the information provided in the ###Context.
the answer can be indirectly derived from the information in ###Context.

"""

In [26]:
# User prompt template
qna_user_message_template="""
Consider the following ###Context and ###Question
###Context
{context}

###Question
{question}
"""

### Response Function

In [72]:
# Defining a function to generate a RAG response
def generate_RAG_response(user_input, llm, k=3, max_tokens=256, temperature=0, top_p=0.95, top_k=50):
    relevant_document_chunks = retriever.get_relevant_documents(user_input, k=k)
    context_list = [d.page_content.replace("\t", " ") for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    prompt = f"""[INST] <<SYS>>
{qna_system_message}
<</SYS>>

{qna_user_message_template.format(context=context_for_query, question=user_input)} [/INST]"""

    try:
        response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=["[/INST]"],
            echo=False
        )
        prediction = response["choices"][0]["text"].strip()
    except Exception as e:
        prediction = f"Sorry, I encountered the following error:\n{e}"

    return prediction


Default Parameters:

* k=3
    * The number of most similar chunks to be added as context to the query
* max_tokens=256
* temperature=0
* top_p=0.95
* top_k=50

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [76]:
print(generate_RAG_response(query_dict[1],lcpp_llm))

Llama.generate: prefix-match hit


Based on the provided ###Context, the answer to ###Question is:
The protocol for managing sepsis in a critical care unit includes provision of adequate nutrition, prevention of infection, stress ulcers and gastritis, and pulmonary embolism, as well as monitoring and testing. Additionally, emerging therapies such as cooling for hyperthermia and early treatment of renal failure may be employed. However, it is important to note that the protocol may vary depending on the severity of the sepsis and the individual patient's needs.


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [75]:
print(generate_RAG_response(query_dict[2],lcpp_llm))

Llama.generate: prefix-match hit


Based on the provided ###Context, the answer to ###Question is:
The common symptoms of appendicitis include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia; after a few hours, the pain shifts to the right lower quadrant. Other signs are right lower quadrant direct and rebound tenderness located at McBurney's point (junction of the middle and outer thirds of the line joining the umbilicus to the anterior superior spine).
Acute appendicitis is typically treated with surgical removal of the inflamed appendix, either through an open or laparoscopic appendectomy. Antibiotics may be given before or after surgery to treat any underlying infection. While antibiotics can help manage symptoms and reduce the risk of complications, they are not a substitute for surgical intervention, as untreated appendicitis can lead to serious complications such as perforation, gangrene, and death.
In some cases, appendicitis may be difficult to diagnose, and the appendix may be

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [77]:
print(generate_RAG_response(query_dict[3],lcpp_llm))

Llama.generate: prefix-match hit


Based on the provided ###Context, the answer to ###Question is:
The effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, include:
1. Alopecia areata: Treatment options for alopecia areata include topical, intralesional, or systemic corticosteroids, topical minoxidil, topical anthralin, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA).
2. Lichen planopilaris: Treatment options for lichen planopilaris include oral antimalarials, corticosteroids, retinoids, or immunosuppressants.
3. Chronic cutaneous lupus lesions: Treatment options for chronic cutaneous lupus lesions include oral antimalarials, corticosteroids, retinoids, or immunosuppressants.
4. Telogen effluvium: Treatment options for telogen effluvium include addressing the underlying cause


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [78]:
print(generate_RAG_response(query_dict[4],lcpp_llm))

Llama.generate: prefix-match hit


Based on the provided ###Context, the answer to ###Question is:
"Rehabilitation services should be planned early."
This answer can be indirectly derived from the information provided in the ###Context. The context highlights the importance of early intervention and rehabilitation for maximal functional recovery in patients with head injury or spinal cord injury. It also mentions that supportive care, including prevention of systemic complications, good nutrition, and physical therapy to prevent limb contractures, is crucial for these patients. Therefore, the recommended treatment for a person who has sustained a physical injury to brain tissue resulting in temporary or permanent impairment of brain function is early rehabilitation services.


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [79]:
print(generate_RAG_response(query_dict[5],lcpp_llm))

Llama.generate: prefix-match hit


Based on the provided ###Context, the answer to ###Question is:
The necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip are as follows:
1. Immediate splinting: The affected leg should be immobilized with a non-rigid or non-circumferential splint to prevent further injury to soft tissues and to decrease pain.
2. Pain management: Pain should be managed with analgesics, and the patient should be kept comfortable.
3. Evaluation for hemorrhagic shock: If the patient is experiencing signs of hemorrhagic shock (such as absent pulses, marked pale skin, or severe pain), they should be evaluated immediately.
4. Transportation to a medical facility: The patient should be transported to a medical facility as soon as possible for further evaluation and treatment.
5. Reduction and internal fixation: If the fracture is displaced, it may need to be reduced (moved back into place) and internally fixed with plates, screws, or rods to prevent further di

Observations on RAG responses:



*   The responses appear to be more focused, accurate and detailed.
     * This is due to the RAG model retrieving context from the medical journal. This increases the accuracy of the response and reduces the liklihood of hallucination.
*   The RAG model is designed in a way to answer the user input only using the context. If there is no information in the context relating to the user input, then the model should state that 'I don't know'.
     * However the model seems to have successfully retrieve the relevant context for each query.
* Some of the responses are truncated due to the max token limit.



### Fine-tuning

#### Combination 1

In [97]:
text_splitter_256_32 = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=256,
    chunk_overlap=32
)

document_chunks_256_32 = pdf_loader.load_and_split(text_splitter_256_32)

report_256_32 = 'Journal_QnA_256_32'
vectorstore_256_32 = Chroma.from_documents(
    document_chunks_256_32,
    embedding_model,
    collection_name=report_256_32
)

retriever_256_32 = vectorstore_256_32.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 6}
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [98]:
def generate_RAG_response_256_32(user_input, llm, k=3, max_tokens=256, temperature=0, top_p=0.95, top_k=50):
    relevant_document_chunks = retriever_256_32.get_relevant_documents(user_input, k=k)
    context_list = [d.page_content.replace("\t", " ") for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    prompt = f"""[INST] <<SYS>>
{qna_system_message}
<</SYS>>

{qna_user_message_template.format(context=context_for_query, question=user_input)} [/INST]"""

    try:
        response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=["[/INST]"],
            echo=False
        )
        prediction = response["choices"][0]["text"].strip()
    except Exception as e:
        prediction = f"Sorry, I encountered the following error:\n{e}"

    return prediction


In [99]:
generate_RAG_response_256_32(query_dict[1],lcpp_llm,temperature=0.1)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
Llama.generate: prefix-match hit


'Based on the provided ###Context, the answer to ###Question is:\nThe protocol for managing sepsis in a critical care unit involves starting antibiotics as soon as possible, preferably within 8 hours of presentation, and providing supportive care including fluids, antipyretics, analgesics, and oxygen therapy for patients with hypoxemia. The choice of antibiotics is based on the likely pathogens and the severity of illness, and guidelines should be adapted to local susceptibility patterns, drug formularies, and individual patient circumstances.'

Fine-tuning and observations

* Increase in chunk overlap from 16 to 32
  * This increases the continuity of the chunks which helps preserve context at boundaries of chunks.
    * Having a larger proportion of overlap for smaller chunk sizes is beneficial since the relevant context may be broken up over many small chunks.
  * The response provides a clearer protocol for managing sepsis in a CCU compared to the original RAG response with a lower chunk overlap.
* Increase in temperature from 0 to 0.1
  * This increases the 'randomness' of the response and leads to more 'creativity'. In the context of the medical field, this would not be recommended as we want the responses to be as deterministic as possible.

#### Combination 2

In [100]:
text_splitter_512_16 = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap=16
)

document_chunks_512_16 = pdf_loader.load_and_split(text_splitter_512_16)

report_512_16 = 'Journal_QnA_512_16'
vectorstore_512_16 = Chroma.from_documents(
    document_chunks_512_16,
    embedding_model,
    collection_name=report_512_16
)

retriever_512_16 = vectorstore_512_16.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 6}
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [101]:
def generate_RAG_response_512_16(user_input, llm, k=3, max_tokens=256, temperature=0, top_p=0.95, top_k=50):
    relevant_document_chunks = retriever_512_16.get_relevant_documents(user_input, k=k)
    context_list = [d.page_content.replace("\t", " ") for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    prompt = f"""[INST] <<SYS>>
{qna_system_message}
<</SYS>>

{qna_user_message_template.format(context=context_for_query, question=user_input)} [/INST]"""

    try:
        response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=["[/INST]"],
            echo=False
        )
        prediction = response["choices"][0]["text"].strip()
    except Exception as e:
        prediction = f"Sorry, I encountered the following error:\n{e}"

    return prediction


In [102]:
generate_RAG_response_512_16(query_dict[2],lcpp_llm,top_p=0.9)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
Llama.generate: prefix-match hit


"Based on the provided ###Context, the answer to ###Question is:\nThe common symptoms of appendicitis include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia; after a few hours, the pain shifts to the right lower quadrant. Pain increases with cough and motion. Classic signs are right lower quadrant direct and rebound tenderness located at McBurney's point (junction of the middle and outer thirds of the line joining the umbilicus to the anterior superior spine). Additional signs are pain felt in the right lower quadrant with palpation of the left lower quadrant, an increase in pain from passive extension of the right hip joint that stretches the iliopsoas muscle (psoas sign), or pain caused by passive internal rotation of the flexed thigh (obturator sign). Low-grade fever (rectal temperature 37.7 to 38.3° C [100 to 101° F]) is common.\nUnfortunately, these classic findings appear in < 50% of patients. Many variations of"

Fine-tuning and observations:

* Increase in chunk size from 256 to 512
  * This can give more context per chunk and can prevent important information from being split apart
  * However potential drawbacks can include less precise retrieval and may return irrelevant information
    * This is seen in the response as it returns unnecessary information about the symptoms and therefore was unable to answer second part of the query (treatments for appendicitis) within the token limit.
* * Top_p is reduced from .95 to 0.9.
      * This decreases the cumulative probability cutoff for token selection. This leads to a less diverse response.

#### Combination 3

In [103]:
text_splitter_512_32 = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap=32
)

document_chunks_512_32 = pdf_loader.load_and_split(text_splitter_512_32)

report_512_32 = 'Journal_QnA_512_32'
vectorstore_512_32 = Chroma.from_documents(
    document_chunks_512_32,
    embedding_model,
    collection_name=report_512_32
)

retriever_512_32 = vectorstore_512_32.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 6}
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [104]:
def generate_RAG_response_512_32(user_input, llm, k=3, max_tokens=256, temperature=0, top_p=0.95, top_k=50):
    relevant_document_chunks = retriever_512_32.get_relevant_documents(user_input, k=k)
    context_list = [d.page_content.replace("\t", " ") for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    prompt = f"""[INST] <<SYS>>
{qna_system_message}
<</SYS>>

{qna_user_message_template.format(context=context_for_query, question=user_input)} [/INST]"""

    try:
        response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=["[/INST]"],
            echo=False
        )
        prediction = response["choices"][0]["text"].strip()
    except Exception as e:
        prediction = f"Sorry, I encountered the following error:\n{e}"

    return prediction


In [105]:
generate_RAG_response_512_32(query_dict[3],lcpp_llm,top_k=20)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
Llama.generate: prefix-match hit


'Based on the provided context, there are several effective treatments or solutions for addressing sudden patchy hair loss, including:\n1. Minoxidil (2% for women, 2% or 5% for men): Prolongs the anagen growth phase and gradually enlarges miniaturized follicles (vellus hairs) into mature terminal hairs.\n2. Finasteride: Inhibits the 5α-reductase enzyme, blocking conversion of testosterone to dihydrotestosterone, and is useful for male-pattern hair loss.\n3. Alopecia areata treatment: Steroid injections, topical corticosteroids, or oral medications such as prednisone may be used to treat patchy hair loss.\n4. Hair transplantation: This involves moving healthy hair follicles from the back and sides of the head to the bald areas.\n5. Wigs or hairpieces: Can be used to cover up bald areas until hair grows back.\nPossible causes behind sudden patchy hair loss include:\n1. Androgenetic alopecia (male-pattern or female-pattern'

Fine-tuning and observations:
* Increase in chunk size (256 to 512) and chunk overlap (16 to 32)
  * Increasing chunk size can increase context per chunk and increasing chunk overlap can help preserve continuity of context across chunks
    * This can be observed in this response as it provides a high amount of additional detail regarding the mechanisms of the treaments for localized bald spots. However some of this information may not be necessary and can lead to information overload.
    * Once again, these changes prevented the response from answering the whole query within the token limit.
  * Top_k is reduced from 50 to 20
    * This lowers the maximum number of most-likely token to consider at each step which will make the response more focused

#### Combination 4

In [106]:
text_splitter_512_64 = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap=64
)

document_chunks_512_64 = pdf_loader.load_and_split(text_splitter_512_64)

report_512_64 = 'Journal_QnA_512_64'
vectorstore_512_64 = Chroma.from_documents(
    document_chunks_512_64,
    embedding_model,
    collection_name=report_512_64
)

retriever_512_64 = vectorstore_512_64.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 6}
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [107]:
def generate_RAG_response_512_64(user_input, llm, k=3, max_tokens=256, temperature=0, top_p=0.95, top_k=50):
    relevant_document_chunks = retriever_512_64.get_relevant_documents(user_input, k=k)
    context_list = [d.page_content.replace("\t", " ") for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    prompt = f"""[INST] <<SYS>>
{qna_system_message}
<</SYS>>

{qna_user_message_template.format(context=context_for_query, question=user_input)} [/INST]"""

    try:
        response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=["[/INST]"],
            echo=False
        )
        prediction = response["choices"][0]["text"].strip()
    except Exception as e:
        prediction = f"Sorry, I encountered the following error:\n{e}"

    return prediction


In [108]:
generate_RAG_response_512_64(query_dict[4],lcpp_llm,max_tokens=512,k=2)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
Llama.generate: prefix-match hit


"Based on the provided ###Context, the answer to ###Question is:\nRehabilitation therapy is recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function. This type of therapy should be provided through a team approach that combines physical, occupational, and speech therapy, skill-building activities, and counseling to meet the patient's social and emotional needs. Early planning and implementation of rehabilitation services are important to maximize functional recovery."

Fine-tuning and observations
* Increase in chunk size (256 to 512) and chunk overlap (16 to 64)
  * Increasing chunk size can increase context per chunk and increasing chunk overlap can help preserve continuity of context across chunks
    * This can be seen in this response as it provides slightly more detail about treating a patient who sustained a physical brin injury to brain tissue
* Incraese in max tokens from 256 to 512 and a decrease in k from 3 to 2
  * The increase in max tokens allows the model to respond fully without truncation.
  * The decrease in k (the number of the most similar chunks retrived that is used as context) may lead to less contextual information, but it also prevents contextual overload (especially if there is a small conext window and large chunk sizes)

#### Combination 5

In [109]:
text_splitter_256_64 = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=256,
    chunk_overlap=64
)

document_chunks_256_64 = pdf_loader.load_and_split(text_splitter_256_64)

report_256_64 = 'Journal_QnA_256_64'
vectorstore_256_64 = Chroma.from_documents(
    document_chunks_256_64,
    embedding_model,
    collection_name=report_256_64
)

retriever_256_64 = vectorstore_256_64.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 6}
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [110]:
def generate_RAG_response_256_64(user_input, llm, k=3, max_tokens=256, temperature=0, top_p=0.95, top_k=50):
    relevant_document_chunks = retriever_256_64.get_relevant_documents(user_input, k=k)
    context_list = [d.page_content.replace("\t", " ") for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    prompt = f"""[INST] <<SYS>>
{qna_system_message}
<</SYS>>

{qna_user_message_template.format(context=context_for_query, question=user_input)} [/INST]"""

    try:
        response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=["[/INST]"],
            echo=False
        )
        prediction = response["choices"][0]["text"].strip()
    except Exception as e:
        prediction = f"Sorry, I encountered the following error:\n{e}"

    return prediction


In [111]:
generate_RAG_response_256_64(query_dict[5],lcpp_llm,k=6,top_p=0.8,top_k=20)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
Llama.generate: prefix-match hit


"Based on the provided context, the following are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip:\nPrecautions:\n1. Immobilization: The affected leg should be immobilized using a splint or cast to prevent further injury and promote healing.\n2. Pain management: Pain medication should be administered as needed to manage discomfort and reduce the risk of infection.\n3. Wound care: Any open wounds on the affected leg should be cleaned and dressed regularly to prevent infection.\n4. Elevation: The affected leg should be elevated above the level of the heart to reduce swelling and promote blood flow.\n5. Rest: The patient should rest as much as possible to avoid putting unnecessary strain on the injured leg.\n6. Monitoring: The patient's vital signs, particularly their pulse and breathing rate, should be monitored regularly to ensure they are stable and not experiencing any complications.\n7. Transportation: If the injury is sever

Fine-tuning and observations
* Increase in chunk overlap from 16 to 64
  * This helps preserve continuity of context across chunks
    * However a relatively large chunk overlap to the chunk size could lead to the retriever returning redundant chunks due to duplication within the chunks
      * However this duplication is not seen in the response.
    * The response is comprehensive with detailed treatment steps.
* Increase in k from 3 to 6
  * This increases the number of most similar chunks retrieved to the query
  * This can increase context diversity and improve model response
  * However this could lead to context overflow, especially if there the chunk size is large. Since the chunk size is 256 and the size of the context window is 4096, this should not be an issue.
* Decrease top_p from 0.95 to 0.8
* Decrease in top_k from 50 to 20
    * These parameter changes leads to a more concise, focused and deterministic response which is necessary in the medical field.


     

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [80]:
groundedness_rater_system_message  = """
You will be presented a ###Question, ###Context used by the AI system and AI generated ###Answer.

Your task is to judge the extent to which the ###Answer is derived from ###Context.

Rate it 1 - if The ###Answer is not derived from the ###Context at all
Rate it 2 - if The ###Answer is derived from the ###Context only to a limited extent
Rate it 3 - if The ###Answer is derived from ###Context to a good extent
Rate it 4 - if The ###Answer is derived from ###Context mostly
Rate it 5 - if The ###Answer is is derived from ###Context completely

Please note: Make sure you give a single overall rating in the range of 1 to 5 along with an overall explanation.
"""

In [81]:
relevance_rater_system_message = """
You will be presented with a ###Question, the ###Context used by the AI system to generate a response, and the AI-generated ###Answer.

Your task is to judge the extent to which the ###Answer is relevant to the ###Question, considering whether it directly addresses the key aspects of the ###Question based on the provided ###Context.

Rate the relevance as follows:
- Rate 1 – The ###Answer is not relevant to the ###Question at all.
- Rate 2 – The ###Answer is only slightly relevant to the **###Question**, missing key aspects.
- Rate 3 – The ###Answer is moderately relevant, addressing some parts of the **###Question** but leaving out important details.
- Rate 4 – The ###Answer is mostly relevant, covering key aspects but with minor gaps.
- Rate 5 – The ###Answer is fully relevant, directly answering all important aspects of the **###Question** with appropriate details from the **###Context**.

Note: Provide a single overall rating in the range of 1 to 5, along with a brief explanation of why you assigned that score.
"""

In [82]:
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [89]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = lcpp_llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = lcpp_llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    response_2 = lcpp_llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [90]:
generate_ground_relevance_response(query_dict[1])

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


('  Based on the provided context, I would rate the extent to which the answer is derived from the context as follows:\n* Provision of adequate nutrition: The context mentions that supportive care for ICU patients includes provision of adequate nutrition (see p. 21). Therefore, this answer is derived from the context to a good extent. Rating: 3.\n* Prevention of infection: The context highlights the importance of preventing infection, stress ulcers and gastritis, and pulmonary embolism in ICU patients (see p. ',
 '  Based on the provided context and question, I would rate the relevance of the answer as follows:\nRating: 3 - Moderately relevant\nThe answer provides some information related to the management of sepsis in a critical care unit, including provision of adequate nutrition, prevention of infection, stress ulcers and gastritis, and pulmonary embolism. However, the answer does not directly address key aspects of the question, such as the specific protocol for managing sepsis in 

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [93]:
generate_ground_relevance_response(query_dict[2])

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


('  Based on the provided context and question, I would rate the extent to which the AI-generated answer is derived from the context as follows:\n* Relevance to the topic: 5/5 - The answer directly addresses the symptoms and treatment of appendicitis, which is the main focus of the context.\n* Accuracy: 4/5 - The answer provides a comprehensive list of common symptoms of appendicitis, including epigastric or periumbilical pain, nausea, vomiting, anorexia, and right lower quadrant tenderness. However',
 '  Based on the provided context and question, I would rate the relevance of the AI-generated answer as follows:\nRating: 3 - Moderately relevant\nThe answer provides some information related to the symptoms and signs of appendicitis, which is the main topic of the question. However, it misses some key aspects and details that are important for a fully relevant response. For example, the answer does not mention the specific age groups most commonly affected by appendicitis, nor does it p

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [94]:
generate_ground_relevance_response(query_dict[3])

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


('  Based on the provided context, I would rate the extent to which the answer is derived from the context as follows:\n* For Alopecia areata, the answer mentions topical corticosteroids, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA) as effective treatment options. These treatments are mentioned in the context as possible treatment options for alopecia areata. Rating: 3 (good extent)\n* For Androgenetic alopecia, the answer mentions a',
 '  Based on the provided context and answer, I would rate the relevance of the answer as follows:\nRating: 4 (Mostly Relevant)\nExplanation: The answer provides relevant information related to the question, addressing key aspects such as the most common causes of hair loss, treatment options for alopecia areata and androgenetic alopecia, and the effectiveness of topical corticosteroids, immunotherapy, and PUVA in treating these conditions. However, there are some minor gaps in the answer, such

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [95]:
generate_ground_relevance_response(query_dict[4])

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


('  Based on the provided context and question, I would rate the extent of derivation of the ###Answer from the ###Context as follows:\nRating: 4 (Mostly Derived)\nThe ###Answer is mostly derived from the ###Context in several ways:\n1. Rehabilitation services: The context highlights the importance of early intervention by rehabilitation specialists, which is consistent with the ###Answer that recommends planning rehabilitation services early.\n2. Cognitive therapy: The context mentions cognitive dysfunction as a common abnormality after brain injury,',
 '  Based on the provided question and context, I would rate the relevance of the AI-generated answer as follows:\nRating: 3 (Moderately relevant)\nThe AI-generated answer provides some relevant information related to the treatment of brain injury, including the importance of early intervention by rehabilitation specialists, supportive care, and cognitive therapy. However, there are some gaps in the answer that prevent it from being ful

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [96]:
generate_ground_relevance_response(query_dict[5])

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


('  Based on the provided context, I would rate the extent to which the AI-generated answer is derived from the context as follows:\n* Precautions: The answer provides accurate information about immobilization and pain management, which are essential precautions for fracture care. It also mentions elevation, which is a common practice in fracture management. Rating: 3 (good extent)\n* Treatment steps: The answer provides a comprehensive list of treatment steps for fracture management, including reduction, immobilization, and surgical repair. It also mentions the use of prophy',
 '  Based on the provided context, I would rate the relevance of the AI-generated answer as follows:\nRating: 4 (Mostly Relevant)\nThe AI-generated answer provides a comprehensive overview of the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip. It covers key aspects such as immobilization, pain management, elevation, and evaluation for potential complicatio

Observations on evaluations
* The LLM-as-a-judge appears to successfully evaluate the responses on groundedness and relevance
  * It provides ratings for these evaluation metrics based off the provided template.
  * It also provides its reasoning for the evaluation metrics
* The low max token limit (128) may cause the truncation of the model response, preventing a comprehensive answer which leads to the evaluator giving a relatively low rating in some of the queries.

## Actionable Insights and Business Recommendations

* A Retrieval-Augmented Generation (RAG) model built on trusted medical manuals and research can streamline access to knowledge, standardize care practices, and support time-sensitive decision-making.
  * It can address challenges related to information overload when diagnosing and treating patients.
  * It allows for rapid, reliable access to authoritative medical knowledge to improve accuracy, efficiency, and patient outcomes.
  * It has more impactful business applications than a standard LLM as the RAG based model due to the context provided by relevant medical information, whereas a standard LLM provides more genralized responses that may not be relevant or accurate.
* Refine prompt design and temperature settings to control response length and creativity.
  * In this case, less creative and more deterministic responses (low temperature) are essential in the medical field.
* Continuously adjust RAG parameters based on specific use cases for optimal performance.
  * Modify parameters such chunk size, chunk overlap, max tokens and the number of chunks retrieved to be used as context (k).
* Prioritize groundedness and relevance in evaluations to ensure reliable and contextually accurate outputs.

<font size=6 color='blue'>Power Ahead</font>
___