## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m54.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m128.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m112.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.9/16.9 MB[0m [31m169.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m285.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency

In [2]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub==0.23.2 pandas==1.5.3 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.1.1 langchain-community==0.0.13 chromadb==0.4.22 sentence-transformers==2.3.1 numpy==1.25.2 -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 1.5.3 which is incompatible.
diffusers 0.33.1 requires huggingface-hub>=0.27.0, but you have huggingface-hub 0.23.2 which is incompatible.
tensorflow 2.18.0 requires numpy<2.1.0,>=1.26.0, but you have numpy 1.25.2 which is incompatible.
dask-cudf-cu12 25.2.2 requires pandas<2.2.4dev0,>=2.0, but you have pandas 1.5.3 which is incompatible.
xarray 2025.3.1 requires pandas>=2.1, but you have pandas 1.5.3 which is incompatible.
peft 0.15.2 requires huggingface_hub>=0.25.0, but you have huggingface-hub 0.23.2 which is incompatible.
dask-expr 1.1.21 requires pandas>=2, but you have pandas 1.5.3 which is incompatible.
cudf-cu12 25.2.1 requires pandas<2.2.4dev0,>=2.0, but you have pandas 1.5.3 which is incompatible.
blosc2 3.4.0 requires numpy>=1.26, but 

In [3]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

In [4]:
from langchain_community.document_loaders import PyMuPDFLoader

# Replace with your actual PDF path
pdf_path = "medical_diagnosis_manual.pdf"

loader = PyMuPDFLoader(pdf_path)
documents = loader.load()

# Preview
print(f"Loaded {len(documents)} pages")
print(documents[0].page_content[:500])  # See first 500 characters


Loaded 4114 pages
zahra.sepasi@utdallas.edu
42JOEDPKY3
ant for personal use by zahra.sepasi@utd
shing the contents in part or full is liable 



In [17]:
for i in range(3):
    print(f"Page Number : {i+1}",end="\n")
    print(documents[i].page_content,end="\n")

Page Number : 1
zahra.sepasi@utdallas.edu
42JOEDPKY3
ant for personal use by zahra.sepasi@utd
shing the contents in part or full is liable 

Page Number : 2
zahra.sepasi@utdallas.edu
42JOEDPKY3
This file is meant for personal use by zahra.sepasi@utdallas.edu only.
Sharing or publishing the contents in part or full is liable for legal action.

Page Number : 3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ............................................................................................................................................................................

In [18]:
documents[8].page_content

'3172\nChapter 299. Chromosomal Anomalies    ...........................................................................................................................\n3182\nChapter 300. Inherited Muscular Disorders    ...................................................................................................................\n3187\nChapter 301. Inherited Disorders of Metabolism    .........................................................................................................\n3209\nChapter 302. Hereditary Periodic Fever Syndromes    .................................................................................................\n3214\nChapter 303. Behavioral Concerns & Problems in Children    ..................................................................................\n3222\nChapter 304. Learning & Developmental Disorders    ..................................................................................................\n3237\nChapter 305. Mental Disorders 

# **Data Chunking**

In [19]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap= 20
)

In [21]:
document_chunks = loader.load_and_split(text_splitter)

In [22]:
len(document_chunks)

8514

In [23]:
document_chunks[0].page_content

'zahra.sepasi@utdallas.edu\n42JOEDPKY3\nant for personal use by zahra.sepasi@utd\nshing the contents in part or full is liable'

In [24]:
document_chunks[-2].page_content

'Y\nYaws 1266-1267\nforest 1379\nY chromosome 3373 (see also Genetic)\nabnormalities of 3005\nYeast infection (see also Fungal infection)\nvaginal 2542, 2544, 2545\nYellow fever 1400, 1429, 1437\nhepatic inflammation in 248\nvaccine against 1172, 1437, 3441\nYellow nail syndrome 732, 1995\npleural effusion in 1997\nYellow skin (see Jaundice)\nYersinia infection 1167, 1256-1257\nY. enterocolitica infection 147\nY. pestis infection 1924\nYew poisoning 3338\nYips 1762\nYo, antibodies to 1056\nYolk sac tumor 2476\nThe Merck Manual of Diagnosis & Therapy, 19th Edition\nY\n4103\nzahra.sepasi@utdallas.edu\n42JOEDPKY3\nThis file is meant for personal use by zahra.sepasi@utdallas.edu only.\nSharing or publishing the contents in part or full is liable for legal action.'

In [25]:
document_chunks[-1].page_content

"Z\nZafirlukast 1879\nZalcitabine 1451\nin children 2854\nZaleplon 1709\nZanamivir 1407\nin influenza 1407, 1929\nZAP-70 (zeta-associated protein 70) deficiency 1092, 1108\nZavanelli maneuver 2680\nZellweger syndrome 2383, 3023\nZenker's diverticulum 125\nZidovudine 1451, 1453\nin children 2854\nZileuton 1881\nin asthma 1880\nZinc 49, 55, 3431-3432\nin common cold 1405\ndeficiency of 11, 49, 55\nin dermatophytoses 705\npoisoning with 3328, 3353\nrecommended dietary allowances for 50\nreference values for 3499\ntoxicity of 49, 55\ncopper deficiency and 49\nin Wilson's disease 52\nZinc oxide 2233\ngelatin formulation of 646, 672\nZinc pyrithione 647\nZinc shakes 55\nZipper injury 3239, 3240\nZiprasidone\nin agitation 1492\nin bipolar disorder 3059\npoisoning with 3347\nin schizophrenia 1566\nZoledronate 359, 361, 848\nZollinger-Ellison syndrome 95, 199, 200-201, 910\nmastocytosis vs 1125\nMenetrier's disease vs 132\npeptic ulcer disease vs 134\nZolmitriptan 1721\nZolpidem 1709, 3103\nZon

# **Embedding**

In [26]:
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')

modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/67.9k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [27]:
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content)
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)

In [28]:
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

Dimension of the embedding vector  1024


True

# **Vector Database**

In [29]:
out_dir = 'documents_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [30]:
vectorstore = Chroma.from_documents(
    document_chunks,
    embedding_model,
    persist_directory=out_dir
)

In [31]:
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

In [32]:
vectorstore.embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
  (2): Normalize()
), model_name='thenlper/gte-large', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False)

# **Retriever**

In [33]:
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 2}
)

#### Downloading and Loading the model

In [34]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [35]:
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)

mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [36]:
#uncomment the below snippet of code if the runtime is connected to GPU.
llm = Llama(
    model_path=model_path,
    n_ctx=2300,
    n_gpu_layers=38,
    n_batch=512
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [37]:
llm("What is the protocol for managing sepsis in a critical care unit?")['choices'][0]['text']

'\n\nSepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following steps outline the general approach to managing sepsis in a critical care setting:\n\n1. Early recognition: Suspect sepsis in any patient with suspected or confirmed infection who is exhibiting signs of organ dysfunction, such as altered mental status, decreased urine output, respiratory distress, or hypotension. Use the Sequential Organ Failure Assessment (SOFA) score to assess organ dysfunction and monitor for progression'

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [38]:
llm("What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?")['choices'][0]['text']

Llama.generate: prefix-match hit


'\n\nAppendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The most common symptoms of appendicitis include:\n\n1. Abdominal pain: This is usually the first symptom and starts as a vague discomfort that eventually localizes to the lower right side of the abdomen. The pain may be constant or intermittent, and it can worsen with movement or pressure on the area.\n2. Loss of appetite: As the inflammation progresses, you may lose your'

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [39]:
llm("What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?")['choices'][0]['text']

Llama.generate: prefix-match hit


' This question is quite common among individuals experiencing such a condition. In this article, we will discuss some potential causes and effective treatments for sudden patchy hair loss.\n\n## Causes of Sudden Patchy Hair Loss\n\nThere are several possible causes of sudden patchy hair loss that can lead to localized bald spots on the scalp. Some common causes include:\n\n### Alopecia Areata\n\nAlopecia areata is an autoimmune disorder that affects the hair follicles, causing hair loss in small patches. It occurs when the immune system attacks the hair follicles, preventing'

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [40]:
llm("What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?")['choices'][0]['text']

Llama.generate: prefix-match hit


'\n\nA person who has sustained a physical injury to the brain tissue may require various treatments based on the severity and nature of the injury. Here are some common treatments that may be recommended:\n\n1. Medical intervention: The primary goal is to ensure the safety and stability of the patient, and address any life-threatening conditions such as bleeding in the brain (hematoma), swelling (edema), or pressure on the brain (intracranial pressure). This may involve surgery, intensive care unit (ICU) monitoring, and medication.\n2. Rehabilitation therapy: Depending on the extent'

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [41]:
llm("What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?")['choices'][0]['text']

Llama.generate: prefix-match hit


'\n\nA fractured leg is a serious injury that requires prompt medical attention. Here are some necessary precautions and treatment steps for a person who has sustained a leg fracture while hiking:\n\n1. Assess the situation: If you or someone in your group has sustained a leg fracture during a hike, first assess the situation to determine the severity of the injury. Check for signs of shock such as pale skin, rapid heartbeat, and low blood pressure. Try to keep the person calm and still to prevent further injury.\n2. Call for help: If possible, call for emergency medical assistance or'

## Question Answering using LLM with Prompt Engineering

In [43]:
qna_system_message = """
You are an assistant whose work is to review the report and provide the appropriate answers from the context.
User input will have the context required by you to answer user questions.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Please answer only using the context provided in the input. Do not mention anything about the context in your final answer.

If the answer is not found in the context, respond "I don't know".
"""

In [44]:
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [47]:
question= "What is the protocol for managing sepsis in a critical care unit?"
qna_user_message = qna_user_message_template.format(context=context, question=question)

# Final prompt to send to LLM
final_prompt = qna_system_message.strip() + "\n\n" + qna_user_message.strip()

# Call the LLM
response = llm(final_prompt, max_tokens=300)

# Get the text
answer = response['choices'][0]['text'].strip()

# Print the final answer
print(answer)

Llama.generate: prefix-match hit


Answer:
The context does not provide specific information about the protocol for managing sepsis in a critical care unit. The context mentions the prevention of infection as part of supportive care for ICU patients, but it does not give details about sepsis management. Therefore, the answer is "I don't know".


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [48]:
question= "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
qna_user_message = qna_user_message_template.format(context=context, question=question)

# Final prompt to send to LLM
final_prompt = qna_system_message.strip() + "\n\n" + qna_user_message.strip()

# Call the LLM
response = llm(final_prompt, max_tokens=300)

# Get the text
answer = response['choices'][0]['text'].strip()

# Print the final answer
print(answer)

Llama.generate: prefix-match hit


I don't know. The context does not provide information on the symptoms or treatment of appendicitis.


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [49]:
question= "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
qna_user_message = qna_user_message_template.format(context=context, question=question)

# Final prompt to send to LLM
final_prompt = qna_system_message.strip() + "\n\n" + qna_user_message.strip()

# Call the LLM
response = llm(final_prompt, max_tokens=300)

# Get the text
answer = response['choices'][0]['text'].strip()

# Print the final answer
print(answer)

Llama.generate: prefix-match hit


I don't know. The context provided does not contain any information regarding sudden patchy hair loss, its causes, or effective treatments.


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [50]:
question= "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
qna_user_message = qna_user_message_template.format(context=context, question=question)

# Final prompt to send to LLM
final_prompt = qna_system_message.strip() + "\n\n" + qna_user_message.strip()

# Call the LLM
response = llm(final_prompt, max_tokens=300)

# Get the text
answer = response['choices'][0]['text'].strip()

# Print the final answer
print(answer)

Llama.generate: prefix-match hit


I don't know. The context discusses the supportive care for critically ill patients in an ICU setting, which includes nutrition and prevention of infection, stress ulcers, and pulmonary embolism. It does not mention any specific treatments for brain injuries or their impairment of brain function.


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [51]:
question= "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
qna_user_message = qna_user_message_template.format(context=context, question=question)

# Final prompt to send to LLM
final_prompt = qna_system_message.strip() + "\n\n" + qna_user_message.strip()

# Call the LLM
response = llm(final_prompt, max_tokens=300)

# Get the text
answer = response['choices'][0]['text'].strip()

# Print the final answer
print(answer)

Llama.generate: prefix-match hit


I don't know. The context provided does not mention anything about fractures or their treatment and care.


## Data Preparation for RAG

### Loading the Data

In [52]:
pdf_path = "medical_diagnosis_manual.pdf"

# Load the PDF pages
loader = PyMuPDFLoader(pdf_path)
documents = loader.load()

print(f"Loaded {len(documents)} pages.")
print(documents[0].page_content[:500])

Loaded 4114 pages.
zahra.sepasi@utdallas.edu
42JOEDPKY3
ant for personal use by zahra.sepasi@utd
shing the contents in part or full is liable 



### Data Overview

#### Checking the first 5 pages

In [76]:
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(documents[i].page_content,end="\n")

Page Number : 1
zahra.sepasi@utdallas.edu
42JOEDPKY3
ant for personal use by zahra.sepasi@utd
shing the contents in part or full is liable 

Page Number : 2
zahra.sepasi@utdallas.edu
42JOEDPKY3
This file is meant for personal use by zahra.sepasi@utdallas.edu only.
Sharing or publishing the contents in part or full is liable for legal action.

Page Number : 3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ............................................................................................................................................................................

#### Checking the number of pages

In [77]:
len(documents)

4114

### Data Chunking

In [53]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,       # Number of characters per chunk
    chunk_overlap=20     # Overlap between chunks to preserve context
)

# Split loaded documents
chunks = text_splitter.split_documents(documents)


### Embedding

In [54]:
from langchain_community.embeddings import SentenceTransformerEmbeddings

# Load embedding model
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

In [55]:
from langchain_community.vectorstores import Chroma

# Create Chroma vector store from chunks
db = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_function,
    persist_directory="medical_chroma"  # Local folder to save the DB
)


### Vector Database

In [56]:
from langchain_community.vectorstores import Chroma

db = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_function,
    persist_directory="medical_chroma"
)

db.persist()

### Retriever

In [57]:
retriever = db.as_retriever(search_kwargs={"k": 3})  # retrieves top 3 most relevant chunks

### System and User Prompt Template

In [58]:
qna_system_message = """
You are an assistant whose work is to review the report and provide the appropriate answers from the context.
User input will have the context required by you to answer user questions.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Please answer only using the context provided in the input. Do not mention anything about the context in your final answer.

If the answer is not found in the context, respond "I don't know".
"""

In [59]:
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""

### Response Function

In [60]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [62]:
query_1 = "What is the protocol for managing sepsis in a critical care unit?"
answer_1 = generate_rag_response(query_1)
print( answer_1)

Llama.generate: prefix-match hit


Answer: The protocol involves providing supplemental oxygen, checking and securing the airway with mechanical ventilation if necessary, controlling hemorrhage, inserting two large IV catheters into separate peripheral veins, and initiating treatment simultaneously with evaluation.


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [64]:
query_2 = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
answer_2 = generate_rag_response(query_2)
print( answer_2)

Llama.generate: prefix-match hit


Based on the context provided, the common symptoms for appendicitis include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia. After a few hours, the pain shifts to the right lower quadrant and increases with cough and motion. The classic signs are right lower quadrant direct and rebound tenderness located at McBurney's point.

Appendicitis cannot be cured via medicine alone as it results from obstruction of the appendiceal lumen, which leads to distention, bacterial overgrowth, ischemia,


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [66]:
query_3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
answer_3 = generate_rag_response(query_3)
print( answer_3)

Llama.generate: prefix-match hit


Answer:
Multiple treatment options exist for addressing sudden patchy hair loss, which is a common symptom of alopecia areata. These include topical treatments such as minoxidil and anthralin, intralesional treatments like corticosteroids, systemic corticosteroids, immunotherapies using diphencyprone or squaric acid dibutylester, and PUVA therapy. Hormonal modulators like oral contraceptives or spironolactone may also be useful for female-pattern hair loss associated with hyperandrogenemia.


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [67]:
query_4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
answer_4 = generate_rag_response(query_4)
print( answer_4)

Llama.generate: prefix-match hit


Answer: The context suggests that there is no specific treatment for brain damage. However, supportive care is recommended which includes managing pulmonary infections, UTIs, and multiple organ failures to ensure the patient's survival.


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [68]:
query_5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
answer_5 = generate_rag_response(query_5)
print( answer_5)

Llama.generate: prefix-match hit


Based on the context provided, the necessary precautions for a person who has fractured their leg during a hiking trip include immediate splinting to prevent further damage and loss of blood. The treatment steps involve definitive treatment such as reduction for certain injuries, rest, ice, compression, and elevation (RICE), and usually immobilization. In the case of life- or limb-threatening injuries, emergency care should be sought for hemorrhagic shock treatment, arterial repair surgery if necessary, and nerve injury repair. It is important to note that the context does not provide specific information


### Fine-tuning

In [69]:
{
  "instruction": "What is the protocol for managing sepsis in a critical care unit?",
  "input": "###Context\nSepsis management involves prompt recognition, fluid resuscitation, administration of broad-spectrum antibiotics within the first hour, and hemodynamic monitoring. Follow-up includes source control, lactate level tracking, and escalation based on organ function scores.",
  "output": "Administer fluids, start broad-spectrum antibiotics within 1 hour, monitor hemodynamics, and follow-up with source control and lactate tracking."
}

{'instruction': 'What is the protocol for managing sepsis in a critical care unit?',
 'input': '###Context\nSepsis management involves prompt recognition, fluid resuscitation, administration of broad-spectrum antibiotics within the first hour, and hemodynamic monitoring. Follow-up includes source control, lactate level tracking, and escalation based on organ function scores.',
 'output': 'Administer fluids, start broad-spectrum antibiotics within 1 hour, monitor hemodynamics, and follow-up with source control and lactate tracking.'}

In [70]:
{
  "instruction": "What are the first-line options and alternatives for managing rheumatoid arthritis?",
  "input": "###Context\nFirst-line treatment of RA typically involves disease-modifying antirheumatic drugs (DMARDs), such as methotrexate. Alternatives include leflunomide, sulfasalazine, and biologics for non-responders.",
  "output": "Start with methotrexate. Use leflunomide or sulfasalazine as alternatives. Biologics are reserved for refractory cases."
}

{'instruction': 'What are the first-line options and alternatives for managing rheumatoid arthritis?',
 'input': '###Context\nFirst-line treatment of RA typically involves disease-modifying antirheumatic drugs (DMARDs), such as methotrexate. Alternatives include leflunomide, sulfasalazine, and biologics for non-responders.',
 'output': 'Start with methotrexate. Use leflunomide or sulfasalazine as alternatives. Biologics are reserved for refractory cases.'}

In [71]:
{
  "instruction": "What are the common symptoms and treatments for pulmonary embolism?",
  "input": "###Context\nPulmonary embolism symptoms include shortness of breath, chest pain, and rapid heart rate. Treatment options include anticoagulants like heparin or warfarin, thrombolytics in severe cases, and supportive oxygen therapy.",
  "output": "Symptoms: shortness of breath, chest pain, tachycardia. Treatments: anticoagulants (heparin, warfarin), thrombolytics, and oxygen therapy."
}

{'instruction': 'What are the common symptoms and treatments for pulmonary embolism?',
 'input': '###Context\nPulmonary embolism symptoms include shortness of breath, chest pain, and rapid heart rate. Treatment options include anticoagulants like heparin or warfarin, thrombolytics in severe cases, and supportive oxygen therapy.',
 'output': 'Symptoms: shortness of breath, chest pain, tachycardia. Treatments: anticoagulants (heparin, warfarin), thrombolytics, and oxygen therapy.'}

## Output Evaluation

In [80]:
groundedness_rater_system_message  = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
The answer should be derived only from the information presented in the context

Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.
"""

In [81]:
relevance_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. First write down the steps that are needed to evaluate the context as per the metric.
2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the context using the evaluaton criteria and assign a score.
"""

In [82]:
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [83]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

In [84]:
user_input = "Who are the authors of this article and who published this article ?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=350)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify if the answer is derived only from the information presented in the context.
2. Check if the context contains any mention of the authors or the publisher of the article.
3. Compare the answer with the context to ensure consistency.

Explanation:
The AI generated answer "I don't know. The context does not provide information about the authors or the publisher of the article" is consistent with the context as it does not contain any mention of the authors or the publisher of the article. Therefore, the metric is followed completely.

Rating:
Based on the evaluation criteria, I would rate this answer a 5 as the metric was followed completely.

 Steps to evaluate the context as per the relevance metric:
1. Identify the main aspects of the question: In this case, the main aspects are the names of the authors and the name of the publisher.
2. Determine if the context contains information about these main aspects: The context does not contain any exp

In [85]:
user_input = "List common symptoms of pulmonary embolism?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=350)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the common symptoms mentioned in the context.
2. Compare the common symptoms listed in the context with those listed in the AI generated answer.
3. Determine if all the common symptoms from the context are included in the AI generated answer.

Explanation:
The context mentions that the symptoms of pulmonary embolism include dyspnea, pleuritic chest pain, cough, syncope or cardiorespiratory arrest. The AI generated answer lists dyspnea, pleuritic chest pain, cough, syncope as common symptoms. Therefore, all the common symptoms from the context are included in the AI generated answer.

Evaluation:
The metric is followed completely.

Rating:
Based on the evaluation criteria, I would rate the answer a 5. The metric was followed completely as the answer was derived only from the information presented in the context.

 Steps to evaluate the context as per the relevance metric:
1. Identify the main aspects of the question: In this case, the question 

In [86]:
user_input = "What medications are typically used for treating hypertension?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=350)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the key information in the context related to medications used for treating hypertension.
2. Determine if the AI generated answer includes only the information from the context.
3. Compare the answer with the context to ensure there is no additional or extraneous information.

The AI generated answer "An ACE inhibitor or an angiotensin II receptor blocker, as well as diuretics (thiazide-type, loop, and K-sparing) are commonly used medications for treating hypertension. In some cases, severe or refractory hypertension may require the use of three or four drugs." adheres to the metric as it is derived directly from the context. The answer includes all the medications mentioned in the context (ACE inhibitor or angiotensin II receptor blocker and diuretics) and also mentions that severe or refractory hypertension may require the use of three or four drugs, which is also stated in the context.

Rating:
5 - The metric is followed completely.

 Steps

In [87]:
user_input = "How is alopecia areata typically diagnosed and treated?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=350)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the information in the context related to the diagnosis and treatment of alopecia areata.
2. Determine if the AI generated answer is derived only from the information presented in the context.
3. Explain how the answer adheres to the metric considering the question and context as the input.

Explanation:
The AI generated answer states that alopecia areata is typically diagnosed through observation of the characteristic patchy hair loss on the scalp, beard, or any other hairy area. It also mentions various treatment options including topical, intralesional, or systemic corticosteroids, topical minoxidil, topical anthralin, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA).

The context provides information about the diagnosis and treatment of alopecia areata. It states that hair loss due to alopecia areata is typically diagnosed through observation of patchy hair loss on the scalp, beard, 

In [88]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
ground,rel = generate_ground_relevance_response(user_input,max_tokens=350)

print(ground,end="\n\n")
print(rel)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


 Steps to evaluate the answer:
1. Identify the key information in the context related to managing sepsis in a critical care unit.
2. Compare the information identified in step 1 with the AI generated answer to determine if the answer is derived only from the context.

Explanation:
The answer includes the following steps for managing sepsis in a critical care unit: keeping the patient warm, controlling hemorrhage, checking and assisting with airway and ventilation, providing supplemental oxygen through a face mask, and inserting two large IV catheters into separate peripheral veins. If shock is severe or if ventilation is inadequate, airway intubation with mechanical ventilation may be necessary.

All of these steps are directly taken from the context. The context mentions keeping the patient warm, controlling hemorrhage, checking and assisting with airway and ventilation, providing supplemental oxygen through a face mask, and inserting IV catheters. The context also implies that mechan

## Actionable Insights and Business Recommendations

Drive Evidence-Based Practice Through Embedded Evaluation: Use score trends to alert admins when LLMs begin hallucinating or missing context, and retrain or adjust prompts accordingly.

Use Evaluation Scores as Trust Signals in Front-End Applications:

Optimize Knowledge Retrieval Using Score Analytics: Implement a retriever feedback loop that logs question+score pairs to inform re-embedding or re-chunking strategies.


Enable Real-Time Quality Monitoring in Clinical QA Pipelines:Integrate evaluation results with monitoring dashboards

<font size=6 color='blue'>Power Ahead</font>
___