## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m152.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m122.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m265.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.9/16.9 MB[0m [31m260.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m172.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependenc

In [2]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub==0.23.2 pandas==1.5.3 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.1.1 langchain-community==0.0.13 chromadb==0.4.22 sentence-transformers==2.3.1 numpy==1.25.2 -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 1.5.3 which is incompatible.
dask-cudf-cu12 25.2.2 requires pandas<2.2.4dev0,>=2.0, but you have pandas 1.5.3 which is incompatible.
peft 0.15.2 requires huggingface_hub>=0.25.0, but you have huggingface-hub 0.23.2 which is incompatible.
diffusers 0.34.0 requires huggingface-hub>=0.27.0, but you have huggingface-hub 0.23.2 which is incompatible.
xarray 2025.3.1 requires pandas>=2.1, but you have pandas 1.5.3 which is incompatible.
mizani 0.13.5 requires pandas>=2.2.0, but you have pandas 1.5.3 which is incompatible.
gradio 5.31.0 requires huggingface-hub>=0.28.1, but you have huggingface-hub 0.23.2 which is incompatible.
cudf-cu12 25.2.1 requires pandas<2.2.4dev0,>=2.0, but you have pandas 1.5.3 which is incompatible.
blosc2 3.5.0 requires numpy>=

In [3]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [4]:
# Defining the Hugging Face repository & the model version for Mistral-7B fine-tuned parameters below
model_name_or_path = 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF'

# Specifying the file name for quantized Mistral-7B model in GGUF format (Q6_K for optimal performance)
model_basename = 'mistral-7b-instruct-v0.2.Q6_K.gguf'

In [5]:
#The GGUF format is used because it provides us with memory-efficient storage and faster inference while maintaining
# compatibility across different hardware platforms.

# Downloading the specified model file from the Hugging Face Hub package and storing its local path
model_path = hf_hub_download(
    repo_id= model_name_or_path, # Has the hugging face repository containing the model
    filename= model_basename # Model to download (in GGUF format)
)



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [6]:
# Loading the LLaMA model with the below specified context, GPU layers, and batch size
llm = Llama(
    model_path=model_path, #Path to the GGUF model file which has been created above
    n_ctx=2300, #Sets the context window to 2300 tokens (on how much text the model can "see" at once)
    n_gpu_layers=38, #Loads 38 model layers onto GPU for faster inference
    n_batch=512 #Number of tokens processed at once for faster processing
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


#### Response

In [7]:
# Create a function as response that sends the query prompt to the LLM with specified set of parameters
def response(query, max_tokens=512, temperature=0, top_p=0.95, top_k=50):

    model_output = llm(
        prompt=query, #The question or prompt sent to the LLM as the input file
        max_tokens=max_tokens, #Maximum number of tokens to generate
        temperature=temperature, #Controls randomness of the generated response
        top_p=top_p, #picks from top tokens that make up top_p of total probability controls the diversity of the generated response
        top_k=top_k #considers only the top_k most likely tokens, when generating the response
    )
    # Extracting and returning only the text part of the response first attribute denoted by the first position [0]
    return model_output['choices'][0]['text'].strip()

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [8]:
# Load the first query in a variable and invoking reponse from model, which has not been RAG trained yet
user_input = "What is the protocol for managing sepsis in a critical care unit?"
response(user_input)

'Sepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are general steps for managing sepsis in a critical care unit:\n\n1. Early recognition: Recognize the signs and symptoms of sepsis early and initiate treatment as soon as possible. Sepsis can present with various clinical features, including fever or hypothermia, tachycardia or bradycardia, altered mental status, respiratory distress, and lactic acidosis.\n2. Resuscitation: Provide adequate fluid resuscitation to maintain adequate tissue perfusion. The goal is to achieve a mean arterial pressure (MAP) of at least 65 mmHg and a central venous oxygen saturation (ScvO2) of greater than 70%.\n3. Antibiotics: Administer broad-spectrum antibiotics as soon as possible based on the suspected source of infection and local microbiology data.\n4. Source control: Identify and address the source of infection, if possible. This 

### Observation
The response provided by the language model is comprehensive and clinically sound, aligning well with best practices for managing sepsis in a critical care setting.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [9]:
# Load the second query in a variable and invoking reponse from model, which has not been RAG trained yet
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
response(user_input)

Llama.generate: prefix-match hit


"Appendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms of appendicitis can vary from person to person, but some common signs include:\n\n1. Abdominal pain: The pain may start as a mild discomfort around the navel or in the lower right abdomen, which then gradually moves to the right lower quadrant and becomes more severe over time. The pain may be constant or intermittent and is often worsened by movement, coughing, or deep breathing.\n2. Loss of appetite: People with appendicitis may lose their appetite due to abdominal pain and discomfort.\n3. Nausea and vomiting: Vomiting is a common symptom of appendicitis, especially in the later stages of the condition.\n4. Fever: A fever of 100.4°F (38°C) or higher may be present in some cases of appendicitis.\n5. Constipation or diarrhea: Both constipation and diarrhea can occur with appendicitis, depending on the location and sev

### Observation
Like the response to the initial query, this answer is medically accurate, indicating that the model is functioning effectively.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [10]:
# Load the third query in a variable and invoking reponse from model, which has not been RAG trained yet
user_input = " What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
response(user_input)

Llama.generate: prefix-match hit


"Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles. It can result in round or oval bald patches on the scalp, but it can also occur on other parts of the body such as the beard area, eyebrows, or eyelashes.\n\nThe exact cause of alopecia areata is not known, but it's believed to be related to a problem with the immune system. Some possible triggers for this condition include stress, genetics, viral infections, and certain medications.\n\nThere are several treatments that have been shown to be effective in addressing sudden patchy hair loss:\n\n1. Corticosteroids: These are anti-inflammatory drugs that can help reduce inflammation and suppress the immune system's attack on the hair follicles. They can be applied topically or taken orally, depending on the severity of the condition.\n2. Minoxidil: This is a medication that has been shown to promote hair growth in some people with alopecia areata. It works by increasing

### Observation
Alopecia areata is an autoimmune condition causing sudden patchy hair loss, often treated with corticosteroids, JAK inhibitors, and immune modulators, though no definitive cure exists.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [11]:
# Load the fourth query in a variable and invoking reponse from model, which has not been RAG trained yet
user_input = " What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
response(user_input)

Llama.generate: prefix-match hit


"There is no one-size-fits-all answer to this question as the specific treatment recommendations for a person with a brain injury depend on various factors such as the severity and location of the injury, the age and overall health of the individual, and the extent of the resulting impairments. However, I can provide an overview of some common treatments and interventions that may be recommended for individuals with brain injuries.\n\n1. Acute care: In the immediate aftermath of a brain injury, the focus is on providing acute care to stabilize the patient's condition and prevent further damage. This may include measures such as controlling bleeding, managing seizures, maintaining adequate oxygenation and ventilation, and addressing any other medical complications that arise.\n2. Rehabilitation: Once the acute phase has passed, rehabilitation becomes a key component of treatment for brain injury patients. Rehabilitation may involve a range of interventions aimed at helping the individua

### Observation
The answer is medically appropriate and provides a comprehensive overview of individualized brain injury treatments—including acute care, rehabilitation, medications, and surgery—aligned with the query's intent.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [12]:
# Load the fifth query in a variable and invoking reponse from model, which has not been RAG trained yet
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
response(user_input)

Llama.generate: prefix-match hit


"First and foremost, if you suspect that someone has fractured their leg while hiking, it's essential to ensure their safety and prevent further injury. Here are some necessary precautions:\n\n1. Keep the person calm and still: Encourage them to remain as still as possible to minimize pain and prevent worsening the injury.\n2. Assess the situation: Check for any signs of shock, such as pale skin, rapid heartbeat, or shallow breathing. If you notice these symptoms, seek medical help immediately.\n3. Immobilize the leg: Use a splint, sling, or other available materials to immobilize the leg and prevent movement. Be sure not to apply too much pressure on the injury site.\n4. Provide pain relief: Offer over-the-counter pain medication, such as acetaminophen or ibuprofen, to help manage pain.\n5. Seek medical attention: If the fracture is severe or if you suspect that there may be other injuries, seek medical help as soon as possible.\n\nOnce you've ensured the person's safety and stability

### Observation
This answer is comprehensive and medically appropriate, clearly outlining the essential precautions and treatment steps—such as calming the person, immobilizing the leg, managing pain, seeking prompt medical help, and ensuring proper follow-up care—for someone who fractures their leg while hiking.

### Rationale for using Prompt Engineering
###Strengths:-
Comprehensive Responses: Delivers medically appropriate and detailed answers across a wide range of topics.
Structured Format: Uses clear, numbered lists that enhance readability and organization.
Contextual Relevance: Maintains alignment with the query, addressing key aspects effectively.
### Limitations
Incomplete Outputs: Responses are frequently truncated mid-sentence, likely due to token limits, leading to incomplete information.
Generic Content: Some answers lack specificity, offering broad guidelines rather than tailored advice.
Limited Depth: Tends to provide high-level overviews without diving into nuanced, scenario-specific details.

To overcome these limitations, prompt engineering will be employed to refine queries, guide the model toward deeper contextual understanding, and ensure complete, high-quality responses.

## Question Answering using LLM with Prompt Engineering

In [13]:
# Variable name to hold the system prompt
system_prompt = "Answer the following question correctly and concisely"

In [14]:
# Create a new function response_mod that sends the query with the prompt to the LLM with specified set of parameters, passed as parameters instead
def response_mod(query, max_tokens, temperature, top_p, top_k):

    model_output = llm(
        prompt=query, #The question or prompt sent to the LLM as the input file
        max_tokens=max_tokens, #Maximum number of tokens to generate
        temperature=temperature, #Controls randomness of the generated response
        top_p=top_p, #picks from top tokens that make up top_p of total probability controls the diversity of the generated response
        top_k=top_k #considers only the top_k most likely tokens, when generating the response
    )
    # Extracting and returning only the text part of the response first attribute denoted by the first position [0]
    return model_output['choices'][0]['text'].strip()

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [15]:
# Answer 1 generated, with the basic function defined with no modification of the LLM model's paramters
user_input = system_prompt+"\n"+ "What is the protocol for managing sepsis in a critical care unit?"
response(user_input)

Llama.generate: prefix-match hit


'The management of sepsis in a critical care unit involves early recognition, prompt initiation of antibiotics, fluid resuscitation, and supportive measures. The Surviving Sepsis Campaign guidelines recommend administering broad-spectrum antibiotics within 1 hour of recognition and achieving a mean arterial pressure (MAP) of 65 mmHg or greater and a central venous oxygen saturation (ScvO2) of 70% or higher. Adjustments to fluid resuscitation, vasopressors, and inotropes should be made based on hemodynamic response. Corticosteroids may be considered for patients with persistent hypotension despite adequate fluid resuscitation and vasopressor use. Non-invasive and invasive mechanical ventilation may be required for respiratory support. Close monitoring of organ function, electrolytes, and coagulation status is essential, along with source control if the infection is identified. Early goal-directed therapy (EGDT) has been shown to improve outcomes in sepsis patients, but current guideline

In [16]:
# Answer 2 generated, with the modified function defined with different in parameters with temperature as 0.95 while rest of parameters remain same
response_mod(user_input,512,0.95,0.95,50)

Llama.generate: prefix-match hit


'The management of sepsis in a critical care unit involves early recognition, prompt initiation of antibiotics, fluid resuscitation, and supportive measures. The Sequential [Sepsis-related] Organ Failure Assessment (SOFA) score can be used to identify patients at risk for developing sepsis or worsening of sepsis. Fluid resuscitation is typically initiated with isotonic crystalloids to maintain mean arterial pressure ≥65 mmHg and urine output ≥0.5 mL/kg/hour. Antibiotics should be administered based on culture results if available, or empirically if not. Vasopressors may be necessary to maintain blood pressure if fluid resuscitation is insufficient. Corticosteroids and other adjunctive therapies may be considered in select patients based on guidelines. Close monitoring of organ function with regular reassessment of the SOFA score is essential, along with ongoing evaluation and adjustment of treatment plans as needed.'

### Observation on Query 1:

- Answer 1 provides a more comprehensive and nuanced response, seems more deterministic with temperature value = 0.0. It is protocol-driven, aligning with the Surviving Sepsis Campaign and emphasizing early intervention, hemodynamic targets, and organ support.
- Answer 2 with temperature value = 0.95 is more assessment-focused, using the SOFA score for dynamic evaluation and tailoring treatment accordingly. An element of vagueness and also randomness in the response is visible, given the high (close to 1 value of temperature).

Overall, Answer 1 looks more comprehensive in clinical guidance and is more informative. If we need a framework for ongoing assessment, Response 2 is useful but less detailed.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [17]:
# Answer 1 generated, with the basic function defined with no modification of the LLM model's paramters
user_input = system_prompt+"\n"+ "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
response(user_input)

Llama.generate: prefix-match hit


'Appendicitis is characterized by the following symptoms:\n1. Sudden pain in the lower right abdomen that may begin as mild and gradually worsens over hours.\n2. Loss of appetite and feeling sick to your stomach (nausea).\n3. Fever, which may start after the abdominal pain begins.\n4. Vomiting.\n5. Constipation or diarrhea.\n6. Abdominal swelling and rigidity.\n7. Inability to pass gas or have a bowel movement.\n\nAppendicitis cannot be cured via medicine alone, as the appendix must be removed to prevent rupture and potential complications such as peritonitis. The standard surgical procedure for treating appendicitis is an appendectomy, which involves removing the inflamed appendix through a small incision in the abdomen or using laparoscopic surgery with several small incisions.'

In [26]:
# Answer 2 generated, with the modified function defined with difference in parameters with max_tokens as 50, instead of original 512 while rest of parameters remain same
response_mod(user_input,50,0,0.95,50)

Llama.generate: prefix-match hit


'Appendicitis is characterized by the following symptoms:\n1. Sudden pain in the lower right abdomen that may begin as mild and gradually worsens over hours.\n2. Loss of appetite and feeling sick to your stomach'

### Observation on Query 2:

- Answer 1 provides a more detailed response, with max_tokens value = 512. It is hence, the output at 173 words is fully shown and is not curtailed.
- Answer 2 with max_tokens value = 50 is clearly limited as it specificies the maximum number of tokens that the model should generate, and hence seems incompleted.

Overall, Answer 1 looks more informative and detailed on the symptoms, however the other is curtailed by the relatively smaller value of max_tokens.


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [27]:
# Answer 1 generated, with the basic function defined with no modification of the LLM model's paramters
user_input = system_prompt+"\n"+ " What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
response(user_input)

Llama.generate: prefix-match hit


"Sudden patchy hair loss, also known as alopecia areata, is an autoimmune condition that causes hair loss in small, round patches on the scalp. Effective treatments for addressing this condition include:\n\n1. Corticosteroids: Topical or injected corticosteroids can help reduce inflammation and promote hair regrowth.\n2. Immunotherapy: Injections of immunotherapies like diphencyprone or squaric acid dibutyl ester can stimulate an immune response that helps the body to stop attacking its own hair follicles.\n3. Minoxidil: Topical minoxidil, a medication commonly used for androgenetic alopecia, may help promote hair regrowth in some cases of alopecia areata.\n4. Hair transplantation: In severe cases where there is extensive hair loss, hair transplantation may be an option to restore hair.\n\nPossible causes behind sudden patchy hair loss include genetics, stress, viral infections, and autoimmune disorders. In some cases, the exact cause may not be known. It's important to consult a healt

In [28]:
# Answer 2 generated, with the modified function defined with difference in parameters with top_k as 250, instead of original 50 while rest of parameters remain same
response_mod(user_input,512,0,0.95,250)

Llama.generate: prefix-match hit


"Sudden patchy hair loss, also known as alopecia areata, is an autoimmune condition that causes hair loss in small, round patches on the scalp. Effective treatments for addressing this condition include:\n\n1. Corticosteroids: Topical or injected corticosteroids can help reduce inflammation and promote hair regrowth.\n2. Immunotherapy: Injections of immunotherapies like diphencyprone or squaric acid dibutyl ester can stimulate an immune response that helps the body to stop attacking its own hair follicles.\n3. Minoxidil: Topical minoxidil, a medication commonly used for androgenetic alopecia, may help promote hair regrowth in some cases of alopecia areata.\n4. Hair transplantation: In severe cases where there is extensive hair loss, hair transplantation may be an option to restore hair.\n\nPossible causes behind sudden patchy hair loss include:\n\n1. Autoimmune disorders: Alopecia areata is an autoimmune condition that causes the body's immune system to attack its own hair follicles.\n

### Observation of Query 3:

- Answer 1 provides a concise list of causes and generalizes, with top_k value as 50 (the paramter which controls the maximum number of most-likely next tokens to consider, while generating the text at each step).
- Answer 2 with max_tokens value = 250 has a larger pool available, which is evident with the detailed point-by-point list of all probable causes as well, being added to the response.

Overall, Answer 2 provides more context to the causes and the treatment options, while the Answer 1 is limited in context on the causes for sudden patchy hair loss.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [31]:
system_prompt2 = "Answer the following question in detail"
# Answer 1 generated, with the basic function defined with no modification of the LLM model's paramters
user_input = system_prompt2+"\n"+ "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
response(user_input)

Llama.generate: prefix-match hit


'A person who has sustained a physical injury to brain tissue, also known as a traumatic brain injury (TBI), can experience a wide range of symptoms depending on the severity and location of the injury. Treatment for TBIs is aimed at addressing the specific symptoms and promoting recovery. Here are some common treatments recommended for individuals with TBIs:\n\n1. Emergency care: The first priority in treating a TBI is to ensure the person receives appropriate emergency care. This may include measures such as controlling bleeding, preventing further injury, and monitoring vital signs. In severe cases, surgery may be necessary to remove hematomas or repair skull fractures.\n2. Medications: Depending on the symptoms present, various medications may be prescribed to manage TBI-related conditions. For example, diuretics may be used to reduce swelling in the brain, while anticonvulsants may be given to prevent seizures. Pain relievers and sedatives may also be used to help manage pain and 

In [32]:
# Answer 2 generated, with the modified function defined with difference in parameters with top_p as 0.05, instead of original 0.95 while rest of parameters remain same
response_mod(user_input,512,0,0.05,50)

Llama.generate: prefix-match hit


"A person who has sustained a physical injury to brain tissue, also known as a traumatic brain injury (TBI), can experience a wide range of symptoms depending on the severity and location of the injury. Treatment for TBIs aims to address both the acute symptoms and the long-term consequences of the injury. Here are some common treatments recommended for individuals with TBIs:\n\n1. Emergency care: The first priority in treating a TBI is to ensure the person's airway is clear, they are breathing properly, and their circulation is stable. This may involve administering oxygen, providing fluids intravenously, or performing surgery to relieve pressure on the brain.\n2. Medications: Depending on the symptoms of the TBI, various medications may be prescribed to manage conditions such as swelling, seizures, or infections. For example, corticosteroids may be used to reduce inflammation, while anticonvulsants may be given to prevent seizures.\n3. Rehabilitation: Rehabilitation is a crucial comp

### Observation on Query 4:

Answer 1, generated with a top_p of 0.95, allows the model to consider a wider range of possible next tokens, leading to slightly more diverse and expansive content. The answer is informative but a bit general in structure.

Answer 2, generated with a top_p = 0.05, that is a narrower top_p, the model restricts its output to more probable and focused responses (as top_p parameter controls the diveristy of the generted response by establishing a cumulative probability for token selection).

Overall, Answer 2 is better because the lower top_p value helps the model generate more focused and relevant content, minimizing unnecessary variation.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [33]:
system_prompt2 = "Answer the following question in detail"
# Answer 1 generated, with the basic function defined with no modification of the LLM model's paramters
user_input = system_prompt2+"\n"+ "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
response(user_input)

Llama.generate: prefix-match hit


"A fractured leg is a serious injury that requires immediate attention. Here are some necessary precautions and treatment steps for a person who has sustained a leg fracture while hiking:\n1. Assess the situation: If you or someone in your group has sustained a leg fracture, the first step is to assess the situation. Check if there are any other injuries, such as head trauma or bleeding. If the fracture is open (compound), try to prevent contamination by covering it with a clean cloth.\n2. Immobilize the leg: It's essential to immobilize the leg to prevent further damage and promote healing. Use a splint, a makeshift sling, or any available materials to keep the leg stable and in place. Be careful not to apply too much pressure on the fracture site.\n3. Call for help: If possible, call for emergency medical assistance. If you're in a remote area without cell phone reception, try to signal for help using a mirror, whistle, or other means.\n4. Provide first aid: While waiting for medical

In [34]:
# Answer 2 generated, with the modified function defined with all parameters being adjusted for better response
response_mod(user_input,1500,0,0.05,250)

Llama.generate: prefix-match hit


"A fractured leg is a serious injury that requires immediate attention. Here are some necessary precautions and treatment steps for a person who has sustained a leg fracture while hiking:\n1. Assess the situation: If you suspect someone has a leg fracture, try to ensure the safety of both the injured person and yourself. Check if the area is safe from further injury or danger, such as unstable terrain or inclement weather.\n2. Call for help: If possible, call for emergency medical assistance. If there is no cell phone reception, use a satellite phone or hike out to find reception.\n3. Immobilize the leg: Use a splint, a makeshift splint, or a tourniquet (if necessary) to immobilize the fractured leg. Be careful not to apply too much pressure on the tourniquet, as it can cause damage to the limb.\n4. Control bleeding: Apply direct pressure to any visible wounds with a clean cloth to control bleeding.\n5. Provide warmth and shelter: Keep the injured person warm and sheltered to prevent h

### Observation on Query 5:

Answer 1, generated with default parameters, offers a detailed overview of immediate actions for managing a fractured leg during a hiking trip, emphasizing severity assessment, infection prevention, and the importance of seeking medical attention for complications.

Answer 2, produced with adjusted parameters for better performance (with max_tokens increased from 512 to 1500,temperature was already ideal at 0, top_p as low as 0.05 instead of 0.95 and top_k as 250 instead of default value 50), is more structured and comprehensive, covering both immediate care and long-term recovery—including nutrition, rest, and physical therapy. It also provides practical guidance for laypersons, such as wound cleaning and steps for pain relief.

Conclusion:
Answer 2 is more holistic, addressing both acute management and follow-up care, making it a more complete and user-friendly response.

## Data Preparation for RAG

### Loading the Data

In [35]:
# Variable to hold the context providing pdf file for domain knowledge
manual_pdf_path = "/content/medical_diagnosis_manual.pdf"

In [39]:
# We initialize PDF loader with the path to the Merck PDF file
pdf_loader = PyMuPDFLoader(manual_pdf_path)
# Load the content of the PDF file into a variable merck, which will return a list of document objects, with each being a page
manual = pdf_loader.load()

### Data Overview

#### Checking the first 5 pages

In [40]:
# View the first 5 pages of the pdf file
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(manual[i].page_content,end="\n")

Page Number : 1
reachmaheswaran@gmail.com
KTCVI3GYNA
t for personal use by reachmaheswaran@
shing the contents in part or full is liable 

Page Number : 2
reachmaheswaran@gmail.com
KTCVI3GYNA
This file is meant for personal use by reachmaheswaran@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.

Page Number : 3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ..............................................................................................................................................................................

#### Checking the number of pages

In [41]:
# Len() function is used to ascertain the number of pages in the pdf document
len(manual)

4114

### Data Chunking

In [42]:
# Initializing a RecursiveCharacterTextSplitter to split the text into manageable chunks for word embedding and retrieval in RAG process
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',# specifies the tokenizer encoding
    chunk_size=512, # chunk size is defined to 512 tokens long
    chunk_overlap=20 # with an overlap of 20 tokens between the chunks
)

In [43]:
# Use the text splitter to divide the loaded PDF content into smaller, overlapping chunks
# This step is essential for preparing the text for embedding and retrieval in RAG workflows
# Each chunk maintains contextual continuity due to the defined overlap
document_chunks = pdf_loader.load_and_split(text_splitter)

In [44]:
#Checking the number of text chunks the pdf has been split into
len(document_chunks)

8469

In [45]:
# Confirming that there is overlap between chunks, by executing the below code
document_chunks[0].page_content

'reachmaheswaran@gmail.com\nKTCVI3GYNA\nt for personal use by reachmaheswaran@\nshing the contents in part or full is liable'

In [47]:
document_chunks[120].page_content

'the tube with a syringe or infused by gravity from an elevated bag. After feedings, the tube is flushed with\nwater to prevent clogging.\nNasogastric or nasoduodenal tube feeding often causes diarrhea initially; thus, feedings are usually\nstarted with small amounts of dilute preparations and increased as tolerated. Most formulas contain 0.5,\n1, or 2 kcal/mL. Formulas with higher caloric concentration (less water per calorie) may cause decreased\ngastric emptying and thus higher gastric residuals than when more dilute formulas with the same number\nof calories are used. Initially, a 1-kcal/mL commercially prepared solution may be given undiluted at 50\nmL/h or, if patients have not been fed for a while, at 25 mL/h. Usually, these solutions do not supply\nenough water, particularly if vomiting, diarrhea, sweating, or fever has increased water loss. Extra water\nis supplied as boluses via the feeding tube or IV. After a few days, the rate or concentration can be\nincreased as needed to

In [48]:
document_chunks[121].page_content

"more expensive.\nIndications: TPN may be the only feasible option for patients who do not have a functioning GI tract or\nwho have disorders requiring complete bowel rest, such as the following:\n• Some stages of Crohn's disease or ulcerative colitis\n• Bowel obstruction\n• Certain pediatric GI disorders (eg, congenital GI anomalies, prolonged diarrhea regardless of its cause)\n• Short bowel syndrome due to surgery\nNutritional content: TPN requires water (30 to 40 mL/kg/day), energy (30 to 60 kcal/kg/day, depending\non energy expenditure), amino acids (1 to 2.0 g/kg/day, depending on the degree of catabolism), essential\nfatty acids, vitamins, and minerals (see\nTable 3-3). Children who need TPN may have different fluid requirements and need more energy (up to\n120 kcal/kg/day) and amino acids (up to 2.5 or 3.5 g/kg/day).\nBasic TPN solutions are prepared using sterile techniques, usually in liter batches according to standard\nformulas. Normally, 2 L/day of the standard solution is 

### Observation

Chunks with index as 120, and index 121 overlaps with the chunk size text of 'more expensive' !

### Embedding

In [49]:
#This model is chosen because of its embedding vector size which is the same as our token size in chunking at (512) for Sentence embedding !
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [50]:
# Sentence embeddings of chunk with index 0 and 1 are stored in two variables, just for
# cross-checking - after the Sentence embedding of the entire pdf input is completed above !
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content)
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)

In [51]:
# Checking the embedding size and if both the chunk 0 and chunk 1 are of the same size
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

Dimension of the embedding vector  1024


True

In [52]:
# displaying the sentence embeddings
embedding_1,embedding_2

([-0.0232709813863039,
  -0.002451223786920309,
  0.00830541830509901,
  -0.004883055575191975,
  0.0027736497577279806,
  -0.015151701867580414,
  0.0024263218510895967,
  0.03761095181107521,
  0.015574483200907707,
  0.015287090092897415,
  0.027759935706853867,
  0.008457034826278687,
  0.002858733292669058,
  -0.04373984411358833,
  -0.017529083415865898,
  0.000842845649458468,
  -0.006905822083353996,
  -0.02549283765256405,
  0.000899837352335453,
  0.00197278568521142,
  0.022625520825386047,
  0.01091026235371828,
  -0.08687840402126312,
  -0.015162646770477295,
  -0.002073376439511776,
  0.004172846209257841,
  0.014404158107936382,
  -0.005979826208204031,
  0.04991646856069565,
  0.0465327613055706,
  -0.016613474115729332,
  -0.005854798946529627,
  0.058072444051504135,
  -0.028446951881051064,
  -0.015030501410365105,
  -0.006539573892951012,
  0.04419444128870964,
  -0.020803246647119522,
  -0.02144070342183113,
  -0.03461378812789917,
  0.015670940279960632,
  0.00911

* The embedding model provides a fixed-length vector for any number of chunks.  
* This is necessary because we want to compare them for similarity.

### Vector Database

In [55]:
# Output directory path for the VectorDB to be stored in RAG, for comparison against the query or user input
out_dir = 'medical_db'

# If the path does not exist, we write the command to create one
if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [56]:
vectorstore = Chroma.from_documents( #creating a Chroma vector database equivalent store for a set of document chunks.
    document_chunks, #creating a list of text chunks that will be converted into embeddings
    embedding_model, #model responsible for embedding the document chunks into vector representations of size 1024 each
    persist_directory=out_dir #name of the collection in the Chroma database for storing the data
)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [57]:
# We could either create a Chroma vector datastore as above or using below code
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [58]:
#Accessing the embedding function used in the Chroma vector store as shown below
vectorstore.embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
  (2): Normalize()
), model_name='thenlper/gte-large', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False)

In [60]:
#Performing a similarity search in the vector store to find the top 3 most similar documents to an input text say as "Clinical Decision Making"
vectorstore.similarity_search("Clinical Decision Making",k=3)

[Document(page_content='Chapter 342. Clinical Decision Making\nIntroduction\nClinicians must integrate a huge variety of clinical data while facing conflicting pressures to decrease\ndiagnostic uncertainty, risks to patients, and costs. Deciding what information to gather, which tests to\norder, how to interpret and integrate this information to draw diagnostic conclusions, and which\ntreatments to give is known as clinical decision making.\nWhen presented with a patient, clinicians usually must answer the following questions:\n• What disease does this patient have?\n• Should this patient be treated?\n• Should testing be done?\nIn straightforward or common situations, clinicians often make such decisions informally; diagnoses are\nmade by recognizing disease patterns, and testing and treatment are initiated based on customary\npractice. For example, during a flu epidemic, a healthy adult who has had fever, aches, and harsh cough\nfor 2 days is likely to be recognized as another case of

- From the retrieved chunks, we can confirm that the key terms are as 'Clinical Decision Making'

### Retriever

In [61]:
retriever = vectorstore.as_retriever( #Converting the Chroma vector store into a retriever for querying, and initializing a variable called retriever
    search_type='similarity', #Specifying that retrieval is based on cosine similarity, which is the metric used
    search_kwargs={'k': 3} #Retrieving the top 3 most similar documents for a given query, which will be our choice of input for RAG
)

In [64]:
# Creating a new variable as user query, which will be used for retrieval
user_query = 'What are the symptoms of Altitude sickness?'
rel_docs = retriever.get_relevant_documents(user_query)
rel_docs

 Document(page_content="Chapter 334. Altitude Sickness\nAltitude sickness (AS) includes several related syndromes caused by decreased O2 availability\nat high altitudes. Acute mountain sickness (AMS), the mildest form, is headache plus one or\nmore systemic manifestations. High-altitude cerebral edema (HACE) is encephalopathy in\npeople with AMS. High-altitude pulmonary edema (HAPE) is a form of noncardiogenic\npulmonary edema causing severe dyspnea and hypoxemia. AMS may occur in recreational\nhikers and skiers in mountains. Diagnosis is clinical. Treatment of mild AMS is with analgesics\nand acetazolamide. Severe syndromes require descent and supplemental O2 if available. In\naddition, dexamethasone may be useful for HACE, and nifedipine may be useful for HAPE.\nAs altitude increases, atmospheric pressure decreases while the percentage of O2 in air remains\nconstant; thus, the partial pressure of O2 decreases with altitude and, at 5800 m (19,000 ft), is about one\nhalf that at sea le

- We can observe that the three relevant chunks contain the answer to the query.  
- If we increase the **`k`** value, there is a chance that we might find the answer in even more chunks.  
- This is a hyperparameter that we need to tune to get the best context.

In [65]:
# Creating a variable model_output to check response of the query without providing it any reference of the given pdf document, using the same input on the document as user_query
model_output = llm(
      user_query, # Same input query as above
      max_tokens= 512, # the maximum number of tokens
      temperature=0.5, # the temperature parameter indicating randomness in output response
    )

Llama.generate: prefix-match hit


In [None]:
# Checking the output of the first index chunk of the model_output, where the model is not yet fully trained and is rather
model_output['choices'][0]['text']

### System and User Prompt Template

### Response Function

In [None]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

### Fine-tuning

## Output Evaluation

In [None]:
groundedness_rater_system_message  = ""

In [None]:
relevance_rater_system_message = ""

In [None]:
user_message_template = ""

In [None]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

## Actionable Insights and Business Recommendations

<font size=6 color='blue'>Power Ahead</font>
___