<a href="https://colab.research.google.com/github/RashmiJK/PGP-AIML-MedicalAssistant-NLP/blob/main/medical_assistant_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## 1 - Installing and Importing Necessary Libraries and Dependencies

**Set Google Colab to use the T4 GPU**

Install `llama-cpp-python` with GPU acceleration. The wheel build is essential; ignore other errors. Then restart runtime.

- `llama-cpp-python` is a Python wrapper for llama.cpp, a universal LLM inference library that runs models efficiently using the GGUF file format.

- GGUF (GGML Universal File) is a binary format storing model weights and metadata in a single file. It uses quantization to reduce precision, decreasing memory usage and increasing inference speed.

- Model Compatibility: Supports any GGUF-converted model including Llama, Mistral, CodeLlama, Gemma, and Qwen.

- `Llama()` class: Main interface for loading and running models

- `hf_hub_download()`: A function from the Hugging Face Hub library to download specific files from Hugging Face repositories with automatic caching

In [None]:
# Installation for GPU llama-cpp-python: Downloads and compiles the library with GPU acceleration enabled.
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m56.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m328.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m312.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m308.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m264.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency

In [1]:
# Install the libraries & downloading models from HF Hub
!pip install huggingface_hub pandas tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.3.25 langchain-community==0.3.25 chromadb sentence-transformers numpy transformers -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m824.4 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m44.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.8/20.8 MB[0m [31m84.1 MB/s[0m eta [36m0:00:00[0m


In [None]:
# Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## 2 - Query LLM with default parameters

### 2.1 - Download and load the Mistral model
| Model | Repository | File/Name | Model card |
|-------|------------|-----------|---------|
| Mistral-7B-Instruct-v0.2 | `TheBloke/Mistral-7B-Instruct-v0.2-GGUF` | `mistral-7b-instruct-v0.2.Q6_K.gguf` | https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF |

In [None]:
# Define the model repository and filename for the Mistral-7B-Instruct-v0.2 GGUF model.
model_repo = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_file = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [None]:
# Download the model
model_path = hf_hub_download(
    repo_id= model_repo,
    filename= model_file
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [None]:
# Initialize the Llama model with the downloaded GGUF file.
# model_path: path to the GGUF model file.
# n_ctx: context window size (determines how much text the model can process at once).
# n_gpu_layers: number of layers to offload to the GPU for acceleration.
# n_batch: batch size for processing.
llm = Llama(
    model_path=model_path,
    n_ctx=2300,
    n_gpu_layers=38,
    n_batch=512
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


### 2.2 - Utility function `generate_response`

In [None]:
def generate_response(query, max_tokens=128, temperature=0, top_p=0.95, top_k=50, repeat_penalty=1.0):
    """
    Generates a response from the language model.

    Args:
        query (str): The input prompt for the model.
        max_tokens (int, optional): The maximum number of tokens to generate. Defaults to 128.
        temperature (float, optional): Controls the randomness of the output. Defaults to 0.
        top_p (float, optional): Nucleus sampling parameter. Defaults to 0.95.
        top_k (int, optional): Top-k sampling parameter. Defaults to 50.
        repeat_penalty (float, optional): Penalizes repeated tokens. Defaults to 1.0.

    Returns:
        str: The generated text response.
    """
    model_output = llm(
            prompt=query,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            repeat_penalty=repeat_penalty
        )

    return model_output['choices'][0]['text'], model_output

### 2.3 - Querying the LLM

#### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
query_1 = "What is the protocol for managing sepsis in a critical care unit?"
ans_1, moutput_1 = generate_response(query_1)
print(ans_1)
print("completion_tokens = ", moutput_1['usage']['completion_tokens'])



Sepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are the general steps for managing sepsis in a critical care unit:

1. Early recognition and suspicion: Septic patients may present with non-specific symptoms such as fever, chills, tachycardia, tachypnea, altered mental status, and lactic acidosis. It is essential to have a high index of suspicion for sepsis, especially in patients with known infections or risk factors.
2.
completion_tokens =  128


#### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
query_2 = "What are the common symptoms of appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
ans_2, moutput_2 = generate_response(query_2)
print(ans_2)
print("completion_tokens = ", moutput_2['usage']['completion_tokens'])

Llama.generate: prefix-match hit




Appendicitis is a medical condition characterized by inflammation of the appendix, a small tube-shaped organ located in the lower right side of the abdomen. The symptoms of appendicitis can vary from person to person, but some common signs include:

1. Abdominal pain: The pain is typically located in the lower right side of the abdomen and may start as a mild discomfort that gradually worsens. The pain may be constant or come and go, and it may be accompanied by cramping or bloating.
2. Loss of appetite: People with appendic
completion_tokens =  128


#### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
query_3 = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
ans_3, moutput_3 = generate_response(query_3)
print(ans_3)
print("completion_tokens = ", moutput_3['usage']['completion_tokens'])

Llama.generate: prefix-match hit




Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles, leading to hair loss in small, round patches on the scalp, beard, or other areas of the body. The exact cause of alopecia areata is not known, but it is believed to be related to a problem with the immune system.

There are several treatments that have been shown to be effective in addressing sudden patchy hair loss:

1. Corticosteroids: Corticosteroids are anti-inflammatory
completion_tokens =  128


#### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
query_4 = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
ans_4, moutput_4 = generate_response(query_4)
print(ans_4)
print("completion_tokens = ", moutput_4['usage']['completion_tokens'])

Llama.generate: prefix-match hit




A person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, is typically diagnosed with a traumatic brain injury (TBI). The treatment for a TBI depends on the severity and location of the injury, as well as the individual's overall health and age.

Immediate treatment for a TBI may include:

1. Emergency medical care: This may include surgery to remove hematomas or other obstructions, as well as treatment for other injuries that may have occurred at the same time as the TBI.
2. Med
completion_tokens =  128


#### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
query_5 = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
ans_5, moutput_5 = generate_response(query_5)
print(ans_5)
print("completion_tokens = ", moutput_5['usage']['completion_tokens'])

Llama.generate: prefix-match hit




First and foremost, if a person has fractured their leg during a hiking trip, it is essential to ensure their safety and prevent further injury. Here are some necessary precautions and treatment steps:

1. Assess the situation: Check the extent of the injury and assess the person's condition. If the fracture is open or the person is in severe pain, immobilize the leg with a splint or a makeshift sling to prevent any movement.
2. Call for help: If possible, call for emergency medical assistance. If there is no cell phone reception, try to
completion_tokens =  128


<span style="color: blue;"> **Observation**</span>
- The responses to the questions are generic.
- The output is truncated due to the default `max_tokens` limit of 128.

## 3 - Query LLM with Prompt Engineering and Parameter Tuning

Prompt template for Mistral from the model card : `<s>[INST] {prompt} [/INST]`

In order to leverage instruction fine-tuning, prompt is surrounded by [INST] and [/INST] tokens.


In [None]:
# Define a simple utility function to prepare model prompt
def prepare_model_prompt(system_prompt, user_prompt):
    return f"""<s>[INST]{'system'}: {system_prompt}
                {'user'}: {user_prompt}
                [/INST]"""

### Query 1: What is the protocol for managing sepsis in a critical care unit?

Combination 1 - System prompt (general audience, harmless) and modified `max_tokens`

In [None]:
system_prompt = """You are a helpful, respectful and honest medical assistant.
                  Always explain in simple terms for a general audience.
                  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
                  Please ensure that your responses are socially unbiased and positive in nature."""
user_input = "What is the protocol for managing sepsis in a critical care unit?"


ans, moutput = generate_response(
    prepare_model_prompt(system_prompt, user_input),
    max_tokens=0,
    temperature=0,
    top_p=0.95,
    top_k=50,
    repeat_penalty=1.0
  )
print(ans)
print("completion_tokens = ", moutput['usage']['completion_tokens'])

Llama.generate: prefix-match hit


 Sepsis is a serious condition that occurs when the body has an overwhelming response to an infection. In a critical care unit, managing sepsis involves several steps to ensure the best possible outcome for the patient. Here's a simplified explanation of the protocol:

1. Recognition: Healthcare professionals must identify sepsis early and assess its severity using scoring systems like the Sequential Organ Failure Assessment (SOFA) score or the Quick Sequential Organ Failure Assessment (qSOFA) score.

2. Fluid resuscitation: The first step in managing sepsis is to restore intravascular volume by administering fluids intravenously. This helps maintain adequate blood pressure and organ perfusion.

3. Antibiotics: Administering antibiotics as soon as possible is crucial for treating the underlying infection. The choice of antibiotics depends on the suspected infection source and the patient's allergies.

4. Vasopressors: If the patient's blood pressure remains low despite fluid resuscitat

<span style="color: blue;"> **Observation**</span>
- The explanation is detailed and suitable for a general audience.
- The number of completion tokens has increased compared to the previous query.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

Combination 2 - System prompt (brevity and Shakespearean language) and modified `temperature` and `max_tokens`

In [None]:
# temperature set to 1 and max_token is 0
system_prompt = """Respond briefly and clearly in Shakespearean language."""

user_input = "What are the common symptoms of appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"

ans, moutput = generate_response(
    prepare_model_prompt(system_prompt, user_input),
    max_tokens=0,
    temperature=1,
    top_p=0.95,
    top_k=50,
    repeat_penalty=1.0
  )
print(ans)
print("completion_tokens = ", moutput['usage']['completion_tokens'])

Llama.generate: prefix-match hit


 Thy queried mind doth ask of appendicitis, its manifestations and cure, perchance I shall provide thee insight, good sir.

Appendicitis, a malady most wretched, manifests itself through sharp pains in the right side, near the navel, whence they may travel downward, towards the lower regions. A feeling of unease and discomfort doth pervade the belly, swelling may ensue, and loss of appetite is common. Feverish heat within the body, and a general sense of malaise, complete the picture of this affliction.

As for a remedy by medicine alone, alas, it is but a fleeting hope. For appendicitis, a surgical intervention called an appendectomy is the customary course. In this procedure, the afflicted appendage is removed, granting relief from the torment it inflicts.

So, in summary, Appendicitis doth present with pain near the navel, fever, loss of appetite, and swelling; its cure lieth not in medicine but in surgery known as appendectomy.
completion_tokens =  244


In [None]:
# temperature set to 1 and max_token is 0
# Repeating the same question to observe effect of temperature
system_prompt = """Respond briefly and clearly in Shakespearean language."""

user_input = "What are the common symptoms of appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"

ans, moutput = generate_response(
    prepare_model_prompt(system_prompt, user_input),
    max_tokens=0,
    temperature=1,
    top_p=0.95,
    top_k=50,
    repeat_penalty=1.0
  )
print(ans)
print("completion_tokens = ", moutput['usage']['completion_tokens'])

Llama.generate: prefix-match hit


 Thou askest of the appendix, a malady's signs and healing's way,
With sorrow in the belly, swelling doth betray.
Low fever, cravings harsh, with pain increased,
Appetite and rest, both cruelly misced.
The right side, near the navel, holds the mournful part,
Whereas, alas! No potion or herb art,
Could cure this ailment, oh, so cruel, unkind,
A surgeon's hand must slice the suffering blind.
An appendectomy, thou shalt the term call,
To free the soul from this affliction's thrall.
completion_tokens =  150


<span style="color: blue;"> **Observation**</span>
- The explanation is poetic in nature
- Same question repeated again has distinct response as temperature is set to 1 for random response.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

Combination 3 - System prompt (empty) and modified `top_k`

`top_k` controls the maximum number of most-likely next tokens to consider when generating the response at each step.

In [None]:
# top_k set to 5
system_prompt = ""

user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"

ans, moutput = generate_response(
    prepare_model_prompt(system_prompt, user_input),
    max_tokens=0,
    temperature=0,
    top_p=0.95,
    top_k=5,
    repeat_penalty=1.0
  )
print(ans)
print("completion_tokens = ", moutput['usage']['completion_tokens'])

Llama.generate: prefix-match hit


 There are several possible causes for sudden, patchy hair loss, also known as alopecia areata. Here are some effective treatments and possible causes:

Causes:
1. Alopecia Areata: An autoimmune disorder that causes the body's immune system to attack hair follicles, leading to hair loss.
2. Stress: Physical or emotional stress can cause hair loss.
3. Nutritional Deficiencies: Lack of certain nutrients, such as iron, zinc, or biotin, can lead to hair loss.
4. Hormonal Imbalance: Hormonal changes, such as those caused by pregnancy, menopause, or thyroid problems, can cause hair loss.
5. Medications: Certain medications, such as chemotherapy drugs, can cause hair loss.

Treatments:
1. Minoxidil: A topical medication that can help stimulate hair growth and slow down hair loss.
2. Corticosteroids: Prescription medications that can help reduce inflammation and suppress the immune system to promote hair growth.
3. Immunotherapy: Injections of certain proteins that can help stimulate hair grow

In [None]:
# top_k set to 70
system_prompt = ""

user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"

ans, moutput = generate_response(
    prepare_model_prompt(system_prompt, user_input),
    max_tokens=0,
    temperature=0,
    top_p=0.95,
    top_k=70,
    repeat_penalty=1.0
  )
print(ans)
print("completion_tokens = ", moutput['usage']['completion_tokens'])

Llama.generate: prefix-match hit


 There are several possible causes for sudden, patchy hair loss, also known as alopecia areata. Here are some effective treatments and possible causes:

Causes:
1. Alopecia Areata: An autoimmune disorder that causes the body's immune system to attack hair follicles, leading to hair loss.
2. Stress: Physical or emotional stress can cause hair loss.
3. Nutritional Deficiencies: Lack of certain nutrients, such as iron, zinc, or biotin, can lead to hair loss.
4. Hormonal Imbalance: Hormonal changes, such as those caused by pregnancy, menopause, or thyroid problems, can cause hair loss.
5. Medications: Certain medications, such as chemotherapy drugs, can cause hair loss.

Treatments:
1. Minoxidil: A topical medication that can help stimulate hair growth and slow down hair loss.
2. Corticosteroids: Prescription medications that can help reduce inflammation and suppress the immune system to promote hair growth.
3. Immunotherapy: Injections of certain proteins that can help stimulate hair grow

<span style="color: blue;"> **Observation**</span>
- While the "Causes" sections are identical, the "Treatments" sections show a clear difference based on the top_k setting.
- The top_k=70 response provides a longer list of treatments, better wording specificity and more token count.
- This happens because top_k=5 forces the model to choose its next word from only the top 5 most probable options, leading to a more predictable and generic response. In contrast, top_k=70 gives the model a much wider pool of 70 words to choose from at each step, allowing for more specific terminology and a more comprehensive list.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

Combination 4 - Few-shot prompting

In [None]:
system_prompt = """
You are a medical assistant providing information on treatments for brain injuries.

User:
Question: What are the common symptoms and treatments for pulmonary embolism?
Answer: Common symptoms of pulmonary embolism include sudden shortness of breath, chest pain that worsens with breathing or coughing, rapid heart rate, rapid breathing, anxiety, coughing (sometimes with blood), sweating, and fainting. Treatment typically involves anticoagulant medications to prevent further clots, and sometimes thrombolytics to dissolve existing clots. In severe cases, surgical embolectomy or catheter-directed treatments may be necessary.

User:
Question: Can you provide the trade names of medications used for treating hypertension?
Answer: Some common trade names for medications used to treat hypertension include Prinivil, Zestril (Lisinopril), Norvasc (Amlodipine), Cozaar (Losartan), Diovan (Valsartan), Toprol XL, Lopressor (Metoprolol), and Tenormin (Atenolol).

User:
Question: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?
Answer:
"""

user_input = ""

ans, moutput = generate_response(
    prepare_model_prompt(system_prompt, user_input),
    max_tokens=0
  )
print(ans)
print("completion_tokens = ", moutput['usage']['completion_tokens'])

Llama.generate: prefix-match hit


 Treatment for a brain injury can depend on the severity and location of the injury. For mild to moderate brain injuries, rest, medication for pain and swelling, and rehabilitation therapies such as physical, occupational, and speech therapy may be recommended. For more severe injuries, treatments may include surgery to remove hematomas or repair skull fractures, and intensive care to manage symptoms such as seizures, infections, or breathing problems. Rehabilitation is also an important part of treatment for brain injuries, regardless of severity. It can help individuals regain skills and improve function. Additionally, medications may be prescribed to manage symptoms such as seizures, depression, or difficulty with attention or memory. It's important to note that every brain injury is unique, and treatment plans will vary depending on the individual's specific needs.
completion_tokens =  174


<span style="color: blue;"> **Observation**</span>
- The structure and content of response align well with the provided few-shot examples, demonstrating that the model understood the desired format and level of detail.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

Combination 5 - CoT prompting

In [None]:
system_prompt = """Think step-by-step to determine the necessary precautions, treatment steps, and considerations for care and recovery for a person who has fractured their leg during a hiking trip. Consider the immediate actions to take at the injury site, the subsequent medical treatment, and the long-term recovery process.
"""

user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"

ans, moutput = generate_response(
    prepare_model_prompt(system_prompt, user_input),
    max_tokens=0
  )
print(ans)
print("completion_tokens = ", moutput['usage']['completion_tokens'])

Llama.generate: prefix-match hit


 I. Immediate Actions at the Injury Site:
1. Assess the situation: Check if the person is in a safe location and if there are any other injuries.
2. Provide first aid: Apply a sterile dressing to the wound, if present, to prevent infection. Do not attempt to realign the bone or apply excessive pressure to the area.
3. Immobilize the leg: Use a splint, a makeshift sling, or a hiking pole to immobilize the leg to prevent further damage and provide comfort.
4. Monitor vital signs: Check for signs of shock, such as rapid heartbeat, shallow breathing, or pale skin.
5. Provide hydration and nutrition: Offer water or other fluids to help maintain hydration and provide energy-rich snacks.

II. Subsequent Medical Treatment:
1. Seek professional help: Arrange for transportation to the nearest medical facility as soon as possible.
2. Diagnostic tests: X-rays will be used to confirm the fracture and determine the extent of the injury.
3. Pain management: The healthcare provider may prescribe pain 

<span style="color: blue;"> **Observation**</span>
- The response is detailed and includes step-by-step thinking and reasoning.

## 4 - Download Embedding model

Download the General Text Embeddings (GTE) model to generate embeddings for the PDF data from the Merck Manual.

*   These models are ranked well on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for 'Retrieval' tasks, indicating their effectiveness in creating meaningful representations of text for search and retrieval purposes.
*   This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.

| Model     | Repository          | How to Load         | Model Card                                        | Embedding Dimension |
|-----------|---------------------|---------------------|-------------|---------------------------------------------------|
| GTE-Large | `thenlper/gte-large` | `SentenceTransformer` | https://huggingface.co/thenlper/gte-large         | 1024 |

In [1]:
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings

In [2]:
embedding_model = SentenceTransformerEmbeddings(model_name="thenlper/gte-large")

  embedding_model = SentenceTransformerEmbeddings(model_name="thenlper/gte-large")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [None]:
print("Model information:")
print(embedding_model.client)

print("\nTokenizer:")
print(embedding_model.client.tokenizer)

Model information:
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Tokenizer:
BertTokenizerFast(name_or_path='thenlper/gte-large', vocab_size=30522, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True, added_tokens_decoder={
	0: AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	100: AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False

- The gte-large embedding model uses BertTokenizerFast for generating embeddings.
- This Notebook will use the same to token count when splitting the document into chunks with RecursiveCharacterTextSplitter
- This ensures the chunks are within the embedding model's maximum length.

Methods `.embed_documents()` or `.emded_query()` can be used to generate embeddings

## 5 - Data Preparation and Vector Database Setup for RAG

To prepare the medical manual data for Retrieval Augmented Generation (RAG), we will perform the following steps:

1.  **Chunking**: Divide the PDF document into smaller, manageable text segments (chunks). We will create two sets of chunks with different sizes (490 and 245 tokens) to explore the impact of chunk size on retrieval performance.
2.  **Vectorization**: Convert these text chunks into numerical representations called embeddings using the pre-trained GTE-Large embedding model.
3.  **Vector Database Setup**: Store the vectorized chunks in two separate Chroma vector databases, one for each chunk size. This allows for efficient similarity search during the retrieval phase of RAG.

By creating two databases with different chunk sizes, we can compare their effectiveness in retrieving relevant information for answering medical queries.

### 5.2 - Import libraries required for chunking

In [3]:
# Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

# Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import Chroma
from uuid import uuid4
from time import sleep

### 5.2 - Loading and Previewing the Medical Manual

In [4]:
# Connect to Google Drive to load the PDF
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
manual_pdf_path = "/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf"

In [6]:
pdf_loader = PyMuPDFLoader(manual_pdf_path)

In [7]:
manual = pdf_loader.load()

In [8]:
print("total documents loaded from the PDF = ", len(manual))

total documents loaded from the PDF =  4114


In [9]:
# Inspect page_content and length of few randomly selected documents to understand the data structure
for i in range(20,24):
  print("page = ",manual[i].metadata['page'],end="\n")
  print("page_content = ", manual[i].page_content[:200],end="\n")
  print("page_content length = ", len(manual[i].page_content),end="\n")
  print("---"*10)

page =  20
page_content =  PO2
oxygen partial pressure (or tension)
PPD
purified protein derivative (tubercullin)
ppm
parts per million
prn
as needed
PT
prothrombin time
PTT
partial thromboplastin time
q
every (only in dosages)
page_content length =  1487
------------------------------
page =  21
page_content =  Medical College
Senior Assistant Editor
JUSTIN L. KAPLAN, MD
Merck & Co., Inc, and Clinical Associate Professor, Department of Emergency Medicine, Jefferson
Medical College
Editorial Board
RICHARD K. 
page_content length =  1843
------------------------------
page =  22
page_content =  MICHAEL JACEWICZ, MD
Professor of Neurology, University of
Tennessee Health Science Center; Assistant
Chief of Neurology, VA Medical Center,
Memphis
MATTHEW E. LEVISON, MD
Adjunct Professor of Medicin
page_content length =  1858
------------------------------
page =  23
page_content =  Attending Physician, Lenox Hill Hospital
and New York Presbyterian Hospital
Genitourinary (Urologic) Disorders
I

### 5.3 - Data Chunking (chunk_size=490)

In [10]:
# Import the BertTokenizerFast from the transformers library
from transformers import BertTokenizerFast
# Load the tokenizer for the 'thenlper/gte-large' model
tokenizer = BertTokenizerFast.from_pretrained("thenlper/gte-large")

In [None]:
# Initialize the RecursiveCharacterTextSplitter using the loaded tokenizer.
# from_huggingface_tokenizer is used to ensure compatibility with the model's tokenizer.
# chunk_size: The maximum number of tokens in each chunk
# chunk_overlap: The number of tokens to overlap between consecutive chunks
text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
    tokenizer=tokenizer,
    chunk_size=490,
    chunk_overlap=20
)

In [None]:
# Load the PDF document and split it into chunks using the configured text_splitter
document_chunks = pdf_loader.load_and_split(text_splitter)

In [None]:
# Verify token counts for each chunk
max_tokens_allowed = 512
all_chunks_within_limit = True

for i, chunk in enumerate(document_chunks):
  token_count = len(tokenizer.encode(chunk.page_content))
  if token_count > max_tokens_allowed:
    print(f"Chunk {i} exceeds the token limit with {token_count} tokens.")
    all_chunks_within_limit = False

if all_chunks_within_limit:
  print(f"All document chunks are within the {max_tokens_allowed}-token limit.")

All document chunks are within the 512-token limit.


In [None]:
# Print the total number of document chunks created
print(f"""
type(document_chunks) = {type(document_chunks)}
type(document_chunks[0]) = {type(document_chunks[0])}
len(document_chunks) = {len(document_chunks)}""")


type(document_chunks) = <class 'list'>
type(document_chunks[0]) = <class 'langchain_core.documents.base.Document'> 
len(document_chunks) = 8678


In [None]:
# Print the content of a specific document chunk to understand its structure and metadata
document_chunks[2000].model_dump()

{'id': None,
 'metadata': {'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)',
  'creator': 'Atop CHM to PDF Converter',
  'creationdate': '2012-06-15T05:44:40+00:00',
  'source': '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf',
  'file_path': '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf',
  'total_pages': 4114,
  'format': 'PDF 1.7',
  'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition',
  'author': '',
  'subject': '',
  'keywords': '',
  'moddate': '2025-11-02T22:11:05+00:00',
  'trapped': '',
  'modDate': 'D:20251102221105Z',
  'creationDate': 'D:20120615054440Z',
  'page': 916},
 'page_content': "Toxic solitary or multinodular goiter (Plummer's disease) sometimes results from TSH receptor\ngene mutations producing continuous thyroid stimulation. Patients with toxic nodular goiter have none of\nthe autoimmune manifestations or circulating antibodies observed in patients with Graves' disease. Also

In [None]:
# Print the content of a specific document chunk to understand its structure and metadata
document_chunks[2001].model_dump()

{'id': None,
 'metadata': {'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)',
  'creator': 'Atop CHM to PDF Converter',
  'creationdate': '2012-06-15T05:44:40+00:00',
  'source': '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf',
  'file_path': '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf',
  'total_pages': 4114,
  'format': 'PDF 1.7',
  'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition',
  'author': '',
  'subject': '',
  'keywords': '',
  'moddate': '2025-11-02T22:11:05+00:00',
  'trapped': '',
  'modDate': 'D:20251102221105Z',
  'creationDate': 'D:20120615054440Z',
  'page': 916},
 'page_content': "suppressed.\nPathophysiology\nIn hyperthyroidism, serum T3 usually increases more than does T4, probably because of increased\nsecretion of T3 as well as conversion of T4 to T3 in peripheral tissues. In some patients, only T3 is\nelevated (T3 toxicosis). T3 toxicosis may occur in any of the usual disord

<span style="color: blue;"> **Observation**</span>
As expected, there are some overlaps

In [None]:
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content)
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)

In [None]:
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

Dimension of the embedding vector  1024


True

### 5.4 - Populate Vector Database (medical_db_490)

In [18]:
# Define a utility function to create and populate database
def get_vectordb_handler(collection_name, persist_dir, document_chunks):
    """
    Handles the creation or loading of the Chroma vector database

    Returns:
        Chroma: An instance of the Chroma vector database.
    """
    if os.path.exists(persist_dir):
      print(f'"{persist_dir}" already exists!')
    else:
      print(f'Creating vector database directory in "{persist_dir}"')
      os.makedirs(persist_dir)

    # Instantiate Chroma with persitence
    vectorstore = Chroma(
        persist_directory=persist_dir,
        embedding_function=embedding_model,
        collection_name=collection_name
      )

    # Get the collection
    content = vectorstore.get()
    print("Collection content after initialization => ", content)

    if not len(content['ids']):
      print(f'Populating vector database...')

      uuids = [str(uuid4()) for _ in range(len(document_chunks))]
      i = 0
      while i < len(document_chunks) - 1000:
        added_list = vectorstore.add_documents(document_chunks[i : i + 1000], ids=uuids[i : i + 1000])
        print(f'Vector database populated with {len(added_list)} entries')
        i += 1000
        sleep(10)

      if i < len(document_chunks):
          added_list = vectorstore.add_documents(document_chunks[i :], ids=uuids[i :])
          print(f'Vector database populated with {len(added_list)} entries')

    else:
      print(f'Vector database already populated.')

    return vectorstore

In [None]:
# Define the directory where the vector database will be stored
persist_dir = '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_db_490'

In [None]:
vectorstore = get_vectordb_handler("MerckManual", persist_dir, document_chunks)

Creating vector database directory in "/content/drive/MyDrive/Colab Notebooks/Project-5/medical_db_490"


  vectorstore = Chroma(


Collection content after initialization =>  {'ids': [], 'embeddings': None, 'documents': [], 'uris': None, 'included': ['metadatas', 'documents'], 'data': None, 'metadatas': []}
Populating vector database...
Vector database populated with 1000 entries
Vector database populated with 1000 entries
Vector database populated with 1000 entries
Vector database populated with 1000 entries
Vector database populated with 1000 entries
Vector database populated with 1000 entries
Vector database populated with 1000 entries
Vector database populated with 1000 entries
Vector database populated with 678 entries


In [None]:
# Total entries in the vector db
len(vectorstore.get()['ids'])

8678

In [None]:
# Access the query embedding object
vectorstore.embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='thenlper/gte-large', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [None]:
# Test similarity search for vitamin A toxicity
vectorstore.similarity_search("What are the side effects if vitamin A overdose?",k=3)

[Document(metadata={'author': '', 'format': 'PDF 1.7', 'source': '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf', 'file_path': '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf', 'total_pages': 4114, 'trapped': '', 'creationdate': '2012-06-15T05:44:40+00:00', 'page': 93, 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'subject': '', 'moddate': '2025-11-02T22:11:05+00:00', 'keywords': '', 'modDate': 'D:20251102221105Z', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creator': 'Atop CHM to PDF Converter', 'creationDate': 'D:20120615054440Z'}, page_content='defects occur in children of women receiving isotretinoin (which is related to vitamin A) for acne treatment\nduring pregnancy.\nAlthough carotene is converted to vitamin A in the body, excessive ingestion of carotene causes\ncarotenemia, not vitamin A toxicity. Carotenemia is usually asymptomatic but may lead to carotenodermia,\nin which the s

<span style="color: blue;"> **Observation**</span>
- The Merck Manuals have been vectorized and stored in the Chroma DB vector database.
- There are 8678 entries in the database, corresponding to the number of document chunks created.
- Testing the similarity search for "Vitamin A toxicity" successfully retrieved relevant chunks from the database.

### 5.5 - Data Chunking (chunk_size=245)

To tune chunking, we'll create a new database to store smaller size chunks.

In [12]:
# Initialize the RecursiveCharacterTextSplitter for chunk_size 245 (smaller than the previous one)
text_splitter_245 = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
    tokenizer=tokenizer,
    chunk_size=245,
    chunk_overlap=20
)

In [13]:
# Load the PDF document and split it into chunks using the configured text_splitter_245
document_chunks_245 = pdf_loader.load_and_split(text_splitter_245)

In [14]:
# Print the total number of document chunks created using text_splitter_245 and their types
print(f"""
type(document_chunks_245) = {type(document_chunks_245)}
type(document_chunks_245[0]) = {type(document_chunks_245[0])}
len(document_chunks_245) = {len(document_chunks_245)}""")


type(document_chunks_245) = <class 'list'>
type(document_chunks_245[0]) = <class 'langchain_core.documents.base.Document'>
len(document_chunks_245) = 16160


In [15]:
# Print the dimension of the embedding vector generated by the model.
print("Dimension of the embedding vector ",len(embedding_model.embed_query(document_chunks_245[0].page_content)))

Dimension of the embedding vector  1024


<span style="color: blue;"> **Observation**</span>
- The dimension of the embedded vector remains at 1024, consistent with the model's output size.
- This demonstrates that even with smaller chunks (chunk_size=245), the embedding model effectively captures the contextual information within each chunk and represents it as a 1024-dimensional vector.

### 5.4 - Populate Vector Database (medical_db_245)

In [19]:
# Define the directory where the vector database will be stored
persist_dir_245 = '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_db_245'

In [None]:
vectorstore_245 = get_vectordb_handler("MerckManual245", persist_dir_245, document_chunks_245)

"/content/drive/MyDrive/Colab Notebooks/Project-5/medical_db_245" already exists!
Collection content after initialization =>  {'ids': [], 'embeddings': None, 'documents': [], 'uris': None, 'included': ['metadatas', 'documents'], 'data': None, 'metadatas': []}
Populating vector database...


In [None]:
# Total entries in the vector db
len(vectorstore_245.get()['ids'])

In [None]:
# Test similarity search for vitamin A toxicity
vectorstore_245.similarity_search("What are the side effects if vitamin A overdose?",k=3)

### 5.5 - Tranform vectors store into retriever

For easier usage with LangChain chains, we can tranform the vector store into retriever

In [None]:
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 3}
)

In [None]:
rel_docs = retriever.get_relevant_documents("What are the side effects if vitamin A overdose?")
rel_docs

  rel_docs = retriever.get_relevant_documents("What are the side effects if vitamin A overdose?")


[Document(metadata={'moddate': '2025-11-02T22:11:05+00:00', 'format': 'PDF 1.7', 'author': '', 'creator': 'Atop CHM to PDF Converter', 'page': 93, 'source': '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf', 'trapped': '', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'total_pages': 4114, 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'creationDate': 'D:20120615054440Z', 'modDate': 'D:20251102221105Z', 'subject': '', 'creationdate': '2012-06-15T05:44:40+00:00', 'keywords': '', 'file_path': '/content/drive/MyDrive/Colab Notebooks/Project-5/medical_diagnosis_manual.pdf'}, page_content='defects occur in children of women receiving isotretinoin (which is related to vitamin A) for acne treatment\nduring pregnancy.\nAlthough carotene is converted to vitamin A in the body, excessive ingestion of carotene causes\ncarotenemia, not vitamin A toxicity. Carotenemia is usually asymptomatic but may lead to carotenodermia,\nin which the s

<span style="color: blue;"> **Observation**</span>
- You can see the same documents were retrieved for the same query through retriever


In [None]:
model_output = llm(
      "_____", #Complete the code to pass the query
      max_tokens=_____, #Complete the code to pass the maximum number of tokens
      temperature=_____, #Complete the code to pass the temperature
    )

In [None]:
model_output['choices'][0]['text']

The above response is somewhat generic and is solely based on the data the model was trained on, rather than the medical manual.  

Let's now provide our own context.

### System and User Prompt Template

Prompts guide the model to generate accurate responses. Here, we define two parts:

    1. The system message describing the assistant's role.
    2. A user message template including context and the question.

In [None]:
qna_system_message = "_____"  #Complete the code to define the system message

In [None]:
qna_user_message_template = "_____" #Complete the code to define the user message

### Response Function

In [None]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## 5 - Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
generate_rag_response(user_input,top_k=20)

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input_2 = "_____" #Complete the code to pass the query #2
generate_rag_response(_____) #Complete the code to pass the user input

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input_2 = "_____" #Complete the code to pass the query #3
generate_rag_response(_____) #Complete the code to pass the user input

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input_2 = "_____" #Complete the code to pass the query #4
generate_rag_response(_____) #Complete the code to pass the user input

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input_2 = "_____" #Complete the code to pass the query #5
generate_rag_response(_____) #Complete the code to pass the user input

### Fine-tuning

#### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
generate_rag_response(user_input,temperature=0.5)

#### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
user_input_2 = "_____" #Complete the code to pass the query #2
generate_rag_response(_____) #Complete the code to pass the user input along with the parameters

#### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
user_input_2 = "_____" #Complete the code to pass the query #3
generate_rag_response(_____) #Complete the code to pass the user input along with the parameters

#### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
user_input_2 = "_____" #Complete the code to pass the query #4
generate_rag_response(_____) #Complete the code to pass the user input along with the parameters

#### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
user_input_2 = "_____" #Complete the code to pass the query #5
generate_rag_response(_____) #Complete the code to pass the user input along with the parameters

## 6 - Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [None]:
groundedness_rater_system_message = "_____" #Complete the code to define the prompt to evaluate groundedness

In [None]:
relevance_rater_system_message = "_____" #Complete the code to define the prompt to evaluate relevance

In [None]:
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [None]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            echo=False
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [None]:
ground,rel = generate_ground_relevance_response(user_input="What is the protocol for managing sepsis in a critical care unit?",max_tokens=370)

print(ground,end="\n\n")
print(rel)

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [None]:
ground,rel = generate_ground_relevance_response(user_input="_____",_____) #Complete the code to pass the query #2 along with parameters if needed

print(ground,end="\n\n")
print(rel)

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [None]:
ground,rel = generate_ground_relevance_response(user_input="_____",_____) #Complete the code to pass the query #3 along with parameters if needed

print(ground,end="\n\n")
print(rel)

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [None]:
ground,rel = generate_ground_relevance_response(user_input="_____",_____) #Complete the code to pass the query #4 along with parameters if needed

print(ground,end="\n\n")
print(rel)

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [None]:
ground,rel = generate_ground_relevance_response(user_input="_____",_____) #Complete the code to pass the query #5 along with parameters if needed

print(ground,end="\n\n")
print(rel)

## Actionable Insights and Business Recommendations


*   
*  
*



<font size=6 color='blue'>Power Ahead</font>
___