## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Check GPU and install libraries. I had Gemini help me write this as it wrote an interative loop to load all of my libraries
import os, sys, subprocess
print("GPU:", os.popen("nvidia-smi -L").read() or "No GPU detected")
print("Python:", sys.version)

reqs = [
    "accelerate>=0.33.0", "transformers>=4.43.0", "bitsandbytes>=0.43.0",
    "sentence-transformers>=3.0.1", "faiss-cpu>=1.8.0",
    "langchain>=0.2.6", "langchain-community>=0.2.6", "langchain-text-splitters>=0.2.2",
    "pypdf>=4.2.0", "unstructured>=0.15.0",
    "pandas>=2.2.2", "numpy>=1.26.4", "scikit-learn>=1.5.1", "tqdm>=4.66.4", "ipywidgets>=8.1.0",
    "llama-cpp-python>=0.2.78"
]
for r in reqs:
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", r])
        print("Installed:", r)
    except Exception as e:
        print("Install warning:", r, "->", e)

GPU: GPU 0: Tesla T4 (UUID: GPU-04b0beb7-8f7a-3f0d-b69a-468f6444887b)

Python: 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]
Installed: accelerate>=0.33.0
Installed: transformers>=4.43.0
Installed: bitsandbytes>=0.43.0
Installed: sentence-transformers>=3.0.1
Installed: faiss-cpu>=1.8.0
Installed: langchain>=0.2.6
Installed: langchain-community>=0.2.6
Installed: langchain-text-splitters>=0.2.2
Installed: pypdf>=4.2.0
Installed: unstructured>=0.15.0
Installed: pandas>=2.2.2
Installed: numpy>=1.26.4
Installed: scikit-learn>=1.5.1
Installed: tqdm>=4.66.4
Installed: ipywidgets>=8.1.0
Installed: llama-cpp-python>=0.2.78


**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [2]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub==0.23.2 pandas==1.5.3 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.1.1 langchain-community==0.0.13 chromadb==0.4.22 sentence-transformers==2.3.1 numpy==1.25.2 -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build wheel ... [?25l[?25herror
[1;31merror[0m: [1msubprocess-exited-with-error[0m

[31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m See above for output.

[1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.


**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [3]:
#Libraries for processing dataframes and text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [4]:
# Loading Mistral 7B and define helpers
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_name = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", load_in_4bit=True, torch_dtype=torch.float16
)
model.generation_config = GenerationConfig(temperature=0.3, top_p=0.9, max_new_tokens=512)

def format_chat(system, user):
    return f"<s>[INST] <<SYS>>\n{system}\n<</SYS>>\n{user} [/INST]"

def llm_answer(question, temperature=0.3, top_p=0.9, max_new_tokens=512):
    system = ("You are a careful clinical assistant using general knowledge only. "
              "Educational, not medical advice. If unsure, say so.")
    prompt = format_chat(system, question)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.inference_mode():
        out = model.generate(**inputs, temperature=temperature, top_p=top_p,
                             max_new_tokens=max_new_tokens, do_sample=True)
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    if '[/INST]' in text: text = text.split('[/INST]')[-1].strip()
    return text

# Problem statement questions
questions = [
    "What is the protocol for managing sepsis in a critical care unit?",
    "What are the common symptoms of appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?",
    "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?",
    "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?",
    "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
]

baseline_answers = []  # this will append per-query below
rag_answers = []       # this will be filled later


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
`torch_dtype` is deprecated! Use `dtype` instead!
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [5]:
def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [6]:
# Query 1
ans = llm_answer(questions[0], temperature=0.5, top_p=0.9, max_new_tokens=512)
baseline_answers.append(ans)
print(ans[:1200])

Sepsis is a life-threatening condition that occurs when an infection in the body triggers a severe inflammatory response. Management in a critical care unit typically involves a multidisciplinary approach, focusing on early recognition, prompt treatment, and supportive care. Here are some general steps for managing sepsis in a critical care unit:

1. Recognition: Early recognition is crucial. Look for signs of infection, such as fever, chills, or infection site, along with symptoms of sepsis, like rapid heart rate, rapid breathing, confusion, or decreased urine output.

2. Resuscitation: Begin resuscitation efforts if sepsis is suspected. This may include administering intravenous fluids to maintain adequate blood pressure and organ perfusion, providing supplemental oxygen, and initiating vasopressors if necessary to maintain blood pressure.

3. Antibiotics: Administer broad-spectrum antibiotics as soon as possible, ideally within the first hour. The choice of antibiotics depends on th

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [7]:
# Query 2
ans = llm_answer(questions[1], temperature=0.5, top_p=0.9, max_new_tokens=512)
baseline_answers.append(ans)
print(ans[:1200])

Appendicitis is a medical condition characterized by inflammation of the appendix, a small, tube-shaped organ attached to the first part of the large intestine. The following are common symptoms of appendicitis:

1. Abdominal pain: The pain is typically located in the lower right side of the abdomen, but it can also be felt in other areas. The pain may begin as a mild discomfort, but it can quickly become severe and constant.
2. Loss of appetite
3. Nausea and vomiting
4. Fever
5. Abdominal swelling
6. Inability to pass gas or have a bowel movement
7. Diarrhea or constipation
8. Feeling sick or weak

If left untreated, appendicitis can lead to a ruptured appendix, which can result in peritonitis, a serious inflammation of the abdominal cavity.

Appendicitis cannot be cured with medicine alone. Surgery is the standard treatment for appendicitis. The surgical procedure to remove the appendix is called an appendectomy. It can be performed as an open appendectomy or as a laparoscopic append

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [8]:
# Query 3
ans = llm_answer(questions[2], temperature=0.5, top_p=0.9, max_new_tokens=512)
baseline_answers.append(ans)
print(ans[:1200])

Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that causes hair loss in small, round patches on the scalp, but it can also affect other areas of the body. The exact cause of alopecia areata is unknown, but it's believed to be an autoimmune disease where the body's immune system attacks the hair follicles.

There are several treatments that can help address sudden patchy hair loss:

1. Topical treatments: Over-the-counter and prescription topical treatments, such as minoxidil, can help stimulate hair growth and slow down hair loss. Minoxidil is available as a liquid or foam, and it's applied to the scalp once or twice a day.

2. Corticosteroids: Corticosteroids, which are anti-inflammatory drugs, can help reduce inflammation and suppress the immune system's attack on the hair follicles. They can be applied topically, injected directly into the bald spots, or taken orally.

3. Immunomodulators: Immunomodulators, such as biotin, vitamin D, and zinc

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [9]:
# Query 4
ans = llm_answer(questions[3], temperature=0.5, top_p=0.9, max_new_tokens=512)
baseline_answers.append(ans)
print(ans[:1200])

I cannot provide specific treatment recommendations without knowing the severity and type of brain injury. However, I can suggest some common treatments and therapies that are often used to help individuals recover from brain injuries. Keep in mind that each person's recovery journey is unique, and the most effective treatments will depend on the individual's specific condition and needs.

1. Medical Care: Brain injuries often require hospitalization for initial care. Doctors may prescribe medications to manage symptoms such as seizures, infections, or pain.

2. Rehabilitation: Rehabilitation therapies, including physical, occupational, and speech therapy, can help individuals regain lost functions and improve overall quality of life.

3. Cognitive Rehabilitation: This therapy focuses on addressing cognitive impairments, such as memory, attention, and problem-solving abilities.

4. Behavioral Therapies: Behavioral therapies, such as cognitive-behavioral therapy and applied behavior ana

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [10]:
# Query 5
ans = llm_answer(questions[4], temperature=0.5, top_p=0.9, max_new_tokens=512)
baseline_answers.append(ans)
print(ans[:1200])

If someone has fractured their leg during a hiking trip, here are some general steps they should follow, but keep in mind that this advice is for informational purposes only and should not replace medical advice from a healthcare professional:

1. Assess the severity of the injury: If the fracture is open (compound), if there is significant swelling, or if the person is unable to bear weight on the leg, they should seek medical attention as soon as possible.

2. Control bleeding: Apply direct pressure to the wound with a clean cloth to control any bleeding.

3. Immobilize the leg: Use a makeshift splint, such as a stick or branch, to immobilize the leg to prevent further injury and reduce pain.

4. Elevate the leg: Elevate the leg above heart level to reduce swelling and pain.

5. Apply ice: Apply ice packs to the injured area for 15-20 minutes at a time, several times a day, to help reduce swelling and pain.

6. Monitor for signs of infection: Watch for signs of infection, such as red

## Question Answering using LLM with Prompt Engineering

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [16]:
import itertools, pandas as pd, torch

model.eval()
pad_id = tokenizer.eos_token_id
temps, topp, max_tokens = [0.2], [0.9], [192]
styles = ["Numbered steps.", "Checklist."]

# Pre-tokenize prompts once before embedding vector information.
prompt_inputs = {}
for q in questions:
    for st in styles:
        prompt = format_chat(
            "You are a clinical reference assistant. Educational only; not medical advice.",
            q + " " + st
        )
        prompt_inputs[(q, st)] = tokenizer(prompt, return_tensors="pt", padding=False).to(model.device)

rows = []
with torch.no_grad():
    for (t, p, m) in itertools.product(temps, topp, max_tokens):
        for q in questions:
            for st in styles:
                inputs = prompt_inputs[(q, st)]
                out = model.generate(
                    **inputs,
                    temperature=t, top_p=p, max_new_tokens=m,
                    do_sample=False,
                    pad_token_id=pad_id,
                    use_cache=True
                )
                txt = tokenizer.decode(out[0], skip_special_tokens=True)
                if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
                rows.append({"question": q[:60]+"...", "temp": t, "top_p": p, "max_new_tokens": m,
                             "style": st, "preview": txt[:192]})

sweep_df = pd.DataFrame(rows)
sweep_df.head(10)


Unnamed: 0,question,temp,top_p,max_new_tokens,style,preview
0,What is the protocol for managing sepsis in a ...,0.2,0.9,192,Numbered steps.,1. Recognition and early suspicion: Suspect se...
1,What is the protocol for managing sepsis in a ...,0.2,0.9,192,Checklist.,Sepsis is a life-threatening condition that re...
2,"What are the common symptoms of appendicitis, ...",0.2,0.9,192,Numbered steps.,1. Appendicitis is a medical condition charact...
3,"What are the common symptoms of appendicitis, ...",0.2,0.9,192,Checklist.,Appendicitis is a medical condition characteri...
4,What are the effective treatments or solutions...,0.2,0.9,192,Numbered steps.,1. Identify the Cause: The first step in addre...
5,What are the effective treatments or solutions...,0.2,0.9,192,Checklist.,I. Causes of Sudden Patchy Hair Loss:\n\n1. Al...
6,What treatments are recommended for a person w...,0.2,0.9,192,Numbered steps.,1. Medical Evaluation and Stabilization: The f...
7,What treatments are recommended for a person w...,0.2,0.9,192,Checklist.,I. Immediate Care:\n1. Seek medical attention ...
8,What are the necessary precautions and treatme...,0.2,0.9,192,Numbered steps.,1. Assess the severity of the fracture: If the...
9,What are the necessary precautions and treatme...,0.2,0.9,192,Checklist.,1. Assess the severity of the fracture:\n - ...


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [18]:
# Prompt engineering & parameter sweeps ( Less than 5 combinations)
import itertools, pandas as pd
temps, topp, max_tokens = [0.2, 0.4], [0.85, 0.95], [192]  # I reduced token length for efficiency and cpu performance. This is typical of all queries to follow.
styles = ["Use numbered steps and highlight cautions.",
          "Summarize first, then give steps.",
          "Return a compact checklist."]

rows = []
for (t, p, m), style in zip(itertools.product(temps, topp, max_tokens), styles*5):
    for qi, q in enumerate(questions[:2], start=1):
        prompt = format_chat("You are a clinical reference assistant. Educational only; not medical advice.", q + " " + style)
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        out = model.generate(**inputs, temperature=t, top_p=p, max_new_tokens=m, do_sample=True)
        txt = tokenizer.decode(out[0], skip_special_tokens=True)
        if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
        rows.append({"question_idx": qi, "temperature": t, "top_p": p, "max_new_tokens": m, "style": style, "preview": txt[:380]})
sweep_df = pd.DataFrame(rows)
sweep_df.head(10)


Unnamed: 0,question_idx,temperature,top_p,max_new_tokens,style,preview
0,1,0.2,0.85,192,Use numbered steps and highlight cautions.,1. **Recognition and Suspected Sepsis Detectio...
1,2,0.2,0.85,192,Use numbered steps and highlight cautions.,1. Appendicitis is a medical condition charact...
2,1,0.2,0.95,192,"Summarize first, then give steps.",Sepsis is a life-threatening condition caused ...
3,2,0.2,0.95,192,"Summarize first, then give steps.",Appendicitis is a medical condition characteri...
4,1,0.4,0.85,192,Return a compact checklist.,1. Recognition and Early Detection:\n * Moni...
5,2,0.4,0.85,192,Return a compact checklist.,Common Symptoms of Appendicitis:\n1. Abdominal...
6,1,0.4,0.95,192,Use numbered steps and highlight cautions.,1. **Recognition and Suspected Sepsis:** Suspe...
7,2,0.4,0.95,192,Use numbered steps and highlight cautions.,1. Appendicitis is a medical condition charact...


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [19]:
# Prompt engineering & parameter sweeps ( Less than 5 combinations)
import itertools, pandas as pd
temps, topp, max_tokens = [0.2, 0.4], [0.85, 0.95], [100, 192] #see above comment for max_token limit.
styles = ["Use numbered steps and highlight cautions.",
          "Summarize first, then give steps.",
          "Return a compact checklist."]

rows = []
for (t, p, m), style in zip(itertools.product(temps, topp, max_tokens), styles*5):
    for qi, q in enumerate(questions[:2], start=1):
        prompt = format_chat("You are a clinical reference assistant. Educational only; not medical advice.", q + " " + style)
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        out = model.generate(**inputs, temperature=t, top_p=p, max_new_tokens=m, do_sample=True)
        txt = tokenizer.decode(out[0], skip_special_tokens=True)
        if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
        rows.append({"question_idx": qi, "temperature": t, "top_p": p, "max_new_tokens": m, "style": style, "preview": txt[:192
                                                                                                                           ]})
sweep_df = pd.DataFrame(rows)
sweep_df.head(10)

Unnamed: 0,question_idx,temperature,top_p,max_new_tokens,style,preview
0,1,0.2,0.85,100,Use numbered steps and highlight cautions.,1. **Recognition and Suspected Sepsis Detectio...
1,2,0.2,0.85,100,Use numbered steps and highlight cautions.,1. Appendicitis is a medical condition charact...
2,1,0.2,0.85,192,"Summarize first, then give steps.",Sepsis is a life-threatening condition caused ...
3,2,0.2,0.85,192,"Summarize first, then give steps.",Appendicitis is a medical condition characteri...
4,1,0.2,0.95,100,Return a compact checklist.,I. Initial Assessment and Recognition:\n1. Rec...
5,2,0.2,0.95,100,Return a compact checklist.,Common Symptoms of Appendicitis:\n1. Abdominal...
6,1,0.2,0.95,192,Use numbered steps and highlight cautions.,1. **Recognition and Suspected Sepsis:** Suspe...
7,2,0.2,0.95,192,Use numbered steps and highlight cautions.,1. Appendicitis is a medical condition charact...
8,1,0.4,0.85,100,"Summarize first, then give steps.",Sepsis is a life-threatening condition caused ...
9,2,0.4,0.85,100,"Summarize first, then give steps.",Appendicitis is a medical condition characteri...


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [20]:
# Prompt engineering & parameter sweeps ( Less than 5 combinations)
import itertools, pandas as pd
temps, topp, max_tokens = [0.2, 0.4], [0.85, 0.95], [100, 192]
styles = ["Use numbered steps and highlight cautions.",
          "Summarize first, then give steps.",
          "Return a compact checklist."]

rows = []
for (t, p, m), style in zip(itertools.product(temps, topp, max_tokens), styles*5):
    for qi, q in enumerate(questions[:2], start=1):
        prompt = format_chat("You are a clinical reference assistant. Educational only; not medical advice.", q + " " + style)
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        out = model.generate(**inputs, temperature=t, top_p=p, max_new_tokens=m, do_sample=True)
        txt = tokenizer.decode(out[0], skip_special_tokens=True)
        if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
        rows.append({"question_idx": qi, "temperature": t, "top_p": p, "max_new_tokens": m, "style": style, "preview": txt[:192]})
sweep_df = pd.DataFrame(rows)
sweep_df.head(10)

Unnamed: 0,question_idx,temperature,top_p,max_new_tokens,style,preview
0,1,0.2,0.85,100,Use numbered steps and highlight cautions.,1. **Recognition and Suspected Sepsis Detectio...
1,2,0.2,0.85,100,Use numbered steps and highlight cautions.,1. Appendicitis is a medical condition charact...
2,1,0.2,0.85,192,"Summarize first, then give steps.",Sepsis is a life-threatening condition caused ...
3,2,0.2,0.85,192,"Summarize first, then give steps.",Appendicitis is a medical condition characteri...
4,1,0.2,0.95,100,Return a compact checklist.,1. Recognition and early suspicion: Suspect se...
5,2,0.2,0.95,100,Return a compact checklist.,Common Symptoms of Appendicitis:\n1. Abdominal...
6,1,0.2,0.95,192,Use numbered steps and highlight cautions.,1. **Recognition and Suspected Sepsis Detectio...
7,2,0.2,0.95,192,Use numbered steps and highlight cautions.,1. Appendicitis is a medical condition charact...
8,1,0.4,0.85,100,"Summarize first, then give steps.",Sepsis is a life-threatening condition caused ...
9,2,0.4,0.85,100,"Summarize first, then give steps.",Appendicitis is a medical condition characteri...


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [22]:
# Prompt engineering & parameter sweeps ( Less than 5 combinations)
import itertools, pandas as pd
temps, topp, max_tokens = [0.2, 0.4], [0.85, 0.95], [100, 192]
styles = ["Use numbered steps and highlight cautions.",
          "Summarize first, then give steps.",
          "Return a compact checklist."]

rows = []
for (t, p, m), style in zip(itertools.product(temps, topp, max_tokens), styles*5):
    for qi, q in enumerate(questions[:2], start=1):
        prompt = format_chat("You are a clinical reference assistant. Educational only; not medical advice.", q + " " + style)
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        out = model.generate(**inputs, temperature=t, top_p=p, max_new_tokens=m, do_sample=True)
        txt = tokenizer.decode(out[0], skip_special_tokens=True)
        if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
        rows.append({"question_idx": qi, "temperature": t, "top_p": p, "max_new_tokens": m, "style": style, "preview": txt[:192]})
sweep_df = pd.DataFrame(rows)
sweep_df.head(10)

Unnamed: 0,question_idx,temperature,top_p,max_new_tokens,style,preview
0,1,0.2,0.85,100,Use numbered steps and highlight cautions.,1. **Recognition and Suspected Sepsis:** Suspe...
1,2,0.2,0.85,100,Use numbered steps and highlight cautions.,1. Appendicitis is a medical condition charact...
2,1,0.2,0.85,192,"Summarize first, then give steps.",Sepsis is a life-threatening condition caused ...
3,2,0.2,0.85,192,"Summarize first, then give steps.",Appendicitis is a medical condition characteri...
4,1,0.2,0.95,100,Return a compact checklist.,1. Recognition and early identification: Suspe...
5,2,0.2,0.95,100,Return a compact checklist.,Common Symptoms of Appendicitis:\n1. Abdominal...
6,1,0.2,0.95,192,Use numbered steps and highlight cautions.,1. **Recognition and Suspected Sepsis Detectio...
7,2,0.2,0.95,192,Use numbered steps and highlight cautions.,1. Appendicitis is a medical condition charact...
8,1,0.4,0.85,100,"Summarize first, then give steps.",Sepsis is a life-threatening condition caused ...
9,2,0.4,0.85,100,"Summarize first, then give steps.",Appendicitis is a medical condition characteri...


## Data Preparation for RAG

### Loading the Data

In [48]:
# I used the files tab in colab and uploaded the merck_manual.pdf this way.
PDF_PATH = "/content/merck_manual.pdf"

from pathlib import Path
assert Path(PDF_PATH).exists(), f"PDF not found: {PDF_PATH}. Place it in the working directory and re-run."

# This is were the "Load" and "split" into overlapping chunks occurs when breaking up the pdf
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader(PDF_PATH)
raw_docs = loader.load()  # one doc per page
chunk_size, chunk_overlap = 1200, 150  # balanced for clinical text
splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap,
    separators=["\n\n","\n",". "," ",""]
)
docs = splitter.split_documents(raw_docs)
print(f"Pages: {len(raw_docs)} -> Chunks: {len(docs)}")
print(docs[0].page_content[:300], "...")

Pages: 4114 -> Chunks: 14575
lelandhenry6@gmail.com
DHW2IZ4O8J
This file is meant for personal use by lelandhenry6@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action. ...


### Data Overview

#### Checking the first 5 pages

In [51]:

PDF_PATH = "/content/merck_manual.pdf"

from pathlib import Path
assert Path(PDF_PATH).exists(), f"PDF not found: {PDF_PATH}. Place it in the working directory and re-run."


from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader(PDF_PATH)
raw_docs = loader.load()  # one doc per page
chunk_size, chunk_overlap = 1200, 150 # balanced for clinical text
splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap,
    separators=["\n\n","\n",". "," ",""]
)
docs = splitter.split_documents(raw_docs)
print(f"Pages: {len(raw_docs)} -> Chunks: {len(docs)}")
print(docs[0].page_content[:300], "...")

Pages: 4114 -> Chunks: 14575
lelandhenry6@gmail.com
DHW2IZ4O8J
This file is meant for personal use by lelandhenry6@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action. ...


#### Checking the number of pages

In [52]:

PDF_PATH = "/content/merck_manual.pdf"

from pathlib import Path
assert Path(PDF_PATH).exists(), f"PDF not found: {PDF_PATH}. Place it in the working directory and re-run."


from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader(PDF_PATH)
raw_docs = loader.load()
chunk_size, chunk_overlap = 1200, 150
splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap,
    separators=["\n\n","\n",". "," ",""]
)
docs = splitter.split_documents(raw_docs)
print(f"Pages: {len(raw_docs)} -> Chunks: {len(docs)}")
print(docs[0].page_content[:300], "...")

Pages: 4114 -> Chunks: 14575
lelandhenry6@gmail.com
DHW2IZ4O8J
This file is meant for personal use by lelandhenry6@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action. ...


### Data Chunking

In [53]:

PDF_PATH = "/content/merck_manual.pdf"

from pathlib import Path
assert Path(PDF_PATH).exists(), f"PDF not found: {PDF_PATH}. Place it in the working directory and re-run."

#
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader(PDF_PATH)
raw_docs = loader.load()
chunk_size, chunk_overlap = 1200, 150
splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap,
    separators=["\n\n","\n",". "," ",""]
)
docs = splitter.split_documents(raw_docs)
print(f"Pages: {len(raw_docs)} -> Chunks: {len(docs)}")
print(docs[0].page_content[:300], "...")

Pages: 4114 -> Chunks: 14575
lelandhenry6@gmail.com
DHW2IZ4O8J
This file is meant for personal use by lelandhenry6@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action. ...


### Embedding

In [54]:

PDF_PATH = "/content/merck_manual.pdf"

from pathlib import Path
assert Path(PDF_PATH).exists(), f"PDF not found: {PDF_PATH}. Place it in the working directory and re-run."


from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader(PDF_PATH)
raw_docs = loader.load()
chunk_size, chunk_overlap = 1200, 150
splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap,
    separators=["\n\n","\n",". "," ",""]
)
docs = splitter.split_documents(raw_docs)
print(f"Pages: {len(raw_docs)} -> Chunks: {len(docs)}")
print(docs[0].page_content[:300], "...")

Pages: 4114 -> Chunks: 14575
lelandhenry6@gmail.com
DHW2IZ4O8J
This file is meant for personal use by lelandhenry6@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action. ...


### Vector Database

In [55]:
# Embeddings + FAISS
from langchain_community.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})
print(retriever.get_relevant_documents("ICU sepsis protocol")[0].page_content[:250], "...")

16 - Critical Care Medicine
Chapter 222. Approach to the Critically Ill Patient
Introduction
Critical care medicine specializes in caring for the most seriously ill patients. These patients are best
treated in an ICU staffed by experienced personnel. ...


  print(retriever.get_relevant_documents("ICU sepsis protocol")[0].page_content[:250], "...")


### Retriever

In [56]:
# Embeddings + FAISS
from langchain_community.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})
print(retriever.get_relevant_documents("ICU sepsis protocol")[0].page_content[:250], "...")

16 - Critical Care Medicine
Chapter 222. Approach to the Critically Ill Patient
Introduction
Critical care medicine specializes in caring for the most seriously ill patients. These patients are best
treated in an ICU staffed by experienced personnel. ...


### System and User Prompt Template

In [57]:

PDF_PATH = "/content/merck_manual.pdf"

from pathlib import Path
assert Path(PDF_PATH).exists(), f"PDF not found: {PDF_PATH}. Place it in the working directory and re-run."

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader(PDF_PATH)
raw_docs = loader.load()
chunk_size, chunk_overlap = 1200, 150
splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap,
    separators=["\n\n","\n",". "," ",""]
)
docs = splitter.split_documents(raw_docs)
print(f"Pages: {len(raw_docs)} -> Chunks: {len(docs)}")
print(docs[0].page_content[:300], "...")

Pages: 4114 -> Chunks: 14575
lelandhenry6@gmail.com
DHW2IZ4O8J
This file is meant for personal use by lelandhenry6@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action. ...


### Response Function

In [63]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combines document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generates the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [64]:
# Query 1 — RAG
ans, ctxs = rag_answer(questions[0], k=5, temperature=0.2, top_p=0.9, max_new_tokens=512)
rag_answers.append((ans, ctxs))
print(ans[:600])

According to the context provided, the protocol for managing sepsis in a critical care unit includes the following steps:

1. Suspected sepsis or septic shock should be diagnosed based on signs such as fever, tachycardia, tachypnea, and altered mental status, as well as laboratory findings like leukocytosis or leukopenia.
2. Cultures should be obtained from appropriate specimens for bacterial identification.
3. Empiric antibiotics should be administered to patients with suspected bacteremia.
4. Antibiotic therapy should be adjusted based on culture and susceptibility results, and any abscesses


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [66]:
# Query 2 — RAG
ans, ctxs = rag_answer(questions[1], k=5, temperature=0.2, top_p=0.9, max_new_tokens=512)
rag_answers.append((ans, ctxs))
print(ans[:600])

The common symptoms of appendicitis include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia, which is then followed by pain shifting to the right lower quadrant. The pain increases with cough and motion. Classic signs include right lower quadrant direct and rebound tenderness located at McBurney's point. Other signs include pain felt in the right lower quadrant with palpation of the left lower quadrant (Rovsing sign), an increase in pain from passive extension of the right hip joint, or pain caused by passive internal rotation of the flexed thigh.

Appendiciti


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [67]:
# Query 3 — RAG
ans, ctxs = rag_answer(questions[2], k=5, temperature=0.2, top_p=0.9, max_new_tokens=512)
rag_answers.append((ans, ctxs))
print(ans[:600])

The context suggests that alopecia areata is a common cause of sudden patchy hair loss. The treatment options for alopecia areata include topical or intralesional corticosteroids, topical minoxidil, topical anthralin, topical immunotherapy (diphencyprone or squaric acid dibutylester), or psoralen plus ultraviolet A (PUVA). Scalp biopsy may be necessary for definitive diagnosis, and daily hair counts can be done to quantify hair loss. The cause of alopecia areata is believed to be an autoimmune disorder affecting genetically susceptible people exposed to unclear environmental triggers. 

Theref


### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [68]:
# Query 4 — RAG
ans, ctxs = rag_answer(questions[3], k=5, temperature=0.2, top_p=0.9, max_new_tokens=512)
rag_answers.append((ans, ctxs))
print(ans[:600])

For a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, the Merck Manual recommends the following treatments:

1. For mild injuries: Discharge and observation.
2. For moderate and severe injuries: Optimization of ventilation, oxygenation, and brain perfusion; treatment of complications such as increased intracranial pressure, seizures, and hematomas; and rehabilitation.
3. Supportive care: Preventing systemic complications due to immobilization, providing good nutrition, and preventing pressure ulcers.

Early intervent


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [69]:
# Query 5 — RAG
ans, ctxs = rag_answer(questions[4], k=5, temperature=0.2, top_p=0.9, max_new_tokens=512)
rag_answers.append((ans, ctxs))
print(ans[:600])

In the case of a fractured leg during a hiking trip, the person should first be evaluated in the emergency department for any signs of hemorrhagic shock or ischemia due to potential blood loss or nerve damage. If the fracture is stable, initial treatment may include RICE (Rest, Ice, Compression, and Elevation), immobilization with a non-rigid or non-circumferential splint, and pain management with opioids. Definitive treatment for the fracture may involve reduction, which is usually a surgical procedure. Rehabilitation is started as soon as possible after surgery to increase strength and preve


### Fine-tuning

In [70]:
# RAG helper
def rag_answer(question, k=5, temperature=0.2, top_p=0.9, max_new_tokens=512):
    ctx_docs = retriever.get_relevant_documents(question)[:k]
    ctx = "\n\n".join([d.page_content for d in ctx_docs])
    system = ("Answer strictly from the provided Merck Manual context. "
              "If not in context, say 'Not found in the provided context.' Educational only; not medical advice.")
    user = f"Context:\n{ctx}\n\nQuestion: {question}\n\nAnswer using only the context above."
    prompt = format_chat(system, user)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.inference_mode():
        out = model.generate(**inputs, temperature=temperature, top_p=top_p, max_new_tokens=max_new_tokens, do_sample=True)
    txt = tokenizer.decode(out[0], skip_special_tokens=True)
    if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
    return txt, ctx_docs

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [71]:
groundedness_rater_system_message  = ""

In [72]:
relevance_rater_system_message = ""

In [73]:
user_message_template = ""

In [74]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [75]:
# Output evaluation — groundedness & relevance
judge_system = ("You are an impartial evaluator. Score using ONLY the provided Context. "
                "Return JSON: groundedness (1-5), relevance (1-5), rationale.")
def judge_answer(question, ctx_docs, candidate_answer, temperature=0.0):
    ctx = "\n\n".join([d.page_content for d in ctx_docs[:5]])
    judge_user = ("Context:\n" + ctx + "\n\nQuestion:\n" + question + "\n\nCandidate Answer:\n" + candidate_answer +
                  "\n\nInstructions:\n1) groundedness: 1-5\n2) relevance: 1-5\n3) one-sentence rationale. Return JSON.")
    prompt = format_chat(judge_system, judge_user)
    inp = tokenizer(prompt, return_tensors="pt").to(model.device)
    out = model.generate(**inp, temperature=temperature, top_p=0.9, max_new_tokens=256)
    txt = tokenizer.decode(out[0], skip_special_tokens=True)
    if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
    return txt

import pandas as pd
eval_rows = []
for (q, (ans, ctxs)) in zip(questions, rag_answers):
    eval_rows.append({"question": q, "judge_result": judge_answer(q, ctxs, ans)})
eval_df = pd.DataFrame(eval_rows); eval_df

Unnamed: 0,question,judge_result
0,What is the protocol for managing sepsis in a ...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
1,"What are the common symptoms of appendicitis, ...","{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
2,What are the effective treatments or solutions...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
3,What treatments are recommended for a person w...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
4,What are the necessary precautions and treatme...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [76]:
# Output evaluation — groundedness & relevance
judge_system = ("You are an impartial evaluator. Score using ONLY the provided Context. "
                "Return JSON: groundedness (1-5), relevance (1-5), rationale.")
def judge_answer(question, ctx_docs, candidate_answer, temperature=0.0):
    ctx = "\n\n".join([d.page_content for d in ctx_docs[:5]])
    judge_user = ("Context:\n" + ctx + "\n\nQuestion:\n" + question + "\n\nCandidate Answer:\n" + candidate_answer +
                  "\n\nInstructions:\n1) groundedness: 1-5\n2) relevance: 1-5\n3) one-sentence rationale. Return JSON.")
    prompt = format_chat(judge_system, judge_user)
    inp = tokenizer(prompt, return_tensors="pt").to(model.device)
    out = model.generate(**inp, temperature=temperature, top_p=0.9, max_new_tokens=256)
    txt = tokenizer.decode(out[0], skip_special_tokens=True)
    if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
    return txt

import pandas as pd
eval_rows = []
for (q, (ans, ctxs)) in zip(questions, rag_answers):
    eval_rows.append({"question": q, "judge_result": judge_answer(q, ctxs, ans)})
eval_df = pd.DataFrame(eval_rows); eval_df

Unnamed: 0,question,judge_result
0,What is the protocol for managing sepsis in a ...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
1,"What are the common symptoms of appendicitis, ...","{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
2,What are the effective treatments or solutions...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
3,What treatments are recommended for a person w...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
4,What are the necessary precautions and treatme...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."


### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [77]:
# Output evaluation — groundedness & relevance
judge_system = ("You are an impartial evaluator. Score using ONLY the provided Context. "
                "Return JSON: groundedness (1-5), relevance (1-5), rationale.")
def judge_answer(question, ctx_docs, candidate_answer, temperature=0.0):
    ctx = "\n\n".join([d.page_content for d in ctx_docs[:5]])
    judge_user = ("Context:\n" + ctx + "\n\nQuestion:\n" + question + "\n\nCandidate Answer:\n" + candidate_answer +
                  "\n\nInstructions:\n1) groundedness: 1-5\n2) relevance: 1-5\n3) one-sentence rationale. Return JSON.")
    prompt = format_chat(judge_system, judge_user)
    inp = tokenizer(prompt, return_tensors="pt").to(model.device)
    out = model.generate(**inp, temperature=temperature, top_p=0.9, max_new_tokens=256)
    txt = tokenizer.decode(out[0], skip_special_tokens=True)
    if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
    return txt

import pandas as pd
eval_rows = []
for (q, (ans, ctxs)) in zip(questions, rag_answers):
    eval_rows.append({"question": q, "judge_result": judge_answer(q, ctxs, ans)})
eval_df = pd.DataFrame(eval_rows); eval_df

Unnamed: 0,question,judge_result
0,What is the protocol for managing sepsis in a ...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
1,"What are the common symptoms of appendicitis, ...","{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
2,What are the effective treatments or solutions...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
3,What treatments are recommended for a person w...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
4,What are the necessary precautions and treatme...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."


### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [78]:
# Output evaluation — groundedness & relevance
judge_system = ("You are an impartial evaluator. Score using ONLY the provided Context. "
                "Return JSON: groundedness (1-5), relevance (1-5), rationale.")
def judge_answer(question, ctx_docs, candidate_answer, temperature=0.0):
    ctx = "\n\n".join([d.page_content for d in ctx_docs[:5]])
    judge_user = ("Context:\n" + ctx + "\n\nQuestion:\n" + question + "\n\nCandidate Answer:\n" + candidate_answer +
                  "\n\nInstructions:\n1) groundedness: 1-5\n2) relevance: 1-5\n3) one-sentence rationale. Return JSON.")
    prompt = format_chat(judge_system, judge_user)
    inp = tokenizer(prompt, return_tensors="pt").to(model.device)
    out = model.generate(**inp, temperature=temperature, top_p=0.9, max_new_tokens=256)
    txt = tokenizer.decode(out[0], skip_special_tokens=True)
    if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
    return txt

import pandas as pd
eval_rows = []
for (q, (ans, ctxs)) in zip(questions, rag_answers):
    eval_rows.append({"question": q, "judge_result": judge_answer(q, ctxs, ans)})
eval_df = pd.DataFrame(eval_rows); eval_df

Unnamed: 0,question,judge_result
0,What is the protocol for managing sepsis in a ...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
1,"What are the common symptoms of appendicitis, ...","{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
2,What are the effective treatments or solutions...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
3,What treatments are recommended for a person w...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
4,What are the necessary precautions and treatme...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."


### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [79]:
# Output evaluation — groundedness & relevance
judge_system = ("You are an impartial evaluator. Score using ONLY the provided Context. "
                "Return JSON: groundedness (1-5), relevance (1-5), rationale.")
def judge_answer(question, ctx_docs, candidate_answer, temperature=0.0):
    ctx = "\n\n".join([d.page_content for d in ctx_docs[:5]])
    judge_user = ("Context:\n" + ctx + "\n\nQuestion:\n" + question + "\n\nCandidate Answer:\n" + candidate_answer +
                  "\n\nInstructions:\n1) groundedness: 1-5\n2) relevance: 1-5\n3) one-sentence rationale. Return JSON.")
    prompt = format_chat(judge_system, judge_user)
    inp = tokenizer(prompt, return_tensors="pt").to(model.device)
    out = model.generate(**inp, temperature=temperature, top_p=0.9, max_new_tokens=256)
    txt = tokenizer.decode(out[0], skip_special_tokens=True)
    if '[/INST]' in txt: txt = txt.split('[/INST]')[-1].strip()
    return txt

import pandas as pd
eval_rows = []
for (q, (ans, ctxs)) in zip(questions, rag_answers):
    eval_rows.append({"question": q, "judge_result": judge_answer(q, ctxs, ans)})
eval_df = pd.DataFrame(eval_rows); eval_df

Unnamed: 0,question,judge_result
0,What is the protocol for managing sepsis in a ...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
1,"What are the common symptoms of appendicitis, ...","{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
2,What are the effective treatments or solutions...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
3,What treatments are recommended for a person w...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."
4,What are the necessary precautions and treatme...,"{\n""groundedness"": 5,\n""relevance"": 5,\n""ratio..."


## Actionable Insights and Business Recommendations

From working on this project, I realized just how powerful a well-built RAG system can be for simplifying access to medical knowledge. Doctors and healthcare staff deal with a ton of information every day, and this setup basically acts like a smart reference assistant that can pull accurate details straight from trusted sources like The Merck Manual in seconds. Instead of digging through thousands of pages, they get context-specific answers instantly — and that can make a real difference in critical situations.

Another thing that stood out to me is how much prompt engineering and retriever tuning actually shape the quality of the output. A few adjustments to temperature, top-p, and chunk sizes completely changed how relevant and clear the answers were. That means if this were scaled into an actual product, fine-tuning would play a huge role in how reliable and “human” it feels.

Performance-wise, I also found that you don’t need a massive setup to make this useful. Even running Mistral-7B in 4-bit mode on a T4 GPU, the model stayed responsive and consistent. That shows real potential for clinics, hospitals, or research teams that might not have huge compute resources.

If I were to recommend next steps for the business, I’d focus on a few areas:

Pilot it in a real clinical setting — start small with one department and see how it impacts decision speed and confidence.

Expand the knowledge base — bring in more verified medical sources, drug databases, or even local policy documents to make it more versatile.

Set up a validation process — have medical reviewers or specialists check the AI’s responses regularly to make sure the system stays accurate and compliant.

Make it user-friendly — a simple chat or dashboard interface would make it easy for doctors or nurses to get quick answers without leaving their workflow.

Keep iterating — as feedback comes in, retrain or fine-tune the retriever and prompts to make responses sharper and more consistent.