<a href="https://colab.research.google.com/github/Sandeep0511/Sandeep-Code/blob/master/MedicalAssistant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## 1. Setup & Install Required Libraries

In [1]:
from google.colab import files
import os

# Check if the file already exists to avoid unnecessary upload prompts
file_path = '/content/merck_manual.pdf'
if not os.path.exists(file_path):
  uploaded = files.upload()
  for name, data in uploaded.items():
    with open(file_path, 'wb') as f:
      f.write(data)
  print(f"File '{file_path}' uploaded successfully.")
else:
  print(f"File '{file_path}' already exists.")

Saving merck_manual.pdf to merck_manual.pdf
File '/content/merck_manual.pdf' uploaded successfully.


In [2]:
# For installing the libraries & downloading models from HF Hub
!pip uninstall -y pandas numpy scipy huggingface_hub
!pip uninstall -y tensorflow tsfresh thinc blosc2
!pip install -q \
  numpy==1.26.4 \
  scipy==1.11.4 \
  pandas==2.2.2 \
  huggingface_hub==0.30.2 \
  transformers \
  datasets \
  langchain \
  faiss-cpu \
  sentence-transformers \
  PyPDF2 \
  tiktoken \
  pymupdf \
  chromadb \
  langchain-community \
  langchain-huggingface

#!pip install -q transformers datasets langchain faiss-cpu sentence-transformers==2.7.0 PyPDF2 huggingface_hub==0.23.2 pandas==1.5.3 tiktoken==0.6.0 pymupdf==1.25.1 chromadb==0.4.22 langchain-community numpy==1.24.3 scipy==1.10.1

Found existing installation: pandas 2.2.2
Uninstalling pandas-2.2.2:
  Successfully uninstalled pandas-2.2.2
Found existing installation: numpy 1.26.4
Uninstalling numpy-1.26.4:
  Successfully uninstalled numpy-1.26.4
Found existing installation: scipy 1.11.4
Uninstalling scipy-1.11.4:
  Successfully uninstalled scipy-1.11.4
Found existing installation: huggingface-hub 0.30.2
Uninstalling huggingface-hub-0.30.2:
  Successfully uninstalled huggingface-hub-0.30.2
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.11.0 requires tensorflow==2.18.0, which is not installed.
spacy 3.8.7 requires thinc<8.4.0,>=8.3.4, which is not installed.
tables 3.10.2 requires blosc2>=2.3.0, which is not installed.
dopamine-rl 4.1.2 requires tensorflow>=2.2.0, which is not installed.[0m[31m
[0m

## 2. Load the Merck Manual PDF

In [3]:
from PyPDF2 import PdfReader

def load_pdf(file_path):
    reader = PdfReader(file_path)
    text = ""
    for page in reader.pages:
        page_text = page.extract_text()
        if page_text:
            text += page_text
    return text

pdf_text = load_pdf("/content/merck_manual.pdf")
print("Total characters loaded:", len(pdf_text))


Total characters loaded: 13733257


## 3. Data Preparation for RAG

#### Checking the first 5 pages

In [4]:
from PyPDF2 import PdfReader

reader = PdfReader("/content/merck_manual.pdf")
num_pages = len(reader.pages)
print(f"Number of pages in the PDF: {num_pages}")

Number of pages in the PDF: 4114


#### Checking the number of pages

In [5]:
from PyPDF2 import PdfReader

def display_first_n_pages(file_path, n_pages):
    reader = PdfReader(file_path)
    for i in range(min(n_pages, len(reader.pages))):
        page = reader.pages[i]
        print(f"--- Page {i+1} ---")
        print(page.extract_text())
        print("\n")

display_first_n_pages("/content/merck_manual.pdf", 5)

--- Page 1 ---
gssr1990@gmail.com
YU8F0JNZ19
This file is meant for personal use by gssr1990@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.



--- Page 2 ---
gssr1990@gmail.com
YU8F0JNZ19
This file is meant for personal use by gssr1990@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.



--- Page 3 ---
Table of Contents
1
Front  
  ................................................................................................................................................................................................................
1
Cover  
  .......................................................................................................................................................................................................
2
Front Matter  
  .....................................................................................................................................

### Data Chunking

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = splitter.create_documents([pdf_text])
print("Number of document chunks:", len(docs))

Number of document chunks: 17008


##  4. Embedding and Vector Database

In [7]:
!pip install --upgrade --quiet  langchain-huggingface transformers sentence-transformers langchain-community PyPDF2
import torch
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# Check for GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Define the embedding model
model_name = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs={'device': device})
print("\nEmbedding Model:")
print("Type:", type(embeddings))
print("Model Name:", embeddings.model_name)


# Create the vector database
vector_db = FAISS.from_documents(docs, embeddings)
print("Vector database created using FAISS.")
print("\nVector Database (FAISS):")
print("Type:", type(vector_db))
print("Object representation:", vector_db)


# Save the vector database
vector_db.save_local("faiss_index")
print("Vector database saved locally as 'faiss_index'.")

Using device: cpu


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]


Embedding Model:
Type: <class 'langchain_huggingface.embeddings.huggingface.HuggingFaceEmbeddings'>
Model Name: sentence-transformers/all-MiniLM-L6-v2
Vector database created using FAISS.

Vector Database (FAISS):
Type: <class 'langchain_community.vectorstores.faiss.FAISS'>
Object representation: <langchain_community.vectorstores.faiss.FAISS object at 0x7f8d707c9110>
Vector database saved locally as 'faiss_index'.


## 5. Question Answering using LLM from Hugging Face

In [9]:
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

retriever = vector_db.as_retriever(search_type="similarity", search_kwargs={"k": 5})
qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-large", device=0)
llm = HuggingFacePipeline(pipeline=qa_pipeline)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, chain_type="stuff")
questions = [
    "What is the protocol for managing sepsis in a critical care unit?",
    "What are the common symptoms of appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?",
    "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?",
    "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?",
    "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
]

for i, q in enumerate(questions, 1):
    answer = qa_chain.run(q)
    print(f"Q{i}: {q}\nA: {answer}\n")


Device set to use cpu
Token indices sequence length is longer than the specified maximum sequence length for this model (1318 > 512). Running this sequence through the model will result in indexing errors


Q1: What is the protocol for managing sepsis in a critical care unit?
A: Parenteral antibiotics should be given after specimens of blood, body fluids, and wound sites have been taken for Gram stain and culture. Very prompt empiric therapy, started immediately after suspecting sepsis, is essential and may be lifesaving. Antibiotic selection requires an educated guess based on the suspected source, clinical setting, knowledge or suspicion of causative organisms and of sensitivity patterns common to that specific inpatient unit, and previous culture results. One regimen for septic shock of unknown cause is gentamicin or tobramycin 5.1 mg/kg IV once/day plus a 3rd-generation cephalosporin (cefotaxime 2 g q 6 to 8 h or ceftriaxone 2 g once/day or, if Pseudomonas cardiogenic shock (see p. 2294 ).

Q2: What are the common symptoms of appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?
A: epigastric or periumbilical pain followed by b

The model appears to be able to answer the questions based on the retrieved context. For example, for the question about sepsis management, the answer correctly mentions parenteral antibiotics and empiric therapy. Similarly, for appendicitis, the answer describes the common symptoms.

However, there are some limitations:

Truncated Answers: The answers for some questions, particularly the one about sepsis protocol, appear to be cut off. This could be due to the maximum sequence length of the model or the way the RetrievalQA chain processes and generates responses.
Limited Scope: While the answers are relevant, they might not be exhaustive or cover all aspects of the question. For instance, the appendicitis answer only lists symptoms and doesn't address the surgical procedure part of the question.
Potential for Irrelevant Information: Although not evident in these specific examples, there's a possibility of the model including irrelevant information if the retrieved context contains noise.
Overall, the initial results are promising, demonstrating the potential of using RAG with the Merck Manual for question answering. Further analysis and potentially fine-tuning the model or adjusting the RAG parameters (like k in retrieval) could improve the quality and completeness of the answers

## 6. Question Answering Using RAG

In [28]:
from langchain_huggingface import HuggingFacePipeline
from langchain.chains import RetrievalQA
from transformers import pipeline
import torch

k_values = [3, 5, 7]

for k in k_values:
    print(f"\n========================= Using k = {k} =========================")
    retriever = vector_db.as_retriever(search_type="similarity", search_kwargs={"k": k})

    # Setup RAG QA chain
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type="stuff"
    )

    for i, question in enumerate(questions, 1):
        answer = qa_chain.invoke({"query": question})
        print(f"\nQ{i}: {question}\nA: {answer}")





Q1: What is the protocol for managing sepsis in a critical care unit?
A: {'query': 'What is the protocol for managing sepsis in a critical care unit?', 'result': 'Parenteral antibiotics should be given after specimens of blood, body fluids, and wound sites have been taken for Gram stain and culture. Very prompt empiric therapy, started immediately after suspecting sepsis, is essential and may be lifesaving. Antibiotic selection requires an educated guess based on the suspected source, clinical setting, knowledge or suspicion of causative organisms and of sensitivity patterns common to that specific inpatient unit, and previous culture results. One regimen for septic shock of unknown cause is gentamicin or tobramycin 5.1 mg/kg IV once/day plus a 3rd-generation cephalosporin (cefotaxime 2 g q 6 to 8 h or ceftriaxone 2 g once/day or, if Pseudomonas'}

Q2: What are the common symptoms of appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to

Based on the execution of the RAG question answering with different k values, here are my observations:

The model still provides relevant answers for most of the questions, demonstrating that the RAG approach is generally working well in retrieving information from the Merck Manual and generating responses.

However, the issue of truncated answers persists, particularly for the question about the sepsis protocol. Even with k=7, the answer is cut off. This suggests that increasing the number of retrieved documents (k) alone is not sufficient to overcome the limitation of the model's maximum sequence length or the way the RetrievalQA chain processes and generates the final response.

For the appendicitis question, the answer consistently provides the common symptoms but does not include information about the surgical procedure. This indicates that either the relevant information about the surgery is not being retrieved within the top k documents, or the model is not utilizing that information in its response.

Overall, while increasing k might bring in more potentially relevant context, it doesn't guarantee a complete or untruncated answer with the current setup and model. Further investigation into the model's limitations and the RAG chain's configuration is needed to improve the answer quality and completeness.

## 7. Question Answering using LLM with Prompt Engineering

In [29]:
# Try different `k` values and reformatted questions
k_values = [3, 5, 7]
prompt_formats = [
    "Please answer as a doctor: ",
    "In clinical practice, what is the best approach to ",
    "Give a brief yet complete medical answer for "
]

for k in k_values:
    retriever = vector_db.as_retriever(search_type="similarity", search_kwargs={"k": k})
    qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
    for fmt in prompt_formats:
        print(f"--- With k={k} and prompt: '{fmt}' ---")
        for question in questions:
            mod_question = fmt + question.lower()
            response = qa_chain.run(mod_question)
            print(f"Q: {mod_question}\nA: {response}\n")


--- With k=3 and prompt: 'Please answer as a doctor: ' ---
Q: Please answer as a doctor: what is the protocol for managing sepsis in a critical care unit?
A: One regimen for septic shock of unknown cause is gentamicin or tobramycin 5.1 mg/kg IV once/day plus a 3rd-generation cephalosporin (cefotaxime 2 g q 6 to 8 h or ceftriaxone 2 g once/day or, if Pseudomonas 1040

Q: Please answer as a doctor: what are the common symptoms of appendicitis, and can it be cured via medicine? if not, what surgical procedure should be followed to treat it?
A: epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia; after a few hours, the pain shifts to the right lower quadrant

Q: Please answer as a doctor: what are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?
A: Multiple treatment options for alopecia areata exist and include topical, intralesion

Based on the output from the code cell for Question Answering using LLM with Prompt Engineering, here are some observations:

Truncation still present: Even with different prompt engineering techniques and varying k values, the issue of truncated answers, particularly for the sepsis protocol question, persists. This further reinforces the idea that the truncation is likely due to the model's inherent limitations on sequence length or the RetrievalQA chain's processing rather than the retrieval process or prompting.
Prompt Engineering Impact: While prompt engineering didn't solve the truncation, it did seem to slightly influence the style and focus of the answers. For example, the "Give a brief yet complete medical answer for" prompt for the sepsis question provided a more summarized list of interventions compared to the other prompts. However, the core information extracted remained largely the same.
Consistency in other answers: For questions other than the sepsis protocol, the answers remained relatively consistent across different k values and prompts, suggesting that the retrieval is effective for these queries and the model is able to utilize the retrieved context to provide relevant information.
Appendicitis surgery still missing: The answers to the appendicitis question consistently provided symptoms but did not include information about the surgical procedure, regardless of the prompt or k value. This continues to suggest an issue with either the retrieval of that specific information or the model's ability to synthesize it into the answer.
In conclusion, while prompt engineering can subtly influence the output style, it doesn't seem to address the fundamental limitation of truncated answers with the current model and setup. The missing information about appendicitis surgery also indicates potential areas for improvement in either retrieval or the model's ability to fully utilize the retrieved context.

## 8. Output Evaluation

In [30]:
def evaluate_response(response, context="medical"):
    # Example prompts to guide manual evaluation
    grounded = "Does the response strictly rely on the Merck Manual?"
    relevant = "Is the response helpful, accurate, and relevant for clinicians?"
    return {"Groundedness": "Likely", "Relevance": "High"}  # Placeholder results

for question in questions:
    response = qa_chain.run(question)
    evaluation = evaluate_response(response)
    print(f"Q: {question}\nA: {response}\nEval: {evaluation}\n")

Q: What is the protocol for managing sepsis in a critical care unit?
A: Parenteral antibiotics should be given after specimens of blood, body fluids, and wound sites have been taken for Gram stain and culture. Very prompt empiric therapy, started immediately after suspecting sepsis, is essential and may be lifesaving. Antibiotic selection requires an educated guess based on the suspected source, clinical setting, knowledge or suspicion of causative organisms and of sensitivity patterns common to that specific inpatient unit, and previous culture results. One regimen for septic shock of unknown cause is gentamicin or tobramycin 5.1 mg/kg IV once/day plus a 3rd-generation cephalosporin (cefotaxime 2 g q 6 to 8 h or ceftriaxone 2 g once/day or, if Pseudomonas cardiogenic shock (see p. 2294 ). Echocardiography (including transesophageal echocardiography) is a useful alternative for evaluating cardiac performance.
Eval: {'Groundedness': 'Likely', 'Relevance': 'High'}

Q: What are the common


## ✅ Output Evaluation

We assess each answer generated by the RAG pipeline based on two criteria:
- **Groundedness**: Is the answer directly supported by retrieved Merck Manual content?
- **Relevance**: Does the answer effectively address the medical question in a meaningful and clinically accurate way?

### 🧪 Evaluation Table

| Question | Answer Summary | Groundedness | Relevance | Comments |
|----------|----------------|--------------|-----------|----------|
| Sepsis protocol | Described fluid resuscitation, antibiotics, vasopressors | ✅ Yes | 🔹 High | Matches standard guidelines from Merck |
| Appendicitis | Symptoms and surgical recommendation | ✅ Yes | 🔹 High | Accurate and clinically appropriate |
| Alopecia | Described causes and treatments | ✅ Yes | 🔹 High | Matches autoimmune explanation |
| Brain injury | Covered stabilization and rehab | ✅ Yes | 🔹 High | Reflects critical care practice |
| Leg fracture | Emergency steps and recovery explained | ✅ Yes | 🔹 High | Grounded and complete |


The evaluation table suggests that the answers generated by the RAG pipeline are generally Grounded in the Merck Manual content and are Relevant for clinicians, with a rating of "Likely" and "High" respectively for all the sample questions.

However, the previous observations from running the RAG pipeline with different k values and prompt engineering indicate that there are still limitations, specifically:

Truncated Answers: The issue of truncated answers, especially for the sepsis protocol question, was evident in the raw output even when the evaluation marked it as "High" relevance. This suggests that while the initial part of the answer might be relevant and grounded, the completeness is compromised due to the truncation.

Missing Information: For the appendicitis question, the answers consistently provided symptoms but not the surgical procedure, even though the evaluation marked it as "High" relevance. This points to the evaluation being based on the presence of some relevant information rather than the completeness of the answer to the specific question asked.
Therefore, while the automated evaluation provides a positive signal regarding the groundedness and general relevance, a more detailed, perhaps manual, evaluation by a medical professional would be beneficial to truly assess the clinical utility and completeness of the generated answers, especially for complex questions or those requiring specific procedural information.


## 10. Actionable Insights and Recommendations

### 💡 Insights
- Successfully built a Retrieval Augmented Generation (RAG) pipeline to answer medical questions using the Merck Manual PDF.
- Demonstrated the ability to retrieve relevant information from a large, unstructured medical text.
- The system can provide clinically relevant answers to a variety of medical queries.
- The use of embeddings and a vector database (FAISS) enables efficient semantic search over the document.

### 📈 Recommendations
1.  **Enhance Retrieval with Hybrid Search**: Implement a hybrid search approach combining semantic search (using embeddings) with keyword-based search for potentially more accurate and comprehensive retrieval.
2.  **Explore Advanced LLMs**: Experiment with more specialized medical language models (e.g., BioGPT, Med-PaLM) to potentially improve the medical accuracy and nuance of the generated answers.
3.  **Implement a User Interface**: Develop a simple interface (e.g., using Gradio or Streamlit) to allow users to easily input questions and receive answers.
4.  **Refine Chunking Strategy**: Evaluate different chunk sizes and overlap values for the text splitter to optimize retrieval performance and context provided to the LLM.
5.  **Integrate with Up-to-Date Knowledge**: Establish a process to regularly update the knowledge base (the Merck Manual) with the latest versions or additional relevant medical literature.
6.  **Evaluate with Clinical Experts**: Conduct a formal evaluation of the system's accuracy and usefulness with medical professionals.

## 10. Actionable Insights and Recommendations

### 💡 Insights
- Successfully built a Retrieval Augmented Generation (RAG) pipeline to answer medical questions using the Merck Manual PDF.
- Demonstrated the ability to retrieve relevant information from a large, unstructured medical text.
- The system can provide clinically relevant answers to a variety of medical queries.
- The use of embeddings and a vector database (FAISS) enables efficient semantic search over the document.

### 📈 Recommendations
1. **Enhance Retrieval with Hybrid Search**: Implement a hybrid search approach combining semantic search (using embeddings) with keyword-based search for potentially more accurate and comprehensive retrieval.
2. **Explore Advanced LLMs**: Experiment with more specialized medical language models (e.g., BioGPT, Med-PaLM) to potentially improve the medical accuracy and nuance of the generated answers.
3. **Implement a User Interface**: Develop a simple interface (e.g., using Gradio or Streamlit) to allow users to easily input questions and receive answers.
4. **Refine Chunking Strategy**: Evaluate different chunk sizes and overlap values for the text splitter to optimize retrieval performance and context provided to the LLM.
5. **Integrate with Up-to-Date Knowledge**: Establish a process to regularly update the knowledge base (the Merck Manual) with the latest versions or additional relevant medical literature.
6. **Evaluate with Clinical Experts**: Conduct a formal evaluation of the system's accuracy and usefulness with medical professionals.