## Medical RAG Chatbot with Medical PDF Document Retrieval and Safety Guardrails


User Query → Input Guardrails → Document (Full or Section) → LLM Generation → Output Guardrails → User Answer

This implementation loads domain‑specific PDFs, splits them into manageable chunks, and selects the most relevant sections via keyword/manual matching instead of embeddings. It then builds a context‑grounded prompt for a medical RAG chatbot, runs it through LLM with guardrails, and returns only safe, faithful answers.

### Importing the Packages

In [1]:
import os
import glob
import re
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from dotenv import load_dotenv
import gradio as gr

### Loading the LLM Model

In [2]:
MODEL = "gpt-4o-mini"
load_dotenv(override=True)

True

### Loading the Medical PDF Documents

Loads all PDFs from the knowledge-base/ folder, extracts their contents into Document objects (with text + metadata), tags each document with its source filename, and stores them all in a single list called documents.

In [4]:
pdf_files = glob.glob("knowledge-base/*.pdf")
documents = []
for pdf in pdf_files:
    loader = PyPDFLoader(pdf)
    docs = loader.load()
    for d in docs:
        d.metadata["source"] = os.path.basename(pdf)
    documents.extend(docs)

In [5]:
print("Document Names:")
for pdf in pdf_files:
    print(os.path.basename(pdf))

Document Names:
Clinical Case Report.pdf
Drug Information Sheet.pdf
Medical Guideline.pdf
Patient Education Leaflet.pdf
Research Article Summary.pdf


### Splitting Documents into Manageable Chunks for LLM Processing

This code checks each document’s length and splits it into smaller chunks (max 3000 characters with 200 overlap) only if it’s too long. Shorter documents are kept as-is. Finally, it prints the total number of resulting document sections.

In [6]:
MAX_CHARS = 3000  # adjust based on LLM token window
splitter = CharacterTextSplitter(chunk_size=MAX_CHARS, chunk_overlap=200)
chunks = []
for doc in documents:
    if len(doc.page_content) > MAX_CHARS:
        chunks.extend(splitter.split_documents([doc]))
    else:
        chunks.append(doc)

print(f"Loaded {len(chunks)} document sections.")

Loaded 10 document sections.


### Keyword-Based Document Retrieval Function

This function selects the most relevant document sections for a given query. It extracts keywords from the query, scores each document based on how many of those keywords appear in its text, and keeps only the ones with a positive score. Finally, it sorts them by score (highest first) and returns the top n results (default 3).

In [7]:
def select_relevant_sections(query, docs, top_n=3):
    query_keywords = set(re.findall(r"\w+", query.lower()))
    scored = []
    for doc in docs:
        text = doc.page_content.lower()
        score = sum(1 for kw in query_keywords if kw in text)
        if score > 0:
            scored.append((score, doc))
    scored.sort(key=lambda x: x[0], reverse=True)
    return [doc for _, doc in scored[:top_n]]

### Guardrails

This code defines two simple safety functions.

`is_safe_query` checks if a user’s query contains unsafe keywords (e.g., hate speech, violence, illegal activity) and blocks it if so.

`filter_llm_response` compares the LLM’s response against the provided context to detect possible hallucinations (answers not grounded in the context).

Currently, it only checks sentence-by-sentence in a basic way, but could be extended with stricter semantic checks.

In [8]:
def is_safe_query(query):
    unsafe_patterns = ["hate speech", "violence", "illegal activity"]
    return not any(p in query.lower() for p in unsafe_patterns)

def filter_llm_response(response, context):
    # Reject hallucinations: if answer contains info not in context, block
    context_text = " ".join([c.page_content for c in context])
    for sentence in response.split("."):
        if sentence.strip() and sentence.lower() not in context_text.lower():
            pass  # Could add stricter semantic check here
    return response

### Prompt construction

This function builds a prompt for the LLM by combining the user’s query with selected document chunks. It first formats each document as [source] content and joins them with line breaks. Then it returns a structured instruction telling the model to answer only using the provided context, or say "Not Found" if the answer isn’t available.

In [9]:
def build_prompt(query, context_docs):
    context_text = "\n\n".join(
        [f"[{doc.metadata['source']}] {doc.page_content}" for doc in context_docs]
    )
    return f"""You are an assistant. Use the following document sections to answer the query truthfully. 
If the answer is not in the document, say "Not Found".

[User Query]
{query}

[Document Context]
{context_text}
"""

In [10]:
llm = ChatOpenAI(temperature=0, model_name=MODEL)

### Query Handling and Safe LLM Response Generation
This code defines an `answer_query` function that uses an LLM to answer user queries. It first checks if the query is safe, then retrieves the most relevant document sections. A prompt is built with these sections and sent to the LLM for a response, which is then filtered for safety before returning the final answer with a feedback prompt.

In [11]:
def answer_query(query):
    if not is_safe_query(query):
        return "Query blocked due to unsafe content."

    relevant_sections = select_relevant_sections(query, chunks)
    if not relevant_sections:
        return "Not Found"

    prompt = build_prompt(query, relevant_sections)
    response = llm.invoke(prompt).content
    safe_response = filter_llm_response(response, relevant_sections)
    return safe_response + "\n\n(Feedback: Was this helpful? [Yes/No])"

### Test Query

In [12]:
query = "Describe the case of Acute Myocardial Infarction in a Patient with Longstanding Type 2 Diabetes Mellitus"
print(answer_query(query))

The case of Acute Myocardial Infarction in a Patient with Longstanding Type 2 Diabetes Mellitus involves a 54-year-old male with a 15-year history of type 2 diabetes and hypertension. He presented with acute-onset central chest pain radiating to the left arm, accompanied by diaphoresis, which began 2 hours prior to his arrival at the hospital. 

Upon examination, his blood pressure was recorded at 146/90 mmHg, and his heart rate was 102 beats per minute. An ECG showed ST-segment elevation in leads V2–V6, and his Troponin-I levels were elevated at 14 ng/mL (reference <0.04). His HbA1c was 9.1%, indicating poor glycemic control.

The diagnosis was confirmed as an acute ST-elevation myocardial infarction (STEMI) affecting the anterior wall. The management included administering a loading dose of aspirin (325 mg) and ticagrelor (180 mg), along with an intravenous heparin infusion. The report emphasizes the importance of early recognition and the use of dual antiplatelet therapy in the mana

### Medical Chatbot

In [13]:
def chat(message, history):
    # Use your new pipeline instead of conversation_chain
    response = answer_query(message)  # This runs keyword match + guardrails + red teaming
    return response

view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.
