# RAG as a Diabetes Question Answering



The medical domain benefits from RAG's architecture, where precision is non-negotiable. Diabetes is one of the most active areas of clinical research. This system combines established search techniques with modern AI language models. Designed specifically for diabetes research, this approach organises information into clear summaries while maintaining links to original sources.

# 1. Initialisation and Data collection

Data is collected from ClinicalTrials.gov using clinical trials API. The studies are limited to 10000 studies to minimise computational cost. From the studies, NCT ID, title, and summary. NCT ID is used as metadata to connect the summary to the research ID.

In [1]:
import requests
import pandas as pd

# API Configuration
base_url = "https://clinicaltrials.gov/api/v2/studies"
params = {
    "query.titles": "Diabetes",
    "pageSize": 100,
    "fields": "protocolSection.identificationModule.nctId," +
              "protocolSection.identificationModule.officialTitle," +
              "protocolSection.descriptionModule.briefSummary"
}

# Data Collection
studies = []
max_studies = 10000  # Limit for runs to minimise computational cost. Removing the limit would increase the number of studies that the RAG learns from.

while len(studies) < max_studies:
    response = requests.get(base_url, params=params)
    if response.status_code != 200:
        print(f"Error: HTTP {response.status_code}")
        break

    data = response.json()
    new_studies = data.get('studies', [])

    if not new_studies:
        break

    studies.extend(new_studies)

    if len(studies) >= max_studies:
        studies = studies[:max_studies]  # Trim in case we exceeded
        break

    if not data.get('nextPageToken'):
        break

    params['pageToken'] = data['nextPageToken']

# Structured Extraction
cleaned_data = []
for study in studies:
    protocol = study.get('protocolSection', {})
    ident = protocol.get('identificationModule', {})
    desc = protocol.get('descriptionModule', {})

    cleaned_data.append({
        "NCT ID": ident.get('nctId'), # Getting the NCT ID
        "Official Title": ident.get('officialTitle'), # Getting the Title
        "Brief Summary": desc.get('briefSummary') # Getting the summary
    })

# Create DataFrame
df = pd.DataFrame(cleaned_data)

# Save to CSV
df.to_csv("diabetes_trials_summaries.csv", index=False)

print(f"Retrieved {len(studies)} studies")
print(df.head())

Retrieved 10000 studies
        NCT ID                                     Official Title  \
0  NCT01916694  A Randomised Pilot Trial to Compare Remote Blo...   
1  NCT05933460  Investigation of the Impact of the Administrat...   
2  NCT01885260  A Double Blind, Placebo-Controlled, Phase 2 St...   
3  NCT06911060  The Effect of Flaxseed Consumption on Biochemi...   
4  NCT03363360  Comprehensive Awareness and Control in Diabete...   

                                       Brief Summary  
0  Diabetes in pregnancy (gestational diabetes) i...  
1  Administration of 4 strain of probiotics, preb...  
2  The purpose of this study is to evaluate the e...  
3  The main goal of this clinical trial is to det...  
4  The study is a multicenter, sub-center contras...  


# 2. Retrieval

This code transforms raw clinical trial data into structured documents, with page content having the title and summary of the clinical trial and the NCT ID as a metadata. This approach achieves document retrieval that is still connected to its ID, crucial for medical research questioning.

In [2]:
# Example LangChain Document creation
from langchain_core.documents import Document

docs = [
    Document(
        page_content=f"Title: {item['Official Title']}\nSummary: {item['Brief Summary']}",
        metadata={"source": item['NCT ID']}
    )
    for item in cleaned_data
]

## 2.1 Enhanced Retrieval

This system implements a multi-stage retrieval pipeline to maximize precision and recall when searching diabetes clinical trials. By combining keyword-based retrieval (BM25) with semantic search (neural embeddings) and cross-encoder re-ranking, it addresses key challenges in medical information retrieval.

### 2.1.1 Keyword-based Retrieval

rank_bm25 library is used for exact term matching, prioritizing documents containing explicit query terms. This part ensures high precision for protocol-specific queries.

In [3]:
!pip install -q rank_bm25
!pip install -q nltk

In [4]:
import nltk
nltk.download('punkt_tab')  # Needed for BM25 tokenization
from nltk.tokenize import word_tokenize
from rank_bm25 import BM25Okapi

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


In [5]:
# Build BM25 index
bm25_corpus = [word_tokenize(doc.page_content.lower()) for doc in docs]
bm25 = BM25Okapi(bm25_corpus)

In [6]:
!pip install -q sentence-transformers

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### 2.1.2 Cross Encoder Reranker

ms-marco-MiniLM-L-6-v2 is used to refine retrieved documents. It scores relevance based on full query-document interaction.

In [7]:
from sentence_transformers import CrossEncoder

# Load a cross-encoder model (good default for re-ranking)
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.33k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.66k [00:00<?, ?B/s]

### 2.1.3 Hybrid Retrieval

Hybrid retrieval combines dense result with BM25 and reranks it using cross-encoder. This approach will make the code be able to retrieve specific wording and also different wording with the same semantic meaning.

In [8]:
def hybrid_retrieval(query, k=5, rerank_top_n=5):
    # Dense results (FAISS)
    retriever_dense = vectorstore.as_retriever(search_kwargs={"k": k})
    dense_results = retriever_dense.invoke(query)

    # Sparse results (BM25)
    tokenized_query = word_tokenize(query.lower())
    sparse_scores = bm25.get_scores(tokenized_query)
    top_sparse = sorted(enumerate(sparse_scores), key=lambda x: x[1], reverse=True)[:k]
    sparse_results = [docs[i] for i, _ in top_sparse]

    # Combine and deduplicate by source
    combined_docs = {doc.metadata["source"]: doc for doc in (dense_results + sparse_results)}
    combined_list = list(combined_docs.values())

    # Re-rank using cross-encoder
    pairs = [(query, doc.page_content) for doc in combined_list]
    scores = reranker.predict(pairs)

    # Sort by relevance
    ranked = sorted(zip(combined_list, scores), key=lambda x: x[1], reverse=True)
    top_docs = [doc for doc, _ in ranked[:rerank_top_n]]
    return top_docs

### 2.1.4 Step-back Question




Step back questioning reformulates the prompt into a higher order question, emulating reasoning patterns of humans.

In [9]:
def generate_step_back_question(question: str, client) -> str:
    """Converts specific medical questions to conceptual ones"""
    step_back_prompt = """Analyze this medical question and extract its core physiological or clinical concept:

    Original Question: {question}

    Guidelines:
    1. Identify the overarching biological system
    2. Remove specific drug names or trial references
    3. Focus on mechanisms or principles

    Step-Back Question:"""

    response = client.models.generate_content(
        model=MODEL,
        contents=step_back_prompt.format(question=question)
    )
    return response.text.strip()


## 2.2 Verification

verification step to ensure factual consistency of the response with the retrieved evidence, it asks the model to determine whether each factual claim in the answer can be substantiated by the cited clinical trials.

In [10]:
def verify_medical_answer(answer: str, context: str, client) -> tuple[bool, str]:
    """Checks answer against context"""
    verification_prompt = """Verify this medical answer:

    Answer: {answer}
    Context: {context}

    Rules:
    1. Return "TRUE" only if ALL claims are supported
    2. Return "FALSE: [reason]" otherwise

    Judgment:"""

    response = client.models.generate_content(
        model=MODEL,
        contents=verification_prompt.format(answer=answer, context=context)
    )
    return ("TRUE" in response.text), response.text

In [11]:
!pip install langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB

# 3. Vector Store

The embedded representations are indexed and stored using the chroma vector database. The vector store is populated by embedding the full set of retrieved documents and persisting for future reuse

In [12]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

# Initialize embeddings (local model)
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


  embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [13]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-1.0.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting fastapi==0.115.9 (from chromadb)
  Downloading fastapi-0.115.9-py3-none-any.whl.metadata (27 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-4.2.0-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.34.0-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.34.0-py3-none-any.whl.metadata (2.4 kB)
Collecting opentelemetry-instrumentation-fastapi>=0.41b0 (from chromadb)
  Downloading opentelemetry_instrumentation_fastapi-0.55b0-py3-none-any.whl.metadata (2.2 kB)
Collecting opentelemetry-sdk>=1.2.0 (fr

In [14]:
import os
from langchain_community.vectorstores import Chroma

persist_directory = "db"

# Load existing Chroma vector store if it exists
if os.path.exists(persist_directory) and os.listdir(persist_directory):
    print("Loading existing vector store...")
    vectorstore = Chroma(
        persist_directory=persist_directory,
        embedding_function=embeddings
    )
else:
    print("Creating new vector store...")
    vectorstore = Chroma.from_documents(
        documents=docs,
        embedding=embeddings,
        persist_directory=persist_directory
    )
    vectorstore.persist()
    print(f"Vector store persisted to '{persist_directory}'")


Creating new vector store...
Vector store persisted to 'db'


  vectorstore.persist()


# 4. Generation

The generation component of the system uses Gemini 2.0 and is implemented through a controlled prompting guide. A domain specific prompt is designed to constrain the model’s behavior by defining its role and output expectations. The prompt instructs the model to generate responses that rely strictly on the retrieved clinical trial data, cite NCT identifiers when referencing studies, and organize the output into three defined sections which are summary, key findings, and limitations.This step is complete with step-back query reformulation, cross-encoder reranking, and answer verification.

In [15]:

!pip install -q -U google-genai  # Install or update google-genai
!pip install -q -U google-generativeai  # Install or update google-generativeai

from google.colab import userdata
from google import genai

# Set your Google API key (ensure it's stored securely)
GOOGLE_API_KEY = userdata.get('Google_API')
client = genai.Client(api_key=GOOGLE_API_KEY)
MODEL = "gemini-2.0-flash"

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/199.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.5/199.5 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [16]:
def answer_with_gemini(query, client):
    """
    Uses hybrid retrieval (BM25 + vector + reranking) and Gemini to answer the query.
    Additionally, performs step-back question analysis and medical answer verification.
    """
    # Step 1: Generate the "Step-Back" version of the query
    step_back_query = generate_step_back_question(query, client)
    print(f"Step-back query: {step_back_query}")

    # Step 2: Retrieve documents based on the step-back query using hybrid retrieval
    retrieved_docs = hybrid_retrieval(step_back_query, k=8, rerank_top_n=5)

    # Construct context from retrieved documents
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    sources = [doc.metadata["source"] for doc in retrieved_docs]

    # Step 3: Generate an answer with Gemini based on the context
    system_instructions = """
    You are a medical AI assistant. Follow these rules:
    1. Base answers ONLY on the provided clinical trial data.
    2. Cite NCT IDs (e.g., NCT0123456) when referencing trials.
    3. If unsure, say "This requires medical expertise."
    4. Structure responses:
       - Summary of relevant trials
       - Key findings
       - Limitations
    """

    prompt = f"""{system_instructions}

    Context:
    {context}

    Question:
    {query}
    """

    # Step 4: Generate the answer with Gemini
    response = client.models.generate_content(
        model=MODEL,
        contents=prompt
    )

    # Step 5: Verify the generated medical answer against the context
    answer = response.text
    is_verified, verification_message = verify_medical_answer(answer, context, client)

    if is_verified:
        print("Answer Verified: TRUE")
    else:
        print(f"Answer Verified: FALSE. Reason: {verification_message}")

    # Return the generated answer and sources
    return answer, sources


# 5. Evaluation

This internal evaluation is the model's evaluation mechanism to evaluate the RAG's answer using Gemini 2.0 LLM. Using LLM, the code evaluates the answer based on the faithfulness, factuality, completeness, fluency, and citation.

In [17]:
def evaluate_generation(query, answer, context, client):
    """
    Evaluates the quality of a generated medical answer using Gemini.
    Returns the raw evaluation output as text.
    """
    evaluation_prompt = f"""
You are a medical evaluation assistant.

Evaluate the quality of the answer generated for the following clinical question based only on the retrieved context.

Question: {query}

Context (retrieved trials):
{context}

Generated Answer:
{answer}

Score each from 0 to 2:
1. Faithfulness: Does the answer stay within the information found in the context?
2. Factuality: Are all claims medically accurate?
3. Completeness: Does the answer fully address the question?
4. Fluency: Is the answer well-written and easy to understand?
5. Proper Citation: Are clinical trial references (e.g., NCT IDs) correctly cited?

Respond ONLY in JSON format:
{{
  "faithfulness": <0|1|2>,
  "factuality": <0|1|2>,
  "completeness": <0|1|2>,
  "fluency": <0|1|2>,
  "citations": <0|1|2>
}}
"""

    eval_response = client.models.generate_content(
        model=MODEL,
        contents=evaluation_prompt
    )

    eval_text = eval_response.text
    print("Evaluation output:\n", eval_text)
    return eval_text


# 6. Querying

## 6.1. Query 1: What is diabetes

Query 1 asks the fundamental question what is diabetes to ask the RAG model.

In [20]:
user_question = "what is diabetes?" #@param {type:"string"}
answer, sources = answer_with_gemini(user_question, client)
print(f"Answer: {answer}")
print(sources)

Step-back query: **Analysis:**

*   **Overarching Biological System:** Endocrine System (specifically relating to glucose regulation)
*   **Focus on Mechanisms/Principles:** The body's inability to properly regulate blood glucose levels. This can stem from either insufficient insulin production (Type 1) or resistance to insulin's effects (Type 2), or other more rare underlying causes.

**Step-Back Question:**

What are the underlying physiological mechanisms that cause dysregulation of blood glucose homeostasis?
Answer Verified: TRUE
Answer: Diabetes is a disease defined by abnormally high blood sugar (glucose) levels (per the "Natural History of Autoimmune Diabetes and Its Complications" study). This occurs because the body either doesn't produce enough insulin or can't effectively use the insulin it produces. Insulin is a hormone produced by the pancreas that allows glucose to enter cells for energy.

**Summary of Relevant Trials**

*   **Natural History of Autoimmune Diabetes and It

In [21]:
def get_context_from_sources(sources, docs):
    # docs is your original docs list from retrieval
    # This function joins content of docs matching the sources list
    selected_docs = [doc for doc in docs if doc.metadata["source"] in sources]
    return "\n\n".join([doc.page_content for doc in selected_docs])

# Example usage:
context_for_eval = get_context_from_sources(sources, docs)

eval_result = evaluate_generation(user_question, answer, context_for_eval, client)

Evaluation output:
 ```json
{
  "faithfulness": 2,
  "factuality": 2,
  "completeness": 2,
  "fluency": 2,
  "citations": 1
}
```


In [22]:
# prompt: query with gemini LLM without using the RAG pipeline

user_question = "what is diabetes?"

prompt_text = f"""
Answer the following question.
Question: {user_question}
"""

response = client.models.generate_content(
    model=MODEL,
    contents=prompt_text
)

print(f"Gemini's direct answer: {response.text}")

Gemini's direct answer: Diabetes is a chronic metabolic disease characterized by elevated levels of blood glucose, which leads over time to serious damage to the heart, blood vessels, eyes, kidneys, and nerves. It occurs when the body either doesn't produce enough insulin or cannot effectively use the insulin it produces.

In simpler terms:

*   **Insulin** is a hormone that acts like a key to let blood sugar into cells for use as energy.
*   **Diabetes** means either your body doesn't make enough insulin (or any at all), or your body can't use the insulin it does make very well.
*   This causes **blood sugar** to build up in your bloodstream.
*   High blood sugar over a long time can cause **health problems.**

There are different types of diabetes, with the most common being type 1, type 2, and gestational diabetes.



## 6.2. Query 2: What is GLP-1 and why are GLP-1 agonist effective?

Query 2 asks a more indepth question about a specific hormone and medication about diabetes to the RAG model.

In [23]:
user_question = "what is GLP1 and why are GLP1 agonists effective?" #@param {type:"string"}
answer, sources = answer_with_gemini(user_question, client)
print(f"Answer: {answer}")
print(sources)

Step-back query: Here's an analysis following the guidelines:

**1. Overarching Biological System:**

*   Glucose regulation and incretin system (specifically related to glucose-dependent insulin secretion)

**2. Drug Name Removal:**

*   The question asks about the substance itself (GLP1) and *the class of drugs that act on it*, so removing a specific drug name would still leave the core concept intact.

**3. Core Physiological/Clinical Concept (Mechanism/Principle Focus):**

*   **GLP-1 is a hormone that enhances insulin secretion in a glucose-dependent manner, suppresses glucagon secretion, slows gastric emptying, and promotes satiety, thereby contributing to blood glucose control.**

**Step-Back Question:**

*   How does the body naturally regulate blood glucose levels in response to food intake, and what hormonal pathways are involved in this process?
Answer Verified: TRUE
Answer: GLP-1 (glucagon-like peptide-1) is a gastrointestinal hormone that has insulinotrophic and glucagonos

In [24]:

context_for_eval = get_context_from_sources(sources, docs)

eval_result = evaluate_generation(user_question, answer, context_for_eval, client)

Evaluation output:
 ```json
{
  "faithfulness": 2,
  "factuality": 2,
  "completeness": 2,
  "fluency": 2,
  "citations": 1
}
```


In [25]:
# prompt: query with gemini LLM without using the RAG pipeline

user_question = "what is GLP1 and why are GLP1 agonists effective?"

prompt_text = f"""
Answer the following question.
Question: {user_question}
"""

response = client.models.generate_content(
    model=MODEL,
    contents=prompt_text
)

print(f"Gemini's direct answer: {response.text}")

Gemini's direct answer: ## GLP-1 and Why GLP-1 Agonists are Effective

**What is GLP-1?**

GLP-1 stands for **Glucagon-like Peptide-1**. It is a naturally occurring incretin hormone produced in the small intestine in response to food intake.  Incretins are hormones that stimulate insulin release after eating.  GLP-1 plays a crucial role in regulating blood glucose levels.

**Key actions of GLP-1:**

*   **Stimulates Insulin Release:**  GLP-1 enhances insulin secretion from the pancreas in a glucose-dependent manner. This means it only stimulates insulin release when blood sugar levels are elevated.  This reduces the risk of hypoglycemia (low blood sugar).

*   **Suppresses Glucagon Secretion:** GLP-1 inhibits the release of glucagon, a hormone that raises blood sugar levels by stimulating the liver to release stored glucose.

*   **Slows Gastric Emptying:**  GLP-1 slows down the rate at which food empties from the stomach into the small intestine. This promotes a feeling of fullness (s

## 6.3. Query 3:how is RAG different to standard LLM

Query 3 deliberately asks a question that is not part of clinical trial nor is it about diabetes to review how the model would react.

In [26]:
user_question = "how is RAG different to standard LLM?" #@param {type:"string"}
answer, sources = answer_with_gemini(user_question, client)
print(f"Answer: {answer}")
print(sources)

Step-back query: **Step-Back Question:** How does the physiological process of incorporating new information into existing knowledge differ between a biological immune system and an artificial neural network?
Answer Verified: FALSE. Reason: It is impossible to determine what question the answer refers to. Therefore, I cannot verify the medical answer.

FALSE: The question is missing.

Answer: This question cannot be answered from the provided clinical trial data.

['NCT02801942', 'NCT00896610', 'NCT06985862', 'NCT01907399', 'NCT06280729']


In [27]:
# prompt: query with gemini LLM without using the RAG pipeline

user_question = "how is RAG different to standard LLM?"

prompt_text = f"""
Answer the following question.
Question: {user_question}
"""

response = client.models.generate_content(
    model=MODEL,
    contents=prompt_text
)

print(f"Gemini's direct answer: {response.text}")

Gemini's direct answer: RAG (Retrieval-Augmented Generation) differs significantly from standard LLMs (Large Language Models) in how they generate responses. Here's a breakdown:

**Standard LLM (Without RAG):**

*   **Relies on Pre-trained Knowledge:** Standard LLMs are trained on massive datasets and store knowledge within their parameters. When you ask a question, they use this pre-existing knowledge to generate an answer.
*   **Limited to Training Data:** The LLM's knowledge is limited to what it learned during its training phase. It can't access or incorporate new information after training is complete.
*   **Potential for "Hallucination":** Because LLMs are predicting the next word in a sequence, they can sometimes generate incorrect or nonsensical information (hallucinations), especially when asked about topics outside their training data or for very specific, niche knowledge.
*   **Difficult to Update Knowledge:** Updating the knowledge of a standard LLM requires re-training it 