# RAG as a Diabetes Question Answering



The medical domain benefits from RAG's architecture, where precision is non-negotiable. Diabetes is one of the most active areas of clinical research. This system combines established search techniques with modern AI language models. Designed specifically for diabetes research, this approach organises information into clear summaries while maintaining links to original sources.

# 1. Initialisation and Data collection

Data is collected from ClinicalTrials.gov using clinical trials API. The studies are limited to 1000 studies to minimise computational cost. From the studies, NCT ID, title, and summary. NCT ID is used as metadata to connect the summary to the research ID.

In [None]:
import requests
import pandas as pd

# API Configuration
base_url = "https://clinicaltrials.gov/api/v2/studies"
params = {
    "query.titles": "Diabetes",
    "pageSize": 100,
    "fields": "protocolSection.identificationModule.nctId," +
              "protocolSection.identificationModule.officialTitle," +
              "protocolSection.descriptionModule.briefSummary"
}

# Data Collection
studies = []
max_studies = 1000  # Limit for runs to minimise computational cost. Removing the limit would increase the number of studies that the RAG learns from.

while len(studies) < max_studies:
    response = requests.get(base_url, params=params)
    if response.status_code != 200:
        print(f"Error: HTTP {response.status_code}")
        break

    data = response.json()
    new_studies = data.get('studies', [])

    if not new_studies:
        break

    studies.extend(new_studies)

    if len(studies) >= max_studies:
        studies = studies[:max_studies]  # Trim in case we exceeded
        break

    if not data.get('nextPageToken'):
        break

    params['pageToken'] = data['nextPageToken']

# Structured Extraction
cleaned_data = []
for study in studies:
    protocol = study.get('protocolSection', {})
    ident = protocol.get('identificationModule', {})
    desc = protocol.get('descriptionModule', {})

    cleaned_data.append({
        "NCT ID": ident.get('nctId'), # Getting the NCT ID
        "Official Title": ident.get('officialTitle'), # Getting the Title
        "Brief Summary": desc.get('briefSummary') # Getting the summary
    })

# Create DataFrame
df = pd.DataFrame(cleaned_data)

# Save to CSV
df.to_csv("diabetes_trials_summaries.csv", index=False)

print(f"Retrieved {len(studies)} studies")
print(df.head())

Retrieved 1000 studies
        NCT ID                                     Official Title  \
0  NCT05484427           Kids Diabetes Telemedicine Study (KITES)   
1  NCT05208827  A Multicenter Randomized Controlled Study of V...   
2  NCT04616027  A PHASE 1, OPEN-LABEL, SINGLE-DOSE, PARALLEL G...   
3  NCT03977727  An Exploratory, Single-center, Randomized, Ope...   
4  NCT04073927  The Butyful Study. Effect of Butyrate on Infla...   

                                       Brief Summary  
0  Randomised prospective single-center clinical ...  
1  This study was a double-blind multicenter rand...  
2  This study will characterize the effect of var...  
3  This is an exploratory, single-center, randomi...  
4  The objective is to assess the impact of 12 we...  


# 2. Retrieval

This code transforms raw clinical trial data into structured documents, with page content having the title and summary of the clinical trial and the NCT ID as a metadata. This approach achieves document retrieval that is still connected to its ID, crucial for medical research questioning.

In [None]:
# Example LangChain Document creation
from langchain_core.documents import Document

docs = [
    Document(
        page_content=f"Title: {item['Official Title']}\nSummary: {item['Brief Summary']}",
        metadata={"source": item['NCT ID']}
    )
    for item in cleaned_data
]

## 2.1 Enhanced Retrieval

This system implements a multi-stage retrieval pipeline to maximize precision and recall when searching diabetes clinical trials. By combining keyword-based retrieval (BM25) with semantic search (neural embeddings) and cross-encoder re-ranking, it addresses key challenges in medical information retrieval.

### 2.1.1 Keyword-based Retrieval

rank_bm25 library is used for exact term matching, prioritizing documents containing explicit query terms. This part ensures high precision for protocol-specific queries.

In [None]:
!pip install -q rank_bm25
!pip install -q nltk

In [None]:
import nltk
nltk.download('punkt_tab')  # Needed for BM25 tokenization
from nltk.tokenize import word_tokenize
from rank_bm25 import BM25Okapi

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


In [None]:
# Build BM25 index
bm25_corpus = [word_tokenize(doc.page_content.lower()) for doc in docs]
bm25 = BM25Okapi(bm25_corpus)

In [None]:
!pip install -q sentence-transformers

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m90.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m80.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m39.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### 2.1.2 Cross Encoder Reranker

ms-marco-MiniLM-L-6-v2 is used to refine retrieved documents. It scores relevance based on full query-document interaction.

In [None]:
from sentence_transformers import CrossEncoder

# Load a cross-encoder model (good default for re-ranking)
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.33k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.66k [00:00<?, ?B/s]

### 2.1.3 Hybrid Retrieval

Hybrid retrieval combines dense result with BM25 and reranks it using cross-encoder. This approach will make the code be able to retrieve specific wording and also different wording with the same semantic meaning.

In [None]:
def hybrid_retrieval(query, k=5, rerank_top_n=5):
    # Dense results (FAISS)
    retriever_dense = vectorstore.as_retriever(search_kwargs={"k": k})
    dense_results = retriever_dense.invoke(query)

    # Sparse results (BM25)
    tokenized_query = word_tokenize(query.lower())
    sparse_scores = bm25.get_scores(tokenized_query)
    top_sparse = sorted(enumerate(sparse_scores), key=lambda x: x[1], reverse=True)[:k]
    sparse_results = [docs[i] for i, _ in top_sparse]

    # Combine and deduplicate by source
    combined_docs = {doc.metadata["source"]: doc for doc in (dense_results + sparse_results)}
    combined_list = list(combined_docs.values())

    # Re-rank using cross-encoder
    pairs = [(query, doc.page_content) for doc in combined_list]
    scores = reranker.predict(pairs)

    # Sort by relevance
    ranked = sorted(zip(combined_list, scores), key=lambda x: x[1], reverse=True)
    top_docs = [doc for doc, _ in ranked[:rerank_top_n]]
    return top_docs

### 2.1.4 Step-back Question




Step back questioning reformulates the prompt into a higher order question, emulating reasoning patterns of humans.

In [None]:
def generate_step_back_question(question: str, client) -> str:
    """Converts specific medical questions to conceptual ones"""
    step_back_prompt = """Analyze this medical question and extract its core physiological or clinical concept:

    Original Question: {question}

    Guidelines:
    1. Identify the overarching biological system
    2. Remove specific drug names or trial references
    3. Focus on mechanisms or principles

    Step-Back Question:"""

    response = client.models.generate_content(
        model=MODEL,
        contents=step_back_prompt.format(question=question)
    )
    return response.text.strip()


## 2.2 Verification

verification step to ensure factual consistency of the response with the retrieved evidence, it asks the model to determine whether each factual claim in the answer can be substantiated by the cited clinical trials.

In [None]:
def verify_medical_answer(answer: str, context: str, client) -> tuple[bool, str]:
    """Checks answer against context"""
    verification_prompt = """Verify this medical answer:

    Answer: {answer}
    Context: {context}

    Rules:
    1. Return "TRUE" only if ALL claims are supported
    2. Return "FALSE: [reason]" otherwise

    Judgment:"""

    response = client.models.generate_content(
        model=MODEL,
        contents=verification_prompt.format(answer=answer, context=context)
    )
    return ("TRUE" in response.text), response.text

In [None]:
!pip install langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB

# 3. Vector Store

The embedded representations are indexed and stored using the chroma vector database. The vector store is populated by embedding the full set of retrieved documents and persisting for future reuse

In [None]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

# Initialize embeddings (local model)
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


  embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-1.0.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting fastapi==0.115.9 (from chromadb)
  Downloading fastapi-0.115.9-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.3-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-4.2.0-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.33.1-py3-none-any.whl.metadata (1.6 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.33.1-py3-none-any.whl.metadata (2.5 kB)
Collecting opentelemetry-instrumentation-fastapi>=0.41b0 (from chromadb)

In [None]:
import os
from langchain_community.vectorstores import Chroma

persist_directory = "db"

# Load existing Chroma vector store if it exists
if os.path.exists(persist_directory) and os.listdir(persist_directory):
    print("Loading existing vector store...")
    vectorstore = Chroma(
        persist_directory=persist_directory,
        embedding_function=embeddings
    )
else:
    print("Creating new vector store...")
    vectorstore = Chroma.from_documents(
        documents=docs,
        embedding=embeddings,
        persist_directory=persist_directory
    )
    vectorstore.persist()
    print(f"Vector store persisted to '{persist_directory}'")


Creating new vector store...
Vector store persisted to 'db'


  vectorstore.persist()


# 4. Generation

The generation component of the system uses Gemini 2.0 and is implemented through a controlled prompting guide. A domain specific prompt is designed to constrain the model’s behavior by defining its role and output expectations. The prompt instructs the model to generate responses that rely strictly on the retrieved clinical trial data, cite NCT identifiers when referencing studies, and organize the output into three defined sections which are summary, key findings, and limitations.This step is complete with step-back query reformulation, cross-encoder reranking, and answer verification.

In [None]:

!pip install -q -U google-genai  # Install or update google-genai
!pip install -q -U google-generativeai  # Install or update google-generativeai

from google.colab import userdata
from google import genai

# Set your Google API key (ensure it's stored securely)
GOOGLE_API_KEY = userdata.get('Google_API')
client = genai.Client(api_key=GOOGLE_API_KEY)
MODEL = "gemini-2.0-flash"

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/199.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m194.6/199.5 kB[0m [31m7.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.5/199.5 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
def answer_with_gemini(query, client):
    """
    Uses hybrid retrieval (BM25 + vector + reranking) and Gemini to answer the query.
    Additionally, performs step-back question analysis and medical answer verification.
    """
    # Step 1: Generate the "Step-Back" version of the query
    step_back_query = generate_step_back_question(query, client)
    print(f"Step-back query: {step_back_query}")

    # Step 2: Retrieve documents based on the step-back query using hybrid retrieval
    retrieved_docs = hybrid_retrieval(step_back_query, k=8, rerank_top_n=5)

    # Construct context from retrieved documents
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    sources = [doc.metadata["source"] for doc in retrieved_docs]

    # Step 3: Generate an answer with Gemini based on the context
    system_instructions = """
    You are a medical AI assistant. Follow these rules:
    1. Base answers ONLY on the provided clinical trial data.
    2. Cite NCT IDs (e.g., NCT0123456) when referencing trials.
    3. If unsure, say "This requires medical expertise."
    4. Structure responses:
       - Summary of relevant trials
       - Key findings
       - Limitations
    """

    prompt = f"""{system_instructions}

    Context:
    {context}

    Question:
    {query}
    """

    # Step 4: Generate the answer with Gemini
    response = client.models.generate_content(
        model=MODEL,
        contents=prompt
    )

    # Step 5: Verify the generated medical answer against the context
    answer = response.text
    is_verified, verification_message = verify_medical_answer(answer, context, client)

    if is_verified:
        print("Answer Verified: TRUE")
    else:
        print(f"Answer Verified: FALSE. Reason: {verification_message}")

    # Return the generated answer and sources
    return answer, sources


# 5. Evaluation

This internal evaluation is the model's evaluation mechanism to evaluate the RAG's answer using Gemini 2.0 LLM. Using LLM, the code evaluates the answer based on the faithfulness, factuality, completeness, fluency, and citation.

In [None]:
def evaluate_generation(query, answer, context, client):
    """
    Evaluates the quality of a generated medical answer using Gemini.
    Returns the raw evaluation output as text.
    """
    evaluation_prompt = f"""
You are a medical evaluation assistant.

Evaluate the quality of the answer generated for the following clinical question based only on the retrieved context.

Question: {query}

Context (retrieved trials):
{context}

Generated Answer:
{answer}

Score each from 0 to 2:
1. Faithfulness: Does the answer stay within the information found in the context?
2. Factuality: Are all claims medically accurate?
3. Completeness: Does the answer fully address the question?
4. Fluency: Is the answer well-written and easy to understand?
5. Proper Citation: Are clinical trial references (e.g., NCT IDs) correctly cited?

Respond ONLY in JSON format:
{{
  "faithfulness": <0|1|2>,
  "factuality": <0|1|2>,
  "completeness": <0|1|2>,
  "fluency": <0|1|2>,
  "citations": <0|1|2>
}}
"""

    eval_response = client.models.generate_content(
        model=MODEL,
        contents=evaluation_prompt
    )

    eval_text = eval_response.text
    print("Evaluation output:\n", eval_text)
    return eval_text


# 6. Querying

## 6.1. Query 1: What is diabetes

Query 1 asks the fundamental question what is diabetes to ask the RAG model.

In [None]:
user_question = "what is diabetes?" #@param {type:"string"}
answer, sources = answer_with_gemini(user_question, client)
print(f"Answer: {answer}")
print(sources)

Step-back query: Step-Back Question: How does the body regulate blood glucose, and what happens when this regulation fails?
Answer Verified: TRUE
Answer: Based on the provided clinical trial data:

*   **Type 1 Diabetes (T1D):** This is characterized by the loss of pancreatic beta cells, leading to a lack of endogenous insulin production (Effects of SGLT-2 Inhibitor Dapagliflozin on Hormonal Glucose Regulation and Ketogenesis in Patients With Type 1 Diabetes - a Randomised, Placebo-controlled, Open-label, Cross-over Intervention Study). Individuals with T1D often have a defect in glucagon secretion, increasing the risk of hypoglycemia (Can Maximising Time in Range Using Automated Insulin Delivery and a Low Carbohydrate Diet Restore the Glucagon Response to Hypoglycaemic in Type 1 Diabetes?).
*   **Type 2 Diabetes:** This is associated with hyperglycemia, which is a risk factor for cardiovascular issues (Blood Glucose Homeostasis in Type 2 Diabetes: the Effects of Saccharose).
*   **Ges

In [None]:
def get_context_from_sources(sources, docs):
    # docs is your original docs list from retrieval
    # This function joins content of docs matching the sources list
    selected_docs = [doc for doc in docs if doc.metadata["source"] in sources]
    return "\n\n".join([doc.page_content for doc in selected_docs])

# Example usage:
context_for_eval = get_context_from_sources(sources, docs)

eval_result = evaluate_generation(user_question, answer, context_for_eval, client)

Evaluation output:
 ```json
{
  "faithfulness": 2,
  "factuality": 2,
  "completeness": 1,
  "fluency": 2,
  "citations": 1
}
```


In [None]:
# prompt: query with gemini LLM without using the RAG pipeline

user_question = "what is diabetes?"

prompt_text = f"""
Answer the following question.
Question: {user_question}
"""

response = client.models.generate_content(
    model=MODEL,
    contents=prompt_text
)

print(f"Gemini's direct answer: {response.text}")

Gemini's direct answer: Diabetes is a chronic metabolic disorder characterized by elevated levels of blood glucose (or blood sugar), which leads over time to serious damage to the heart, blood vessels, eyes, kidneys, and nerves. This high blood sugar occurs because either:

*   **The pancreas does not produce enough insulin:** Insulin is a hormone that regulates blood sugar.
*   **The body cannot effectively use the insulin it produces:** This is known as insulin resistance.

There are several types of diabetes, including:

*   **Type 1 diabetes:** An autoimmune reaction where the body attacks and destroys insulin-producing cells in the pancreas. People with type 1 diabetes need to take insulin daily to survive.
*   **Type 2 diabetes:** The most common type, where the body becomes resistant to insulin or doesn't produce enough insulin. Often linked to lifestyle factors like being overweight and inactive.
*   **Gestational diabetes:** Develops during pregnancy and usually disappears aft

## 6.2. Query 2: What is GLP-1 and why are GLP-1 agonist effective?

Query 2 asks a more indepth question about a specific hormone and medication about diabetes to the RAG model.

In [None]:
user_question = "what is GLP1 and why are GLP1 agonists effective?" #@param {type:"string"}
answer, sources = answer_with_gemini(user_question, client)
print(f"Answer: {answer}")
print(sources)

Step-back query: **Step-Back Question:**

How do naturally occurring incretin hormones influence glucose regulation, and what mechanisms underlie the therapeutic benefit of enhancing their effects?
Answer Verified: TRUE
Answer: Based on the provided clinical trial data:

*   **What is GLP-1?**

    GLP-1 (Glucagon-like peptide-1) is an incretin hormone (NCT05826353, Title: Effects of SGLT-2 Inhibitor Dapagliflozin on Hormonal Glucose Regulation and Ketogenesis in Patients With Type 1 Diabetes - a Randomised, Placebo-controlled, Open-label, Cross-over Intervention Study). Incretins are gut peptides secreted in response to meals that enhance insulin secretion (Title: Mechanisms of Diabetes Control After Weight Loss Surgery).
*   **Why are GLP-1 agonists effective?**

    The data suggests that GLP-1 agonists are effective because they can suppress glucagon (NCT05826353, Title: Effects of SGLT-2 Inhibitor Dapagliflozin on Hormonal Glucose Regulation and Ketogenesis in Patients With Type 1

In [None]:

context_for_eval = get_context_from_sources(sources, docs)

eval_result = evaluate_generation(user_question, answer, context_for_eval, client)

Evaluation output:
 ```json
{
  "faithfulness": 2,
  "factuality": 2,
  "completeness": 1,
  "fluency": 2,
  "citations": 1
}
```


In [None]:
# prompt: query with gemini LLM without using the RAG pipeline

user_question = "what is GLP1 and why are GLP1 agonists effective?"

prompt_text = f"""
Answer the following question.
Question: {user_question}
"""

response = client.models.generate_content(
    model=MODEL,
    contents=prompt_text
)

print(f"Gemini's direct answer: {response.text}")

Gemini's direct answer: ## GLP-1 and GLP-1 Agonists Explained

**What is GLP-1?**

GLP-1 stands for **Glucagon-Like Peptide-1**. It is an **incretin hormone** naturally produced in the small intestine in response to food intake, especially after meals containing carbohydrates and fat.  Think of it as a signal your gut sends to your pancreas and brain after you eat, helping to regulate blood sugar and appetite.

Here's a breakdown of GLP-1's key functions:

*   **Stimulates Insulin Release:**  It prompts the pancreas to release insulin in a glucose-dependent manner.  This means insulin is released *only* when blood sugar levels are elevated. This reduces the risk of hypoglycemia (low blood sugar).
*   **Suppresses Glucagon Secretion:**  Glucagon is a hormone that raises blood sugar. GLP-1 inhibits glucagon release, further helping to lower blood glucose levels.
*   **Slows Gastric Emptying:**  It slows down the rate at which food empties from the stomach into the small intestine. This h

## 6.3. Query 3:how is RAG different to standard LLM

Query 3 deliberately asks a question that is not part of clinical trial nor is it about diabetes to review how the model would react.

In [None]:
user_question = "how is RAG different to standard LLM?" #@param {type:"string"}
answer, sources = answer_with_gemini(user_question, client)
print(f"Answer: {answer}")
print(sources)

Step-back query: Okay, here's an analysis of the medical question, following the provided guidelines:

**Original Question: how is RAG different to standard LLM?**

**1. Identify the overarching biological system:**

While the question *mentions* "RAG" and "LLM," these are **computational concepts** (Retrieval-Augmented Generation and Large Language Model, respectively).  In the context of a medical *application* they become tools, or methods to do something, rather than being inherently part of a biological system. The question implicitly refers to **information processing** in a *clinical* context, and how the models being inquired about go about the task.

**2. Remove specific drug names or trial references:**

The original question already doesn't contain any specific drug names or trial references.

**3. Focus on mechanisms or principles:**

Instead of the *names* of the technologies, focus on the core *processes* and *principles* they represent.  Instead of asking about the speci

In [None]:
# prompt: query with gemini LLM without using the RAG pipeline

user_question = "how is RAG different to standard LLM?"

prompt_text = f"""
Answer the following question.
Question: {user_question}
"""

response = client.models.generate_content(
    model=MODEL,
    contents=prompt_text
)

print(f"Gemini's direct answer: {response.text}")

Gemini's direct answer: RAG (Retrieval-Augmented Generation) differs from a standard LLM (Large Language Model) in a crucial way: **RAG incorporates external knowledge retrieval into the generation process, whereas a standard LLM relies solely on the knowledge it learned during its training phase.**

Here's a breakdown of the key differences:

*   **Knowledge Source:**
    *   **Standard LLM:**  Its knowledge is entirely based on the massive dataset it was trained on. It can only answer questions or generate text based on what it has seen and learned from this training data.  If the information is not in its training data, it might hallucinate, make assumptions, or simply say it doesn't know.
    *   **RAG:** It leverages an external knowledge base (e.g., a database, a collection of documents, a website index).  When a question is asked, RAG first *retrieves* relevant information from this external source, *augments* the prompt with this retrieved context, and *generates* its response.

# Repo upload

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Replace path below with your actual notebook path inside Drive
notebook_path_in_drive = "/content/drive/My Drive/RAG.ipynb"

# Your existing script
github_key = getpass('Enter your GitHub Personal Access Token: ')
username = "bangbrecho"
repo = "MSBAcourseworks"
branch = "main"
email = "baspradrach@gmail.com"

!git config --global user.email "{email}"
!git config --global user.name "{username}"

!git clone https://{github_key}@github.com/{username}/{repo}.git

!cp "{notebook_path_in_drive}" "{repo}/"

import os
os.chdir(repo)
!git add "RAG.ipynb"
!git commit -m "Upload RAG.ipynb from Colab"
!git push origin {branch}

print("✅ Notebook successfully pushed to your private repo!")


Mounted at /content/drive
Enter your GitHub Personal Access Token: ··········
Cloning into 'MSBAcourseworks'...
remote: Enumerating objects: 7, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 7 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (7/7), 35.30 KiB | 3.92 MiB/s, done.
cp: cannot stat '/content/drive/My Drive/RAG.ipynb': No such file or directory
fatal: pathspec 'RAG.ipynb' did not match any files
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean
Everything up-to-date
✅ Notebook successfully pushed to your private repo!
