# PubMed RAG Prototype

## Notebook outline

| Cell | Purpose |
|------|---------|
| 1 | Load env (BEDROCK_KB_ID, BEDROCK_MODEL_ARN), validate KB_ID |
| 2 | Retrieval only: call `retrieve`, print top-k chunks and metadata |
| 3 | Retrieval + generation: call `retrieve_and_generate`, show model answer |
| 4 | Reload env and KB_ID (e.g. after changing .env) |

---

**Prerequisites.** Bedrock KB must be populated (e.g. after running the processing notebook and ingestion).

This notebook tests retrieval only (`retrieve`) and retrieval + generation (`retrieve_and_generate`).

**Env.** `BEDROCK_KB_ID` (required), `BEDROCK_MODEL_ARN` (optional). Set in `.env` or your shell before running.

In [10]:
# Cell 1: Load env (BEDROCK_KB_ID, BEDROCK_MODEL_ARN), validate KB_ID.
import os
from typing import Any

import boto3

DEFAULT_MODEL_ARN: str = (
    "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0"
)

try:
    from dotenv import load_dotenv

    load_dotenv()
except Exception:
    pass


def load_env(reload: bool = False) -> tuple[str, str]:
    """Load env vars and validate BEDROCK_KB_ID."""
    if reload:
        try:
            from dotenv import load_dotenv

            load_dotenv(override=True)
        except Exception:
            pass

    kb_id = os.getenv("BEDROCK_KB_ID", "")
    model_arn = os.getenv("BEDROCK_MODEL_ARN", DEFAULT_MODEL_ARN)
    if not kb_id:
        raise ValueError("Set BEDROCK_KB_ID in your environment or .env")
    return kb_id, model_arn


def get_runtime_client() -> Any:
    """Return a Bedrock Agent Runtime client."""
    return boto3.client("bedrock-agent-runtime")


def retrieve_chunks(query: str, kb_id: str, top_k: int = 5) -> list[dict[str, Any]]:
    """Retrieve top-k chunks from the knowledge base."""
    client = get_runtime_client()
    resp = client.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={"text": query},
        retrievalConfiguration={"vectorSearchConfiguration": {"numberOfResults": top_k}},
    )
    return resp.get("retrievalResults", [])


def retrieve_and_generate_answer(
    query: str, kb_id: str, model_arn: str, top_k: int = 5
) -> str:
    """Retrieve context and generate an answer using the specified model."""
    client = get_runtime_client()
    resp = client.retrieve_and_generate(
        input={"text": query},
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": kb_id,
                "modelArn": model_arn,
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {"numberOfResults": top_k}
                },
            },
        },
    )
    return resp.get("output", {}).get("text", "")


KB_ID, MODEL_ARN = load_env()
KB_ID

'O4UMSSM70N'

In [11]:
# Cell 2: Retrieval only — call retrieve, print top-k chunks and metadata.
query: str = "What are common caregiver challenges in dementia care?"
results = retrieve_chunks(query, KB_ID, top_k=5)

for item in results:
    print(item.get("content", {}).get("text"))
    print(item.get("metadata"))
    print("-")

This study used a thematic analysis approach. RESULTS: Nine major themes were identified: (1) stepping into the caregiver role and responsibilities; (2) supporting activities of daily living; (3) managing behavioural problems; (4) complex care needs; (5) caregiver-related challenges; 6) healthcare-related challenges; (7) knowledge, perception, and awareness of caregivers; (8) consequences of caregiving; and (9) perceived needs. CONCLUSION: Thematic analyses revealed that the caregiving challenges of dementia caregivers were related to behavioral disturbance, functional care needs, complex care needs, caregiver-related challenges, healthcare-related challenges, understanding of dementia, and stepping into the caregiver role and responsibilities. The consequences of caregiving affect caregivers' physical, psychological, and financial health.
{'x-amz-bedrock-kb-source-uri': 's3://pubmed-rag-data/processed/archive/pubmed_records_20260130.jsonl', 'x-amz-bedrock-kb-source-file-modality': 'TE

In [12]:
# Cell 3: Retrieval + generation — call retrieve_and_generate, show model answer.
query: str = "Summarize caregiver preparedness gaps in dementia care."
answer = retrieve_and_generate_answer(query, KB_ID, MODEL_ARN, top_k=5)

answer

"Caregiver preparedness gaps in dementia care encompass several key areas. The most common preparedness gaps identified are in care coordination, emotional and social support, and advance planning. Caregivers also express needs in nursing and health monitoring, personal care, mobility assistance, and household tasks.\n\nSpecific tasks that caregivers feel less prepared for include managing emotional and behavioral symptoms of dementia patients, recognizing and responding to significant changes in the patient's condition, seeking relevant medical information, handling financial and legal matters, and advocating for services.\n\nIt's important to note that these preparedness gaps are not unique to caregivers of dementia patients. Similar proportions of caregivers for veterans with and without dementia reported preparedness needs across various domains and tasks, suggesting that these challenges are common in caregiving for older adults in general. Additionally, research has shown that la

In [13]:
# Cell 4: Reload env and KB_ID (e.g. after changing .env).
KB_ID, MODEL_ARN = load_env(reload=True)
KB_ID

'O4UMSSM70N'