# Healthcare RAG System Lab
## Overview

In this lab, you'll take on the role of a junior data scientist at a healthcare technology company that specializes in creating educational resources for patients. Your team has been tasked with developing a system that can automatically generate informative responses to common patient questions about medical conditions, treatments, and wellness practices.

The challenge is to ensure these responses are both accurate and grounded in authoritative medical information. Your specific assignment is to implement a Retrieval-Augmented Generation (RAG) system that can:
1. Understand patient questions about various health topics
2. Retrieve relevant information from a trusted knowledge base
3. Generate helpful, accurate responses based on that information
4. Avoid "hallucinated" content that could potentially misinform patients

This lab follows the generative AI implementation process we've studied, with particular focus on:
- Data Strategy and Knowledge Foundation
- Model Selection and Generation Control
- Evaluation Framework Development

## Setup

First, let's import the necessary libraries:

In [1]:
import torch
import pandas as pd
import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Check if CUDA is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cpu


## Part 1: Knowledge Base Setup

Let's create a sample medical knowledge base with information about common health conditions, treatments, and wellness practices:

In [2]:
# Create a sample medical knowledge base
knowledge_base = pd.DataFrame({
    'content': [
        "Diabetes is a chronic condition that affects how your body turns food into energy. There are three main types: Type 1, Type 2, and gestational diabetes. Type 2 diabetes is the most common form, accounting for about 90-95% of diabetes cases.",
        "Type 1 diabetes is an autoimmune reaction that stops your body from making insulin. Symptoms include increased thirst, frequent urination, hunger, fatigue, and blurred vision. It's usually diagnosed in children, teens, and young adults.",
        "Type 2 diabetes occurs when your body becomes resistant to insulin or doesn't make enough insulin. Risk factors include being overweight, being 45 years or older, having a parent or sibling with type 2 diabetes, and being physically active less than 3 times a week.",
        "Managing diabetes involves monitoring blood sugar levels, taking medications as prescribed, eating a healthy diet, maintaining a healthy weight, and getting regular physical activity. It's important to work with healthcare providers to develop a management plan.",
        "Hypertension, or high blood pressure, is when the force of blood pushing against the walls of your arteries is consistently too high. It's often called the 'silent killer' because it typically has no symptoms but significantly increases the risk of heart disease and stroke.",
        "Blood pressure is measured using two numbers: systolic (top number) and diastolic (bottom number). Normal blood pressure is less than 120/80 mm Hg. Hypertension is diagnosed when readings are consistently 130/80 mm Hg or higher.",
        "Lifestyle changes to manage hypertension include reducing sodium in your diet, getting regular physical activity, maintaining a healthy weight, limiting alcohol, quitting smoking, and managing stress. Medications may also be prescribed if lifestyle changes aren't enough.",
        "Regular physical activity offers numerous health benefits, including weight management, reduced risk of heart disease, strengthened bones and muscles, improved mental health, and enhanced ability to perform daily activities. Adults should aim for at least 150 minutes of moderate-intensity activity per week.",
        "A balanced diet should include a variety of fruits, vegetables, whole grains, lean proteins, and healthy fats. It's recommended to limit intake of added sugars, sodium, saturated fats, and processed foods. Proper nutrition helps prevent chronic diseases and supports overall health.",
        "Vaccination is one of the most effective ways to prevent infectious diseases. Vaccines work by helping the body recognize and fight specific pathogens. Common adult vaccines include influenza (flu), Tdap (tetanus, diphtheria, pertussis), shingles, and pneumococcal vaccines."
    ],
    'metadata': [
        {'topic': 'diabetes', 'subtopic': 'overview', 'source': 'medical_guidelines', 'last_updated': '2023-06-10'},
        {'topic': 'diabetes', 'subtopic': 'type1', 'source': 'medical_guidelines', 'last_updated': '2023-06-10'},
        {'topic': 'diabetes', 'subtopic': 'type2', 'source': 'medical_guidelines', 'last_updated': '2023-06-10'},
        {'topic': 'diabetes', 'subtopic': 'management', 'source': 'medical_guidelines', 'last_updated': '2023-06-10'},
        {'topic': 'hypertension', 'subtopic': 'overview', 'source': 'medical_guidelines', 'last_updated': '2023-07-22'},
        {'topic': 'hypertension', 'subtopic': 'diagnosis', 'source': 'medical_guidelines', 'last_updated': '2023-07-22'},
        {'topic': 'hypertension', 'subtopic': 'management', 'source': 'medical_guidelines', 'last_updated': '2023-07-22'},
        {'topic': 'wellness', 'subtopic': 'physical_activity', 'source': 'health_promotion', 'last_updated': '2023-05-15'},
        {'topic': 'wellness', 'subtopic': 'nutrition', 'source': 'health_promotion', 'last_updated': '2023-05-15'},
        {'topic': 'prevention', 'subtopic': 'vaccination', 'source': 'medical_guidelines', 'last_updated': '2023-08-05'}
    ]
})

print(f"Knowledge base loaded with {len(knowledge_base)} entries")
knowledge_base.head(2)

Knowledge base loaded with 10 entries


Unnamed: 0,content,metadata
0,Diabetes is a chronic condition that affects h...,"{'topic': 'diabetes', 'subtopic': 'overview', ..."
1,Type 1 diabetes is an autoimmune reaction that...,"{'topic': 'diabetes', 'subtopic': 'type1', 'so..."


### Task 1: Create Document Embeddings

Complete the function below to create embeddings for each document in the knowledge base. These embeddings will be used to find relevant documents based on patient queries.

In [4]:
def create_document_embeddings(documents):
    """
    Create embeddings for a list of documents.

    Args:
        documents: List of text documents to embed

    Returns:
        Numpy array of document embeddings
    """
    # Initialize a sentence transformer model
    model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
    # Recommended: 'sentence-transformers/all-mpnet-base-v2' or similar
    embedding_model ='sentence-transformers/all-mpnet-base-v2'

    # Generate embeddings for all documents
    # Hint: Use the model.encode() method
    document_embeddings = model.encode(documents,show_progress_bar=True)

    return document_embeddings

# Extract document content
documents = knowledge_base['content'].tolist()

# Create document embeddings
document_embeddings = create_document_embeddings(documents)

# Verify the shape of embeddings
if document_embeddings is not None:
    print(f"Generated embeddings with shape: {document_embeddings.shape}")
else:
    print("Embeddings not created yet.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Generated embeddings with shape: (10, 768)


## Part 2: Implementing the Retrieval Component

Now, let's implement the function to retrieve relevant documents based on a patient query.

In [6]:
def retrieve_documents(query, embeddings, contents, metadata, top_k=3, threshold=0.3):
    """
    Retrieve the most relevant documents for a given query.

    Args:
        query: The patient's question (str)
        embeddings: Precomputed document embeddings (numpy array)
        contents: List of document texts
        metadata: List of document metadata (dicts)
        top_k: Maximum number of documents to retrieve
        threshold: Minimum similarity score to include a document

    Returns:
        List of (content, metadata, similarity_score) tuples
    """
    # Initialize the embedding model (same as used for documents)
    model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

    # Embed the query
    query_embedding = model.encode([query])  # shape (1, embedding_dim)

    # Compute cosine similarity between query and all document embeddings
    similarities = cosine_similarity(query_embedding, embeddings)[0]  # shape (num_docs,)

    # Combine contents, metadata, and similarities
    results = [
        (contents[i], metadata[i], float(similarities[i]))
        for i in range(len(contents))
        if similarities[i] >= threshold
    ]

    # Sort by similarity descending and take top_k
    results = sorted(results, key=lambda x: x[2], reverse=True)[:top_k]

    return results

## Part 3: Building the Generation Component

Now, let's implement the generation component that will use the retrieved documents to create informative responses.

In [7]:
# Initialize the generative model
def initialize_generator(model_name="gpt2"):
    """
    Initialize the generative model and tokenizer.

    Args:
        model_name: Name of the pretrained model to use

    Returns:
        Tokenizer and model objects
    """
    # Load the tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    # Replace with your code
    model = AutoModelForCausalLM.from_pretrained(model_name)
    # Replace with your code

    # Set padding token if needed
    # Check if pad_token exists, if not set it to eos_token
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    return tokenizer, model

# Initialize the generator
tokenizer, model = initialize_generator()
if tokenizer and model:
    print(f"Initialized {model.config._name_or_path} with {model.num_parameters()} parameters")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Initialized gpt2 with 124439808 parameters


In [10]:
def generate_rag_response(query, contents, metadata, document_embeddings, tokenizer, model, max_length=150, top_k=5, threshold=0.2):
    """
    Generate a response using Retrieval-Augmented Generation (RAG).

    Args:
        query: The patient's question
        contents: List of document contents
        metadata: List of document metadata
        document_embeddings: Precomputed embeddings for the documents
        tokenizer: Tokenizer for the language model
        model: The language model for generation
        max_length: Maximum response length
        top_k: Number of top documents to retrieve
        threshold: Minimum similarity for documents to include

    Returns:
        dict: Generated response and retrieved documents
    """
    # --- 1. Retrieve relevant documents ---
    retrieved_docs = retrieve_documents(
        query=query,
        embeddings=document_embeddings,
        contents=contents,
        metadata=metadata,
        top_k=top_k,
        threshold=threshold
    )

    # --- 2. Format the prompt ---
    if retrieved_docs:
        context_texts = [f"Document {i+1}: {doc[0]}" for i, doc in enumerate(retrieved_docs)]
        context_str = "\n".join(context_texts)
        prompt = f"Answer the following question based on the context below:\n{context_str}\n\nQuestion: {query}\nAnswer:"
    else:
        prompt = f"Answer the following question:\n{query}\nAnswer:"

    # --- 3. Tokenize the prompt ---
    inputs = tokenizer(prompt, return_tensors="pt")

    # --- 4. Generate the response ---
    with torch.no_grad():
        output_sequences = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=max_length,
            temperature=0.7,
            top_k=50,
            top_p=0.9,
            do_sample=True,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

    # --- 5. Decode the generated text ---
    response = tokenizer.decode(output_sequences[0], skip_special_tokens=True)

    # Optional: remove the prompt from the output to get only the answer
    response = response.replace(prompt, "").strip()

    return {
        "query": query,
        "response": response,
        "retrieved_documents": retrieved_docs
    }

## Part 4: Evaluation and Analysis

Let's implement a basic evaluation function to assess the quality of our generated responses.

In [12]:
def evaluate_response(response_data):
    """
    Evaluate the quality of a generated response based on various criteria.

    Args:
        response_data: Dictionary containing the query, response, and retrieved docs

    Returns:
        dict: Evaluation metrics
    """
    response_text = response_data.get("response", "").lower()
    retrieved_docs = response_data.get("retrieved_documents", [])

    # --- Metric 1: Content Relevance ---
    # Count how many unique words from retrieved documents appear in the response
    doc_terms = set()
    for doc, _, _ in retrieved_docs:
        words = re.findall(r'\b\w+\b', doc.lower())
        doc_terms.update(words)

    if doc_terms:
        matched_terms = [term for term in doc_terms if term in response_text]
        content_relevance = len(matched_terms) / len(doc_terms)
    else:
        content_relevance = 0.0

    # --- Metric 2: Medical Terminology Usage ---
    medical_terms = [
        "diabetes", "insulin", "glucose", "hypertension", "blood pressure",
        "systolic", "diastolic", "cardiovascular", "cholesterol", "nutrition",
        "obesity", "physical activity", "vaccination", "immune", "prevention"
    ]
    medical_matches = [term for term in medical_terms if term in response_text]
    medical_term_score = len(medical_matches) / len(medical_terms)

    # --- Metric 3 (optional): Response Length Appropriateness ---
    # Simple heuristic: check if response length is between 50 and 300 characters
    length_score = 1.0 if 50 <= len(response_text) <= 300 else 0.5 if len(response_text) < 50 else 0.8

    # Combine metrics
    metrics = {
        "content_relevance": round(content_relevance, 3),
        "medical_term_usage": round(medical_term_score, 3),
        "length_score": round(length_score, 3),
        "num_medical_terms_mentioned": len(medical_matches)
    }

    return metrics

## Reflection Questions

Answer the following questions about your RAG implementation and its potential applications in healthcare:

### How does the RAG approach improve factual accuracy compared to regular generation?

By grounding the model’s responses in actual documents from a curated knowledge base. Instead of relying solely on the language model’s learned patterns , RAG first retrieves relevant documents and then generates answers conditioned on that content. This reduces misinformation and increases reliability, especially for healthcare topics where factual correctness is critical.

### What are potential challenges or limitations of your current implementation?

- If embeddings or the similarity threshold are suboptimal, irrelevant documents may be retrieved, reducing answer accuracy.

- The current implementation compares the query against all embeddings linearly, which can become slow for thousands of documents.

- GPT-style models have a maximum token length, so long contexts or multiple retrieved documents may exceed the limit.

### How might you enhance this system for a production healthcare environment?

- Vector database integration: Use FAISS, Milvus, or Pinecone to efficiently scale retrieval to thousands of documents.

- Domain-specific models: Fine-tune embeddings and the generative model on medical corpora (e.g., PubMed, clinical guidelines).

- Context filtering and summarization: Summarize retrieved documents before feeding them to the generator to stay within token limits.

- Human-in-the-loop validation: Include clinician review or fact-checking to verify responses.

- Logging and auditing: Track query-response pairs for traceability and continuous model improvement.

- Security & privacy: Ensure patient data is handled in compliance with HIPAA or GDPR.

### What ethical considerations are particularly important for healthcare content generation?

- Accuracy and reliability: Responses must be factually correct; misinformation could harm patients.

- Patient privacy: Never expose personally identifiable information or medical records.

- Bias mitigation: Ensure the model does not reinforce health disparities or provide unsafe recommendations.

- Transparency: Clearly indicate that responses are AI-generated and may require professional verification.

- Responsibility: Healthcare AI should supplement, not replace, professional medical advice.