# RAG Demo - Instructor Version

## Providing Domain-Specific Context with Multiple LLMs

This notebook demonstrates Retrieval-Augmented Generation (RAG) using both OpenAI and Anthropic models.

### Prerequisites
- Set `OPENAI_API_KEY` environment variable
- Set `ANTHROPIC_API_KEY` environment variable

## Setup and Configuration

In [None]:
import os
import json
from typing import List, Dict, Any
import numpy as np
from IPython.display import display, Markdown, HTML

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Import AI libraries
from openai import OpenAI
from anthropic import Anthropic

# Initialize clients
openai_client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
anthropic_client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))

print("✅ OpenAI client initialized")
print("✅ Anthropic client initialized")

## Load Product Catalog

In [None]:
# Mock product catalog
PRODUCTS = [
    {
        "name": "LicketySplit Pro",
        "description": "Lightning-fast in-memory caching solution with sub-millisecond latency",
        "features": ["10GB capacity", "LRU eviction", "distributed mode", "Redis compatible"],
        "keywords": ["speed", "fast", "performance", "cache", "memory", "quick", "turbo"],
        "price": "$99/month"
    },
    {
        "name": "Vault-Tec Enterprise",
        "description": "Military-grade encryption for sensitive data protection",
        "features": ["AES-256 encryption", "biometric auth", "SOC2 compliant", "key rotation"],
        "keywords": ["security", "encryption", "protection", "vault", "safe", "secure", "privacy"],
        "price": "$199/month"
    },
    {
        "name": "DreamCloud Manager",
        "description": "Real-time data synchronization across cloud platforms",
        "features": ["Multi-cloud support", "version control", "1TB storage", "automatic backups"],
        "keywords": ["sync", "cloud", "backup", "storage", "synchronization", "replication"],
        "price": "$149/month"
    },
    {
        "name": "DiggityDog Analytics",
        "description": "Stream processing and real-time analytics platform",
        "features": ["Apache Kafka integration", "ML pipelines", "custom dashboards", "alerting"],
        "keywords": ["analytics", "data", "streaming", "metrics", "insights", "dashboard"],
        "price": "$299/month"
    }
]

print(f"Loaded {len(PRODUCTS)} products:")
for p in PRODUCTS:
    print(f"  - {p['name']}: {p['description'][:50]}...")

## Part 1: The Knowledge Gap Problem

Let's demonstrate what happens when LLMs don't have access to your specific product information.

In [None]:
def ask_llm_without_context(question: str, model: str = "openai"):
    """Ask LLM without any context - it lacks domain-specific knowledge!"""
    
    if model == "gpt-3.5":
        response = openai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": question}],
            temperature=0.7
        )
        return response.choices[0].message.content
    
    elif model == "gpt-4o":
        response = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": question}],
            temperature=0.7
        )
        return response.choices[0].message.content
    
    elif model == "claude-haiku":
        response = anthropic_client.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=500,
            messages=[{"role": "user", "content": question}],
            temperature=0.7
        )
        return response.content[0].text
    
    elif model == "claude-sonnet":
        response = anthropic_client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=500,
            messages=[{"role": "user", "content": question}],
            temperature=0.7
        )
        return response.content[0].text

# Test with multiple models
question = "What features does LicketySplit have? What's the pricing?"

display(Markdown("## Testing Knowledge Gap Across Different Models"))

# Test older models
display(Markdown("### 🤖 GPT-3.5 Turbo (Older Model):"))
gpt35_response = ask_llm_without_context(question, "gpt-3.5")
display(Markdown(f"\n{gpt35_response}\n"))

display(Markdown("### 🤖 GPT-4o (Newer OpenAI Model):"))
gpt4o_response = ask_llm_without_context(question, "gpt-4o")
display(Markdown(f"\n{gpt4o_response}\n"))

display(Markdown("### 🤖 Claude Haiku 3 (Older Model):"))
haiku_response = ask_llm_without_context(question, "claude-haiku")
display(Markdown(f"\n{haiku_response}\n"))

display(Markdown("### 🤖 Claude Sonnet 4 (Newer Anthropic Model):"))
sonnet_response = ask_llm_without_context(question, "claude-sonnet")
display(Markdown(f"\n{sonnet_response}\n"))

display(Markdown("""
⚠️ **Key Observation:** Models lack knowledge of your internal products and proprietary information. RAG provides this domain-specific context.
"""))

## Part 2: Keyword-Based RAG

Now let's implement simple keyword search to retrieve real product information.

### What the keyword search results show

This section demonstrates **Retrieval-Augmented Generation (RAG)** using a simple keyword search for retrieval:

1. **Retrieval:** Find relevant products using a basic full-text match (your query tokens vs each product’s name, description, features, and keywords).
2. **Context Construction:** Build a context block from the top results.
3. **Generation:** Pass the context to the LLM to answer the question, grounded in real data.

- The list is ordered by how many tokens overlap (more overlap = more relevant).
- This is the simplest form of RAG: deterministic, offline, and easy to explain.
- The next section will show how semantic search can improve retrieval for less obvious matches.

In [None]:
# Keyword search utilities used in this section
import re
from typing import List, Dict


def _tokenize(text: str) -> set:
    """Lowercase alphanumeric tokenization returning a set of unique tokens."""
    if not text:
        return set()
    return set(re.findall(r"[a-z0-9]+", text.lower()))


def search_products_keyword(query: str, products: List[Dict], top_k: int = 3) -> List[Dict]:
    """Simple full-text overlap across name, description, features, and keywords."""
    q_tokens = _tokenize(query)
    scored = []

    for prod in products:
        name = prod.get("name", "")
        description = prod.get("description", "")
        features = ", ".join(prod.get("features", []))
        keywords = " ".join(prod.get("keywords", []))
        combined = f"{name} {description} {features} {keywords}"
        p_tokens = _tokenize(combined)
        overlap = len(q_tokens & p_tokens)
        if overlap > 0:
            scored.append((prod, overlap))

    scored.sort(key=lambda x: (-x[1], x[0].get("name", "")))
    return [p for p, _ in scored[:top_k]]


def create_context(products: List[Dict]) -> str:
    """Format selected products into a readable context block, one field per line."""
    lines: List[str] = []
    for p in products:
        lines.append(f"Name: {p.get('name', '')}")
        lines.append(f"Description: {p.get('description', '')}")
        feats = ", ".join(p.get("features", []))
        if feats:
            lines.append(f"Features: {feats}")
        price = p.get("price")
        if price:
            lines.append(f"Price: {price}")
        lines.append("")  # Blank line between products
    return "\n".join(lines).strip()


In [None]:
# Test Keyword Search
for q in ["fast cache performance", "security encryption", "cloud backup"]:
    q_tokens = _tokenize(q)
    scored = []
    for prod in PRODUCTS:
        name = prod.get('name', '')
        description = prod.get('description', '')
        features = ", ".join(prod.get('features', []))
        keywords = " ".join(prod.get('keywords', []))
        combined = f"{name} {description} {features} {keywords}"
        p_tokens = _tokenize(combined)
        overlap = len(q_tokens & p_tokens)
        if overlap > 0:
            scored.append((prod['name'], overlap))
    scored.sort(key=lambda x: x[1], reverse=True)

    display(Markdown(f"**Query:** `{q}`"))
    if scored:
        lines = [f"- {name} (overlap: {ov})" for name, ov in scored]
        display(Markdown("Results (highest overlap first):\n" + "\n".join(lines)))
    else:
        display(Markdown("No matches"))
    display(HTML("<hr style='margin: 10px 0;'>"))

In [None]:
def generate_augmented_prompt(question: str, products: List[Dict]):
    """Create an augmented prompt from keyword search results and return it with results."""
    search_results = search_products_keyword(question, products)
    context = create_context(search_results) if search_results else "No product information available."

    prompt = f"""Based on the following product information:

{context}

Question: {question}

Answer based only on the provided information. If the information doesn't answer the question, say so."""
    return prompt, search_results


def ask_llm(prompt: str, model: str = "openai"):
    """Call the selected LLM with the already-constructed prompt."""
    if model == "openai":
        response = openai_client.chat.completions.create(
            model="gpt-5-mini-2025-08-07",
            messages=[
                {"role": "system", "content": "You are a helpful assistant. Answer based only on the provided context."},
                {"role": "user", "content": prompt}
            ]
        )
        return response.choices[0].message.content

    elif model == "anthropic":
        response = anthropic_client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        return response.content[0].text

# Demo: generate the augmented prompt first, print it, then ask the LLMs
question = "What features does LicketySplit have? What's the pricing?"

display(Markdown("## With RAG - Domain-Specific Responses"))

augmented_prompt, _results = generate_augmented_prompt(question, PRODUCTS)

# Print the augmented prompt as a Markdown code block for perfect formatting
prompt_md = f"""### Augmented Prompt\n\n```text\n{augmented_prompt}\n```"""
display(Markdown(prompt_md))

display(Markdown("### ✅ GPT-5 mini with RAG:"))
gpt_rag = ask_llm(augmented_prompt, "openai")
display(Markdown(f"\n{gpt_rag}\n"))

display(Markdown("### ✅ Claude with RAG:"))
claude_rag = ask_llm(augmented_prompt, "anthropic")
display(Markdown(f"\n{claude_rag}\n"))

display(Markdown("✅ **These responses are accurate - based on your actual product data!**"))

## Part 3: Semantic Search (Advanced)

This section demonstrates **RAG with semantic search**:

1. **Retrieval:** Find relevant products using vector similarity (embeddings) to capture meaning, not just keywords.
2. **Context Construction:** Build a context block from the most semantically similar products.
3. **Generation:** Pass the context to the LLM for a grounded answer.

- Semantic search can find related concepts even without exact keyword matches.
- This approach is more robust for natural language queries, but requires API access and is more computationally expensive.
- Compare the results to the keyword search to see the difference in retrieval quality.

In [None]:
def get_embedding(text: str, model: str = "text-embedding-3-small"):
    """Get embedding from OpenAI"""
    response = openai_client.embeddings.create(
        input=text,
        model=model
    )
    return np.array(response.data[0].embedding)

def cosine_similarity(a, b):
    """Calculate cosine similarity between two vectors"""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Generate embeddings for all products
print("Generating embeddings for products...")
for product in PRODUCTS:
    text = f"{product['name']} {product['description']}"
    product['embedding'] = get_embedding(text)
print("✅ Embeddings generated")

def semantic_search(query: str, products: List[Dict], top_k: int = 2) -> List[Dict]:
    """Search using semantic similarity"""
    query_embedding = get_embedding(query)
    
    similarities = []
    for product in products:
        sim = cosine_similarity(query_embedding, product['embedding'])
        similarities.append((product, sim))
    
    similarities.sort(key=lambda x: x[1], reverse=True)
    return [s[0] for s in similarities[:top_k]]

In [None]:
# Compare keyword vs semantic search
semantic_queries = [
    ("high-speed data access", "LicketySplit Pro"),       # Fast cache
    ("protect sensitive information", "Vault-Tec Enterprise"),  # Security
    ("cloud backup", "DreamCloud Manager"),               # Backup/cloud
    ("real-time insights", "DiggityDog Analytics")        # Analytics
]

display(Markdown("## Keyword vs Semantic Search Comparison"))

for query, expected in semantic_queries:
    display(Markdown(f"### Query: `{query}`"))
    display(Markdown(f"**Expected:** {expected}"))
    
    # Keyword search (simple full-text overlap)
    keyword_results = search_products_keyword(query, PRODUCTS, top_k=3)
    keyword_names = [r['name'] for r in keyword_results] if keyword_results else ["No matches"]
    
    # Semantic search (requires embeddings/API)
    semantic_results = semantic_search(query, PRODUCTS, top_k=3)
    semantic_names = [s['name'] for s in semantic_results] if semantic_results else ["No matches"]
    
    # Display results as a Markdown table (no code block, no leading spaces)
    table_md = (
        "| Method | Top 3 Results | Status |\n"
        "|--------|----------------|--------|\n"
        f"| Keyword Search | {', '.join(keyword_names)} | {'✅' if expected in keyword_names else '❌'} |\n"
        f"| Semantic Search | {', '.join(semantic_names)} | {'✅ Found via meaning!' if expected in semantic_names else '❌'} |"
    )
    display(Markdown(table_md))
    
    display(HTML("<hr style='margin: 20px 0;'>"))


## Part 4: Comparing Model Performance with RAG

In [None]:
def compare_models_with_semantic_rag(question: str, products: List[Dict]):
    """Compare different models using semantic RAG"""
    
    # Use semantic search for better results
    search_results = semantic_search(question, products, top_k=2)
    context = create_context(search_results)
    
    prompt = f"""Based on the following product information:

{context}

Question: {question}

Provide a detailed answer based only on the provided information."""
    
    results = {}
    
    # Test GPT-3.5
    try:
        response = openai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "Answer based only on the provided context."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.3
        )
        results['GPT-3.5 Turbo'] = response.choices[0].message.content
    except Exception as e:
        results['GPT-3.5 Turbo'] = f"Error: {e}"
    
    # Test GPT-4o
    try:
        response = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Answer based only on the provided context."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.3
        )
        results['GPT-4o'] = response.choices[0].message.content
    except Exception as e:
        results['GPT-4o'] = f"Error: {e}"
    
    # Test Claude Haiku
    try:
        response = anthropic_client.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        results['Claude Haiku 3'] = response.content[0].text
    except Exception as e:
        results['Claude Haiku 3'] = f"Error: {e}"
    
    # Test Claude Sonnet
    try:
        response = anthropic_client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        results['Claude Sonnet 4'] = response.content[0].text
    except Exception as e:
        results['Claude Sonnet 4'] = f"Error: {e}"
    
    return results

# Compare models
question = "What's the best solution for high-performance data caching with Redis compatibility?"

display(Markdown(f"## Model Comparison with Semantic RAG\n\n**Question:** {question}"))
display(Markdown("*Now with RAG, all models provide accurate information based on real data*\n"))

model_responses = compare_models_with_semantic_rag(question, PRODUCTS)

for model, response in model_responses.items():
    display(Markdown(f"### {model}:"))
    display(Markdown(f"\n{response}\n"))
    display(HTML("<hr style='margin: 20px 0;'>"))