# Demo: Simple RAG System

**NLP Final Lecture - Live Demo**

This notebook demonstrates a basic Retrieval-Augmented Generation (RAG) pipeline:
1. Load documents
2. Create embeddings and store in vector database
3. Retrieve relevant chunks for a query
4. Generate grounded response

In [None]:
# Install required packages (run once)
# !pip install openai langchain langchain-openai chromadb

In [None]:
import os
from openai import OpenAI

# Set your API key
# os.environ["OPENAI_API_KEY"] = "your-key-here"

client = OpenAI()

## Step 1: Sample Documents

For this demo, we'll use some recent AI news that GPT-4's training data wouldn't know about.

In [None]:
# Sample documents about recent events (post-training cutoff)
documents = [
    {
        "title": "DeepSeek-R1 Release",
        "content": """In January 2025, DeepSeek released R1, an open-source reasoning model 
        that matches OpenAI o1's performance on many benchmarks. The model was trained using 
        a novel approach called GRPO (Group Relative Policy Optimization) which doesn't 
        require a separate reward model. DeepSeek-R1 achieved 71% on AIME 2024 math problems, 
        up from 15.6% for the base model. The company released distilled versions ranging 
        from 1.5B to 70B parameters."""
    },
    {
        "title": "OpenAI o1 Capabilities",
        "content": """OpenAI's o1 model, released in September 2024, introduced a new paradigm 
        of test-time compute scaling. Unlike previous models that generate answers directly, 
        o1 'thinks' before responding, using hidden reasoning tokens. This allows the model 
        to solve complex problems that require multi-step reasoning. The o1-pro version 
        generates up to 5000+ tokens of reasoning for difficult problems."""
    },
    {
        "title": "Anthropic Claude 3.5",
        "content": """Anthropic released Claude 3.5 Sonnet in 2024, which became known for 
        its strong coding abilities and longer context windows. Claude uses Constitutional AI 
        for alignment, where the model critiques its own outputs against a set of principles 
        rather than relying solely on human feedback. This approach allows for more scalable 
        alignment training."""
    },
    {
        "title": "AI Agent Frameworks 2025",
        "content": """By early 2025, AI agent frameworks have matured significantly. 
        LangChain and LlamaIndex remain popular choices for building RAG applications. 
        Microsoft's AutoGen enables multi-agent collaboration where specialized agents 
        work together on complex tasks. CrewAI focuses on role-based agent orchestration. 
        The main challenges remain reliability and cost management for production deployments."""
    }
]

print(f"Loaded {len(documents)} documents")

## Step 2: Create Embeddings

In [None]:
def get_embedding(text, model="text-embedding-3-small"):
    """Get embedding for a text using OpenAI API."""
    response = client.embeddings.create(
        input=text,
        model=model
    )
    return response.data[0].embedding

# Create embeddings for all documents
for doc in documents:
    doc["embedding"] = get_embedding(doc["content"])
    print(f"Embedded: {doc['title']} ({len(doc['embedding'])} dimensions)")

## Step 3: Simple Vector Search

In [None]:
import numpy as np

def cosine_similarity(a, b):
    """Calculate cosine similarity between two vectors."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def retrieve(query, documents, top_k=2):
    """Retrieve most similar documents for a query."""
    query_embedding = get_embedding(query)
    
    # Calculate similarities
    similarities = []
    for doc in documents:
        sim = cosine_similarity(query_embedding, doc["embedding"])
        similarities.append((doc, sim))
    
    # Sort by similarity
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    return similarities[:top_k]

In [None]:
# Test retrieval
query = "What is GRPO and how does it work?"

results = retrieve(query, documents, top_k=2)

print(f"Query: {query}\n")
print("Retrieved documents:")
for doc, score in results:
    print(f"  [{score:.3f}] {doc['title']}")

## Step 4: Generate Grounded Response

In [None]:
def generate_rag_response(query, documents, top_k=2):
    """Generate a response using RAG."""
    # Retrieve relevant documents
    retrieved = retrieve(query, documents, top_k=top_k)
    
    # Build context
    context = "\n\n".join([
        f"[Source: {doc['title']}]\n{doc['content']}" 
        for doc, _ in retrieved
    ])
    
    # Create prompt
    prompt = f"""Use the following context to answer the question. 
If the answer is not in the context, say "I don't have information about that."

Context:
{context}

Question: {query}

Answer (cite your sources):"""
    
    # Generate response
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    
    return response.choices[0].message.content, retrieved

In [None]:
# Demo: Question about recent events
query = "What is GRPO and which model uses it?"

answer, sources = generate_rag_response(query, documents)

print(f"Query: {query}\n")
print(f"Answer:\n{answer}\n")
print("Sources used:")
for doc, score in sources:
    print(f"  - {doc['title']} (similarity: {score:.3f})")

In [None]:
# Demo: Question NOT in the documents
query = "What is the capital of France?"

answer, sources = generate_rag_response(query, documents)

print(f"Query: {query}\n")
print(f"Answer:\n{answer}")
print("\n(Note: The model correctly indicates this is not in the context)")

In [None]:
# Interactive demo
print("Try your own questions about:")
print("- DeepSeek-R1")
print("- OpenAI o1")
print("- Constitutional AI")
print("- AI Agent frameworks")

# Uncomment to use:
# your_query = input("Your question: ")
# answer, sources = generate_rag_response(your_query, documents)
# print(f"\nAnswer: {answer}")

## Key Takeaways

1. **RAG separates knowledge from reasoning**: The LLM reasons, the database stores facts
2. **Embeddings enable semantic search**: Similar meaning = similar vectors
3. **Grounding reduces hallucination**: Model cites sources, can say "I don't know"
4. **Simple implementation**: Core RAG is just embed -> search -> generate