# Retrieval Augmented Generation (RAG) with Claude

This notebook demonstrates how to build a simple RAG (Retrieval Augmented Generation) system using:
- **FAISS** for fast vector similarity search
- **TF-IDF** for document vectorization
- **Claude** for generating answers based on retrieved context

## What is RAG?

RAG combines retrieval and generation:
1. **Retrieve** relevant documents from a knowledge base
2. **Augment** the prompt with retrieved context
3. **Generate** an answer using an LLM

This allows the model to answer questions using information it wasn't trained on.

## Step 1: Install Dependencies

In [None]:
%pip install anthropic python-dotenv faiss-cpu scikit-learn

## Step 2: Set Up the Knowledge Base and Retriever

We'll create a simple knowledge base with sample documents and use TF-IDF + FAISS for retrieval.

In [None]:
import faiss
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer

# Sample documents (small knowledge base)
documents = [
    "Python is a programming language known for its simplicity and readability.",
    "The capital of France is Paris, known for the Eiffel Tower.",
    "Claude is an AI assistant created by Anthropic, focused on being helpful and harmless.",
    "The Great Wall of China is a historic landmark spanning over 13,000 miles.",
    "Machine learning is a subset of artificial intelligence that learns from data.",
    "DocuSign is a company that provides electronic signature technology.",
    "RAG stands for Retrieval Augmented Generation, a technique to enhance LLM responses."
]

# Convert documents into TF-IDF vectors
vectorizer = TfidfVectorizer()
document_vectors = vectorizer.fit_transform(documents).toarray()

# Index the document vectors using FAISS (for fast retrieval)
index = faiss.IndexFlatL2(document_vectors.shape[1])
index.add(np.array(document_vectors).astype(np.float32))

print(f"Indexed {len(documents)} documents")
print(f"Vector dimension: {document_vectors.shape[1]}")

In [None]:
def retrieve(query, top_n=1):
    """
    Retrieve the most relevant documents for a query.
    
    Args:
        query: The user's question
        top_n: Number of documents to retrieve
    
    Returns:
        List of relevant documents
    """
    query_vector = vectorizer.transform([query]).toarray()
    distances, indices = index.search(query_vector.astype(np.float32), top_n)
    return [documents[i] for i in indices[0]]

# Test retrieval
test_query = "What is the capital of France?"
retrieved = retrieve(test_query)
print(f"Query: {test_query}")
print(f"Retrieved: {retrieved}")

## Step 3: Set Up Claude Client

In [None]:
import os
from dotenv import load_dotenv
import anthropic

# Load environment variables from .env file
load_dotenv()

# Get API key from environment variable
api_key = os.getenv("ANTHROPIC_API_KEY")

# Initialize the Anthropic client
client = anthropic.Anthropic(api_key=api_key)

# Choose your Claude model
model = "claude-3-7-sonnet-20250219"

print(f"Using model: {model}")
print(f"API key configured: {'Yes' if api_key else 'No - please set ANTHROPIC_API_KEY'}")

## Step 4: Create the RAG Pipeline

The RAG pipeline:
1. Takes a user query
2. Retrieves relevant documents
3. Sends the query + context to Claude
4. Returns the generated answer

In [None]:
def rag_pipeline(query, top_n=1):
    """
    Run the full RAG pipeline: retrieve context and generate answer.
    
    Args:
        query: The user's question
        top_n: Number of documents to retrieve for context
    
    Returns:
        The generated answer
    """
    # Step 1: Retrieve relevant documents
    retrieved_docs = retrieve(query, top_n=top_n)
    context = " ".join(retrieved_docs)
    print(f"Retrieved Document(s): {context}")
    print("-" * 50)
    
    # Step 2: Create the prompt with context
    user_prompt = f"""Based on the following context, answer the question.

Context: {context}

Question: {query}

Answer:"""
    
    # Step 3: Generate answer using Claude
    response = client.messages.create(
        model=model,
        max_tokens=200,
        temperature=0.3,  # Lower temperature for factual answers
        system="You are an AI assistant that answers questions based only on the provided context. If the context doesn't contain enough information to answer, say so.",
        messages=[
            {"role": "user", "content": user_prompt}
        ]
    )
    
    return response.content[0].text

## Step 5: Test the RAG System

In [None]:
# Test 1: Question with relevant context in knowledge base
query1 = "What is the capital of France?"
print(f"Question: {query1}\n")
response1 = rag_pipeline(query1)
print(f"\nAnswer: {response1}")

In [None]:
# Test 2: Question about technology
query2 = "What is Python?"
print(f"Question: {query2}\n")
response2 = rag_pipeline(query2)
print(f"\nAnswer: {response2}")

In [None]:
# Test 3: Question about Claude
query3 = "Who created Claude?"
print(f"Question: {query3}\n")
response3 = rag_pipeline(query3)
print(f"\nAnswer: {response3}")

In [None]:
# Test 4: Question NOT in knowledge base
query4 = "What is the population of Tokyo?"
print(f"Question: {query4}\n")
response4 = rag_pipeline(query4)
print(f"\nAnswer: {response4}")

## Step 6: RAG with Multiple Documents

Let's retrieve multiple documents for better context.

In [None]:
# Retrieve top 2 documents for more context
query5 = "Tell me about AI and machine learning"
print(f"Question: {query5}\n")
response5 = rag_pipeline(query5, top_n=2)
print(f"\nAnswer: {response5}")

## Step 7: Compare RAG vs Direct Query

Let's see the difference between asking Claude directly vs using RAG with our knowledge base.

In [None]:
def direct_query(query):
    """Ask Claude directly without RAG context."""
    response = client.messages.create(
        model=model,
        max_tokens=200,
        temperature=0.3,
        messages=[
            {"role": "user", "content": query}
        ]
    )
    return response.content[0].text

# Compare responses
test_question = "What does DocuSign do?"

print("=" * 50)
print("DIRECT QUERY (no RAG)")
print("=" * 50)
print(direct_query(test_question))

print("\n" + "=" * 50)
print("RAG QUERY (with retrieved context)")
print("=" * 50)
print(rag_pipeline(test_question))

## Summary

In this notebook, we built a simple RAG system that:

1. **Indexes documents** using TF-IDF vectors and FAISS
2. **Retrieves relevant documents** based on query similarity
3. **Generates answers** using Claude with retrieved context

### Key Concepts:

| Component | Purpose |
|-----------|----------|
| **TF-IDF** | Converts text to numerical vectors |
| **FAISS** | Fast similarity search for retrieval |
| **Claude** | Generates natural language answers |
| **System Prompt** | Constrains answers to provided context |

### When to Use RAG:

- When you have domain-specific knowledge not in the LLM's training
- When you need up-to-date information
- When you need answers grounded in specific documents
- When you want to reduce hallucinations

### Improvements for Production:

- Use **embeddings** (e.g., from an embedding model) instead of TF-IDF for better semantic search
- Use a **vector database** (e.g., Pinecone, Weaviate) for larger document collections
- Implement **chunking** for long documents
- Add **reranking** to improve retrieval quality