# How to Build a Claude-Powered RAG from Scratch

**RAG (Retrieval-Augmented Generation)** is a technique to optimize the output of a Large Language Model (LLM), so it references an authoritative knowledge base outside its training data sources before generating a response.

In this notebook, we will build a robust RAG system using:
1.  **Python** for logic.
2.  **ChromaDB** as our Vector Store.
3.  **Sentence-Transformers** for creating Embeddings.
4.  **Claude (via OpenRouter)** as our Generative AI.

---

## Step 1: Install Dependencies

We need `openai`, `chromadb`, `sentence-transformers`, `pypdf`, and `langchain-text-splitters`.

In [None]:
!pip install openai chromadb sentence-transformers numpy pypdf langchain-text-splitters

## Step 2: Setup Claude via OpenRouter

In [None]:
import os
from openai import OpenAI
import getpass

# Enter your OpenRouter API Key
OPENROUTER_API_KEY = getpass.getpass("Enter your OpenRouter API Key: ")

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
)

def chat_with_claude(prompt, system_prompt="You are a helpful assistant."):
    completion = client.chat.completions.create(
        model="anthropic/claude-3-haiku",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ]
    )
    return completion.choices[0].message.content

## Step 3: Create a Knowledge Base (Ingestion & Chunking)

We will load a PDF and split it into chunks.

In [None]:
import requests
from pypdf import PdfReader

# Download example PDF
pdf_url = "https://arxiv.org/pdf/1706.03762.pdf" # The "Attention Is All You Need" paper
pdf_path = "document.pdf"

if not os.path.exists(pdf_path):
    response = requests.get(pdf_url)
    with open(pdf_path, "wb") as f:
        f.write(response.content)
    print("PDF downloaded")
else:
    print("PDF already exists")

# Read PDF
reader = PdfReader(pdf_path)
text_corpus = []
for page in reader.pages:
    text = page.extract_text()
    if text:
        text_corpus.append(text)

print(f"Knowledge Base loaded with {len(text_corpus)} pages.")

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, # Increased chunk size for better context
    chunk_overlap=100
)

chunks = splitter.split_text("\n".join(text_corpus))
print(f"Split into {len(chunks)} chunks.")

## Step 4: Embeddings & Vector Store (ChromaDB)

We will use `SentenceTransformer` to create embeddings and `ChromaDB` to store them.

In [None]:
import chromadb
from sentence_transformers import SentenceTransformer
import uuid

# 1. Initialize Embedding Model
print("Loading embedding model...")
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# 2. Initialize ChromaDB (Persistent or Ephemeral)
chroma_client = chromadb.Client()

# Delete collection if exists to reset
try:
    chroma_client.delete_collection(name="rag_experiment")
except:
    pass

collection = chroma_client.create_collection(name="rag_experiment")

# 3. Add chunks to ChromaDB
print("Embedding and adding documents to ChromaDB...")

# Chroma expects lists
ids = [str(uuid.uuid4()) for _ in chunks]
embeddings = embedding_model.encode(chunks).tolist()

collection.add(
    documents=chunks,
    embeddings=embeddings,
    ids=ids
)

print("Vector Store ready!")

## Step 5: Retrieval & Retrieval Function

In [None]:
def retrieve_relevant_docs(query, n_results=3):
    # 1. Embed Query
    query_embedding = embedding_model.encode([query]).tolist()
    
    # 2. Query Chroma
    results = collection.query(
        query_embeddings=query_embedding,
        n_results=n_results
    )
    
    # 3. Extract Documents
    return results['documents'][0] # List of documents

# Test retrieval
query = "What is self-attention?"
docs = retrieve_relevant_docs(query)
print(f"Retrieved {len(docs)} documents for query: '{query}'")

## Step 6: Full RAG Pipeline with Markdown Output

We will now combine retrieval with Claude and ensure the output is displayed neatly.

In [None]:
from IPython.display import display, Markdown

def rag_chatbot(user_query):
    # 1. Retrieve
    retrieved_docs = retrieve_relevant_docs(user_query)
    context = "\n\n".join(retrieved_docs)
    
    # 2. Augment
    augmented_prompt = f"""
    You are a helpful assistant. Use the following context to answer the user's question.
    
    Context:
    {context}
    
    Question: {user_query}
    
    Answer clearly and concisely in Markdown format.
    """
    
    # 3. Generate
    print("Thinking...")
    response = chat_with_claude(augmented_prompt)
    
    # 4. Display Result
    display(Markdown(f"### Answer\n{response}"))
    return response

# Run
rag_chatbot("Why is self-attention useful?")