# Session 3.3: BakeryAI - RAG Pipeline & Best Practices

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Xe5g768967-1eW2GIYnHTMsnyic1muQt?usp=sharing)

## 🎯 Today's Goal

Build **production-ready RAG pipelines** with best practices!

### What is a RAG Pipeline?

```
User Question
     ↓
[Retrieve] ← Vector Store
     ↓
[Rerank/Filter]
     ↓
[Generate] ← LLM + Context
     ↓
Answer with Citations
```

### Why RAG vs Alternatives?

| Approach | Pros | Cons | Use Case |
|----------|------|------|----------|
| **RAG** | Up-to-date, cites sources, cost-effective | Retrieval quality matters | Knowledge bases, Q&A |
| **Fine-tuning** | Specialized behavior | Expensive, static knowledge | Style/format learning |
| **Long Context** | No retrieval needed | Expensive, slower | Small knowledge bases |
| **Prompt Engineering** | Simple, fast | Limited knowledge | Simple tasks |

### RAG Best Practices:

1. **Chunk Size**: 500-1500 chars (we use 800)
2. **Overlap**: 10-20% (we use 100 chars)
3. **Top K**: 3-5 documents
4. **Reranking**: Always rerank if possible
5. **Citations**: Track sources for every fact
6. **Evaluation**: Measure retrieval and generation quality

Let's build it! 🚀

In [1]:
!pip install -q langchain langchain-openai langchain-community
!pip install -q faiss-cpu chromadb ragas
!pip install -q python-dotenv

In [2]:
!git clone https://github.com/IvanReznikov/mdx-langchain-conclave

fatal: destination path 'mdx-langchain-conclave' already exists and is not an empty directory.


In [3]:
import pickle

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS, Chroma
from langchain_core.documents import Document

import os
from google.colab import userdata

# Set OpenAI API key from Google Colab's user environment or default
def set_openai_api_key(default_key: str = "YOUR_API_KEY") -> None:
    """Set the OpenAI API key from Google Colab's user environment or use a default value."""
    #if not (userdata.get("OPENAI_API_KEY") or "OPENAI_API_KEY" in os.environ):
    try:
      os.environ["OPENAI_API_KEY"] = userdata.get("MDX_OPENAI_API_KEY")
    except:
      os.environ["OPENAI_API_KEY"] = default_key

set_openai_api_key()
#set_openai_api_key("sk-...")

llm = ChatOpenAI(model="gpt-5-nano")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

print("✅ Environment ready!")
print(f"Embedding model: text-embedding-3-small (1536 dimensions)")

✅ Environment ready!
Embedding model: text-embedding-3-small (1536 dimensions)


## 1. Load Vector Store from Session 3.2

In [4]:
# Load FAISS vector store
try:
    vectorstore = FAISS.load_local(
        "/content/mdx-langchain-conclave/rag_artifacts/bakery_faiss_index",
        embeddings,
        allow_dangerous_deserialization=True
    )
    print("✅ Loaded FAISS vector store from Session 3.2")
except:
    print("⚠️  Creating sample vector store...")
    from langchain_core.documents import Document

    sample_docs = [
        Document(
            page_content="Our refund policy: Full refunds within 24 hours of order. After that, store credit is offered.",
            metadata={"source": "Customer_Service_Policy.txt", "category": "policy"}
        ),
        Document(
            page_content="Chocolate Truffle Cake: Rich chocolate with Belgian truffle filling. Price $45. Allergens: dairy, eggs, gluten.",
            metadata={"source": "cakes.pdf", "category": "product"}
        ),
        Document(
            page_content="Red Velvet Cake: Velvety red sponge with cream cheese frosting. Price $50. Allergens: dairy, eggs, gluten.",
            metadata={"source": "cakes.pdf", "category": "product"}
        ),
        Document(
            page_content="Food handlers must wash hands with soap for at least 20 seconds before handling food items.",
            metadata={"source": "SOP_Hygiene_Food_Safety.txt", "category": "safety"}
        ),
        Document(
            page_content="Response time standard: All customer inquiries must be responded to within 2 hours during business hours.",
            metadata={"source": "Customer_Service_Policy.txt", "category": "policy"}
        )
    ]

    vectorstore = FAISS.from_documents(sample_docs, embeddings)
    print("✅ Created sample vector store")

# Create retriever
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 3, "fetch_k": 10}
)

print(f"✅ Retriever configured: MMR search, top-k=3")

✅ Loaded FAISS vector store from Session 3.2
✅ Retriever configured: MMR search, top-k=3


## 2. Basic RAG Chain (Method 1: Simple)

The simplest way to build RAG.

In [5]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Pull the prompt
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

qa_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

# Test it - pass the string directly, not as a dict
query = "What is the refund policy?"
result = qa_chain.invoke(query)  # Changed from invoke({"query": query})

print("🔍 SIMPLE RAG CHAIN TEST")
print("=" * 70)
print(f"\nQuestion: {query}")
print(f"\nAnswer: {result}")  # result is a string, not a dict

🔍 SIMPLE RAG CHAIN TEST

Question: What is the refund policy?

Answer: - Eligible for a full refund for product quality issues (stale, spoiled, incorrect), our error in order fulfillment, or delivery failure due to our fault, within 24 hours of purchase with the receipt. 
- Manager approval is required for refunds over AED 300. 
- Not eligible for a refund includes change of mind after custom order production begins.


## 3. Advanced RAG Chain (Method 2: LCEL)

Build custom RAG with LangChain Expression Language.

In [6]:
from langchain_core.prompts import ChatPromptTemplate

# Custom RAG prompt
rag_prompt = ChatPromptTemplate.from_template("""
You are BakeryAI's knowledge assistant. Answer questions accurately based on the provided context.

Context from our knowledge base:
{context}

Question: {question}

Instructions:
1. Answer based ONLY on the provided context
2. If the answer isn't in the context, say "I don't have that information in my knowledge base"
3. Be specific and cite details from the context
4. Keep answers concise but complete

Answer:
""")

# Helper function to format documents
def format_docs(docs):
    return "\n\n".join(f"Document {i+1} (from {doc.metadata.get('source', 'unknown')}): {doc.page_content}"
                      for i, doc in enumerate(docs))

# Build RAG chain with LCEL
rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | rag_prompt
    | llm
    | StrOutputParser()
)

print("✅ Advanced RAG Chain built with LCEL")

✅ Advanced RAG Chain built with LCEL


In [7]:
# Test advanced chain
questions = [
    "What is the refund policy?",
    "Tell me about chocolate cakes and their allergens",
    "What are the hygiene requirements for food handlers?"
]

print("🧪 TESTING ADVANCED RAG CHAIN")
print("=" * 70)

for q in questions:
    print(f"\n❓ Question: {q}")
    answer = rag_chain.invoke(q)
    print(f"💡 Answer: {answer}")
    print("-" * 70)

🧪 TESTING ADVANCED RAG CHAIN

❓ Question: What is the refund policy?
💡 Answer: - Eligible for a full refund: if there are product quality issues (stale, spoiled, incorrect), if it was our error in order fulfillment, or if delivery failed due to our fault; eligible within 24 hours of purchase with receipt.

- Manager approval: refunds over AED 300 require manager approval.

- Not eligible for refund: change of mind after production has begun on a custom order. (Note: the document also mentions allergic reactions, but the text is cut off in the provided context.)
----------------------------------------------------------------------

❓ Question: Tell me about chocolate cakes and their allergens
💡 Answer: Here are the chocolate-related cakes and their allergens from the provided context:

- Red Velvet Cake (R): Has a cocoa base giving a chocolate hint. Allergens: dairy, gluten (wheat), and eggs. (Doc 1: “Allergens: dairy, gluten, eggs.” and “Contains dairy, wheat, and eggs.”)

- Black For

## 4. RAG with Source Citations

Track and display sources for every answer.

In [8]:
# Enhanced RAG with citations
citation_prompt = ChatPromptTemplate.from_template("""
Answer the question based on the provided documents. Include inline citations.

Documents:
{context}

Question: {question}

Provide your answer with citations in the format [Doc X] after each fact.
At the end, list all sources used.

Answer:
""")

# Chain with source tracking
def rag_with_sources(question):
    # Retrieve documents
    docs = retriever.invoke(question)

    # Format with doc numbers
    context = "\n\n".join(
        f"[Doc {i+1}] (Source: {doc.metadata.get('source', 'unknown')})\n{doc.page_content}"
        for i, doc in enumerate(docs)
    )

    # Generate answer
    chain = citation_prompt | llm | StrOutputParser()
    answer = chain.invoke({"context": context, "question": question})

    return {
        "answer": answer,
        "sources": [doc.metadata.get('source', 'unknown') for doc in docs],
        "documents": docs
    }

# Test
query = "What are the allergens in our cakes?"
result = rag_with_sources(query)

print("📚 RAG WITH CITATIONS")
print("=" * 70)
print(f"\nQuestion: {query}")
print(f"\nAnswer:\n{result['answer']}")
print(f"\n📑 Sources:")
for i, source in enumerate(set(result['sources']), 1):
    print(f"   {i}. {source}")

📚 RAG WITH CITATIONS

Question: What are the allergens in our cakes?

Answer:
- Dairy. [Doc 1][Doc 3]

- Gluten. [Doc 1][Doc 3]

- Eggs. [Doc 1]

- Nuts. [Doc 2]

- Milk. [Doc 2]

- Wheat. [Doc 2][Doc 3]

Sources: [Doc 1], [Doc 2], [Doc 3]

📑 Sources:
   1. /content/mdx-langchain-conclave/data/cakes.docx
   2. /content/mdx-langchain-conclave/data/cakes.pdf


## 5. Conversational RAG with Memory

RAG that remembers conversation history.

In [9]:
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

# Create memory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key="answer"
)

# Conversational RAG chain
conversational_rag = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    return_source_documents=True
)

print("✅ Conversational RAG with memory created")

✅ Conversational RAG with memory created


  memory = ConversationBufferMemory(


In [10]:
# Test conversation
print("💬 CONVERSATIONAL RAG TEST")
print("=" * 70)

# Turn 1
q1 = "What cakes do we offer?"
r1 = conversational_rag.invoke({"question": q1})
print(f"\n👤 Customer: {q1}")
print(f"🍰 BakeryAI: {r1['answer']}")

# Turn 2 (references previous)
q2 = "What are the allergens in them?"
r2 = conversational_rag.invoke({"question": q2})
print(f"\n👤 Customer: {q2}")
print(f"🍰 BakeryAI: {r2['answer']}")

# Turn 3
q3 = "What if I want a refund?"
r3 = conversational_rag.invoke({"question": q3})
print(f"\n👤 Customer: {q3}")
print(f"🍰 BakeryAI: {r3['answer']}")

print("\n✅ Memory allows context from previous turns!")

💬 CONVERSATIONAL RAG TEST

👤 Customer: What cakes do we offer?
🍰 BakeryAI: Here's the cake lineup described in your material:

- Carrot Cake: Hearty, slightly denser-than-sponge; great for brunch or celebrations. Contains nuts, dairy, and gluten.
- Opera Cake (R): French classic with almond sponge soaked in coffee syrup, layered with ganache and coffee buttercream, topped with chocolate glaze. Contains dairy, eggs, nuts, and gluten.
- Cheesecake: Rich cream cheese base on a buttery graham cracker crust, topped with seasonal fruit compote; no-bake. Contains dairy, eggs, and wheat.
- Rainbow Cake (R): Layered vanilla sponge in different colors with buttercream frosting; festive and crowd-pleasing.
- Seasonal Mango Coconut Cake: A warm-weather favorite, currently temporarily off the menu. Contains dairy, eggs, and coconut.
- Black Forest Cake (R): Chocolate sponge with whipped cream and cherries, kirsch-soaked layers; elegant and balanced. Contains dairy, eggs, and gluten.

If you’d like,

## 6. Alternative to RAG: Long Context Windows

Compare RAG vs putting everything in context.

In [11]:
# Get all documents
all_docs = vectorstore.similarity_search("", k=100)  # Get everything
all_content = "\n\n".join([doc.page_content for doc in all_docs[:10]])  # Take first 10

# Method 1: Long Context (no retrieval)
long_context_prompt = ChatPromptTemplate.from_template("""
Here is our complete knowledge base:

{all_knowledge}

Question: {question}

Answer based on the knowledge base above:
""")

long_context_chain = long_context_prompt | llm | StrOutputParser()

# Method 2: RAG (with retrieval)
# (already built above)

# Compare
import time

test_question = "What is the refund policy?"

print("⚖️  COMPARING: Long Context vs RAG")
print("=" * 70)

# Long context
print("\n1️⃣ LONG CONTEXT (all docs in prompt):")
start = time.time()
lc_answer = long_context_chain.invoke({
    "all_knowledge": all_content,
    "question": test_question
})
lc_time = time.time() - start
print(f"   Answer: {lc_answer[:100]}...")
print(f"   Time: {lc_time:.2f}s")
print(f"   Tokens used: ~{len(all_content.split())} (context)")

# RAG
print("\n2️⃣ RAG (retrieve then generate):")
start = time.time()
rag_answer = rag_chain.invoke(test_question)
rag_time = time.time() - start
retrieved_docs = retriever.invoke(test_question)
retrieved_content = " ".join([doc.page_content for doc in retrieved_docs])
print(f"   Answer: {rag_answer[:100]}...")
print(f"   Time: {rag_time:.2f}s")
print(f"   Tokens used: ~{len(retrieved_content.split())} (context)")

print("\n📊 Comparison:")
print(f"   Speed: RAG is {lc_time/rag_time:.1f}x faster")
print(f"   Cost: RAG uses ~{len(all_content.split())/len(retrieved_content.split()):.1f}x fewer tokens")
print(f"   Scalability: RAG scales to millions of documents, long context limited")

⚖️  COMPARING: Long Context vs RAG

1️⃣ LONG CONTEXT (all docs in prompt):
   Answer: Here is the refund policy as listed:

Eligible for refund
- Change of mind after custom order produc...
   Time: 7.47s
   Tokens used: ~547 (context)

2️⃣ RAG (retrieve then generate):
   Answer: - Eligible for full refund:
  - Product quality issues (stale, spoiled, incorrect)
  - Our error in ...
   Time: 16.81s
   Tokens used: ~296 (context)

📊 Comparison:
   Speed: RAG is 0.4x faster
   Cost: RAG uses ~1.8x fewer tokens
   Scalability: RAG scales to millions of documents, long context limited


## 7. Advanced RAG: Query Decomposition

Break complex questions into sub-questions.

In [12]:
decomposition_prompt = ChatPromptTemplate.from_template("""
Break down this complex question into 2-3 simpler sub-questions.

Complex Question: {question}

List the sub-questions (one per line):
""")

def multi_query_rag(complex_question):
    """Answer complex questions by decomposing into sub-queries"""

    # Decompose question
    decompose_chain = decomposition_prompt | llm | StrOutputParser()
    sub_questions = decompose_chain.invoke({"question": complex_question})

    print(f"🔍 Decomposed into sub-questions:")
    print(sub_questions)
    print()

    # Answer each sub-question
    sub_answers = []
    for sq in sub_questions.split('\n'):
        if sq.strip():
            answer = rag_chain.invoke(sq.strip())
            sub_answers.append(f"Q: {sq}\nA: {answer}")

    # Synthesize final answer
    synthesis_prompt = ChatPromptTemplate.from_template("""
    Original question: {question}

    Sub-answers:
    {sub_answers}

    Synthesize a comprehensive answer to the original question:
    """)

    final_chain = synthesis_prompt | llm | StrOutputParser()
    final_answer = final_chain.invoke({
        "question": complex_question,
        "sub_answers": "\n\n".join(sub_answers)
    })

    return final_answer

# Test
complex_q = "What cakes do we offer and what are their allergens, and if someone has allergies what is our policy?"

print("🎯 QUERY DECOMPOSITION RAG")
print("=" * 70)
print(f"\nComplex Question: {complex_q}\n")

answer = multi_query_rag(complex_q)
print("\n📝 Final Synthesized Answer:")
print(answer)

🎯 QUERY DECOMPOSITION RAG

Complex Question: What cakes do we offer and what are their allergens, and if someone has allergies what is our policy?

🔍 Decomposed into sub-questions:
What cakes do we offer?
What allergens are present in each cake?
What is our policy for customers with allergies?


📝 Final Synthesized Answer:
Here’s a consolidated answer addressing both what we offer and our allergy policy.

Cakes we offer (with available allergen details)
- Carrot Cake — described as a brunch-style cake; allergen information is not available in our current knowledge base. Please check with staff for exact allergens.
- Opera Cake (R) — French classic with almond sponge, coffee components, and ganache; contains dairy, eggs, nuts, and gluten.
- Cheesecake — rich cream cheese base on a buttery crust with seasonal fruit topping; contains dairy, eggs, and wheat (gluten).
- Rainbow Cake (R) — vanilla-flavored layered cake with buttercream; contains eggs, gluten, and dairy.
- Black Forest Cake (

## 8. Advanced RAG: HyDE (Hypothetical Document Embeddings)

Generate hypothetical answers, embed them, then retrieve.

In [13]:
hyde_prompt = ChatPromptTemplate.from_template("""
Generate a hypothetical ideal answer to this question (even if you don't know the real answer):

Question: {question}

Hypothetical answer (2-3 sentences):
""")

def hyde_rag(question):
    """RAG with hypothetical document embeddings"""

    # Generate hypothetical answer
    hyde_chain = hyde_prompt | llm | StrOutputParser()
    hypothetical_answer = hyde_chain.invoke({"question": question})

    print(f"💭 Hypothetical answer generated:")
    print(f"   {hypothetical_answer}\n")

    # Retrieve using hypothetical answer (not original question)
    docs = vectorstore.similarity_search(hypothetical_answer, k=3)

    print(f"📚 Retrieved docs using hypothetical answer:")
    for i, doc in enumerate(docs, 1):
        print(f"   {i}. {doc.page_content[:60]}...")

    # Generate final answer
    context = "\n\n".join([doc.page_content for doc in docs])
    final_prompt = ChatPromptTemplate.from_template("""
    Context: {context}

    Question: {question}

    Answer:
    """)

    final_chain = final_prompt | llm | StrOutputParser()
    return final_chain.invoke({"context": context, "question": question})

# Test
print("🔮 HyDE RAG TEST")
print("=" * 70)
print()

test_q = "What makes our cakes special?"
print(f"Question: {test_q}\n")

hyde_answer = hyde_rag(test_q)
print(f"\n✨ Final Answer:\n{hyde_answer}")

print("\n💡 HyDE often retrieves better docs by matching answer patterns!")

🔮 HyDE RAG TEST

Question: What makes our cakes special?

💭 Hypothetical answer generated:
   Each cake is handmade in small batches from real, locally sourced ingredients, delivering a moist crumb and rich, layered flavors. We marry classic baking techniques with playful, custom decorations to reflect your celebration and taste. Best of all, we tailor every cake to your preferences and dietary needs, making it truly yours.

📚 Retrieved docs using hypothetical answer:
   1. . Its hearty, satisfying, and just the right level of indulg...
   2. From New York-style to Japanese jiggly versions, cheesecake ...
   3. . Finished with spun sugar and caramel crowns, it truly look...

✨ Final Answer:
What makes our cakes special:

- Versatility for every occasion: perfect for brunch, birthdays, family reunions, and office celebrations.
- Rich, multi-layered flavors with distinct textures: from the dense, wholesome carrot cake and the refined Opera Cake to the no-bake cheesecake and the molten Ma

## 9. RAG Evaluation

Measure RAG quality with metrics.

In [14]:
# Simple RAG evaluation
test_qa_pairs = [
    {
        "question": "What is the refund policy?",
        "expected_keywords": ["24 hours", "refund", "store credit"]
    },
    {
        "question": "What allergens are in chocolate cake?",
        "expected_keywords": ["dairy", "eggs", "gluten"]
    },
    {
        "question": "What are hygiene requirements?",
        "expected_keywords": ["wash", "hands", "20 seconds"]
    }
]

def evaluate_rag(qa_pairs):
    """Simple keyword-based evaluation"""
    results = []

    for qa in qa_pairs:
        answer = rag_chain.invoke(qa["question"])
        answer_lower = answer.lower()

        # Check if expected keywords appear
        keywords_found = sum(1 for kw in qa["expected_keywords"]
                            if kw.lower() in answer_lower)
        score = keywords_found / len(qa["expected_keywords"])

        results.append({
            "question": qa["question"],
            "answer": answer,
            "score": score,
            "keywords_found": keywords_found,
            "keywords_total": len(qa["expected_keywords"])
        })

    return results

# Run evaluation
print("📊 RAG EVALUATION")
print("=" * 70)

eval_results = evaluate_rag(test_qa_pairs)

for i, result in enumerate(eval_results, 1):
    print(f"\nTest {i}:")
    print(f"Question: {result['question']}")
    print(f"Answer: {result['answer'][:80]}...")
    print(f"Score: {result['score']:.1%} ({result['keywords_found']}/{result['keywords_total']} keywords)")

avg_score = sum(r['score'] for r in eval_results) / len(eval_results)
print(f"\n📈 Overall Score: {avg_score:.1%}")

📊 RAG EVALUATION

Test 1:
Question: What is the refund policy?
Answer: - Eligible for a full refund:
  - Product quality issues (stale, spoiled, incorr...
Score: 66.7% (2/3 keywords)

Test 2:
Question: What allergens are in chocolate cake?
Answer: Dairy, gluten, and soy. (Chocolate Truffle Cake contains dairy, gluten, and soy....
Score: 66.7% (2/3 keywords)

Test 3:
Question: What are hygiene requirements?
Answer: Here are the hygiene requirements from the SOPs:

- Personal hygiene
  - Handwas...
Score: 100.0% (3/3 keywords)

📈 Overall Score: 77.8%


## 🎯 Exercise 5: Implement Contextual Compression

**Task**: Filter retrieved documents to keep only relevant parts:
1. Retrieve more documents initially (k=10)
2. Use LLM to extract only relevant sentences
3. Pass compressed context to generation

In [15]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# TODO: Build compression retriever
# compressor = LLMChainExtractor.from_llm(llm)
# compression_retriever = ContextualCompressionRetriever(
#     base_compressor=compressor,
#     base_retriever=retriever
# )

## 🎯 Exercise 6: Build RAG Evaluation Framework

**Task**: Create comprehensive evaluation:
1. Retrieval quality (precision, recall)
2. Answer quality (relevance, faithfulness)
3. Latency and cost metrics
4. Generate evaluation report

In [16]:
class RAGEvaluator:
    def __init__(self, rag_chain, retriever, test_set):
        self.rag_chain = rag_chain
        self.retriever = retriever
        self.test_set = test_set

    def evaluate_retrieval(self):
        """Measure retrieval quality"""
        # TODO: Calculate precision@k, recall@k
        pass

    def evaluate_generation(self):
        """Measure answer quality"""
        # TODO: Use LLM-as-judge or RAGAS metrics
        pass

    def evaluate_performance(self):
        """Measure latency and cost"""
        # TODO: Track time and token usage
        pass

    def generate_report(self):
        """Comprehensive evaluation report"""
        # TODO: Combine all metrics
        pass

# Test your evaluator

## Summary: What We Built

### ✅ Session 3.3 Achievements:

1. **Basic RAG**: Simple question-answering
2. **Advanced RAG**: Custom prompts with LCEL
3. **Citations**: Source tracking and attribution
4. **Conversational RAG**: Memory-enabled Q&A
5. **Query Decomposition**: Complex question handling
6. **HyDE**: Hypothetical document embeddings
7. **Evaluation**: RAG quality metrics
8. **Alternatives**: Compared with fine-tuning and long context

### 🎨 BakeryAI RAG Capabilities:

✨ **Accurate Answers**: Based on real knowledge base  
✨ **Source Citations**: Every fact is traceable  
✨ **Conversational**: Remembers context  
✨ **Complex Queries**: Handles multi-part questions  
✨ **Cost-Effective**: Only retrieves what's needed  

### 📊 RAG Best Practices Summary:

**Chunk Size**: 500-1500 chars (we use 800)  
**Overlap**: 10-20% (we use 100 chars)  
**Top K**: 3-5 documents  
**Search Type**: MMR for diversity  
**Prompt**: Be specific about using only context  
**Evaluation**: Always measure and iterate  

### 🚀 Next: Notebook 3.4

We'll integrate RAG with **agents from Session 2**:
- Agents that use RAG tools
- Multi-step reasoning with knowledge retrieval
- Query routing to appropriate knowledge bases
- Complete BakeryAI with agents + RAG!