# Complete RAG Demonstration: From Naive to Advanced

## What is RAG?

**RAG (Retrieval-Augmented Generation)** is a technique that enhances LLM responses by retrieving relevant documents from a knowledge base before generating an answer. Instead of relying solely on the LLM's training data, RAG allows the model to access current, domain-specific information.

**Why RAG?**
- LLMs have knowledge cutoff dates and don't know about recent information
- LLMs don't know about your private/company data
- RAG grounds responses in actual documents, reducing hallucinations

This notebook demonstrates RAG using **LangChain v1+ with LCEL** (LangChain Expression Language).

**Key Technologies:**
- **LangChain**: Framework for building LLM applications
- **LCEL**: Modern way to compose LangChain components as chains
- **sentence-transformers**: Converts text to vector embeddings
- **Qdrant**: Vector database for similarity search
- **LangGraph**: Orchestrates complex multi-step workflows
- **Ollama**: Runs LLMs locally (we'll use Gemma2 2B)

## Table of Contents
1. Setup
2. LLM Without RAG
3. Naive RAG
4. Where Naive RAG Fails
5. Advanced RAG (LangGraph)
6. Comparison

## 1. Setup

### Prerequisites
```bash
# Install UV
brew install uv

# Install Ollama
brew install ollama
ollama serve
ollama pull gemma2:2b
```

In [1]:
# Install packages
!uv pip install langchain-core langchain-community langgraph langchain-ollama
!uv pip install qdrant-client langchain-qdrant sentence-transformers langchain-text-splitters

[2mAudited [1m4 packages[0m [2min 29ms[0m[0m
[2mAudited [1m4 packages[0m [2min 21ms[0m[0m


### Start Qdrant Vector Database

**Qdrant** is a vector database optimized for similarity search. Unlike traditional databases that store rows of data, vector databases store embeddings (numerical representations of text) and can quickly find similar items.

We'll run Qdrant in a Docker container for easy setup and teardown.

In [2]:
import time

# Start Qdrant container
!docker run -d -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    --name qdrant_rag qdrant/qdrant:latest

time.sleep(1)
print("[OK] Qdrant assumed running at http://localhost:6333/dashboard")

e3f08c8da6eee5d271c3e05aaa5245207935fee31223aadc9d1e4abb05ecf816
[OK] Qdrant assumed running at http://localhost:6333/dashboard


### Initialize LLM, Embeddings, and Database Client

We need three key components:

1. **LLM (Large Language Model)**: Gemma2 2B via Ollama - generates human-readable answers
2. **Embedding Model**: sentence-transformers - converts text to vectors (384-dimensional) for similarity search
3. **Qdrant Client**: Connects to our vector database

**Important distinction**: The embedding model (22M parameters) only converts text to vectors, while the LLM (2B parameters) generates text. They serve different purposes!

In [3]:
# Import models and clients
from langchain_ollama import OllamaLLM
from langchain_community.embeddings import HuggingFaceEmbeddings
from qdrant_client import QdrantClient

import warnings
warnings.filterwarnings('ignore')

# Initialize LLM (for generating text)
llm = OllamaLLM(model="gemma2:2b", temperature=0.1)
print("[OK] LLM: Gemma2 2B (Ollama)")

# Initialize embedding model (for converting text to vectors)
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)
print("[OK] Embeddings: all-MiniLM-L6-v2 (384d)")

# Initialize Qdrant client
qdrant_client = QdrantClient(url="http://localhost:6333", prefer_grpc=False)
print(f"[OK] Qdrant: {qdrant_client.get_collections()}")

[OK] LLM: Gemma2 2B (Ollama)
[OK] Embeddings: all-MiniLM-L6-v2 (384d)
[OK] Qdrant: collections=[]


### Sample Data

We'll create a small knowledge base about a fictional restaurant (The Bangalore Bistro). This simulates having internal restaurant documents (menu, policy, hours) that aren't in the LLM's training data.

In [4]:
documents = [
    """The Whitefield Vision (Doc 1): 
    We are proud to announce the grand opening of 'Bangalore Bistro: Whitefield' on February 15, 2026. 
    This branch marks the beginning of our 'Robo-Server' era, featuring sleek metallic decor and a 
    fully automated dining experience designed specifically for the tech-forward community of Whitefield.""",
    
    """The Signature Avocado Toast (Doc 2): 
    Our 'Green Gold' Avocado Toast (‚Çπ450) is a fan favorite. It features slow-fermented sourdough 
    bread from our Indiranagar bakery, topped with hand-picked Hass avocados, pomegranate seeds, 
    and a drizzle of our signature spicy local chili oil.""",
    
    """The Heritage Filter Coffee (Doc 3): 
    Experience our Filter Coffee (‚Çπ60), crafted from a 50-year-old family recipe using Peaberry 
    beans from Chikmagalur. Brewed in traditional brass filters, it delivers the perfect, frothy 
    decoction that Bangaloreans have loved for decades.""",
    
    """Senior Citizen Inclusivity (Doc 4): 
    We honor our elders with a flat 20% discount on all food items for guests aged 65 and above. 
    Simply present a valid government ID to our staff (or scan it at the Robo-Kiosk) to avail of 
    this benefit. Respecting our roots is a core value at the Bistro.""",
    
    """The Early Bird Rewards (Doc 5): 
    Start your productivity early at any of our branches. All orders placed before 9:00 AM are 
    eligible for a 15% 'Early Bird' discount. It's our way of rewarding the early risers and 
    ensuring a peaceful, budget-friendly start to your day.""",
    
    """The Robo-Tech Surcharge (Doc 6): 
    To maintain our innovative fleet of automated servers at the Whitefield branch, a 10% 
    'Robo-Tech service fee' is added to every bill. This fee ensures peak performance, 
    regular maintenance, and a seamless futuristic dining experience for all guests.""",
    
    """Membership Evolution (Doc 7): 
    With the launch of our app, the traditional 'Gold Leaf Membership Card' is now retired. 
    Previous cardholders can verify their details on the new 'Bistro App' to receive an 
    immediate ‚Çπ200 joining credit, which can be applied toward any future bill.""",
    
    """Whitefield Safety Protocols (Doc 8): 
    Due to the high-voltage automated equipment and moving parts of our robotic servers, the 
    Whitefield branch is strictly NOT pet-friendly. We prioritize the safety of both our 
    robotic staff and your beloved furry companions.""",
    
    """The Indiranagar Heritage (Doc 9): 
    Our Indiranagar branch continues to be a pet-friendly oasis, offering lush garden seating 
    and special 'Puppy-Patties' for our four-legged guests. It remains the perfect weekend 
    retreat for families and their pets in the heart of the city.""",
    
    """Senior Launch Special (Doc 10): 
    For the grand opening of the Whitefield branch on Feb 15, 2026, all eligible senior 
    citizens (65+) will receive a complimentary Filter Coffee. This gesture symbolizes our 
    dedication to bridging traditional hospitality with the future of dining."""
]

test_questions = [
    "When does the Bangalore Bistro's Whitefield branch open?",
    "Is the Indiranagar branch pet-friendly?"
]

print(f"[OK] {len(documents)} detailed documents, {len(test_questions)} test questions")

[OK] 10 detailed documents, 2 test questions


## 2. LLM Without RAG

First, let's see what happens when we ask the LLM directly without providing any context. The LLM will only use its training data, which doesn't include our The Bangalore Bistro restaurant documents.

In [5]:
print("=" * 80)
print("LLM WITHOUT RAG")
print("=" * 80)

for q in test_questions:
    print(f"\nQ: {q}")
    print(f"A: {llm.invoke(q)}")
    print("-" * 80)

print("\n[Analysis] LLM cannot provide accurate TechCorp-specific information")

LLM WITHOUT RAG

Q: When does the Bangalore Bistro's Whitefield branch open?
A: I do not have access to real-time information, including business hours. 

To find out when the Bangalore Bistro's Whitefield branch opens, I recommend:

* **Checking their website:** Most businesses list their operating hours on their official website.
* **Calling them directly:** You can call the restaurant and ask about their opening times.
* **Using a food delivery app:** Apps like Zomato or Swiggy often display the business hours of restaurants in their listings. 


I hope this helps! 

--------------------------------------------------------------------------------

Q: Is the Indiranagar branch pet-friendly?
A: I do not have access to real-time information, including details about specific businesses like whether they are pet-friendly. 

**To find out if the Indiranagar branch of a particular business is pet-friendly, I recommend:**

* **Checking their website:** Many businesses list their policies on

## 3. Naive RAG

Now let's implement a basic RAG system:
1. **Split** documents into chunks
2. **Embed** chunks into vectors and **store** in Qdrant
3. **Retrieve** relevant chunks based on question similarity
4. **Generate** answer using LLM with retrieved context

### Step 1: Prepare and Store Documents

In [7]:
# Import document processing tools
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_qdrant import QdrantVectorStore

# Split documents into chunks (smaller pieces for better retrieval)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
# 1. Fixed chunk length
# 2. split based on para
# 3. intelligent split
doc_objects = [Document(page_content=doc) for doc in documents]
splits = text_splitter.split_documents(doc_objects)
print(f"[OK] {len(splits)} chunks")

# Create vector store in Qdrant
collection_name = "bangalore_bistro_2026"
try:
    qdrant_client.delete_collection(collection_name)  # Clean up previous runs
except:
    pass

# Store document embeddings in Qdrant
vectorstore = QdrantVectorStore.from_documents(
    documents=splits,
    embedding=embeddings,
    url="http://localhost:6333",
    prefer_grpc=False,
    collection_name=collection_name,
)
print(f"[OK] Qdrant collection: {collection_name}")

# Create retriever (will fetch top k=2 most similar chunks)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
print("[OK] Retriever created (k=2)")

[OK] 10 chunks
[OK] Qdrant collection: bangalore_bistro_2026
[OK] Retriever created (k=2)


### Step 2: Build RAG Chain with LCEL

**LCEL (LangChain Expression Language)** is the modern v1+ way to compose chains. Think of it as building a pipeline where:
- The `|` operator chains components together
- Data flows from left to right
- Each component transforms the data

Our chain: `question ‚Üí retrieve docs ‚Üí format ‚Üí prompt ‚Üí LLM ‚Üí parse output`

In [8]:
# Import LCEL components
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Define prompt template
template = """Answer based on context:

Context: {context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(template)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build chain using LCEL syntax
naive_rag_chain = (
    {
        "context": retriever | format_docs,      # Retrieve docs and format them
        "question": RunnablePassthrough()      # Pass question through
    }
    | prompt                                   # Fill prompt template
    | llm                                      # Generate answer
    | StrOutputParser()                       # Parse to string
)

print("[OK] Naive RAG chain created")

[OK] Naive RAG chain created


In [9]:
print("=" * 80)
print("NAIVE RAG (k=2)")
print("=" * 80)

for q in test_questions:
    print(f"\nQ: {q}")
    answer = naive_rag_chain.invoke(q)
    print(f"A: {answer}")
    print("-" * 80)

print("\n[Success] Naive RAG provides accurate answers!")

NAIVE RAG (k=2)

Q: When does the Bangalore Bistro's Whitefield branch open?
A: The Bangalore Bistro's Whitefield branch opens on **February 15, 2026**. 

--------------------------------------------------------------------------------

Q: Is the Indiranagar branch pet-friendly?
A: Yes, the Indiranagar branch is pet-friendly.  The text states it offers "lush garden seating" and "Puppy-Patties" for pets. 

--------------------------------------------------------------------------------

[Success] Naive RAG provides accurate answers!


## 4. Where Naive RAG Fails

Naive RAG struggles with:
- **Multi-hop questions**: Requiring information from multiple documents (e.g., combining menu + offers)
- **Vague questions**: Questions that need context from different sections
- **Limited retrieval**: Only fetching k=2 documents might miss relevant information

Let's test with slightly more complex questions:

In [10]:
challenging_questions = [
    # The 9-Hop Extreme Mega-Query
    # Needs Docs 1, 2, 3, 4, 5, 6, 7, 8, 10 for a perfect answer.
    "I‚Äôm 70 years old, have a Gold Leaf Card and a pet dog. I want to visit the new branch on its opening day at 8:30 AM for Avocado Toast and Filter Coffee. Can I bring my dog, and what is my final cost using my credit?",
    
    # Ambiguity check: Needs Doc 8 vs Doc 9
    "Which branch should I go to if I want to bring my pet for breakfast?",
    
    # Implicit link: Needs Doc 7 and Doc 6/2 for logic
    "How much out-of-pocket will I pay for an Avocado Toast at Whitefield if I use my retired membership credit?"
]

print("=" * 80)
print("EXTREME RAG DIFFERENTIATION")
print("=" * 80)

for q in challenging_questions:
    print(f"\nQ: {q}")
    answer = naive_rag_chain.invoke(q)
    print(f"A: {answer}")
    
    docs = vectorstore.similarity_search(q, k=2)
    print(f"\nRetrieved Chunks (k=2):")
    for i, doc in enumerate(docs, 1):
        print(f"  {i}. {doc.page_content[:150]}...")
    print("-" * 80)

EXTREME RAG DIFFERENTIATION

Q: I‚Äôm 70 years old, have a Gold Leaf Card and a pet dog. I want to visit the new branch on its opening day at 8:30 AM for Avocado Toast and Filter Coffee. Can I bring my dog, and what is my final cost using my credit?
A: Based on the provided context, here's how we can answer your question:

* **Bringing Your Dog:**  The text states that you can "start your productivity early at any of our branches." This suggests that pets are generally allowed. However, it's always best to call the branch directly to confirm their pet policy. 
* **Final Cost:** You will receive a ‚Çπ200 joining credit when you verify your Gold Leaf Membership Card on the Bistro App.  You can use this credit towards your Avocado Toast and Filter Coffee order.

**To get the exact final cost, follow these steps:**

1. **Call the branch:** Confirm if they have any restrictions on bringing dogs to the opening day.
2. **Check menu prices:** Find out the price of the Avocado Toast and Filter 

## 5. Advanced RAG with LangGraph

**LangGraph** allows us to build stateful, multi-step workflows. Our advanced RAG adds:

1. **Query Enhancement**: Rephrases questions with synonyms to improve retrieval
2. **More Retrieval**: Fetches k=5 documents instead of k=2
3. **Self-Correction**: Detects uncertain answers
4. **Iterative Refinement**: Refines query and retries if answer is uncertain

### Define State and Create Advanced Retriever

In [11]:
# Import LangGraph types
from typing import TypedDict, List
from langchain_core.documents import Document

# Define state that will be passed between workflow nodes
class RAGState(TypedDict):
    question: str                    # Original user question
    enhanced_query: str              # Rephrased version with synonyms
    retrieved_docs: List[Document]   # Retrieved documents
    answer: str                      # Generated answer
    needs_refinement: bool           # Whether answer is uncertain
    iteration: int                   # Iteration counter

# Create advanced retriever that fetches more documents
advanced_retriever = vectorstore.as_retriever(search_kwargs={"k": 7})

### Define Workflow Nodes

Each node is a function that takes the current state and returns updated state. The workflow will execute nodes in sequence, with conditional logic to decide whether to refine and retry.

In [12]:
# Define workflow node functions
def enhance_query(state: RAGState) -> RAGState:
    q = state["question"]
    prompt = f"""Rephrase this query 2-3 different ways with synonyms:
    
    Query: {q}
    
    Alternatives:"""
    
    enhanced = llm.invoke(prompt)
    combined = f"{q}\n{enhanced}"
    
    print(f"\n[Enhance] {q[:50]}...")
    return {**state, "enhanced_query": combined}

def retrieve_documents(state: RAGState) -> RAGState:
    docs = vectorstore.similarity_search(state["enhanced_query"], k=5)
    print(f"[Retrieve] {len(docs)} docs")
    return {**state, "retrieved_docs": docs}

def generate_answer(state: RAGState) -> RAGState:
    q = state["question"]
    docs = state["retrieved_docs"]
    iteration = state.get("iteration", 0)
    
    context = "\n\n".join([d.page_content for d in docs])
    prompt = f"""Context: {context}
    
    Question: {q}
    
    Answer (say if context insufficient):"""
    
    answer = llm.invoke(prompt)
    
    # Uncertainty markers to trigger refinement
    uncertainty = [
        "not sure", "don't know", "not enough", "unclear", 
        "insufficient", "does not mention", "no information"
    ]
    needs_refinement = any(p in answer.lower() for p in uncertainty) and iteration < 2
    
    print(f"[Generate] Iter {iteration + 1}, refine={needs_refinement}")
    return {**state, "answer": answer, "needs_refinement": needs_refinement, "iteration": iteration + 1}

def refine_query(state: RAGState) -> RAGState:
    q = state["question"]
    prev = state["answer"][:200]
    
    prompt = f"""Question: {q}
    Previous incomplete: {prev}
    
    Generate more specific query for missing info:"""
    
    refined = llm.invoke(prompt)
    print(f"[Refine] {refined[:50]}...")
    return {**state, "enhanced_query": refined}

def should_refine(state: RAGState) -> str:
    return "refine" if state["needs_refinement"] else "end"

### Build and Compile Workflow

Now we connect the nodes into a graph:
- **Entry**: enhance query
- **Flow**: enhance ‚Üí retrieve ‚Üí generate ‚Üí (if uncertain: refine ‚Üí retrieve ‚Üí generate again)
- **Conditional edge**: Decides whether to end or refine based on answer confidence

In [13]:
# Import LangGraph components
from langgraph.graph import StateGraph, END

# Build workflow graph
workflow = StateGraph(RAGState)
workflow.add_node("enhance", enhance_query)
workflow.add_node("retrieve", retrieve_documents)
workflow.add_node("generate", generate_answer)
workflow.add_node("refine", refine_query)

workflow.set_entry_point("enhance")
workflow.add_edge("enhance", "retrieve")
workflow.add_edge("retrieve", "generate")
workflow.add_conditional_edges("generate", should_refine, {"refine": "refine", "end": END})
workflow.add_edge("refine", "retrieve")

advanced_rag = workflow.compile()
print("[OK] Advanced RAG workflow ready")

[OK] Advanced RAG workflow ready


In [18]:
print("=" * 80)
print("ADVANCED RAG (k=5, Query Enhancement, Refinement)")
print("=" * 80)

for q in challenging_questions:
    print(f"\n{'='*80}")
    print(f"Q: {q}")
    print(f"{'='*80}")
    
    result = advanced_rag.invoke({
        "question": q,
        "enhanced_query": "",
        "retrieved_docs": [],
        "answer": "",
        "needs_refinement": False,
        "iteration": 0
    })
    
    print(f"\n[Answer] {result['answer']}")
    print(f"Iterations: {result['iteration']}")
    print("-" * 80)

ADVANCED RAG (k=5, Query Enhancement, Refinement)

Q: I‚Äôm 70 years old, have a Gold Leaf Card and a pet dog. I want to visit the new branch on its opening day at 8:30 AM for Avocado Toast and Filter Coffee. Can I bring my dog, and what is my final cost using my credit?

[Enhance] I‚Äôm 70 years old, have a Gold Leaf Card and a pet ...
[Retrieve] 5 docs
[Generate] Iter 1, refine=False

[Answer] Let's break down your question!  Here's how we can figure out the answer:

* **You are 70 years old:** This means you qualify as a senior citizen.
* **You have a Gold Leaf Card:** You're eligible for the joining credit and benefits associated with it.
* **Visiting on opening day at 8:30 AM:**  This is great! The Early Bird Rewards apply to orders placed before 9:00 AM.
* **Avocado Toast and Filter Coffee:** These are menu items you want to order.

**Here's what we need to know to get the final cost:**

1. **Is your dog allowed at the new branch?**  This information is not provided in the contex

## Summary

### What We Have:

1. **Modern LCEL Chains**: Using LangChain Expression Language (v1+ standard)
2. **sentence-transformers**: Proper embedding model (384d, 22M params)
3. **Qdrant**: Production vector database
4. **LangGraph**: Stateful workflow orchestration

### Key Differences:

| Feature | Naive RAG | Advanced RAG |
|---------|-----------|-------------|
| Retrieval | k=2 | k=5 |
| Query | Direct | Enhanced |
| Refinement | None | Iterative |
| Complex queries | Poor | Good |

### Further Improvements:
- Hybrid search
- Intelligent chunking (hybrid or semantic)
- Re-ranking
- Query decomposition
- Conversation memory

In [38]:
# Cleanup
# !docker stop qdrant_rag && docker rm qdrant_rag
print("To stop Qdrant: docker stop qdrant_rag && docker rm qdrant_rag")

To stop Qdrant: docker stop qdrant_rag && docker rm qdrant_rag
