# Building a Chatbot That Knows Your Documents (RAG)

In the previous notebook, you built a chatbot that could chat and use tools. But what if you want your chatbot to answer questions about YOUR specific documents?

This is where **RAG** (Retrieval-Augmented Generation) comes in!

## What You'll Learn in This Notebook

1. **What is RAG?** - The technique that powers ChatGPT's document analysis
2. **Embeddings** - How to turn text into numbers that capture meaning
3. **Vector Stores** - Databases designed for similarity search
4. **Basic RAG Chain** - Combining search with generation using LCEL
5. **History-Aware RAG** - Handling follow-up questions correctly
6. **Advanced RAG with LangGraph** - Building sophisticated RAG pipelines

## The Problem RAG Solves

```
WITHOUT RAG:
------------------------------------------------------------
User: "What's in the LLaMA-2 research paper?"
AI: "I don't have access to specific papers...
     Generally, LLaMA-2 is a language model..."

     [X] Generic answer - NOT helpful!
------------------------------------------------------------

WITH RAG:
------------------------------------------------------------
User: "What's in the LLaMA-2 research paper?"
AI: [Searches your documents -> Finds relevant chunks]
    "According to the LLaMA-2 paper, the model was trained
     on 2 trillion tokens with a context length of 4096..."

     [OK] Specific answer from YOUR documents!
------------------------------------------------------------
```

## The RAG Architecture

```
+---------------------------------------------------------------------+
|                         RAG PIPELINE                                 |
+---------------------------------------------------------------------+
|                                                                      |
|   OFFLINE (One-time setup):                                          |
|   +----------+    +----------+    +----------+    +----------+       |
|   |Documents | -> |  Split   | -> |  Embed   | -> | Store in |       |
|   |(PDF,etc.)|    | (Chunks) |    |(Vectors) |    |Vector DB |       |
|   +----------+    +----------+    +----------+    +----------+       |
|                                                                      |
|   ONLINE (Every query):                                              |
|   +----------+    +----------+    +----------+    +----------+       |
|   | Question | -> |  Embed   | -> |  Search  | -> | Retrieved|       |
|   |          |    | Question |    |  Similar |    |  Chunks  |       |
|   +----------+    +----------+    +----------+    +----------+       |
|                                           |                          |
|                                           v                          |
|                                   +--------------+                   |
|                                   |     LLM      |                   |
|                                   |   Combines   | -> Final Answer   |
|                                   |   Context +  |                   |
|                                   |   Question   |                   |
|                                   +--------------+                   |
|                                                                      |
+---------------------------------------------------------------------+
```

---

## Setup: Installing and Importing Libraries

We'll use:
- `langchain-openai`: OpenAI chat models and embeddings
- `langchain-pinecone`: Pinecone vector store integration
- `langchain-core`: Core LCEL components
- `pinecone`: Vector database client
- `datasets`: HuggingFace datasets for loading sample data

> **Security**: API keys are loaded from environment variables - never hardcode them!

In [None]:
# Standard library imports
import os
from typing import List, Optional
from time import sleep

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# LangChain imports
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, BaseMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda

# History management
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

# Pinecone
from pinecone import Pinecone, ServerlessSpec

# Data handling
from datasets import load_dataset
from tqdm.auto import tqdm

print("All imports successful!")

In [None]:
# Verify API keys are loaded
openai_key = os.getenv('OPENAI_API_KEY')
pinecone_key = os.getenv('PINECONE_API_KEY')

if openai_key:
    print("OpenAI API key loaded!")
else:
    print("WARNING: No OpenAI API key found. Set OPENAI_API_KEY in .env file")

if pinecone_key:
    print("Pinecone API key loaded!")
else:
    print("WARNING: No Pinecone API key found. Set PINECONE_API_KEY in .env file")

---

## Part 1: Embeddings - Turning Words into Numbers

### What You'll Learn
- What embeddings are and why they're magical
- How similarity works in vector space
- Using OpenAI's embedding models

### Key Concept: Embeddings

> **Definition**: Embeddings are numerical representations (vectors) of text that capture semantic meaning. Similar texts have similar embeddings.
>
> **Analogy**: Imagine a map where cities are placed by similarity:
> - Paris and Rome are close (both European capitals)
> - Paris and "Eiffel Tower" are close (related concepts)
> - Paris and "Banana" are far apart (unrelated)

### How Embeddings Capture Meaning

```
Text Embedding Example:
================================================================

"King"  -> [0.2, 0.8, 0.1, -0.3, ...]  (1536 dimensions)
"Queen" -> [0.3, 0.7, 0.2, -0.2, ...]  (similar to King!)
"Apple" -> [-0.5, 0.1, 0.9, 0.4, ...]  (very different)

Famous equation (Word2Vec discovered this):
King - Man + Woman ~ Queen
```

### OpenAI Embedding Models

| Model | Dimensions | Best For | Cost |
|-------|------------|----------|------|
| `text-embedding-3-small` | 1536 | General use, cost-effective | $ |
| `text-embedding-3-large` | 3072 | Higher accuracy | $$ |

In [None]:
# Create an embedding model
embedding_model = OpenAIEmbeddings(model='text-embedding-3-small')

print("Embedding model created!")
print(f"Model: text-embedding-3-small")
print(f"Dimensions: 1536")

In [None]:
# Let's see embeddings in action!

# Embed some sample texts
texts = [
    "The king sat on his throne",
    "The queen wore a golden crown",
    "I love eating pizza"
]

embeddings = embedding_model.embed_documents(texts)

print("=== Embedding Results ===")
for i, text in enumerate(texts):
    print(f"\nText: '{text}'")
    print(f"Embedding (first 5 values): {embeddings[i][:5]}")
    print(f"Total dimensions: {len(embeddings[i])}")

In [None]:
# Let's calculate similarity between embeddings
import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors."""
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

print("=== Similarity Scores ===")
print(f"'King' vs 'Queen': {cosine_similarity(embeddings[0], embeddings[1]):.4f}")
print(f"'King' vs 'Pizza': {cosine_similarity(embeddings[0], embeddings[2]):.4f}")
print(f"'Queen' vs 'Pizza': {cosine_similarity(embeddings[1], embeddings[2]):.4f}")
print()
print("(Higher = more similar, Range: -1 to 1)")

---

## Part 2: Vector Stores - Databases for AI

### What You'll Learn
- What vector databases are and why we need them
- Setting up Pinecone (cloud vector database)
- Storing and retrieving document embeddings

### Why Not Use Regular Databases?

```
REGULAR DATABASE (SQL):
---------------------------------------------------------
SELECT * FROM documents WHERE content LIKE '%refund%'

Problem: Only finds EXACT word matches!
---------------------------------------------------------

VECTOR DATABASE:
---------------------------------------------------------
Find documents SIMILAR to "refund" embedding

Result: Finds ALL semantically related documents!
- "return policy" -> FOUND (similar meaning)
- "money back guarantee" -> FOUND (similar meaning)
---------------------------------------------------------
```

In [None]:
# Initialize Pinecone client
pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])

print("Pinecone client created!")

In [None]:
# Create or connect to a Pinecone index
index_name = "llama2-arxiv-papers-chunked"

# Check if index already exists
existing_indexes = [idx['name'] for idx in pc.list_indexes()]

if index_name not in existing_indexes:
    print(f"Creating new index: {index_name}")
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1'
        )
    )

    while not pc.describe_index(index_name).status['ready']:
        print("Waiting for index to be ready...")
        sleep(2)
else:
    print(f"Index '{index_name}' already exists!")

index = pc.Index(index_name)
print(f"Connected to index: {index_name}")

In [None]:
# Check index statistics
index_stats = index.describe_index_stats()

print("=== Index Statistics ===")
print(f"Dimension: {index_stats['dimension']}")
print(f"Total vectors: {index_stats['total_vector_count']}")

index_initialized = index_stats['total_vector_count'] > 0

if index_initialized:
    print(f"\nIndex is ready with {index_stats['total_vector_count']} vectors!")
else:
    print("\nIndex is empty. We'll populate it in the next section.")

---

## Part 3: Loading and Storing Documents

### Why Chunk Documents?

```
Problem: A 100-page PDF is too large to embed as one vector

Solution: Split into smaller "chunks" (e.g., 500-1000 characters each)

           Original Document              Chunked Document
         +------------------+           +--------+
         |   100 pages      |   -->     | Chunk 1| -> Vector 1
         |   of text        |           +--------+
         +------------------+           | Chunk 2| -> Vector 2
                                        +--------+
```

In [None]:
# Load a sample dataset (already chunked!)
print("Loading dataset...")
llama2_dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

print(f"\n=== Dataset Info ===")
print(f"Number of chunks: {len(llama2_dataset)}")
print(f"Features: {list(llama2_dataset.features.keys())}")

In [None]:
# Let's look at one example
example = llama2_dataset[0]

print("=== Example Chunk ===")
print(f"Title: {example['title']}")
print(f"DOI: {example['doi']}")
print(f"Chunk ID: {example['chunk-id']}")
print(f"\nChunk content (first 300 chars):")
print(example['chunk'][:300] + "...")

In [None]:
# Upload vectors to Pinecone (only if not already done)

if not index_initialized:
    print("Uploading vectors to Pinecone...")
    print("This may take a few minutes...\n")

    df = llama2_dataset.to_pandas()
    batch_size = 128

    for i in tqdm(range(0, len(df), batch_size)):
        batch = df.iloc[i:i + batch_size]
        ids = [f"{row['doi']}-{row['chunk-id']}" for _, row in batch.iterrows()]
        texts = [row['chunk'] for _, row in batch.iterrows()]
        embeddings = embedding_model.embed_documents(texts)
        metadata = [
            {
                'text': row['chunk'],
                'source': row['source'],
                'title': row['title'],
            } for _, row in batch.iterrows()
        ]
        index.upsert(vectors=list(zip(ids, embeddings, metadata)))

    print("\nUpload complete!")
else:
    print("Index already populated. Skipping upload.")

---

## Part 4: Building a Basic RAG Chain with LCEL

### What You'll Learn
- Creating a retriever from a vector store
- Building a RAG chain using pure LCEL
- Understanding the retrieval + generation pattern

In [None]:
# Create a LangChain vector store wrapper
vector_store = PineconeVectorStore(
    index=index,
    embedding=embedding_model,
    text_key='text'
)

print("Vector store created!")

In [None]:
# Create a retriever
retriever = vector_store.as_retriever(
    search_kwargs={"k": 3}
)

print("Retriever created!")
print("Configuration: Returns top 3 most similar documents")

In [None]:
# Test the retriever
query = "What is LLaMA-2?"

print(f"=== Testing Retriever ===")
print(f"Query: {query}\n")

docs = retriever.invoke(query)

for i, doc in enumerate(docs, 1):
    print(f"--- Document {i} ---")
    print(f"Content (first 200 chars): {doc.page_content[:200]}...")
    print(f"Source: {doc.metadata.get('source', 'N/A')}")
    print()

In [None]:
# Create the chat model
llm = ChatOpenAI(
    model="gpt-5-mini",
    temperature=0.1
)

print("Chat model created!")
print(f"Model: gpt-5-mini")

---

## Part 5: History-Aware RAG - Handling Follow-Up Questions

### The Problem: Follow-Up Questions Fail

```
BASIC RAG PROBLEM:
================================================================

Q1: "What is LLaMA-2?"
    -> Embeds: "What is LLaMA-2?" [OK]
    -> Retrieves: LLaMA-2 docs [OK]

Q2: "What are its main features?"
    -> Embeds: "What are its main features?" [X]
    -> "its" = ??? (no context!)
    -> Retrieves: Random "features" docs [X]

================================================================
```

### The Solution: Query Rewriting + RunnableWithMessageHistory

> **Key Idea**: LangChain provides `RunnableWithMessageHistory` to automatically
> manage conversation history. Combined with a query rewriter, we get pure LCEL history-aware RAG.

```
HISTORY-AWARE RAG (Pure LCEL Pattern):
================================================================

Q2: "What are its main features?"

    Step 1: RunnableWithMessageHistory provides chat_history automatically
    +----------------------------------------------------------+
    | Chat History (managed by RunnableWithMessageHistory):    |
    |   User: "What is LLaMA-2?"                               |
    |   AI: "LLaMA-2 is Meta's open-source LLM..."             |
    +----------------------------------------------------------+

    Step 2: Query Rewriter (LCEL chain)
    +----------------------------------------------------------+
    | Input: "What are its main features?"                     |
    | Output: "What are the main features of LLaMA-2?"         |
    +----------------------------------------------------------+

    Step 3: Search with rewritten question
    -> Embeds: "What are the main features of LLaMA-2?" [OK]
    -> Retrieves: LLaMA-2 feature docs [OK]

================================================================
```

### Architecture: Pure LCEL with RunnableWithMessageHistory

```
+---------------------------------------------------------------------+
|              PURE LCEL HISTORY-AWARE RAG                            |
+---------------------------------------------------------------------+
|                                                                     |
|   rag_chain = (                                                     |
|       RunnablePassthrough.assign(                                   |
|           standalone=rewrite_prompt | llm | parser   # Query rewrite|
|       )                                                             |
|       | RunnablePassthrough.assign(                                 |
|           context=lambda x: retriever.invoke(x["standalone"])       |
|       )                                                             |
|       | qa_prompt | llm | parser                     # Generate    |
|   )                                                                 |
|                                                                     |
|   rag_with_history = RunnableWithMessageHistory(                    |
|       rag_chain,                                                    |
|       get_session_history,          # Session-based history store   |
|       input_messages_key="input",                                   |
|       history_messages_key="chat_history"                           |
|   )                                                                 |
|                                                                     |
|   Usage:                                                            |
|   response = rag_with_history.invoke(                               |
|       {"input": "What are its features?"},                          |
|       config={"configurable": {"session_id": "user1"}}              |
|   )                                                                 |
|                                                                     |
+---------------------------------------------------------------------+
```

### Why Pure LCEL?

| Approach | Pros | Cons |
|----------|------|------|
| **Class Wrapper** | Explicit, easy to debug | More verbose, less composable |
| **Pure LCEL** | Composable, idiomatic, concise | Requires understanding LCEL |

**Pure LCEL is the recommended approach!**

In [None]:
# =============================================================================
# STEP 1: Create the Query Rewriter (LCEL Chain)
# =============================================================================
# This chain rewrites follow-up questions to be standalone

rewrite_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""

rewrite_prompt = ChatPromptTemplate.from_messages([
    ("system", rewrite_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# LCEL chain: prompt -> llm -> parser
query_rewriter = rewrite_prompt | llm | StrOutputParser()

print("Query rewriter chain created (LCEL)!")
print("\nThis will rewrite questions like:")
print('  "What are its features?" -> "What are the features of LLaMA-2?"')

In [None]:
# Test the query rewriter
test_history = [
    HumanMessage(content="What is LLaMA-2?"),
    AIMessage(content="LLaMA-2 is Meta's open-source large language model.")
]

rewritten = query_rewriter.invoke({
    "chat_history": test_history,
    "input": "What are its main features?"
})

print("=== Query Rewriter Test ===")
print(f"Original: 'What are its main features?'")
print(f"Rewritten: '{rewritten}'")

In [None]:
# =============================================================================
# STEP 2: Create the QA Chain (LCEL)
# =============================================================================
# This chain generates answers from retrieved context

qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.

Context:
{context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# Helper function to format documents
def format_docs(docs):
    """Convert Document objects to a formatted string."""
    return "\n\n".join([doc.page_content for doc in docs])

print("QA prompt created!")

In [None]:
# =============================================================================
# STEP 3: Build Pure LCEL History-Aware RAG Chain
# =============================================================================
# This is the recommended pattern using RunnableWithMessageHistory

# Pure LCEL chain: rewrite -> retrieve -> generate
rag_chain = (
    # Step 1: Rewrite the question to be standalone
    RunnablePassthrough.assign(
        standalone=rewrite_prompt | llm | StrOutputParser()
    )
    # Step 2: Retrieve documents using the rewritten question
    | RunnablePassthrough.assign(
        context=lambda x: format_docs(retriever.invoke(x["standalone"]))
    )
    # Step 3: Generate answer
    | qa_prompt
    | llm
    | StrOutputParser()
)

print("Pure LCEL RAG chain created!")
print("\nChain flow:")
print("  1. Rewrite question (if history exists)")
print("  2. Retrieve documents with rewritten question")
print("  3. Generate answer from context")

In [None]:
# =============================================================================
# STEP 4: Wrap with RunnableWithMessageHistory
# =============================================================================
# This automatically manages chat history per session

# Session history store (in-memory, could be Redis/DB in production)
session_store = {}

def get_session_history(session_id: str) -> ChatMessageHistory:
    """Get or create chat history for a session."""
    if session_id not in session_store:
        session_store[session_id] = ChatMessageHistory()
    return session_store[session_id]

# Wrap the RAG chain with history management
rag_with_history = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history"
)

print("History-aware RAG created (pure LCEL)!")
print("\nUsage:")
print('  response = rag_with_history.invoke(')
print('      {"input": "Your question"},')
print('      config={"configurable": {"session_id": "user1"}}')
print('  )')

In [None]:
# =============================================================================
# TEST: History-Aware RAG in Action!
# =============================================================================

# Configuration for this conversation session
config = {"configurable": {"session_id": "demo-session"}}

print("=" * 70)
print("TESTING PURE LCEL HISTORY-AWARE RAG")
print("=" * 70)

# Question 1 - First question
question1 = "What is LLaMA-2?"
print(f"\nUser: {question1}")

answer1 = rag_with_history.invoke(
    {"input": question1},
    config=config
)

print(f"\nAI: {answer1}")

In [None]:
# Question 2 - Follow-up using "its" (should work now!)
question2 = "What are its main features?"
print(f"\nUser: {question2}")
print("       ^ Note: 'its' refers to LLaMA-2 from previous question")

answer2 = rag_with_history.invoke(
    {"input": question2},
    config=config  # Same session_id - history is preserved!
)

print(f"\nAI: {answer2}")

print("\n" + "=" * 70)
print("SUCCESS! The follow-up question was correctly understood!")
print("RunnableWithMessageHistory automatically managed the chat history.")
print("=" * 70)

In [None]:
# Question 3 - Another follow-up
question3 = "How was it trained?"
print(f"\nUser: {question3}")

answer3 = rag_with_history.invoke(
    {"input": question3},
    config=config
)

print(f"\nAI: {answer3}")

# View the complete chat history
print("\n" + "=" * 70)
print("COMPLETE CONVERSATION HISTORY")
print("=" * 70)

history = session_store["demo-session"].messages
for i, msg in enumerate(history):
    role = "User" if isinstance(msg, HumanMessage) else "AI"
    content = msg.content[:150] + '...' if len(msg.content) > 150 else msg.content
    print(f"\n{role}: {content}")

---

## Part 6: Advanced RAG with LangGraph

### What You'll Learn
- Why LangGraph gives more control over RAG
- Building a stateful RAG pipeline with explicit nodes
- Using checkpointing for persistent conversations

### Why LangGraph for RAG?

The LCEL class approach above is clean, but LangGraph offers:
- **Visual workflow**: Clear node-based architecture
- **Built-in persistence**: Automatic state checkpointing
- **Conditional routing**: Add quality checks, retry logic
- **Multi-step workflows**: Complex RAG patterns

In [None]:
# Import LangGraph components
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated, Optional

print("LangGraph components imported!")

In [None]:
# Define the state schema for our RAG workflow

class RAGState(TypedDict):
    """State that flows through our RAG graph."""
    question: str
    standalone_question: Optional[str]
    documents: Optional[str]
    messages: Annotated[list, add_messages]
    answer: Optional[str]

print("RAG state schema defined!")

In [None]:
# Define the nodes (functions that process state)

def rewrite_node(state: RAGState) -> dict:
    """Rewrite the query to be standalone based on chat history."""
    print("  [Rewrite Node] Processing...")

    question = state["question"]
    messages = state.get("messages", [])

    if not messages:
        print("  [Rewrite Node] No history, using original question")
        return {"standalone_question": question}

    rewritten = query_rewriter.invoke({
        "chat_history": messages,
        "input": question
    })

    print(f"  [Rewrite Node] '{question}' -> '{rewritten}'")
    return {"standalone_question": rewritten}


def retrieve_node(state: RAGState) -> dict:
    """Retrieve relevant documents from the vector store."""
    print("  [Retrieve Node] Searching...")

    query = state.get("standalone_question") or state["question"]
    docs = retriever.invoke(query)
    context = format_docs(docs)

    print(f"  [Retrieve Node] Found {len(docs)} documents")
    return {"documents": context}


def generate_node(state: RAGState) -> dict:
    """Generate an answer using the LLM and retrieved context."""
    print("  [Generate Node] Generating answer...")

    question = state["question"]
    context = state.get("documents", "No context available.")
    messages = state.get("messages", [])

    chain = qa_prompt | llm | StrOutputParser()
    answer = chain.invoke({
        "context": context,
        "chat_history": messages,
        "input": question
    })

    print("  [Generate Node] Done!")
    return {
        "answer": answer,
        "messages": [
            HumanMessage(content=question),
            AIMessage(content=answer)
        ]
    }

print("RAG nodes defined!")

In [None]:
# Build the RAG graph

rag_workflow = StateGraph(RAGState)

# Add nodes
rag_workflow.add_node("rewrite", rewrite_node)
rag_workflow.add_node("retrieve", retrieve_node)
rag_workflow.add_node("generate", generate_node)

# Add edges
rag_workflow.add_edge(START, "rewrite")
rag_workflow.add_edge("rewrite", "retrieve")
rag_workflow.add_edge("retrieve", "generate")
rag_workflow.add_edge("generate", END)

# Compile with memory
memory_saver = MemorySaver()
rag_graph = rag_workflow.compile(checkpointer=memory_saver)

print("LangGraph RAG compiled!")
print("\nGraph: START -> rewrite -> retrieve -> generate -> END")

In [None]:
# Test the LangGraph RAG
config = {"configurable": {"thread_id": "user-1"}}

print("=" * 70)
print("TESTING LANGGRAPH HISTORY-AWARE RAG")
print("=" * 70)

question = "What is the training data size for LLaMA-2?"
print(f"\nUser: {question}")

result = rag_graph.invoke(
    {
        "question": question,
        "standalone_question": None,
        "documents": None,
        "messages": [],
        "answer": None
    },
    config=config
)

print(f"\nAI: {result['answer']}")

In [None]:
# Follow-up question (LangGraph remembers via checkpointer)
question2 = "How does it compare to the original LLaMA?"
print(f"\nUser: {question2}")
print("       ^ Note: 'it' refers to LLaMA-2")

result2 = rag_graph.invoke(
    {
        "question": question2,
        "standalone_question": None,
        "documents": None,
        "messages": [],
        "answer": None
    },
    config=config  # Same thread_id
)

print(f"\nAI: {result2['answer']}")

---

## Summary: What You've Learned

Congratulations! You've learned how to build a RAG-powered chatbot that handles follow-up questions correctly using **pure LCEL**.

| Concept | What It Does | LangChain Approach |
|---------|--------------|-------------------|
| **Embeddings** | Turn text into vectors | `OpenAIEmbeddings` |
| **Vector Stores** | Store and search vectors | `PineconeVectorStore` |
| **Retrievers** | Find relevant documents | `vector_store.as_retriever()` |
| **Query Rewriting** | Handle follow-up questions | LCEL: `prompt \| llm \| parser` |
| **History Management** | Track conversation | `RunnableWithMessageHistory` |
| **RAG Pipeline** | Combine retrieval + generation | Pure LCEL chain composition |
| **Advanced RAG** | Stateful workflows | LangGraph `StateGraph` |

### Key Takeaways

1. **Pure LCEL is the way**: Use `RunnablePassthrough.assign()` to build composable chains
2. **RunnableWithMessageHistory**: Automatic session-based history management
3. **Query rewriting is essential**: Follow-up questions need context from history
4. **LangGraph adds control**: Build complex workflows with explicit state management

### The Pure LCEL Pattern

```python
# Build the chain
rag_chain = (
    RunnablePassthrough.assign(standalone=rewrite_chain)
    | RunnablePassthrough.assign(context=retrieve_fn)
    | qa_prompt | llm | parser
)

# Wrap with history
rag_with_history = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history"
)

# Use it
response = rag_with_history.invoke(
    {"input": "What are its features?"},
    config={"configurable": {"session_id": "user1"}}
)
```

### Resources

- [LCEL Documentation](https://python.langchain.com/docs/concepts/lcel/)
- [RunnableWithMessageHistory](https://python.langchain.com/docs/how_to/message_history/)
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [Pinecone Documentation](https://docs.pinecone.io/)