# Build a RAG Agent with LangChain

This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) application with LangChain.

**What is RAG?**
RAG is a technique that enables LLMs to answer questions about specific source information by:
1. **Indexing**: Creating a searchable index from your data sources
2. **Retrieval and Generation**: Retrieving relevant context and using it to generate answers

We'll cover:
- **Indexing Pipeline**: Loading, splitting, and storing documents
- **RAG Agents**: Using agents with retrieval tools for flexible querying
- **RAG Chains**: Two-step chains for fast, simple queries

---


## Setup

First, let's install the required packages and set up our environment.


In [2]:
# %pip install -qU langchain langchain-text-splitters langchain-community bs4 langchain-openai langchain-chroma


In [3]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")


---

# Part 1: Indexing

The indexing pipeline prepares your data for retrieval. This typically happens in a separate process before querying.

## 1.1 Loading Documents

We'll use the LLM Powered Autonomous Agents blog post by Lilian Weng as our data source.


In [36]:
from langchain_community.document_loaders import WebBaseLoader

# Load the blog post content without relying on bs4 SoupStrainer that may not match the site's structure.
# Instead, use the default loader behavior and rely on the loader to extract all the text from the page.
urls = ["https://www.anthropic.com/engineering/writing-tools-for-agents",
        "https://www.anthropic.com/engineering/building-effective-agents"]

loader = WebBaseLoader(urls)
docs = loader.load()

if not docs:
    raise ValueError("No documents were loaded. The page may be protected by anti-bot measures, or the loader could not parse the content.")

print(f"Loaded {len(docs)} document(s)")
print(f"First document length: {len(docs[0].page_content)} characters")
print(f"\nFirst 500 characters:\n{docs[0].page_content[:500]}...")

Loaded 2 document(s)
First document length: 22843 characters

First 500 characters:
Skip to main contentSkip to footerResearchEconomic FuturesCommitmentsLearnNewsTry ClaudeEngineering at AnthropicWriting effective tools for agents — with agentsPublished Sep 11, 2025Agents are only as effective as the tools we give them. We share how to write high-quality tools and evaluations, and how you can boost performance by using Claude to optimize its tools for itself.The Model Context Protocol (MCP) can empower LLM agents with potentially hundreds of tools to solve real-world tasks. But...


In [37]:
len(docs)

2

## 1.2 Splitting Documents

We need to split the documents into smaller chunks that can be efficiently retrieved. The `RecursiveCharacterTextSplitter` is a good default choice.


In [38]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
all_splits = text_splitter.split_documents(docs)

print(f"Split into {len(all_splits)} chunks")
print(f"\nFirst chunk (length: {len(all_splits[0].page_content)}):")
print(all_splits[0].page_content[:300] + "...")


Split into 54 chunks

First chunk (length: 989):
Skip to main contentSkip to footerResearchEconomic FuturesCommitmentsLearnNewsTry ClaudeEngineering at AnthropicWriting effective tools for agents — with agentsPublished Sep 11, 2025Agents are only as effective as the tools we give them. We share how to write high-quality tools and evaluations, and ...


## 1.3 Storing Documents in a Vector Store

We'll use Chroma as our vector store. The documents are embedded and stored for semantic search.


In [43]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

# Create embeddings and vector store
embeddings = OpenAIEmbeddings()
vector_store = Chroma.from_documents(
    documents=all_splits,
    embedding=embeddings,
    collection_name="rag-tutorial",
    persist_directory="./chroma-db"
)

print(f"Vector store created with {len(all_splits)} documents")

Vector store created with 54 documents


---

# Part 2: Retrieval and Generation

Now that we've indexed our data, we can build RAG applications. We'll demonstrate two approaches:

1. **RAG Agents**: Flexible agents that decide when to search
2. **RAG Chains**: Fast two-step chains that always search

## 2.1 RAG Agents

RAG agents use tools to retrieve context. The agent decides when to call the retrieval tool, allowing for:
- Multiple searches in support of a single query
- Contextual search queries
- Skipping searches for simple queries

Let's create a retrieval tool and build an agent.


In [40]:
from langchain.agents import create_agent
from langchain_core.tools import tool
from langchain.chat_models import init_chat_model

# Initialize the model
model = init_chat_model("openai:gpt-4o-mini")

# Create a retrieval tool
@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information to help answer a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

tools = [retrieve_context]

# Create the agent with custom instructions
prompt = (
    "You have access to a tool that retrieves context from a blog post. "
    "Use the tool to help answer user queries."
)
agent = create_agent(model, tools, system_prompt=prompt)

### Testing the RAG Agent

Let's test with a simple query:


In [41]:
query = "What is the challenge of writing tools for agents?"

for step in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()


What is the challenge of writing tools for agents?
Tool Calls:
  retrieve_context (call_Ckk5yhsjtV4ltgJL3a7fY6fN)
 Call ID: call_Ckk5yhsjtV4ltgJL3a7fY6fN
  Args:
    query: challenge of writing tools for agents
Name: retrieve_context

Source: {'source': 'https://www.anthropic.com/engineering/writing-tools-for-agents', 'language': 'en', 'title': 'Writing effective tools for AI agents—using AI agents \\ Anthropic', 'description': 'Writing effective tools for AI agents—using AI agents'}
Content: performance improvements even beyond what we achieved with "expert" tool implementations—whether those tools were manually written by our researchers or generated by Claude itself.In the next section, we’ll share some of what we learned from this process.Principles for writing effective toolsIn this section, we distill our learnings into a few guiding principles for writing effective tools.Choosing the right tools for agentsMore tools don’t always lead to better outcomes. A common error we’ve obs

### Multi-Step Retrieval

The agent can perform multiple searches to answer complex questions:


In [42]:
query = (
    "How to build effective agents?"
)

for event in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()



How to build effective agents?
Tool Calls:
  retrieve_context (call_EplnVMtI49JZUTB7k9rCpVNW)
 Call ID: call_EplnVMtI49JZUTB7k9rCpVNW
  Args:
    query: build effective agents
Name: retrieve_context

Source: {'title': 'Building Effective AI Agents \\ Anthropic', 'language': 'en', 'description': 'Discover how Anthropic approaches the development of reliable AI agents. Learn about our research on agent capabilities, safety considerations, and technical framework for building trustworthy AI.', 'source': 'https://www.anthropic.com/engineering/building-effective-agents'}
Content: can help you get started quickly, but don't hesitate to reduce abstraction layers and build with basic components as you move to production. By following these principles, you can create agents that are not only powerful but also reliable, maintainable, and trusted by their users.AcknowledgementsWritten by Erik Schluntz and Barry Zhang. This work draws upon our experiences building agents at Anthropic and the valu

**Benefits of RAG Agents:**
- ✅ Search only when needed
- ✅ Contextual search queries
- ✅ Multiple searches allowed

**Trade-offs:**
- ⚠️ Two inference calls (one for tool call, one for response)
- ⚠️ Reduced control (LLM may skip or add unnecessary searches)

---

## 2.2 RAG Chains

RAG chains use a two-step approach:
1. Always run a search (using the user query)
2. Incorporate results as context in a single LLM call

This results in **one inference call per query**, trading flexibility for speed.

### Using Dynamic Prompts Middleware

We can implement a RAG chain using middleware to inject context:


In [20]:
from langchain.agents.middleware import dynamic_prompt, ModelRequest

@dynamic_prompt
def prompt_with_context(request: ModelRequest) -> str:
    """Inject context into state messages."""
    last_query = request.state["messages"][-1].text
    retrieved_docs = vector_store.similarity_search(last_query)

    docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

    system_message = (
        "You are a helpful assistant. Use the following context in your response:"
        f"\n\n{docs_content}"
    )

    return system_message

# Create agent with middleware (no tools needed)
rag_chain = create_agent(model, tools=[], middleware=[prompt_with_context])


In [21]:
query = "What is task decomposition?"

for step in rag_chain.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()



What is task decomposition?

Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks or components. This approach allows for easier handling, understanding, and execution of the overall task. By dividing a task into simpler parts, each subtask can be tackled individually, making it less overwhelming and more organized. 

In the context of agents or artificial intelligence, task decomposition can help improve the accuracy and efficiency of executing tasks by allowing agents to focus on one subtask at a time, potentially reducing errors and improving outcomes. Each subtask can also be handled by different tools or processes, enhancing flexibility and operability. The idea is to create a structured approach that leads to clearer paths for problem-solving and decision-making in complex scenarios.


### Returning Source Documents

If you need access to the source documents (e.g., for citations), you can use middleware to store them in the state:


In [22]:
from typing import Any
from langchain_core.documents import Document
from langchain.agents.middleware import AgentMiddleware, AgentState


class State(AgentState):
    context: list[Document]


class RetrieveDocumentsMiddleware(AgentMiddleware[State]):
    state_schema = State

    def before_model(self, state: AgentState) -> dict[str, Any] | None:
        last_message = state["messages"][-1]
        retrieved_docs = vector_store.similarity_search(last_message.text)

        docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

        augmented_message_content = (
            f"{last_message.text}\n\n"
            "Use the following context to answer the query:\n"
            f"{docs_content}"
        )
        return {
            "messages": [last_message.model_copy(update={"content": augmented_message_content})],
            "context": retrieved_docs,
        }


rag_chain_with_sources = create_agent(
    model,
    tools=[],
    middleware=[RetrieveDocumentsMiddleware()],
)


In [23]:
query = "What is task decomposition?"

result = rag_chain_with_sources.invoke(
    {"messages": [{"role": "user", "content": query}]}
)

# Access the final response
print("Response:")
print(result["messages"][-1].content)

# Access the source documents
print(f"\n\nRetrieved {len(result.get('context', []))} source document(s)")
if result.get("context"):
    print(f"\nFirst source metadata: {result['context'][0].metadata}")


Response:
Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks. This approach helps streamline workflows by assigning specific, well-defined tasks to agents or tools, reducing the likelihood of errors and improving efficiency. In the context provided, task decomposition serves several purposes:

1. **Reducing Complexity**: By segmenting a larger task into smaller components, agents can focus on individual parts, which helps mitigate the risk of making mistakes. This is particularly important because agents may miscall tools or parameters when faced with an extensive, intricate task.

2. **Improving Tool Utilization**: When tasks are well-defined and correspond to specific tools that match these subdivisions, agents can better select and utilize the appropriate tools, thereby enhancing the effectiveness of tool calls. This means that agents don’t have to manage an overwhelming number of tool descriptions and can instead leverage releva

---

## Summary

**RAG Agents** are best for:
- Complex queries requiring multiple searches
- When you want the LLM to decide when to search
- General-purpose applications

**RAG Chains** are best for:
- Simple queries with predictable search needs
- When speed is critical (single LLM call)
- Constrained settings where you always want to search

**Next Steps:**
- Add conversational memory for multi-turn interactions
- Stream tokens for responsive user experiences
- Add structured responses
- Deploy with LangSmith Deployment
- Explore advanced RAG patterns with LangGraph
