# Tutorial 12: Agentic RAG

Build a RAG system where an **agent** controls retrieval - decomposing queries, performing multiple retrievals, and iteratively refining answers.

**What you'll learn:**
- **Agent-Controlled Retrieval**: LLM decides when/what to retrieve
- **Query Decomposition**: Break complex questions into sub-queries
- **Multi-Step Retrieval**: Multiple retrieval rounds
- **Tool-Based RAG**: Retrieval as a tool the agent can call

## Why Agentic RAG?

Previous patterns use fixed retrieval flows. **Agentic RAG** lets the LLM control:
- Whether to retrieve at all
- What queries to use
- When to stop retrieving

```
Agent Loop:
  Think → Decide (retrieve/answer) → Act → Observe → Repeat
```

In [None]:
from langgraph_ollama_local import LocalAgentConfig
from langchain_ollama import ChatOllama

config = LocalAgentConfig()
llm = ChatOllama(
    model=config.ollama.model,
    base_url=config.ollama.base_url,
    temperature=0,
)
print(f"Using model: {config.ollama.model}")

In [None]:
from typing import Annotated, List
from typing_extensions import TypedDict
from langchain_core.documents import Document
from langchain_core.tools import tool
from langgraph.graph.message import add_messages

class AgenticRAGState(TypedDict):
    """State for Agentic RAG."""
    messages: Annotated[list, add_messages]  # Conversation history
    documents: List[Document]                 # Retrieved documents
    retrieval_count: int                      # Number of retrievals done

print("State defined!")

In [None]:
from langgraph_ollama_local.rag import LocalRetriever

retriever = LocalRetriever()

# Create retrieval tool
@tool
def search_documents(query: str) -> str:
    """Search the document database for information.
    
    Args:
        query: The search query to find relevant documents.
    
    Returns:
        Retrieved document contents.
    """
    docs = retriever.retrieve_documents(query, k=3)
    if not docs:
        return "No relevant documents found."
    
    results = []
    for i, doc in enumerate(docs, 1):
        source = doc.metadata.get('filename', 'Unknown')
        results.append(f"[Doc {i} - {source}]:\n{doc.page_content[:500]}...")
    
    return "\n\n".join(results)

tools = [search_documents]
print(f"Tools: {[t.name for t in tools]}")

In [None]:
from langchain_core.messages import SystemMessage

SYSTEM_PROMPT = """You are a research assistant with access to a document database.

Your goal is to answer questions thoroughly using the search_documents tool.

Strategy:
1. For complex questions, break them into sub-questions
2. Search for each aspect separately
3. Synthesize information from multiple searches
4. Provide a comprehensive answer with sources

You can search multiple times if needed. When you have enough information, provide your final answer."""

# Bind tools to LLM
llm_with_tools = llm.bind_tools(tools)

print("LLM with tools configured!")

In [None]:
from langchain_core.messages import ToolMessage

def agent(state: AgenticRAGState) -> dict:
    """Agent node - decides whether to search or answer."""
    print("--- AGENT THINKING ---")
    
    messages = state["messages"]
    
    # Add system message if not present
    if not any(isinstance(m, SystemMessage) for m in messages):
        messages = [SystemMessage(content=SYSTEM_PROMPT)] + list(messages)
    
    response = llm_with_tools.invoke(messages)
    
    return {"messages": [response]}

def execute_tools(state: AgenticRAGState) -> dict:
    """Execute any tool calls."""
    print("--- EXECUTING TOOLS ---")
    
    messages = state["messages"]
    last_message = messages[-1]
    
    tool_results = []
    retrieval_count = state.get("retrieval_count", 0)
    
    for tool_call in last_message.tool_calls:
        tool_name = tool_call["name"]
        tool_args = tool_call["args"]
        
        print(f"  Calling {tool_name} with: {tool_args}")
        
        if tool_name == "search_documents":
            result = search_documents.invoke(tool_args)
            retrieval_count += 1
        else:
            result = f"Unknown tool: {tool_name}"
        
        tool_results.append(
            ToolMessage(content=result, tool_call_id=tool_call["id"])
        )
    
    return {
        "messages": tool_results,
        "retrieval_count": retrieval_count,
    }

print("Agent nodes defined!")

In [None]:
def should_continue(state: AgenticRAGState) -> str:
    """Decide whether to continue or end."""
    messages = state["messages"]
    last_message = messages[-1]
    
    # If there are tool calls, execute them
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    
    # Otherwise, we're done
    return "end"

print("Routing defined!")

In [None]:
from langgraph.graph import StateGraph, START, END

# Build graph
graph = StateGraph(AgenticRAGState)

graph.add_node("agent", agent)
graph.add_node("tools", execute_tools)

graph.add_edge(START, "agent")
graph.add_conditional_edges(
    "agent",
    should_continue,
    {"tools": "tools", "end": END}
)
graph.add_edge("tools", "agent")

agentic_rag = graph.compile()
print("Agentic RAG compiled!")

In [None]:
# Visualize
from IPython.display import Image, display
try:
    display(Image(agentic_rag.get_graph().draw_mermaid_png()))
except:
    print(agentic_rag.get_graph().draw_ascii())

In [None]:
# Test with a complex question
from langchain_core.messages import HumanMessage

question = "Compare Self-RAG and CRAG. What are the key differences and when should I use each?"

print(f"Question: {question}\n")
print("=" * 60)

result = agentic_rag.invoke({
    "messages": [HumanMessage(content=question)],
    "retrieval_count": 0,
})

print("=" * 60)
print(f"\nRetrievals performed: {result['retrieval_count']}")
print(f"\nFinal Answer:")
print(result["messages"][-1].content)

## Key Concepts

| Component | Purpose |
|-----------|--------|
| **Tool-based Retrieval** | Agent decides when to search |
| **ReAct Loop** | Think → Act → Observe cycle |
| **Multi-step** | Multiple retrievals for complex queries |
| **Query Decomposition** | Agent breaks down questions |

## What's Next?

In [Tutorial 13: Perplexity Clone](13_perplexity_clone.ipynb), you'll build a full research assistant with:
- In-text citations
- Source metadata
- Streaming responses
- Follow-up suggestions