
# üõ∞Ô∏è 08 ‚Äî Agentic RAG (Agents + Retrieval-Augmented Generation)

This notebook focuses **specifically** on combining:

- **Agents** (from this Agents Universe)  
- **RAG** (from your RAG Universe)  

into **Agentic RAG** patterns.

The goal is to:

- Go beyond *‚Äúretrieve then answer‚Äù*  
- Let an **agent plan, call retrieval tools, reflect, and iterate**  
- Make retrieval **adaptive and intelligent**, not static  



## 1. What Is Agentic RAG?

Classic RAG (from your RAG Universe):

1. Embed documents  
2. Retrieve top-k chunks for a query  
3. Stuff chunks into context  
4. Ask the LLM to answer  

Agentic RAG:

1. An **agent** reasons about the user‚Äôs goal  
2. It **plans** what to retrieve (and where from)  
3. It uses **retrieval tools** (one or more times)  
4. It **reflects** on whether the retrieved context is enough  
5. It may:
   - refine the query  
   - call other tools  
   - do multi-hop retrieval  
6. It synthesizes a final answer with **citations and explanations**  



## 2. Roles in an Agentic RAG System

You can define several **agent roles**:

- **User-Facing Answer Agent**
  - Talks to the user  
  - Orchestrates everything  
  - Produces final answer  

- **Retrieval Planner Agent**
  - Decides:
    - which index to query
    - which filters to apply
    - how many docs to retrieve  

- **Query Rewriter Agent**
  - Takes user query
  - Generates better retrieval queries
  - Can propose:
    - expanded queries
    - synonyms, related phrases  

- **Context Selector Agent**
  - Given many candidate chunks, picks the best ones  
  - Optionally:
    - diversifies sources
    - removes duplicates  

These can be:

- separate agents (multi-agent)  
- or behaviors inside a **single** configurable agent  



## 3. RAG Tools Expected from the RAG Universe

From your **RAG Universe**, we assume tools like:

- `search_knowledge_base(query, top_k, filters)`  
- `get_document_by_id(id)`  
- `get_chunks_by_ids(ids)`  
- Optional:
  - `search_by_semantic(query, top_k)`
  - `search_by_keyword(query, top_k)`
  - `hybrid_search(query, top_k)`

In this notebook we will create **placeholder functions** and show how an agent would use them.

You can later wire these to your **real RAG pipelines**.



## 4. Skeleton: RAG Tool Adapters in Python

Below we sketch Python functions that wrap RAG retrieval as **tools** an agent can call.


In [None]:
from typing import Dict, Any, List

# These are STUBS / PLACEHOLDERS.
# Replace them with real calls to your RAG Universe (vector DB, DB, etc.)

def rag_search_tool(input_dict: Dict[str, Any]) -> Dict[str, Any]:
    """Search the knowledge base for relevant chunks.

    Expected input:
      {
        "query": "string",
        "top_k": 5,
        "filters": { ...optional... }
      }

    Output (example shape):
      {
        "results": [
          {"id": "doc1#chunk3", "doc_id": "doc1", "score": 0.87, "snippet": "‚Ä¶"},
          ...
        ]
      }
    """
    query = input_dict.get("query", "")
    top_k = int(input_dict.get("top_k", 5))
    filters = input_dict.get("filters", {})

    # TODO: connect to your real vector DB / RAG pipeline.
    # For now, we just return a mock.
    return {
        "results": [
            {
                "id": "doc1#chunk1",
                "doc_id": "doc1",
                "score": 0.9,
                "snippet": f"Mock snippet for query='{query}' (top_k={top_k})",
            }
        ]
    }


def rag_get_document_tool(input_dict: Dict[str, Any]) -> Dict[str, Any]:
    """Fetch a full document by ID (stub).

    Input:
      {"doc_id": "doc1"}

    Output:
      {"doc_id": "doc1", "title": "...", "content": "..."}
    """
    doc_id = input_dict.get("doc_id", "unknown")
    # TODO: real DB / file lookup.
    return {
        "doc_id": doc_id,
        "title": f"Mock document {doc_id}",
        "content": "Full document content would go here in a real system."
    }


> üîÅ In your **real implementation**, you‚Äôd import your RAG library / vector DB client and implement the body of these functions using:
>
> - pgvector / Postgres  
> - Chroma / FAISS / Pinecone  
> - LangChain retrievers  
> - LlamaIndex query engines  



## 5. Integrating RAG Tools with the Agent Tool System

We now assume you are re-using the **Tool** / `ToolRegistry` abstractions from the Python hands-on notebook (04).

Here is how you would register RAG tools as agent tools:


In [None]:
# If running in same environment as 04, Tool & ToolRegistry are already defined.
# Otherwise, you would re-import or redefine them.

try:
    registry  # type: ignore[name-defined]
except NameError:
    from dataclasses import dataclass
    from typing import Callable, Dict, Any, List

    @dataclass
    class Tool:
        name: str
        description: str
        func: Callable[[Dict[str, Any]], Dict[str, Any]]

        def __call__(self, tool_input: Dict[str, Any]) -> Dict[str, Any]:
            return self.func(tool_input)

    class ToolRegistry:
        def __init__(self):
            self._tools: Dict[str, Tool] = {}

        def register(self, tool: Tool):
            self._tools[tool.name] = tool

        def get(self, name: str):
            return self._tools.get(name)

        def list_tools(self) -> List[Tool]:
            return list(self._tools.values())

    registry = ToolRegistry()

# Register RAG tools alongside other tools
registry.register(Tool(
    name="rag_search",
    description="Search the knowledge base (RAG) for relevant text chunks.",
    func=rag_search_tool,
))

registry.register(Tool(
    name="rag_get_document",
    description="Get a full document by its ID.",
    func=rag_get_document_tool,
))



## 6. Agentic RAG Loop: High-Level Flow

We now design a **ReAct-style Agentic RAG loop**:

1. User asks a question  
2. Agent:
   - decides if it needs retrieval  
   - possibly rewrites the query for retrieval  
   - calls `rag_search` one or more times  
   - inspects results  
   - decides if more retrieval is needed  
3. Agent synthesizes final answer, with:
   - references (doc IDs, titles, etc.)  
   - explanation of reasoning (optional)  



### 6.1 Prompting the Agent for Retrieval Decisions

We extend the **system prompt**:

- Tell the agent:
  - when to use `rag_search`
  - how many times
  - how to decide if context is enough  


In [None]:
AGENTIC_RAG_SYSTEM_PROMPT = """You are a careful, retrieval-aware assistant.

You can use tools to search a knowledge base and fetch documents.

Tools:
- rag_search(query, top_k, filters?) -> returns candidate chunks
- rag_get_document(doc_id) -> returns full document content

Guidelines:
- If the question requires factual or document-based knowledge, you MUST use rag_search at least once.
- If initial results seem insufficient, you may:
  - refine the query
  - search again
- Try not to call rag_search more than 3 times.
- When you have enough information, stop using tools and provide a final answer with brief citations.

Use this format at each step:
THOUGHT: <your reasoning>
ACTION: <tool_name or NONE>
ACTION_INPUT: <JSON object>
"""


### 6.2 Using the Existing SimpleAgent with Agentic RAG Prompt

We now reuse the `SimpleAgent` from the Python hands-on notebook, but with:

- updated system prompt  
- RAG tools in the registry  


In [None]:
# We assume SimpleAgent, parse_react_response, etc. are defined as in 04.
# If not in the current environment, you'd import them.

try:
    SimpleAgent  # type: ignore[name-defined]
except NameError:
    # Minimal fallback stub to avoid execution errors; in real usage, import from 04.
    from dataclasses import dataclass, field
    from typing import List, Dict, Any, Optional, Tuple
    import json, re

    def llm_complete(system_prompt: str, messages: List[Dict[str, str]]) -> str:
        # Placeholder simulated ReAct output
        return "THOUGHT: I will query the knowledge base.
ACTION: rag_search
ACTION_INPUT: {"query": "mock", "top_k": 1}"

    def parse_react_response(text: str) -> Tuple[str, str, Dict[str, Any]]:
        thought_match = re.search(r"THOUGHT:(.*)", text)
        action_match = re.search(r"ACTION:(.*)", text)
        input_match = re.search(r"ACTION_INPUT:(.*)", text, re.DOTALL)
        thought = thought_match.group(1).strip() if thought_match else ""
        action = action_match.group(1).strip() if action_match else "NONE"
        raw_input = input_match.group(1).strip() if input_match else "{}"
        try:
            action_input = json.loads(raw_input)
        except json.JSONDecodeError:
            action_input = {"raw": raw_input, "parse_error": True}
        return thought, action, action_input

    @dataclass
    class AgentStep:
        user: Optional[str] = None
        thought: Optional[str] = None
        action: Optional[str] = None
        action_input: Optional[Dict[str, Any]] = None
        observation: Optional[Dict[str, Any]] = None

    @dataclass
    class SimpleAgent:
        tools: Any
        max_steps: int = 5
        memory: List[AgentStep] = field(default_factory=list)

        def run(self, user_query: str) -> str:
            self.memory.clear()
            self.memory.append(AgentStep(user=user_query))
            for _ in range(self.max_steps):
                messages = []
                for s in self.memory:
                    if s.user:
                        messages.append({"role": "user", "content": s.user})
                    if s.thought:
                        messages.append({"role": "assistant", "content": f"THOUGHT: {s.thought}"})
                    if s.action:
                        messages.append({"role": "assistant", "content": f"ACTION: {s.action}"})
                    if s.observation is not None:
                        messages.append({"role": "user", "content": f"OBSERVATION: {s.observation}"})
                raw = llm_complete(AGENTIC_RAG_SYSTEM_PROMPT, messages)
                thought, action, action_input = parse_react_response(raw)
                step = AgentStep(thought=thought, action=action, action_input=action_input)
                if action.upper() == "NONE":
                    self.memory.append(step)
                    return thought or raw
                tool = self.tools.get(action)
                if not tool:
                    step.observation = {"error": f"Unknown tool '{action}'"}
                    self.memory.append(step)
                    return f"Unknown tool '{action}'"
                obs = tool(action_input or {})
                step.observation = obs
                self.memory.append(step)
            return "Max steps reached"



Now we can create an **Agentic RAG Agent** instance:


In [None]:
agentic_rag_agent = SimpleAgent(tools=registry, max_steps=3)

response = agentic_rag_agent.run("Explain what Agentic RAG is, based on the knowledge base.")
response



> üìù In a real environment, with a real `llm_complete` and RAG backend wired up:
>
> - The agent will call `rag_search` at least once  
> - It may call `rag_get_document` if necessary  
> - It will then stop with `ACTION: NONE` and a final **THOUGHT** that forms your answer  



## 7. Multi-Step Agentic RAG with Reflection (Conceptual)

You can enhance the loop with **explicit reflection**:

1. First round:
   - agent retrieves & drafts an answer  
2. Reflection step:
   - agent (or another agent) critiques:
     - ‚ÄúIs the answer well-supported by sources?‚Äù
3. If not satisfied:
   - refine queries  
   - retrieve more  
   - update answer  

This can be implemented with:

- additional roles (Critic Agent)  
- or additional steps in the same agent loop  



### 7.1 Example Reflection Prompt (Concept Only)

> ‚ÄúGiven the question, the retrieved context, and your draft answer, analyze:
> 
> - Did you use the most relevant parts of the context?
> - Did you hallucinate anything not supported?
> - Are there any obvious missing angles?
> 
> If issues exist, propose:
> 
> - a refined retrieval query, or
> - a corrected answer, or both.‚Äù



## 8. Agentic RAG Design Checklist

When designing Agentic RAG in your real system:

- [ ] Does the agent know **when** to use retrieval?  
- [ ] Does it know which **retrieval tool** to use? (dense/sparse/hybrid)  
- [ ] Is there a **limit** on retrieval calls per request?  
- [ ] Are retrieved chunks:
  - logged?
  - traceable to sources?  
- [ ] Does the agent ever **self-check** its answer?  
- [ ] Are there **fallback behaviors**:
  - low-confidence answer
  - ‚ÄúI don‚Äôt know‚Äù
  - ask user for clarification  

Combined with your **RAG Universe** and **Agents Universe**, this notebook:

- completes the conceptual & practical bridge for **Agentic RAG**,  
- and prepares you for the next notebooks:

  - `09_MCP_with_Agents.ipynb`
  - `10_Full_RAG_Agents_MCP.ipynb`
