# 3.d RAG as subagent

In this notebook you will see
- How to have the RAG as an agent called by another agent (avoid having to explain to the main agent how to interpret the sources)

Again, a lot of improvements could be imagined there.

What we did in previous sections is reused.

# Setup

In [6]:
import os
from typing import Any
import re
import shutil

from docling.document_converter import DocumentConverter

from conversational_toolkit.vectorstores.chromadb import ChromaDBVectorStore
from conversational_toolkit.llms.base import LLMMessage, Roles
from conversational_toolkit.tools.base import Tool
from conversational_toolkit.agents.tool_agent import ToolAgent
from conversational_toolkit.agents.base import QueryWithContext
from conversational_toolkit.embeddings.openai import OpenAIEmbeddings
from conversational_toolkit.llms.openai import OpenAILLM
from conversational_toolkit.chunking.base import Chunk

from utils.specific_chunker import SpecificCharChunker

In [7]:
path_to_docs = "data/docs"
path_to_document = os.path.join(path_to_docs, "alexnet_paper.pdf")

path_to_db = "data/db"
path_to_vectorstore = os.path.join(path_to_db, "example.db")

In [8]:
doc_converter = DocumentConverter()

conv_res = doc_converter.convert(path_to_document)
md = conv_res.document.export_to_markdown()

# replace \n per " ", as often just new lines
md = re.sub(r"(?<!\n)\n(?!\n)", " ", md)

doc_title_to_document = {"alexnet_paper.pdf": md}

chunker = SpecificCharChunker()
chunks = chunker.make_chunks(
    split_characters=["\n\n\n", "\n\n", "\n"],
    document_to_text=doc_title_to_document,
    max_number_of_characters=1024,
)

if os.path.exists(path_to_vectorstore):
    shutil.rmtree(path_to_vectorstore)
embedding_model = OpenAIEmbeddings(model_name="text-embedding-3-small")
embeddings = await embedding_model.get_embeddings([c.content for c in chunks])
vector_store = ChromaDBVectorStore(path_to_vectorstore)

await vector_store.insert_chunks(chunks=chunks, embedding=embeddings)

[32m[INFO] 2026-02-26 15:22:33,054 [RapidOCR] base.py:22: Using engine_name: onnxruntime[0m
[32m[INFO] 2026-02-26 15:22:33,068 [RapidOCR] download_file.py:60: File exists and is valid: C:\Users\sieverin\SDSC\Code\sme-kt-zh-collaboration-rag\rag_venv\Lib\site-packages\rapidocr\models\ch_PP-OCRv4_det_infer.onnx[0m
[32m[INFO] 2026-02-26 15:22:33,068 [RapidOCR] main.py:53: Using C:\Users\sieverin\SDSC\Code\sme-kt-zh-collaboration-rag\rag_venv\Lib\site-packages\rapidocr\models\ch_PP-OCRv4_det_infer.onnx[0m
[32m[INFO] 2026-02-26 15:22:33,146 [RapidOCR] base.py:22: Using engine_name: onnxruntime[0m
[32m[INFO] 2026-02-26 15:22:33,148 [RapidOCR] download_file.py:60: File exists and is valid: C:\Users\sieverin\SDSC\Code\sme-kt-zh-collaboration-rag\rag_venv\Lib\site-packages\rapidocr\models\ch_ppocr_mobile_v2.0_cls_infer.onnx[0m
[32m[INFO] 2026-02-26 15:22:33,149 [RapidOCR] main.py:53: Using C:\Users\sieverin\SDSC\Code\sme-kt-zh-collaboration-rag\rag_venv\Lib\site-packages\rapidocr\mod

In [9]:
def chunks_to_text(chunks: list[Chunk]) -> str:
    text = ""

    for chunk in chunks:
        text += (
            f"## Chunk {chunk.title}:\n```\n{chunk.content}\n```\n" + "-" * 30 + "\n\n"
        )

    text = text[:-4]

    return text


class RetrieveRelevantChunks(Tool):
    def __init__(
        self,
        name: str,
        description: str,
        parameters: dict[str, Any],
    ):
        self.name = name
        self.description = description
        self.parameters = parameters

    async def call(self, args: dict[str, Any]) -> dict[str, Any]:
        query = args.get("query")
        top_k = args.get("top_k", 5)

        if top_k > 10:
            raise ValueError("top_k cannot be greater than 10.")

        query_embedding = await embedding_model.get_embeddings([query])
        retrieved_chunks = await vector_store.get_chunks_by_embedding(
            embedding=query_embedding, top_k=top_k
        )

        retrieved_chunks_as_text = chunks_to_text(retrieved_chunks)

        return {"result": retrieved_chunks_as_text}


alexnet_retriever_tool = RetrieveRelevantChunks(
    name="retrieve_relevant_chunks",
    description="Retrieves the most relevant chunks from the AlexNet paper based on a query.",
    # What parameters it expects
    parameters={
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The query to retrieve relevant chunks for.",
            },
            "top_k": {
                "type": "number",
                "description": "The number of top relevant chunks to retrieve, maximum is 10.",
            },
        },
        "required": ["query"],
        "additionalProperties": False,
    },
)

# Setup RAG subagent

In [10]:
class RAGAgent(ToolAgent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)


llm = OpenAILLM(tool_choice="auto", tools=[alexnet_retriever_tool])

# Define the prompt
prompt = "You are a helpful assistant, answer shortly. Use the tools only when they are relevant, but if you do so, trust the results from the tools and use them in your answer, cite them precisely if you use them."
prompt_as_message = LLMMessage(content=prompt, role=Roles.SYSTEM)

2026-02-26 15:22:45.215 | DEBUG    | conversational_toolkit.llms.openai:__init__:63 - OpenAI LLM loaded: gpt-4o-mini; temperature: 0.5; seed: 42; tools: [<__main__.RetrieveRelevantChunks object at 0x00000248592582F0>]; tool_choice: auto; response_format: {'type': 'text'}


In [11]:
rag_agent = RAGAgent(system_prompt=prompt, llm=llm, max_steps=5)

# Define it as a tool

In [12]:
class RAGAgentAsTool(Tool):
    def __init__(
        self,
        name: str,
        description: str,
        parameters: dict[str, Any],
    ):
        self.name = name
        self.description = description
        self.parameters = parameters

    async def call(self, args: dict[str, Any]) -> dict[str, Any]:
        query = args.get("query")

        answer = await rag_agent.answer(
            query_with_context=QueryWithContext(query=query, history=[])
        )
        answer_as_text = str(answer.content)

        return {"result": answer_as_text}


rag_agent_as_tool_tool = RAGAgentAsTool(
    name="rag_agent_as_tool",
    description="Uses the RAG agent to answer queries based on the AlexNet paper.",
    parameters={
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The query to retrieve relevant chunks for.",
            },
        },
        "required": ["query"],
        "additionalProperties": False,
    },
)

# Define the Main Agent


In [13]:
class MainAgent(ToolAgent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)


llm_main_agent = OpenAILLM(tool_choice="auto", tools=[rag_agent_as_tool_tool])

# Define the prompt, no need to explain the sources
prompt_main_agent = "You are a helpful assistant, answer shortly."
prompt_as_message_main_agent = LLMMessage(content=prompt_main_agent, role=Roles.SYSTEM)

main_agent = MainAgent(system_prompt=prompt_main_agent, llm=llm_main_agent, max_steps=5)

2026-02-26 15:22:45.517 | DEBUG    | conversational_toolkit.llms.openai:__init__:63 - OpenAI LLM loaded: gpt-4o-mini; temperature: 0.5; seed: 42; tools: [<__main__.RAGAgentAsTool object at 0x0000024B4CA50200>]; tool_choice: auto; response_format: {'type': 'text'}


# Test the agent

## First Simple Question

In [14]:
conversation = [prompt_as_message_main_agent]

In [15]:
query = "What is a CNN?"
query_as_message = LLMMessage(content=query, role=Roles.USER)
query_with_context = QueryWithContext(query=query, history=[])

answer = await main_agent.answer(query_with_context)

2026-02-26 15:22:47.652 | DEBUG    | conversational_toolkit.agents.tool_agent:answer_stream:106 - [{'content': 'A CNN, or Convolutional Neural Network, is a type of deep learning model specifically designed for processing structured grid data, such as images. It utilizes convolutional layers to automatically detect and learn features from input data, making it particularly effective for tasks like image recognition, classification, and object detection. CNNs can capture spatial hierarchies in data through their layered architecture, which typically includes convolutional layers, pooling layers, and fully connected layers.', 'tool_calls': [], 'role': <Roles.ASSISTANT: 'assistant'>, 'function_name': 'llm'}]


In [16]:
print(answer.content)

A CNN, or Convolutional Neural Network, is a type of deep learning model specifically designed for processing structured grid data, such as images. It utilizes convolutional layers to automatically detect and learn features from input data, making it particularly effective for tasks like image recognition, classification, and object detection. CNNs can capture spatial hierarchies in data through their layered architecture, which typically includes convolutional layers, pooling layers, and fully connected layers.


In [17]:
conversation += [query_as_message, answer]

## Follow Up about AlexNet

And test remembers conversation.

In [18]:
query = "What are the top-1 and top-5 scores obtained on 'ILSVRC-2010' by AlexNet? How does AlexNet relate to my first question? Answer to both in few sentences."
query_as_message = LLMMessage(content=query, role=Roles.USER)
query_with_context = QueryWithContext(query=query, history=conversation)

answer = await main_agent.answer(query_with_context)

2026-02-26 15:22:51.446 | INFO     | conversational_toolkit.embeddings.openai:get_embeddings:38 - OpenAI embeddings shape: (1, 1024)
2026-02-26 15:22:53.346 | DEBUG    | conversational_toolkit.agents.tool_agent:answer_stream:106 - [{'content': '', 'tool_calls': [ToolCall(id='call_BlRYBhLq1KFfm0sfqhMKnESL', function=Function(name='retrieve_relevant_chunks', arguments='{"query":"top-1 and top-5 scores ILSVRC-2010 AlexNet"}'), type='function')], 'role': <Roles.ASSISTANT: 'assistant'>, 'function_name': 'llm'}, {'result': "## Chunk 67:\n```\nOur results on ILSVRC-2010 are summarized in Table 1. Our network achieves top-1 and top-5 test set error rates of 37.5% and 17.0% 5 . The best performance achieved during the ILSVRC2010 competition was 47.1% and 28.2% with an approach that averages the predictions produced from six sparse-coding models trained on different features [2], and since then the best published results are 45.7% and 25.7% with an approach that averages the predictions of two c

In [19]:
print(answer.content)

AlexNet achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, on the ILSVRC-2010 test set, significantly outperforming previous models at that time.

AlexNet is a specific implementation of a Convolutional Neural Network (CNN) designed for image classification. It consists of multiple convolutional and fully connected layers, showcasing the effectiveness of deep CNNs in handling complex visual recognition tasks. Its architecture and innovative techniques advanced the field of deep learning significantly.


----------------