# Agentic RAG with LlamaIndex


In this notebook we will experiment the agent worker and runner concept.

- Define a reader to read the `pdf` sample file [AraGPT2](./data/aragpt2.pdf) paper.
- Define a `splitter` to process the texts of the document.
- Set the LLM embedding and generation model ids.
- Create the engines from the Indexes and define a tool wrapper around them.
- Define the agent worker and agent runner that utilize memory.
- Debug the results by manually excuting the tasks.


## Setups


In [1]:
from rich import print
from dotenv import load_dotenv

In [2]:
# load env variables
_ = load_dotenv()

In [3]:
# define some constants
GENERATION_MODEL_ID = "gpt-4o-mini"
EMBEDDING_MODEL_ID = "text-embedding-3-small"

## Load Documents


In [4]:
from llama_index.core import SimpleDirectoryReader

documents_reader = SimpleDirectoryReader(input_files=["./data/aragpt2.pdf"])
documents = documents_reader.load_data()

In [5]:
print(documents[0])

In [6]:
from llama_index.core.node_parser import SentenceSplitter

sentence_splitter = SentenceSplitter(chunk_size=512, chunk_overlap=32)
nodes = sentence_splitter.get_nodes_from_documents(documents)

In [7]:
print(len(nodes))
print(nodes[2])

## Define Backend Settings


In [8]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model=GENERATION_MODEL_ID)
Settings.embed_model = OpenAIEmbedding(model=EMBEDDING_MODEL_ID)

## Define Vector and Summary Indecies


In [9]:
from llama_index.core import VectorStoreIndex, SummaryIndex

vector_index = VectorStoreIndex(nodes=nodes)
summary_index = SummaryIndex(nodes=nodes)

## Convert Indecies to Tools


In [10]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

In [11]:
vector_engine = vector_index.as_query_engine()
vector_tool = QueryEngineTool(
    query_engine=vector_engine,
    metadata=ToolMetadata(
        name="vector_tool",
        description="Useful for retrieving specific context from the aragpt2 paper.",
    ),
)

In [12]:
summary_engine = summary_index.as_query_engine(response_mode="tree_summarize")
summary_tool = QueryEngineTool(
    query_engine=summary_engine,
    metadata=ToolMetadata(
        name="summary_tool",
        description="Useful for summarization questions related to the aragpt2 paper.",
    ),
)

## Define Agent Worker and Runner


In [13]:
from llama_index.core.agent import FunctionCallingAgentWorker, AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tools=[vector_tool, summary_tool], verbose=True
)
agent_runner = AgentRunner(agent_worker=agent_worker)

In [14]:
query = "What datasets were used to train the AraGPT2 model?"
response = agent_runner.chat(query)

Added user message to memory: What datasets were used to train the AraGPT2 model?
=== Calling Function ===
Calling function: vector_tool with args: {"input": "datasets used to train the AraGPT2 model"}
=== Function Output ===
The training dataset for the AraGPT2 model includes the following publicly available Arabic corpora:

- The unshuffled OSCAR corpus
- The Arabic Wikipedia dump from September 2020
- The 1.5B words Arabic Corpus
- The OSIAN corpus
- News articles provided by As-safir newspaper

Additionally, the dataset underwent preprocessing, which involved filtering out short documents, removing repeated sentences, and replacing URLs, emails, and user mentions with special tokens, among other modifications.
=== LLM Response ===
The AraGPT2 model was trained on several publicly available Arabic corpora, including:

- The unshuffled OSCAR corpus
- The Arabic Wikipedia dump from September 2020
- The 1.5B words Arabic Corpus
- The OSIAN corpus
- News articles from the As-safir newsp

In [15]:
query = "What was the combined size in GBs of those data?"
response = agent_runner.chat(query)

Added user message to memory: What was the combined size in GBs of those data?
=== Calling Function ===
Calling function: vector_tool with args: {"input": "combined size in GBs of datasets used to train AraGPT2 model"}
=== Function Output ===
The combined size in GBs of the datasets used to train the AraGPT2 model is not specified in the provided information.
=== LLM Response ===
The combined size in GBs of the datasets used to train the AraGPT2 model is not specified in the available information.


In [16]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

## Agent Debugging


In [17]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    tools=[vector_tool, summary_tool], verbose=True
)
agent_runner = AgentRunner(agent_worker=agent_worker)

In [18]:
query = "What datasets were used to train the AraGPT2 models and what evaluation methods were used?"

In [19]:
task = agent_runner.create_task(input=query)

In [20]:
print(task)

In [21]:
step_output = agent_runner.run_step(task_id=task.task_id)

Added user message to memory: What datasets were used to train the AraGPT2 models and what evaluation methods were used?
=== Calling Function ===
Calling function: vector_tool with args: {"input": "datasets used to train AraGPT2 models"}
=== Function Output ===
The training dataset for the AraGPT2 models includes the following publicly available Arabic corpora:

- The unshuffled OSCAR corpus
- The Arabic Wikipedia dump from September 2020
- The 1.5B words Arabic Corpus
- The OSIAN corpus
- News articles provided by As-safir newspaper

Additionally, the dataset was preprocessed by filtering out short documents, removing repeated sentences, and replacing URLs, emails, and user mentions with special tokens, among other modifications.
=== Calling Function ===
Calling function: vector_tool with args: {"input": "evaluation methods used for AraGPT2 models"}
=== Function Output ===
The evaluation methods used for AraGPT2 models include an automatic evaluation based on the perplexity measure, a

In [22]:
completed_steps = agent_runner.get_completed_steps(task_id=task.task_id)

In [23]:
print(len(completed_steps))

In [24]:
upcoming_steps = agent_runner.get_upcoming_steps(task_id=task.task_id)

In [25]:
print(upcoming_steps)

In [26]:
step_output = agent_runner.run_step(task_id=task.task_id)

=== LLM Response ===
The AraGPT2 models were trained using several publicly available Arabic corpora, including:

- The unshuffled OSCAR corpus
- The Arabic Wikipedia dump from September 2020
- The 1.5B words Arabic Corpus
- The OSIAN corpus
- News articles provided by As-safir newspaper

The dataset was preprocessed by filtering out short documents, removing repeated sentences, and replacing URLs, emails, and user mentions with special tokens, among other modifications.

For evaluation, the AraGPT2 models employed two methods:

1. An automatic evaluation based on the perplexity measure.
2. A human-based evaluation that assesses the model's ability to deceive human evaluators.


In [27]:
print(step_output)

In [28]:
agent_runner.get_upcoming_steps(task_id=task.task_id) # no upcomming steps

[]

In [29]:
response = agent_runner.finalize_response(task_id=task.task_id)
print(response.response)