# Agentic RAG with LlamaIndex


In this notebook we will experiment the tool calling capability of our LLM.

- Define a reader to read the `pdf` sample file [AraGPT2](./data/aragpt2.pdf) paper.
- Define a `splitter` to process the texts of the document.
- Set the LLM embedding and generation model ids.
- Create the engines from the Indexes and define a tool wrapper around them.
- Call the LLM with the defined tools and see the results.
- Ensure that the LLM pciks the right tool.


## Setups


In [1]:
from rich import print
from dotenv import load_dotenv

In [2]:
# load env variables
_ = load_dotenv()

In [3]:
# define some constants
GENERATION_MODEL_ID = "gpt-4o-mini"
EMBEDDING_MODEL_ID = "text-embedding-3-small"

## Load Documents


In [4]:
from llama_index.core import SimpleDirectoryReader

documents_reader = SimpleDirectoryReader(input_files=["./data/aragpt2.pdf"])
documents = documents_reader.load_data()

In [5]:
print(documents[0])

In [6]:
from llama_index.core.node_parser import SentenceSplitter

sentence_splitter = SentenceSplitter(chunk_size=512, chunk_overlap=32)
nodes = sentence_splitter.get_nodes_from_documents(documents)

In [7]:
print(len(nodes))
print(nodes[2])

## Define Vector Index


In [8]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes=nodes)

## Define Vector Tool with Filters


In [9]:
from llama_index.core.vector_stores import MetadataFilters

# define a filter to get answers from specific pages
vector_query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts([{"key": "page_label", "value": "1"}])
)

In [10]:
query = "What are the AraGPT2 model sizes?"
response = vector_query_engine.query(query)

In [11]:
print(response.response)

In [12]:
for n in response.source_nodes:
    print(n.metadata)

In [13]:
from llama_index.core.vector_stores import FilterCondition

# defining a vectro query fn that will limit the query to a certain page
# type hints and docs are a must when defining LLM tools


def query_vector_engine(query: str, page_numbers: list[str]):
    """Perform vector search over an index.
    
    inputs:
        query (str): the string query to be embedded.
        page_numbers (list[str]): filter by set of pages. leave BLANK if we want to \
perform a vector search over all pages. otherwise, filter by the set of specified pages.
    """
    metadata_dicts = [
        {"key": "page_label", "value": page_number} for page_number in page_numbers
    ]
    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts, condition=FilterCondition.OR
        ),
    )
    response = query_engine.query(query)
    return response

In [14]:
from llama_index.core.tools import FunctionTool

query_vector_engine_tool = FunctionTool.from_defaults(
    fn=query_vector_engine, name="query_vector_engine_tool"
)

In [15]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model=GENERATION_MODEL_ID)

In [16]:
query = "What are the AraGPT2 model sizes described in page number 1?"
response = llm.predict_and_call(
    tools=[query_vector_engine_tool], verbose=True, user_msg=query
)

=== Calling Function ===
Calling function: query_vector_engine_tool with args: {"query": "AraGPT2 model sizes", "page_numbers": ["1"]}
=== Function Output ===
The AraGPT2 model comes in 4 size variants: base (135M), medium (370M), large (792M), and mega (1.46B).


In [17]:
print(response.response)

In [18]:
for n in response.source_nodes:
    print(n.metadata)

## Define Summary Tool


In [19]:
from llama_index.core import SummaryIndex

summary_index = SummaryIndex(nodes=nodes)

In [20]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata


summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize")
summary_tool = QueryEngineTool(
    query_engine=summary_query_engine,
    metadata=ToolMetadata(
        name="summary_tool",
        description="Useful for summarization questions related to the aragpt2 paper."
    ),
)

In [21]:
query = "What are the AraGPT2 model sizes described in page number 1?"
response = llm.predict_and_call(
    tools=[query_vector_engine_tool, summary_tool], verbose=True, user_msg=query
)

=== Calling Function ===
Calling function: query_vector_engine_tool with args: {"query": "AraGPT2 model sizes", "page_numbers": ["1"]}
=== Function Output ===
The AraGPT2 model comes in 4 size variants: base (135M), medium (370M), large (792M), and mega (1.46B).


In [22]:
print(response.response)

In [23]:
query = "Summarize the AraGPT2 paper for me please."
response = llm.predict_and_call(
    tools=[query_vector_engine_tool, summary_tool], verbose=True, user_msg=query
)

=== Calling Function ===
Calling function: summary_tool with args: {"input": "AraGPT2: A Pre-trained Arabic Language Model\n\nAbstract: AraGPT2 is a pre-trained language model specifically designed for the Arabic language, based on the GPT-2 architecture. This paper presents the model's architecture, training methodology, and evaluation on various Arabic NLP tasks. The authors highlight the importance of pre-trained models for low-resource languages like Arabic and demonstrate AraGPT2's effectiveness in generating coherent and contextually relevant text.\n\n1. Introduction: The introduction discusses the challenges faced in Arabic NLP, including dialectal variations and the lack of large annotated datasets. The authors emphasize the need for a robust pre-trained model to improve performance on downstream tasks.\n\n2. Model Architecture: AraGPT2 is built on the transformer architecture, similar to GPT-2, with modifications to better suit the Arabic language. The model incorporates a lar

In [24]:
print(response.response)