# Agentic RAG with LlamaIndex


In this notebook we will experiment RAG with multi-document agent.

- Define a reader to read the `pdf` sample file [AraGPT2](./data/aragpt2.pdf) paper.
- Define a `splitter` to process the texts of the document.
- Set the LLM embedding and generation model ids.
- Create the engines from the Indexes and define a tool wrapper around them.
- Create Index for tool objects.
- Define the agent worker and agent runner that utilize memory.
- Excute the multi-docs agent.


## Setups


In [1]:
from rich import print
from dotenv import load_dotenv

In [2]:
# load env variables
_ = load_dotenv()

In [3]:
# define some constants
GENERATION_MODEL_ID = "gpt-4o-mini"
EMBEDDING_MODEL_ID = "text-embedding-3-small"

## Load Documents


In [16]:
from llama_index.core.schema import TextNode
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter



def get_nodes(file_paths: list[str]) -> dict[str, TextNode]:
    """Extract text nodes from documents.
    
    inputs:
        file_paths (list[str]): paths to pdf files. must be unique.
    returns:
        nodes_dict (dict[str, TextNode]): mapping of file paths to nodes.
    """
    nodes_dict = {file_path.split("/")[-1].split(".")[0]: [] for file_path in file_paths}
    documents_reader = SimpleDirectoryReader(input_files=file_paths)
    documents = documents_reader.load_data()
    sentence_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=64)
    nodes = sentence_splitter.get_nodes_from_documents(documents)
    for node in nodes:
        nodes_dict[node.metadata["file_name"].split(".")[0]].append(node)
    return nodes_dict

In [17]:
import glob

file_paths = glob.glob("data/*")
print(file_paths)

In [18]:
nodes = get_nodes(file_paths=file_paths)

In [20]:
print(nodes.keys())