# Lesson 4: Building a Multi-Document Agent

## Setup

In [12]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [13]:
import nest_asyncio
nest_asyncio.apply()

## 1. Setup an agent over 3 papers

We can create vector search and summary tools respectively for each of the 3 papers, which then result in 6 tools in total. These tools can then be made available to the agent worker. 

In [14]:
papers = [
    "docs/metagpt.pdf",
    "docs/longlora.pdf",
    "docs/selfrag.pdf",
]

In [4]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: docs/metagpt.pdf
Getting tools for paper: docs/longlora.pdf
Getting tools for paper: docs/selfrag.pdf


In [5]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [6]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [7]:
len(initial_tools)

6

In [15]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [16]:
# The agent can then choose the appropriate tools for each steps 
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation dataset"}
=== Function Output ===
The evaluation dataset used in the provided context includes the RedPajama dataset, PG19 validation split, book corpus dataset PG19, cleaned Arxiv Math proof-pile dataset, LongBench, LEval, and LongAlpaca-12k.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation results"}
=== Function Output ===
The evaluation results across various experiments and comparisons demonstrate the effectiveness of different methods in extending context windows, achieving comparable performance to full fine-tuning while reducing computational costs, improving efficiency, maintaining performance in long context adaptation, and showcasing state-of-the-art performance in handling long-context tasks. The resul

In [17]:
# we can chat/query about the LLM's reasoning process
response = agent.chat(
    "Tell me about the tools you leverage to answer the above question."
)

Added user message to memory: Tell me about the tools you leverage to answer the above question.
=== LLM Response ===
To provide information on the evaluation dataset and evaluation results in LongLoRA, I utilized the following tools:

1. **LongLoRA Summary Tool**: This tool was used to extract a summary of the evaluation dataset used in LongLoRA, including details about the datasets involved in the evaluation process.

2. **LongLoRA Summary Tool**: I also used this tool to generate a summary of the evaluation results in LongLoRA, which includes insights into the performance of different methods, comparisons, and configurations in various experiments.

These tools helped me efficiently gather and present the relevant information about the evaluation dataset and evaluation results in LongLoRA.


## 2. Setup an agent over more papers

When we want to utilize more papers as our knowledge base, we could still create vector search and summary tools for each of these papers, but this will result in more tokens in each query (hence higher cost and latency). In addition, the outline can get confused, and the LLM may fail to pick the right tool when there are too many choices.  

Therefore, let's create *Retrieval Argumentation* over these tools. That is, create a retrievor that can retrieve the tools that are the most relevant to the query, then instead of passing all the tools to the agent as we did in L3 and above, just pass the retriever to the agent.

![RAG over tools](image/RAG-over-tools.PNG)

In [18]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: docs/metagpt.pdf
Getting tools for paper: docs/longlora.pdf
Getting tools for paper: docs/selfrag.pdf


### Extend the Agent with Tool Retrieval

In [19]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [21]:
# define an "object" index over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [22]:
# define the "retriever" over the index with specified retrieval method
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [31]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and longLora"
)

In [35]:
# check for the top 3 tools selected
tools[0].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_longlora', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [36]:
# Define the agentWorker and agentRunner
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [39]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against longLora"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against longLora
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation datasets used in MetaGPT are HumanEval, MBPP, and SoftwareDev.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation dataset used in longLora"}
=== Function Output ===
The evaluation dataset used in LongLoRA is the PG19 validation set from Rae et al., 2020.
=== LLM Response ===
The evaluation datasets used in MetaGPT are HumanEval, MBPP, and SoftwareDev. On the other hand, the evaluation dataset used in longLora is the PG19 validation set from Rae et al., 2020.
assistant: The evaluation datasets used in MetaGPT are HumanEval, MBPP, and SoftwareDev. On the other hand, the evaluation dataset used in longLora is the PG19 validation set from Rae et al., 2020.
