# Lesson 4: Building a Multi-Document Agent

## Setup

In [2]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [3]:
import nest_asyncio
nest_asyncio.apply()

## 1. Setup an agent over 3 papers

**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`.

In [4]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "pdf/metagpt.pdf",
    "pdf/longlora.pdf",
    "pdf/selfrag.pdf",
]

In [5]:
from utils import get_doc_tools2
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools2(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: pdf/metagpt.pdf
Getting tools for paper: pdf/longlora.pdf
Getting tools for paper: pdf/selfrag.pdf


In [11]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [9]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [12]:
len(initial_tools)

6

In [13]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [14]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation dataset"}
=== Function Output ===
The evaluation dataset used in the experiments described in the context is the PG19 test split.
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation results"}
=== Function Output ===
The evaluation results show that the models achieve better perplexity with longer context sizes. Increasing the context window size leads to improved perplexity values. Additionally, the models are fine-tuned on different context lengths, such as 100k, 65536, and 32768, and achieve promising results on these large settings. However, there is some perplexity degradation observed on small context sizes for the extended models, which is a known limitation of Position Interpolation.
=== LLM Response ===
The ev

In [15]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework that improves the quality and factuality of large language models by incorporating retrieval on demand and self-reflection. It trains a single arbitrary LM to adaptively retrieve passages, generate text informed by these passages, and critique its own output using special tokens called reflection tokens. This framework outperforms existing models on various tasks, demonstrating enhanced performance in terms of factuality, correctness, and citation accuracy. Additionally, Self-RAG evaluates text generation by assessing factual relevance, supportiveness, and overall utility of the generated content, ensuring that the output aligns with given instructions and evidence.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
=== 

## 2. Setup an agent over 11 papers

### Download 11 ICLR papers

In [16]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "pdf/metagpt.pdf",
    "pdf/longlora.pdf",
    "pdf/loftq.pdf",
    "pdf/swebench.pdf",
    "pdf/selfrag.pdf",
    "pdf/zipformer.pdf",
    "pdf/values.pdf",
    "pdf/finetune_fair_diffusion.pdf",
    "pdf/knowledge_card.pdf",
    "pdf/metra.pdf",
    "pdf/vr_mcl.pdf"
]

To download these papers, below is the needed code:


    #for url, paper in zip(urls, papers):
         #!wget "{url}" -O "{paper}"
    
    
**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`.

In [17]:
from utils import get_doc_tools2
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools2(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: pdf/metagpt.pdf
Getting tools for paper: pdf/longlora.pdf
Getting tools for paper: pdf/loftq.pdf
Getting tools for paper: pdf/swebench.pdf
Getting tools for paper: pdf/selfrag.pdf
Getting tools for paper: pdf/zipformer.pdf
Getting tools for paper: pdf/values.pdf
Getting tools for paper: pdf/finetune_fair_diffusion.pdf
Getting tools for paper: pdf/knowledge_card.pdf
Getting tools for paper: pdf/metra.pdf
Getting tools for paper: pdf/vr_mcl.pdf


### Extend the Agent with Tool Retrieval

In [18]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [19]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [20]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [21]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)

In [22]:
tools[2].metadata

ToolMetadata(description='Useful for summarization questions related to swebench', name='summary_tool_swebench', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [23]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [24]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation dataset"}
=== Function Output ===
The evaluation dataset used in the study includes two public benchmarks, HumanEval and MBPP, along with a self-generated, more challenging software development benchmark named SoftwareDev. The HumanEval benchmark consists of 164 handwritten programming tasks, while the MBPP benchmark comprises 427 Python tasks. The SoftwareDev dataset contains 70 representative examples of software development tasks covering diverse scopes such as mini-games, image processing algorithms, and data visualization. These datasets serve as a robust testbed for evaluating the performance of MetaGPT in software development tasks.
=== Calling Function ===
Calling function: summary_tool_swebench with args: {"input": "evaluation dataset"}
=== Function Output ===
The eva

In [25]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "Analyzing the approach in the LongLoRA paper."}
=== Function Output ===
The approach introduced in the LongLoRA paper focuses on enhancing the context length of large language models by incorporating shifted sparse attention (S2-Attn) during training to approximate standard self-attention patterns. This method enables extending the context window of models like Llama2 7B and 13B to significantly larger lengths, such as 100k and 32k respectively, on a single 8× A100 machine. The approach maintains the original attention architecture during inference, ensuring compatibility with existing optimization and infrastructure. Furthermore, the paper introduces a supervised fine-tuning solution using the LongAlpaca dataset to enhance chat ability in large language models. Additionall