# Experiment 4: Building a Multi-Document Agent

## Setup

In [1]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio
nest_asyncio.apply()

## 1. Setup an agent over 3 papers

In [3]:
urls = [
    "https://openreview.net/attachment?id=Yh0a6Xpey6&name=pdf",
    "https://openreview.net/attachment?id=MWCuvhSFPI&name=pdf",
    "https://openreview.net/attachment?id=vsaEOFOUyY&name=pdf",
]

papers = [
    "RISeg.pdf",
    "Online_3D_Edge_Reconstructi.pdf",
    "KnotDLO_Toward_Interpretabl.pdf"
]

In [4]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: RISeg.pdf
Getting tools for paper: Online_3D_Edge_Reconstructi.pdf
Getting tools for paper: KnotDLO_Toward_Interpretabl.pdf


In [5]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [6]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [7]:
len(initial_tools)

6

In [8]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [9]:
response = agent.query(
    "What is the core problem addressed by this paper, "
    "and what existing limitations in the current state-of-the-art motivated this work?"
)

Added user message to memory: What is the core problem addressed by this paper, and what existing limitations in the current state-of-the-art motivated this work?
=== Calling Function ===
Calling function: summary_tool_RISeg with args: {"input": "core problem addressed by the paper"}
=== Function Output ===
The core problem addressed by the paper is improving the accuracy of object instance segmentation in cluttered scenes through the introduction of a novel approach that corrects segmentation inaccuracies, such as under-segmentation, using robot interactions and a designed body frame-invariant feature (BFIF).
=== Calling Function ===
Calling function: summary_tool_RISeg with args: {"input": "existing limitations in the current state-of-the-art that motivated this work"}
=== Function Output ===
The existing limitations in the current state-of-the-art that motivated this work include challenges related to under and over segmentation in cluttered scenes when performing unseen object inst

In [10]:
response = agent.query("How is fairness, robustness, or interpretability addressed within the context of the paper's main contribution?")
print(str(response))

Added user message to memory: How is fairness, robustness, or interpretability addressed within the context of the paper's main contribution?
=== Calling Function ===
Calling function: vector_tool_RISeg with args: {"query": "fairness, robustness, interpretability"}
=== Function Output ===
Fairness, robustness, and interpretability are important aspects to consider in the development of interactive perception frameworks like RISeg.
=== Calling Function ===
Calling function: vector_tool_Online_3D_Edge_Reconstructi with args: {"query": "fairness, robustness, interpretability"}
=== Function Output ===
The proposed online 3D edge reconstruction framework focuses on efficiency and completeness in reconstructing wiry objects for robotic manipulation tasks. It emphasizes accuracy in estimating configurations and target slots, showcasing its robustness in handling such structures. The framework's utilization of a Bayesian approach for updating beliefs ensures interpretability in generating spar

## 2. Setup an agent over 5 papers

In [11]:
urls = [
    "https://openreview.net/attachment?id=Yh0a6Xpey6&name=pdf",
    "https://openreview.net/attachment?id=MWCuvhSFPI&name=pdf",
    "https://openreview.net/attachment?id=vsaEOFOUyY&name=pdf",
    "https://openreview.net/attachment?id=s86mu1ovz4&name=pdf",
    "https://openreview.net/attachment?id=1mwJlHsS19&name=pdf"
]

papers = [
    "RISeg.pdf",
    "Online_3D_Edge_Reconstructi.pdf",
    "KnotDLO_Toward_Interpretabl.pdf",
    "Incorporating_Foundation_Model.pdf",
    "Distilling_Semantic_Feature.pdf",
]

In [12]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: RISeg.pdf
Getting tools for paper: Online_3D_Edge_Reconstructi.pdf
Getting tools for paper: KnotDLO_Toward_Interpretabl.pdf
Getting tools for paper: Incorporating_Foundation_Model.pdf
Getting tools for paper: Distilling_Semantic_Feature.pdf


### Extend the Agent with Tool Retrieval

In [13]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [14]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [15]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [16]:
tools = obj_retriever.retrieve(
    "Why is the learning rate a critical hyperparameter in gradient descent optimization?"
)

In [17]:
tools[2].metadata

ToolMetadata(description='Useful for summarization questions related to Online_3D_Edge_Reconstructi', name='summary_tool_Online_3D_Edge_Reconstructi', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [18]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [19]:
response = agent.query(
    "How does the choice of loss function impact the training objective of a model?"
    "Why is cross-validation superior to a single train-test split for generalization estimation?"
)
print(str(response))

Added user message to memory: How does the choice of loss function impact the training objective of a model?Why is cross-validation superior to a single train-test split for generalization estimation?
=== Calling Function ===
Calling function: summary_tool_Incorporating_Foundation_Model with args: {"input": "The choice of loss function impacts the training objective of a model by defining the measure of dissimilarity between the predicted output and the actual target. Different loss functions lead to different optimization objectives during training, influencing how the model learns and generalizes. For example, using mean squared error loss prioritizes minimizing the squared differences between predictions and targets, while cross-entropy loss is commonly used for classification tasks to optimize the model's probability distribution predictions."}
=== Function Output ===
The selection of a specific loss function significantly influences the training objective of a model by specifying 

In [20]:
response = agent.query(
    "Compare and contrast the IEEE papers"
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the IEEE papersAnalyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_Distilling_Semantic_Feature with args: {"input": "Analyze the approach in the paper."}
=== Function Output ===
The paper delves into utilizing vision foundation models to extract semantic information from RGB images to enhance 3D representations of cloth-like deformable objects. It evaluates the effectiveness of models like Grounded SAM and DINOv2 in tasks such as semantic segmentation and dense feature extraction for cloth manipulation. The study highlights the challenges faced in accurately tracking keypoints and representing deformable cloth objects due to their nature. It suggests potential directions for future research to improve dense descriptors for keypoint tracking on deformable objects. Additionally, the paper discusses the limitations of current models when dealing with deformable objects and proposes the use of