<a href="https://colab.research.google.com/github/CalvHobbes/ai-agents/blob/main/Agentic_RAG_with_LLamaIndex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
## REF - https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/2/router-query-engine

In [3]:
# Download the file dynamically from GitHub
!wget -O agentic_rag_llamaindex_requirements.txt https://raw.githubusercontent.com/CalvHobbes/ai-agents/main/agentic_rag_llamaindex_requirements.txt




--2025-01-23 08:46:59--  https://raw.githubusercontent.com/CalvHobbes/ai-agents/main/agentic_rag_llamaindex_requirements.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 221 [text/plain]
Saving to: ‘agentic_rag_llamaindex_requirements.txt’


2025-01-23 08:47:00 (2.09 MB/s) - ‘agentic_rag_llamaindex_requirements.txt’ saved [221/221]



In [None]:
# Install the requirements
!pip install -r agentic_rag_llamaindex_requirements.txt

In [5]:
!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf

--2025-01-23 08:47:10--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 35.184.86.251
Connecting to openreview.net (openreview.net)|35.184.86.251|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘metagpt.pdf’


2025-01-23 08:47:11 (42.2 MB/s) - ‘metagpt.pdf’ saved [16911937/16911937]



In [6]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)


In [7]:
import nest_asyncio

nest_asyncio.apply()

In [8]:
from google.colab import userdata
import os

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAPI_KEY')

In [115]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

In [10]:
model = Settings.llm

In [11]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

In [12]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [13]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)

In [14]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

In [15]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: This choice indicates that the document is useful for summarization questions related to MetaGPT..
[0mThe document introduces MetaGPT, a meta-programming framework that enhances multi-agent systems based on Large Language Models (LLMs) through role specialization, workflow management, and efficient communication mechanisms. It includes agents like Product Managers, Architects, Engineers, and QA Engineers, each with specific roles. MetaGPT utilizes an executable feedback mechanism to improve code quality iteratively during runtime and outperforms existing approaches in code generation tasks. The document also provides insights into the development process of a software application, specifically a "Drawing App," outlining requirements, UI design, implementation approach, required Python third-party packages, logic analysis, tasks breakdown, shared knowledge, and the involvement of various agents in the development cycle. It discusses the performa

In [16]:
response = query_engine.query(
    "How do agents share information with other agents?"
)
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it specifically mentions retrieving specific context, which is necessary for understanding how agents share information with other agents..
[0mAgents share information with other agents by utilizing a shared message pool where they can publish structured messages and subscribe to relevant messages based on their profiles. This shared message pool allows all agents to exchange messages directly, access messages from other entities transparently, and retrieve required information without the need to inquire about other agents individually.


In [17]:
response = query_engine.query("Tell me about the ablation study results?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: Ablation study results are specific context from the MetaGPT paper, making choice 2 the most relevant..
[0mThe ablation study results show that MetaGPT effectively addresses challenges related to information overload and reduces hallucinations in software generation. By utilizing a global message pool and a subscription mechanism, MetaGPT efficiently manages information flow and filters out irrelevant contexts, enhancing the relevance and utility of the information. This design is crucial in optimizing communication and ensuring that the generated software programs are accurate and free from hallucination issues.


In [18]:
response = query_engine.query("summarise ablation study")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: Ablation study is often used for summarization questions related to the performance of different components in a model like MetaGPT..
[0mThe ablation study examined the effects of different roles within the MetaGPT framework on software development outcomes. By systematically excluding roles, the study showed that role specialization improved software development processes by reducing revision costs and enhancing executability. Adding more roles led to better performance metrics. Additionally, the study evaluated MetaGPT's performance under various conditions, including initial input levels, detailed prompts, and different language models. Clear requirements and detailed prompts were found to enhance the quality of generated software projects, with specific language models like GPT-4 contributing to superior performance.


In [19]:
response = query_engine.query("who is gabriel cirulli") # name onlt mentioned in an image in the metagpt pdf
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it focuses on retrieving specific context, which would likely include information about Gabriel Cirulli if he is mentioned in the MetaGPT paper..
[0mGabriel Cirulli is not mentioned in the provided context information.


In [20]:
## Function Calling ##

In [21]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int:
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)

add_tool = FunctionTool.from_defaults(add)
mystery_tool = FunctionTool.from_defaults(mystery)

In [22]:
Settings.llm.predict_and_call([add_tool, mystery_tool],
    "mystify 9 and 8",
    verbose=True)

=== Calling Function ===
Calling function: mystery with args: {"x": 9, "y": 8}
=== Function Output ===
289


AgentChatResponse(response='289', sources=[ToolOutput(content='289', tool_name='mystery', raw_input={'args': (), 'kwargs': {'x': 9, 'y': 8}}, raw_output=289, is_error=False)], source_nodes=[], is_dummy_stream=False, metadata=None)

In [None]:
print(nodes[33].get_content(metadata_mode="all"))

In [25]:
from llama_index.core import VectorStoreIndex
vector_index = VectorStoreIndex(nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=4)


In [26]:
response = query_engine.query("What are some high-level results of MetaGPT?")
print(str(response))

MetaGPT outperforms all preceding approaches in both HumanEval and MBPP benchmarks. When MetaGPT collaborates with GPT-4, it significantly improves the Pass @k in the HumanEval benchmark compared to GPT-4. It achieves 85.9% and 87.7% pass rates on the MBPP and HumanEval with a single attempt. Additionally, MetaGPT achieves an average score of 3.9, surpassing ChatDev's score of 2.1, based on the Chat chain.


In [27]:
from llama_index.core.vector_stores import MetadataFilters
query_engine = vector_index.as_query_engine(similarity_top_k=2,
                                            filters=MetadataFilters.from_dicts(
                                                [
                                                    {"key": "page_label", "value": "3"}

                                                  ])
)


In [28]:
response = query_engine.query(
    "What are some high-level results of MetaGPT?",
)
print(str(response))

MetaGPT demonstrates state-of-the-art performance on HumanEval and MBPP, showcasing its effectiveness as a meta-programming framework for developing LLM-based multi-agent systems. Additionally, MetaGPT integrates human-like Standard Operating Procedures (SOPs) to enhance robustness and reduce unproductive collaboration among LLM-based agents. The framework also introduces an executive feedback mechanism that improves code generation quality significantly during runtime.


In [29]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '3', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-01-23', 'last_modified_date': '2025-01-23'}
{'page_label': '3', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-01-23', 'last_modified_date': '2025-01-23'}


In [30]:
from typing import List
from llama_index.core.vector_stores import FilterCondition

def total_page_count() -> int:
  """
  Returns total number of pages in MetaGPT document, Only used to fetch the numnber of pages of the document
  """
  return 29

def vector_query(query:str, page_numbers: List[str]) -> str:
  """Perform a vector search over an index.
  query (str): the string query to be embedded.
  page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
  over all pages. Otherwise, filter by the set of specified pages.
  """
  filters = [
      {"key": "page_label", "value": page} for page in page_numbers
  ]
  query_engine = vector_index.as_query_engine(similarity_top_k=2,
                                            filters=MetadataFilters.from_dicts(
                                                filters,
                                                condition=FilterCondition.OR
                                                  )
                                            )
  response = query_engine.query(query)
  return response

vector_query_tool = FunctionTool.from_defaults( name="vector_query",
    fn=vector_query)
page_count_tool = FunctionTool.from_defaults(name="page_count_tool", fn=total_page_count)

In [None]:
model.predict_and_call([vector_query_tool], "What are the high-level results of MetaGPT as described on the second page?", verbose=True)

In [32]:
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool

# summary_index = SummaryIndex(nodes)

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)


In [33]:
response = model.predict_and_call(
    [vector_query_tool, summary_tool],
    "What are the MetaGPT comparisons with ChatDev described on second page?",
    verbose=True
)

=== Calling Function ===
Calling function: vector_query with args: {"query": "MetaGPT comparisons with ChatDev", "page_numbers": ["2"]}
=== Function Output ===
MetaGPT outperforms ChatDev in handling higher levels of software complexity and offering extensive functionality. In experimental evaluations, MetaGPT achieves a 100% task completion rate, showcasing the robustness and efficiency of its design in terms of time and token costs.


In [34]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-01-23', 'last_modified_date': '2025-01-23'}


In [35]:
response = model.predict_and_call(
    [vector_query_tool, summary_tool],
    "what is the summary of the paper?",
    verbose=True
)

=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "summary of the paper"}
=== Function Output ===
The paper introduces MetaGPT, a meta-programming framework that utilizes Standardized Operating Procedures (SOPs) to enhance multi-agent systems based on Large Language Models (LLMs). It incorporates role specialization, workflow management, and efficient communication mechanisms to improve problem-solving capabilities. MetaGPT employs an executable feedback mechanism to enhance code generation quality during runtime, outperforming previous approaches in various benchmarks. The paper discusses the development process of a software application called the "Drawing App" using MetaGPT, outlining tasks like designing a user-friendly GUI, implementing color selection functionality, displaying RGB values in real-time, and testing for accuracy and performance. It also addresses the performance of MetaGPT in generating executable code, the impact of different instruc

In [37]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

In [38]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool, page_count_tool],
    llm=llm,
    verbose=True
)
agent = AgentRunner(agent_worker)

In [39]:
response = agent.query(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT include the Product Manager, Architect, Project Manager, Engineer, and QA Engineer. The Product Manager is responsible for generating the Product Requirement Document (PRD) and competitive analysis. The Architect designs the technical specifications, system architecture, and interface definitions. The Project Manager breaks down the project into tasks and assigns them to Engineers. The Engineer develops the code based on the specifications provided. The QA Engineer generates unit tests and reviews the code for bugs to ensure high-quality software. Each role plays a crucial part in the software development process within the MetaGPT framework.
=== Calling Function ===
Calling function: query_engine_tool with args:

In [40]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

Added user message to memory: Tell me about the evaluation datasets used.
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Evaluation datasets used in MetaGPT"}
=== Function Output ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and SoftwareDev. HumanEval consisted of 164 handwritten programming tasks, MBPP included 427 Python tasks covering core concepts and standard library features, and SoftwareDev comprised 70 representative examples of software development tasks with diverse scopes such as mini-games, image processing algorithms, and data visualization.
=== LLM Response ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and SoftwareDev. 

1. HumanEval: This dataset consisted of 164 handwritten programming tasks.
2. MBPP: This dataset included 427 Python tasks covering core concepts and standard library features.
3. SoftwareDev: This dataset comprised 70 representative examples of software development tasks wi

In [41]:
response = agent.chat("Tell me the results over one of the above datasets.")

Added user message to memory: Tell me the results over one of the above datasets.
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Results over HumanEval dataset in MetaGPT"}
=== Function Output ===
MetaGPT achieved state-of-the-art performance on the HumanEval dataset, surpassing previous approaches in solving programming tasks. GPT-4 demonstrated higher sensitivity to prompts, code parsing, and post-processing compared to GPT-3.5-Turbo. Across various settings, GPT-4 consistently outperformed GPT-3.5-Turbo in terms of completion accuracy and reliability.
=== LLM Response ===
MetaGPT achieved state-of-the-art performance on the HumanEval dataset, surpassing previous approaches in solving programming tasks. GPT-4 demonstrated higher sensitivity to prompts, code parsing, and post-processing compared to GPT-3.5-Turbo. Across various settings, GPT-4 consistently outperformed GPT-3.5-Turbo in terms of completion accuracy and reliability.


In [42]:
task = agent.create_task(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

In [43]:
step_output = agent.run_step(task.task_id)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Agent roles in MetaGPT"}
=== Function Output ===
The roles in MetaGPT include the Product Manager responsible for creating the Product Requirement Document and analyzing user stories, the Architect who designs system architecture and technical specifications, the Project Manager who breaks down tasks and assigns them to Engineers, the Engineers who develop the code based on specifications, and the QA Engineer who generates unit tests and ensures software quality. Each agent has a specific role in the software development process within the MetaGPT framework.
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Communication between agent roles in MetaGPT"}
=== Function Output ===
Communication between agent roles in MetaGPT is structured and efficient, with 

In [44]:
for key in dict(step_output).keys():
    print(key)

output
task_step
next_steps
is_last


In [45]:
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

Num completed for task 2c0a9e4c-2152-4315-a6c8-ee49feef75f0: 1
The roles in MetaGPT include the Product Manager responsible for creating the Product Requirement Document and analyzing user stories, the Architect who designs system architecture and technical specifications, the Project Manager who breaks down tasks and assigns them to Engineers, the Engineers who develop the code based on specifications, and the QA Engineer who generates unit tests and ensures software quality. Each agent has a specific role in the software development process within the MetaGPT framework.


In [46]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

Num upcoming steps for task 2c0a9e4c-2152-4315-a6c8-ee49feef75f0: 1


TaskStep(task_id='2c0a9e4c-2152-4315-a6c8-ee49feef75f0', step_id='101cc444-fa57-4dd5-ae3a-c710b5701e12', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

In [47]:
step_output = agent.run_step(
    task.task_id, input="What about how agents share information?"
)

Added user message to memory: What about how agents share information?
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "How agents share information in MetaGPT"}
=== Function Output ===
Agents in MetaGPT share information through a structured communication protocol involving publishing structured messages in a shared message pool and subscribing to relevant messages based on their profiles. This method allows for efficient exchange of information without direct one-to-one communication, enhancing communication efficiency. The subscription mechanism enables agents to select and follow information based on their role profiles, ensuring they receive only task-relevant information and avoid distractions from irrelevant details.


In [48]:
step_output = agent.run_step(task.task_id)
print(step_output.is_last)

=== LLM Response ===
Agents in MetaGPT share information through a structured communication protocol. They publish structured messages in a shared message pool and subscribe to relevant messages based on their profiles. This method allows for efficient exchange of information without direct one-to-one communication, enhancing communication efficiency. The subscription mechanism enables agents to select and follow information based on their role profiles, ensuring they receive only task-relevant information and avoid distractions from irrelevant details.
True


In [49]:
response = agent.finalize_response(task.task_id)
print(str(response))

Agents in MetaGPT share information through a structured communication protocol. They publish structured messages in a shared message pool and subscribe to relevant messages based on their profiles. This method allows for efficient exchange of information without direct one-to-one communication, enhancing communication efficiency. The subscription mechanism enables agents to select and follow information based on their role profiles, ensuring they receive only task-relevant information and avoid distractions from irrelevant details.


## Multi Document RAG##

In [50]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

In [51]:
# import requests
# for url, filename in zip(urls, papers):
#     response = requests.get(url)
#     response.raise_for_status()  # Raise an exception for bad responses

#     # Save the PDF to the 'pdfs' directory
#     filepath = os.path.join("", filename)
#     with open(filepath, "wb") as f:
#         f.write(response.content)

#     print(f"Downloaded {filename} from {url}")

# print("All PDFs downloaded successfully!")
for url, paper in zip(urls, papers):
     !wget "{url}" -O "{paper}"

Downloaded metagpt.pdf from https://openreview.net/pdf?id=VtmBAGCN7o
Downloaded longlora.pdf from https://openreview.net/pdf?id=6PmJoRfdaK
Downloaded selfrag.pdf from https://openreview.net/pdf?id=hSyW5go0v8
All PDFs downloaded successfully!


In [52]:
from typing import Optional,List
from llama_index.core.vector_stores import FilterCondition
def get_doc_tools(
    file_path: str,
    name: str,
) -> str:
    """Get vector query and summary query tools from a document."""

    # load documents
    documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
    splitter = SentenceSplitter(chunk_size=1024)
    nodes = splitter.get_nodes_from_documents(documents)
    vector_index = VectorStoreIndex(nodes)

    def vector_query(
        query: str,
        page_numbers: Optional[List[str]] = None
    ) -> str:
        """Use to answer questions over a given paper.

        Useful if you have specific questions over the paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.

        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.

        """

        page_numbers = page_numbers or []
        metadata_dicts = [
            {"key": "page_label", "value": p} for p in page_numbers
        ]

        query_engine = vector_index.as_query_engine(
            similarity_top_k=2,
            filters=MetadataFilters.from_dicts(
                metadata_dicts,
                condition=FilterCondition.OR
            )
        )
        response = query_engine.query(query)
        return response


    vector_query_tool = FunctionTool.from_defaults(
        name=f"vector_tool_{name}",
        fn=vector_query
    )

    summary_index = SummaryIndex(nodes)
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )
    summary_tool = QueryEngineTool.from_defaults(
        name=f"summary_tool_{name}",
        query_engine=summary_query_engine,
        description=(
            f"Useful for summarization questions related to {name}"
        ),
    )

    return vector_query_tool, summary_tool

In [53]:
from pathlib import Path
paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: selfrag.pdf


In [55]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [116]:
multi_doc_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools,
    llm=model,
    verbose=True
)
multi_doc_agent = AgentRunner(multi_doc_worker)

In [117]:
response = multi_doc_agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation dataset"}
=== Function Output ===
PG19 test split
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation results"}
=== Function Output ===
The evaluation results show that the models achieve better perplexity with longer context sizes. Increasing the context window size leads to improved perplexity scores. Additionally, the models are fine-tuned on different context lengths, such as 100k, 65536, and 32768, and achieve promising results on these large settings. However, there is some perplexity degradation observed on small context sizes for the extended models, which is a known limitation of Position Interpolation.
=== LLM Response ===
The evaluation dataset used in LongLoRA is the PG19 test split. 

Regarding the evalua

In [118]:
response = multi_doc_agent.query("Give me a summary of both Self-RAG and LongLoRA")
# print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework that enhances the quality and factuality of large language models by utilizing retrieval on demand and self-reflection. It involves training a single LM to retrieve, generate, and critique text passages using reflection tokens. This framework allows for customization of LM behaviors at test time, leading to improved performance, factuality, and citation accuracy compared to other models. Additionally, Self-RAG evaluates text generation outputs based on fine-grained aspects like factual relevance, supportiveness, and overall usefulness, aiming to ensure that generated text is factually accurate, well-supported by evidence, and provides informative answers to queries.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
=== 

## Multi Doc Solution
Can't use tools directly for large number of docs due to context length restrictions on model. Instead, use RAG over tools to fetch the right tools for the llm using ObjectIndex in LlamaIndex.

In [119]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    "zipformer.pdf",
    "values.pdf",
    "finetune_fair_diffusion.pdf",
    "knowledge_card.pdf"
]

In [None]:
for url, paper in zip(urls, papers):
     !wget "{url}" -O "{paper}"

In [133]:
paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: loftq.pdf
Getting tools for paper: swebench.pdf
Getting tools for paper: selfrag.pdf
Getting tools for paper: zipformer.pdf
Getting tools for paper: values.pdf
Getting tools for paper: finetune_fair_diffusion.pdf
Getting tools for paper: knowledge_card.pdf


In [138]:
for item in paper_to_tools_dict.values():
   [vector_tool, summary_tool] = item[0], item[1]
   print(vector_tool.metadata)
   print(summary_tool.metadata)

ToolMetadata(description='vector_tool_metagpt(query: str, page_numbers: Optional[List[str]] = None) -> str\nUse to answer questions over a given paper.\n    \n        Useful if you have specific questions over the paper.\n        Always leave page_numbers as None UNLESS there is a specific page you want to search for.\n    \n        Args:\n            query (str): the string query to be embedded.\n            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE \n                if we want to perform a vector search\n                over all pages. Otherwise, filter by the set of specified pages.\n        \n        ', name='vector_tool_metagpt', fn_schema=<class 'llama_index.core.tools.utils.vector_tool_metagpt'>, return_direct=False)
ToolMetadata(description='Useful for summarization questions related to metagpt', name='summary_tool_metagpt', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)
ToolMetadata(description='vector

In [139]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [None]:
for t in all_tools:
  print (t.metadata)

In [142]:
from llama_index.core.objects import ObjectIndex

tools_index = ObjectIndex.from_objects(all_tools,
                                       index_cls=VectorStoreIndex)

obj_retriever = tools_index.as_retriever(similarity_top_k=3)


In [None]:
# # TO DEBUG Index contents
# # Retrieve all node IDs from the docstore
# doc_store = tools_index.index.docstore
# node_ids = list(doc_store.docs.keys())

# # Fetch the nodes using the retrieved node IDs
# nodes = doc_store.get_nodes(node_ids)

# # Print the text content of each node
# for node in nodes:
#     print(node.get_text())



In [144]:

# specific_tools = obj_retriever.retrieve(
#     "Tell me about longlora"
# )


In [None]:
# [print(t.metadata) for t in specific_tools]

In [146]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm,
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [147]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "Analyzing the approach in the LongLoRA paper."}
=== Function Output ===
The approach in the LongLoRA paper introduces an efficient fine-tuning method that extends the context length of large language models by combining the LoRA method with shifted sparse attention during training. This approach enables models to be fine-tuned with minimal accuracy compromise while emphasizing the importance of trainable normalization and embedding layers for successful long context adaptation. Additionally, the paper introduces the Action Units Relation Learning framework, which includes the ART encoder for capturing intra-face relations for forgery detection and the TAP process for generating challenging pseudo-samples to enhance model generalization. The paper achieves state-of-the-art p

In [148]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation dataset used in MetaGPT includes the HumanEval benchmark, the MBPP benchmark, and the SoftwareDev dataset. The HumanEval benchmark consists of 164 handwritten programming tasks, the MBPP benchmark comprises 427 Python tasks, and the SoftwareDev dataset contains 70 representative examples of software development tasks with diverse scopes.
=== Calling Function ===
Calling function: summary_tool_swebench with args: {"input": "evaluation dataset used in SWE-Bench"}
=== Function Output ===
The evaluation dataset used in SWE-Bench is constructed by scraping pull requests from the top 100 most downloaded PyPI libraries. Task instances are created from merged pull requests that resolve issues and introduce new tests. The