<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/agent/multi_document_agents-v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-Document Agents (V1)

In this guide, you learn towards setting up a multi-document agent over the LlamaIndex documentation.

This is an extension of V0 multi-document agents with the additional features:
- Reranking during document (tool) retrieval
- Query planning tool that the agent can use to plan


We do this with the following architecture:

- setup a "document agent" over each Document: each doc agent can do QA/summarization within its doc
- setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [80]:
!pip install llama_index llama-hub unstructured



In [81]:
pip install cohere



In [82]:
!pip install llama_hub



In [83]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Setup and Download Data

In this section, we'll load in the LlamaIndex documentation.

In [84]:
domain = "docs.llamaindex.ai"
docs_url = "https://docs.llamaindex.ai/en/latest/"
!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Length: unspecified [text/html]
Saving to: ‘docs.llamaindex.ai/en/latest/module_guides/supporting_modules/supporting_modules.html’

docs.llamaindex.ai/     [ <=>                ] 216.28K  --.-KB/s    in 0.007s  

2024-01-28 16:47:50 (31.9 MB/s) - ‘docs.llamaindex.ai/en/latest/module_guides/supporting_modules/supporting_modules.html’ saved [221467]

--2024-01-28 16:47:50--  https://docs.llamaindex.ai/en/latest/module_guides/supporting_modules/service_context.html
Reusing existing connection to docs.llamaindex.ai:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘docs.llamaindex.ai/en/latest/module_guides/supporting_modules/service_context.html’

docs.llamaindex.ai/     [ <=>                ] 226.15K  --.-KB/s    in 0.005s  

2024-01-28 16:47:50 (40.9 MB/s) - ‘docs.llamaindex.ai/en/latest/module_guides/supporting_modules/service_context.html’ saved [231577]

--2024-01-28 16:47:50

In [85]:
from llama_hub.file.unstructured.base import UnstructuredReader
from pathlib import Path
from llama_index.llms import OpenAI
from llama_index import ServiceContext
from llama_index import download_loader, SimpleDirectoryReader

In [86]:
import os
import openai
import cohere

os.environ["OPENAI_API_KEY"] = "Your API Key goes here"
os.environ["COHERE_API_KEY"] = "Your API Key goes here"

In [87]:
reader = UnstructuredReader()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [88]:
all_files_gen = Path("./docs.llamaindex.ai/").rglob("*")
all_files = [f.resolve() for f in all_files_gen]

In [89]:
all_html_files = [f for f in all_files if f.suffix.lower() == ".html"]

In [90]:
len(all_html_files)

703

In [91]:
from llama_index import Document

reader = UnstructuredReader()

# TODO: set to higher value if you want more docs
doc_limit = 100

docs = []
for idx, f in enumerate(all_html_files):
    if idx > doc_limit:
        break
    print(f"Idx {idx}/{len(all_html_files)}")
    loaded_docs = reader.load_data(file=f, split_documents=True)
    # Hardcoded Index. Everything before this is ToC for all pages
    start_idx = 72
    loaded_doc = Document(
        text="\n\n".join([d.get_content() for d in loaded_docs[72:]]),
        metadata={"path": str(f)},
    )
    print(loaded_doc.metadata["path"])
    docs.append(loaded_doc)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


Idx 0/703
/content/docs.llamaindex.ai/en/latest/search.html
Idx 1/703
/content/docs.llamaindex.ai/en/latest/genindex.html
Idx 2/703
/content/docs.llamaindex.ai/en/latest/index.html
Idx 3/703
/content/docs.llamaindex.ai/en/latest/api/llama_index.vector_stores.TairVectorStore.html
Idx 4/703
/content/docs.llamaindex.ai/en/latest/api/llama_index.vector_stores.SimpleVectorStore.html
Idx 5/703
/content/docs.llamaindex.ai/en/latest/api/llama_index.vector_stores.MyScaleVectorStore.html
Idx 6/703
/content/docs.llamaindex.ai/en/latest/api/llama_index.readers.MetalReader.html
Idx 7/703
/content/docs.llamaindex.ai/en/latest/api/llama_index.vector_stores.SingleStoreVectorStore.html
Idx 8/703
/content/docs.llamaindex.ai/en/latest/api/llama_index.readers.PineconeReader.html
Idx 9/703
/content/docs.llamaindex.ai/en/latest/api/llama_index.readers.SimpleMongoReader.html
Idx 10/703
/content/docs.llamaindex.ai/en/latest/api/llama_index.readers.PathwayReader.html
Idx 11/703
/content/docs.llamaindex.ai/en/l

Define LLM + Service Context + Callback Manager

In [92]:
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm)

## Building Multi-Document Agents

In this section we show you how to construct the multi-document agent. We first build a document agent for each document, and then define the top-level parent agent with an object index.

In [93]:
from llama_index import VectorStoreIndex, SummaryIndex

In [94]:
import nest_asyncio

nest_asyncio.apply()

### Build Document Agent for each Document

In this section we define "document agents" for each document.

We define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.

This document agent can dynamically choose to perform semantic search or summarization within a given document.

We create a separate document agent for each city.

In [95]:
from llama_index.agent import OpenAIAgent
from llama_index import load_index_from_storage, StorageContext
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.node_parser import SentenceSplitter
import os
from tqdm.notebook import tqdm
import pickle


async def build_agent_per_doc(nodes, file_base):
    print(file_base)

    vi_out_path = f"./data/llamaindex_docs/{file_base}"
    summary_out_path = f"./data/llamaindex_docs/{file_base}_summary.pkl"
    if not os.path.exists(vi_out_path):
        Path("./data/llamaindex_docs/").mkdir(parents=True, exist_ok=True)
        # build vector index
        vector_index = VectorStoreIndex(nodes, service_context=service_context)
        vector_index.storage_context.persist(persist_dir=vi_out_path)
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=vi_out_path),
            service_context=service_context,
        )

    # build summary index
    summary_index = SummaryIndex(nodes, service_context=service_context)

    # define query engines
    vector_query_engine = vector_index.as_query_engine()
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize"
    )

    # extract a summary
    if not os.path.exists(summary_out_path):
        Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)
        summary = str(
            await summary_query_engine.aquery(
                "Extract a concise 1-2 line summary of this document"
            )
        )
        pickle.dump(summary, open(summary_out_path, "wb"))
    else:
        summary = pickle.load(open(summary_out_path, "rb"))

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name=f"vector_tool_{file_base}",
                description=f"Useful for questions related to specific facts",
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name=f"summary_tool_{file_base}",
                description=f"Useful for summarization questions",
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-4")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
        system_prompt=f"""\
You are a specialized agent designed to answer queries about the `{file_base}.html` part of the LlamaIndex docs.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
    )

    return agent, summary


async def build_agents(docs):
    node_parser = SentenceSplitter()

    # Build agents dictionary
    agents_dict = {}
    extra_info_dict = {}

    # # this is for the baseline
    # all_nodes = []

    for idx, doc in enumerate(tqdm(docs)):
        nodes = node_parser.get_nodes_from_documents([doc])
        # all_nodes.extend(nodes)

        # ID will be base + parent
        file_path = Path(doc.metadata["path"])
        file_base = str(file_path.parent.stem) + "_" + str(file_path.stem)
        agent, summary = await build_agent_per_doc(nodes, file_base)

        agents_dict[file_base] = agent
        extra_info_dict[file_base] = {"summary": summary, "nodes": nodes}

    return agents_dict, extra_info_dict

In [96]:
agents_dict, extra_info_dict = await build_agents(docs)

  0%|          | 0/101 [00:00<?, ?it/s]

latest_search
latest_genindex
latest_index
api_llama_index.vector_stores.TairVectorStore
api_llama_index.vector_stores.SimpleVectorStore
api_llama_index.vector_stores.MyScaleVectorStore
api_llama_index.readers.MetalReader
api_llama_index.vector_stores.SingleStoreVectorStore
api_llama_index.readers.PineconeReader
api_llama_index.readers.SimpleMongoReader
api_llama_index.readers.PathwayReader
api_llama_index.readers.ChromaReader
api_llama_index.vector_stores.CassandraVectorStore
api_llama_index.vector_stores.ExactMatchFilter
api_llama_index.readers.TxtaiReader
api_llama_index.vector_stores.ChromaVectorStore
api_llama_index.node_parser.LangchainNodeParser
api_llama_index.node_parser.SentenceSplitter
api_llama_index.schema.QueryBundle
api_llama_index.vector_stores.PineconeVectorStore
api_llama_index.vector_stores.RedisVectorStore
api_llama_index.node_parser.HierarchicalNodeParser
api_llama_index.readers.RssReader
api_llama_index.vector_stores.TimescaleVectorStore
api_llama_index.vector_sto

### Build Retriever-Enabled OpenAI Agent

We build a top-level agent that can orchestrate across the different document agents to answer any user query.

This `RetrieverOpenAIAgent` performs tool retrieval before tool use (unlike a default agent that tries to put all tools in the prompt).

**Improvements from V0**: We make the following improvements compared to the "base" version in V0.

- Adding in reranking: we use Cohere reranker to better filter the candidate set of documents.
- Adding in a query planning tool: we add an explicit query planning tool that's dynamically created based on the set of retrieved tools.


In [97]:
# define tool for each document agent
all_tools = []
for file_base, agent in agents_dict.items():
    summary = extra_info_dict[file_base]["summary"]
    doc_tool = QueryEngineTool(
        query_engine=agent,
        metadata=ToolMetadata(
            name=f"tool_{file_base}",
            description=summary,
        ),
    )
    all_tools.append(doc_tool)

In [98]:
print(all_tools[0].metadata)

ToolMetadata(description='This document provides information on how to search for specific content within the Llama Index AI documentation.', name='tool_latest_search', fn_schema=<class 'llama_index.tools.types.DefaultToolFnSchema'>)


In [99]:
# define an "object" index and retriever over these tools
from llama_index import VectorStoreIndex
from llama_index.objects import (
    ObjectIndex,
    SimpleToolNodeMapping,
    ObjectRetriever,
)
from llama_index.retrievers import BaseRetriever
from llama_index.postprocessor import CohereRerank
from llama_index.tools import QueryPlanTool
from llama_index.query_engine import SubQuestionQueryEngine
from llama_index.llms import OpenAI

llm = OpenAI(model_name="gpt-4-0613")

tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
    all_tools,
    tool_mapping,
    VectorStoreIndex,
)
vector_node_retriever = obj_index.as_node_retriever(similarity_top_k=10)


# define a custom retriever with reranking
class CustomRetriever(BaseRetriever):
    def __init__(self, vector_retriever, postprocessor=None):
        self._vector_retriever = vector_retriever
        self._postprocessor = postprocessor or CohereRerank(top_n=5)
        super().__init__()

    def _retrieve(self, query_bundle):
        retrieved_nodes = self._vector_retriever.retrieve(query_bundle)
        filtered_nodes = self._postprocessor.postprocess_nodes(
            retrieved_nodes, query_bundle=query_bundle
        )

        return filtered_nodes


# define a custom object retriever that adds in a query planning tool
class CustomObjectRetriever(ObjectRetriever):
    def __init__(self, retriever, object_node_mapping, all_tools, llm=None):
        self._retriever = retriever
        self._object_node_mapping = object_node_mapping
        self._llm = llm or OpenAI("gpt-4-0613")

    def retrieve(self, query_bundle):
        nodes = self._retriever.retrieve(query_bundle)
        tools = [self._object_node_mapping.from_node(n.node) for n in nodes]

        sub_question_sc = ServiceContext.from_defaults(llm=self._llm)
        sub_question_engine = SubQuestionQueryEngine.from_defaults(
            query_engine_tools=tools, service_context=sub_question_sc
        )
        sub_question_description = f"""\
Useful for any queries that involve comparing multiple documents. ALWAYS use this tool for comparison queries - make sure to call this \
tool with the original query. Do NOT use the other tools for any queries involving multiple documents.
"""
        sub_question_tool = QueryEngineTool(
            query_engine=sub_question_engine,
            metadata=ToolMetadata(
                name="compare_tool", description=sub_question_description
            ),
        )

        return tools + [sub_question_tool]

In [100]:
custom_node_retriever = CustomRetriever(vector_node_retriever)

# wrap it with ObjectRetriever to return objects
custom_obj_retriever = CustomObjectRetriever(
    custom_node_retriever, tool_mapping, all_tools, llm=llm
)

In [101]:
tmps = custom_obj_retriever.retrieve("hello")
print(len(tmps))

6


In [102]:
 top_agent = ReActAgent.from_tools(
     tool_retriever=custom_obj_retriever,
     system_prompt=""" \
 You are an agent designed to answer queries about the documentation.
 Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

 """,
     llm=llm,
     verbose=True,
 )



### Define Baseline Vector Store Index

As a point of comparison, we define a "naive" RAG pipeline which dumps all docs into a single vector index collection.

We set the top_k = 4

In [103]:
all_nodes = [
    n for extra_info in extra_info_dict.values() for n in extra_info["nodes"]
]

In [104]:
base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)

## Running Example Queries

Let's run some example queries, ranging from QA / summaries over a single document to QA / summarization over multiple documents.

In [106]:
response = top_agent.query(
    "Tell me about LlamaIndex connectors"
)

#response = top_agent.as_query_component("Tell me about the different types of evaluation in LlamaIndex")

[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: tool_latest_search
Action Input: {'input': 'LlamaIndex connectors'}
[0mAdded user message to memory: LlamaIndex connectors
=== Calling Function ===
Calling function: vector_tool_latest_search with args: {
  "input": "LlamaIndex connectors"
}
Got output: The LlamaIndex connectors can be found on the /content/docs.llamaindex.ai/en/latest/search.html page.

[1;3;34mObservation: The LlamaIndex connectors are not specifically mentioned in the `latest_search.html` part of the LlamaIndex docs. You may need to check other sections of the documentation for detailed information about LlamaIndex connectors.
[0m[1;3;38;5;200mThought: I need to search for information about LlamaIndex connectors in a different tool.
Action: tool_latest_index
Action Input: {'input': 'LlamaIndex connectors'}
[0mAdded user message to memory: LlamaIndex connectors
=== Calling Function ===
Calling function: vector_tool_latest_index 

In [107]:
print(response)

LlamaIndex connectors are used to import existing data from various sources and formats into the LlamaIndex ecosystem. These connectors are compatible with APIs, PDFs, SQL, and more, allowing seamless integration of data for natural language access and retrieval.


In [108]:
# baseline
response = base_query_engine.query(
    "Tell me about LlamaIndex connectors"
)
print(str(response))

LlamaIndex provides data connectors that allow you to ingest your existing data from various sources such as APIs, PDFs, SQL, and more. These connectors retrieve data from their native sources and format it in intermediate representations that are easy for LlamaIndex's language models to consume. This enables natural language access to your data and allows you to perform powerful retrieval and query operations.


In [115]:
response = top_agent.query(
    "From the documentation what is the best way to get started with LlamaIndex?"
)

[1;3;38;5;200mThought: I can use the LlamaIndex documentation to find the best way to get started.
Action: tool_latest_search
Action Input: {'input': 'getting started with LlamaIndex'}
[0mAdded user message to memory: getting started with LlamaIndex
[1;3;34mObservation: I'm sorry, but as an AI designed to answer queries about the `latest_search.html` part of the LlamaIndex docs, I don't have the ability to provide a guide on getting started with LlamaIndex. However, you can use the `vector_tool_latest_search` or `summary_tool_latest_search` to ask specific questions about the `latest_search.html` part of the LlamaIndex docs.
[0m[1;3;38;5;200mThought: It seems that the tool I used is specific to searching within the `latest_search.html` part of the LlamaIndex documentation. I should try a different tool to find information on getting started with LlamaIndex.
Action: tool_latest_index
Action Input: {'input': 'getting started with LlamaIndex'}
[0mAdded user message to memory: gettin

In [116]:
print(response)

To get started with LlamaIndex, you need to install the library by running the command `pip install llama-index`. Once installed, you can refer to the documentation which provides guidance for both beginner and advanced users. LlamaIndex is available on GitHub and PyPi for downloading or contributing to the project. There's also a TypeScript/Javascript version available on NPM. If you need help or have any feature suggestions, you can join the LlamaIndex community on Twitter or Discord.

The library provides tools for beginners, advanced users, and everyone in between. It offers data connectors to ingest and query your existing data, data indexes to structure your data for LLMs, and engines for natural language access to your data. LlamaIndex can be used for various purposes such as knowledge-augmented output, conversational interfaces, and LLM-powered knowledge workers.

LlamaIndex aims to make LLMs more relevant and useful to users by overcoming the weaknesses of the fine-tuning appr

In [119]:
response = top_agent.query(
    "What is pinecone"
)

[1;3;38;5;200mThought: (Implicit) I can answer without any more tools!
Answer: Pinecone is a vector database that allows you to store, index, and search high-dimensional vectors. It provides a scalable and efficient solution for similarity search and nearest neighbor retrieval. With Pinecone, you can easily build applications that require fast and accurate similarity matching, such as recommendation systems, image search, and natural language processing.
[0m

In [120]:
print(str(response))

Pinecone is a vector database that allows you to store, index, and search high-dimensional vectors. It provides a scalable and efficient solution for similarity search and nearest neighbor retrieval. With Pinecone, you can easily build applications that require fast and accurate similarity matching, such as recommendation systems, image search, and natural language processing.
