# Multi Document Agents

This guide shows building Multi Document Agents by combining recursive retrieval and "document agents".

There are two motivating factors that lead to solutions for better retrieval:
- Decoupling retrieval embeddings from chunk-based synthesis. Oftentimes fetching documents by their summaries will return more relevant context to queries rather than raw chunks. This is something that recursive retrieval directly allows.
- Within a document, users may need to dynamically perform tasks beyond fact-based question-answering. We introduce the concept of "document agents" - agents that have access to both vector search and summary tools for a given document.

### Setup and Download Data

In this section, we'll define imports and then download Wikipedia articles about different cities. Each article is stored separately.

pip install llama-index-agent-openai

In [1]:
import os 
from dotenv import load_dotenv

In [2]:
load_dotenv('/home/santhosh/Projects/courses/Pinnacle/.env')

True

In [3]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

In [7]:
from llama_index.core import VectorStoreIndex, SummaryIndex, SimpleKeywordTableIndex, SimpleDirectoryReader
from llama_index.core.schema import IndexNode
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.llms.openai import OpenAI

In [8]:
wiki_titles = ["Delhi", "Mumbai", "Bengaluru", "Hyderabad", "Chennai"]

In [9]:
from pathlib import Path

import requests

for title in wiki_titles:
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    wiki_text = page["extract"]

    data_path = Path("data")
    if not data_path.exists():
        Path.mkdir(data_path)

    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)

In [10]:
# Load all wiki documents
city_docs = {}
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(
        input_files=[f"data/{wiki_title}.txt"]
    ).load_data()

## Define LLM + Callback Manager

In [11]:
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")

## Build Document Agent for each Document

In this section we define "document agents" for each document.

First we define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.

This document agent can dynamically choose to perform semantic search or summarization within a given document.

We create a separate document agent for each city.

In [18]:
from llama_index.agent.openai import OpenAIAgent

In [20]:
# Build agents dictionary
agents = {}

for wiki_title in wiki_titles:
    # build vector index
    vector_index = VectorStoreIndex.from_documents(
        city_docs[wiki_title], llm=llm
    )
    # build summary index
    summary_index = SummaryIndex.from_documents(
        city_docs[wiki_title], llm=llm
    )
    # define query engines
    vector_query_engine = vector_index.as_query_engine()
    list_query_engine = summary_index.as_query_engine()

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name="vector_tool",
                description=(
                    "Useful for summarization questions related to"
                    f" {wiki_title}"
                ),
            ),
        ),
        QueryEngineTool(
            query_engine=list_query_engine,
            metadata=ToolMetadata(
                name="summary_tool",
                description=(
                    f"Useful for retrieving specific context from {wiki_title}"
                ),
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-3.5-turbo")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
    )

    agents[wiki_title] = agent

## Build Recursive Retriever over these Agents

Now we define a set of summary nodes, where each node links to the corresponding Wikipedia city article. We then define a `RecursiveRetriever` on top of these Nodes to route queries down to a given node, which will in turn route it to the relevant document agent.

We finally define a full query engine combining `RecursiveRetriever` into a `RetrieverQueryEngine`.

In [21]:
# define top-level nodes
nodes = []
for wiki_title in wiki_titles:
    # define index node that links to these agents
    wiki_summary = (
        f"This content contains Wikipedia articles about {wiki_title}. Use"
        " this index if you need to lookup specific facts about"
        f" {wiki_title}.\nDo not use this index if you want to analyze"
        " multiple cities."
    )
    node = IndexNode(text=wiki_summary, index_id=wiki_title)
    nodes.append(node)

In [22]:
# define top-level retriever
vector_index = VectorStoreIndex(nodes)
vector_retriever = vector_index.as_retriever(similarity_top_k=1)

In [23]:
# define recursive retriever
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer

In [24]:
# note: can pass `agents` dict as `query_engine_dict` since every agent can be used as a query engine
recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever},
    query_engine_dict=agents,
    verbose=True,
)

#### Define Full Query Engine

This query engine uses the recursive retriever + response synthesis module to synthesize a response.

In [26]:
response_synthesizer = get_response_synthesizer(
    response_mode="compact",
)
query_engine = RetrieverQueryEngine.from_args(
    recursive_retriever,
    response_synthesizer=response_synthesizer,
    llm=llm,
)

## Running Example Queries

In [27]:
# should use Delhi agent -> Summary tool
response = query_engine.query("Summarize history of Delhi")

[1;3;34mRetrieving with query id None: Summarize history of Delhi
[0m[1;3;38;5;200mRetrieved node with id, entering: Delhi
[0m[1;3;34mRetrieving with query id Delhi: Summarize history of Delhi
[0mAdded user message to memory: Summarize history of Delhi
=== Calling Function ===
Calling function: summary_tool with args: {"input":"history of Delhi"}
Got output: Delhi has a rich history that dates back to ancient and early medieval periods. Traditionally associated with seven cities, the earliest being Indraprastha, mentioned in the Sanskrit epic Mahabharata. The Maurya period saw architectural relics in Delhi, with the Lal Kot built by Tomara Rajput King Anang Pal in 1052 CE. The medieval period saw the Delhi Sultanate established by Qutb-ud-din Aibak, followed by the Mughal Empire founded by Babur in 1526 after defeating the Lodhi sultan. The Mughal dynasty ruled Delhi for over three centuries, with notable constructions like Shahjahanabad. The colonial period saw Delhi come under 

In [28]:
print(response)

Delhi has a long and diverse history that includes ancient and medieval periods. It was associated with seven cities, starting with Indraprastha from the Mahabharata era. The Maurya period contributed architectural relics like Lal Kot. The medieval era saw the establishment of the Delhi Sultanate by Qutb-ud-din Aibak and later the Mughal Empire by Babur. The Mughal dynasty ruled Delhi for centuries, leaving behind significant constructions like Shahjahanabad. During the colonial period, Delhi came under British control after the Indian Rebellion of 1857 and became the capital of British-held territories in 1911. Post-independence, Delhi underwent demographic changes during the partition of India and was declared the capital of the Union of India in 1947.


In [None]:
# should use Hyderabad agent -> vector tool
response = query_engine.query("Who is current Mayor of Hyderabad city?")

[1;3;34mRetrieving with query id None: Who is current Mayor of Hyderabad city?
[0m[1;3;38;5;200mRetrieved node with id, entering: Hyderabad
[0m[1;3;34mRetrieving with query id Hyderabad: Who is current Mayor of Hyderabad city?
[0mAdded user message to memory: Who is current Mayor of Hyderabad city?
=== Calling Function ===
Calling function: vector_tool with args: {
  "input": "Who is the current Mayor of Hyderabad city?"
}
Got output: Gadwal Vijayalakshmi of Telangana Rashtra Samithi (TRS) is serving as the current Mayor of Hyderabad city.

[1;3;32mGot response: The current Mayor of Hyderabad city is Gadwal Vijayalakshmi from the Telangana Rashtra Samithi (TRS) party.
[0m

In [None]:
print(response)

The current Mayor of Hyderabad city is Gadwal Vijayalakshmi from the Telangana Rashtra Samithi (TRS) party.


In [None]:
# should use Chennai agent -> summary tool
response = query_engine.query(
    "Give me a summary on all the positive aspects of Chennai"
)

[1;3;34mRetrieving with query id None: Give me a summary on all the positive aspects of Chennai
[0m[1;3;38;5;200mRetrieved node with id, entering: Chennai
[0m[1;3;34mRetrieving with query id Chennai: Give me a summary on all the positive aspects of Chennai
[0mAdded user message to memory: Give me a summary on all the positive aspects of Chennai
=== Calling Function ===
Calling function: summary_tool with args: {
  "input": "positive aspects of Chennai"
}
Got output: Chennai is a vibrant and dynamic city with numerous positive aspects. It is home to prestigious educational institutions such as the Indian Institute of Technology Madras and the College of Engineering, Guindy, Anna University. The city also has a strong healthcare sector, with several government-run medical colleges and private colleges affiliated with the Tamil Nadu Dr. M.G.R. Medical University. Chennai is known for its rich literary culture, with libraries like the Connemara Public Library and the Anna Centenary L

In [None]:
print(response)

Chennai is a city that offers a wide range of positive aspects. It is known for its prestigious educational institutions, including the Indian Institute of Technology Madras and the College of Engineering, Guindy, Anna University. The city also has a strong healthcare sector, with several government-run medical colleges and private colleges affiliated with the Tamil Nadu Dr. M.G.R. Medical University. Chennai is rich in literary culture, with libraries like the Connemara Public Library and the Anna Centenary Library, which is the largest library in Asia. It is also a hub for research, housing institutions like the Central Leather Research Institute and the Structural Engineering Research Centre. Chennai's tourism industry is thriving, with attractions like the UNESCO Heritage Site of Mahabalipuram, the Marina Beach, and the Madras Crocodile Bank drawing domestic and foreign tourists. The city also boasts a well-maintained network of public parks, including the expansive Tholkappia Poon

In [None]:
# should use Mumbai agent -> summary tool
response = query_engine.query("Summarize about British rule in Mumbai history")

[1;3;34mRetrieving with query id None: Summarize about British rule in Mumbai history
[0m[1;3;38;5;200mRetrieved node with id, entering: Mumbai
[0m[1;3;34mRetrieving with query id Mumbai: Summarize about British rule in Mumbai history
[0mAdded user message to memory: Summarize about British rule in Mumbai history
=== Calling Function ===
Calling function: summary_tool with args: {
  "input": "British rule in Mumbai history"
}
Got output: Mumbai came under British rule in the 17th century when the English East India Company acquired the islands of Bombay from the Portuguese. Over time, the British expanded their control over nearby territories and made Mumbai the administrative center of the Bombay Presidency. This period saw Mumbai's transformation into a thriving trading hub, with significant growth and development. The British also undertook ambitious civil engineering projects, such as the Hornby Vellard, to merge the seven islands of Bombay into a single landmass. In 1853, Mu

In [None]:
print(response)

During British rule in Mumbai, the city experienced significant growth and development. The British undertook various civil engineering projects, improving connectivity and transforming Mumbai into a major seaport on the Arabian Sea. The city's economy thrived, particularly during the American Civil War, when it became the world's primary cotton-trading market. British rule in Mumbai lasted until India gained independence in 1947.
