# Multi-Document Agents

In this notebook we will look into Building RAG when you have a large number of documents using `DocumentAgents` concept with `ReAct Agent`.

### Installation

In [2]:
%pip install llama-index
%pip install llama-index-llms-bedrock
%pip install llama-index-embeddings-bedrock

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Set Logging

In [4]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

# Set up the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)  # Set logger level to INFO

# Clear out any existing handlers
logger.handlers = []

# Set up the StreamHandler to output to sys.stdout (Colab's output)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)  # Set handler level to INFO

# Add the handler to the logger
logger.addHandler(handler)

from IPython.display import display, HTML

### Setup and imports

In [5]:
from llama_index.core import ( 
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage
)
from llama_index.core.settings import Settings
from llama_index.llms.bedrock import Bedrock
from llama_index.embeddings.bedrock import BedrockEmbedding, Models

NumExpr defaulting to 2 threads.


In [8]:
llm = Bedrock(model = "anthropic.claude-v2")
embed_model = BedrockEmbedding(model = "amazon.titan-embed-text-v1")

In [9]:
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512

### Download Documents

We will use Wikipedia pages of `Toronto`, `Seattle`, `Chicago`, `Boston`, `Houston` cities and build RAG pipeline.

In [10]:
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston"]

from pathlib import Path

import requests

for title in wiki_titles:
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    wiki_text = page["extract"]

    data_path = Path("data")
    if not data_path.exists():
        Path.mkdir(data_path)

    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)

### Load Document

In [11]:
# Load all wiki documents

from llama_index.core import SimpleDirectoryReader

city_docs = {}
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(
        input_files=[f"data/{wiki_title}.txt"]
    ).load_data()

#### Build ReAct Agent for each city 

In [12]:
from llama_index.core.agent import ReActAgent
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# Build agents dictionary
agents = {}

for wiki_title in wiki_titles:
    # build vector index
    vector_index = VectorStoreIndex.from_documents(
        city_docs[wiki_title],
    )
    # build summary index
    summary_index = SummaryIndex.from_documents(
        city_docs[wiki_title],
    )
    # define query engines
    vector_query_engine = vector_index.as_query_engine()
    summary_query_engine = summary_index.as_query_engine()

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name="vector_tool",
                description=(
                    f"Useful for retrieving specific context from {wiki_title}"
                ),
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name="summary_tool",
                description=(
                    "Useful for summarization questions related to"
                    f" {wiki_title}"
                ),
            ),
        ),
    ]

    # build agent
    agent = ReActAgent.from_tools(
        query_engine_tools,
        llm=llm,
        verbose=True,
    )

    agents[wiki_title] = agent

#### Define IndexNode for each of these Agents

In [13]:
from llama_index.core.schema import IndexNode

# define top-level nodes
objects = []
for wiki_title in wiki_titles:
    # define index node that links to these agents
    wiki_summary = (
        f"This content contains Wikipedia articles about {wiki_title}. Use"
        " this index if you need to lookup specific facts about"
        f" {wiki_title}.\nDo not use this index if you want to analyze"
        " multiple cities."
    )
    node = IndexNode(
        text=wiki_summary, index_id=wiki_title, obj=agents[wiki_title]
    )
    objects.append(node)

#### Define Top-Level Retriever to choose an Agent

In [14]:
vector_index = VectorStoreIndex(
    objects=objects,
)
query_engine = vector_index.as_query_engine(similarity_top_k=1, verbose=True)

#### Test Queries

Should choose a vector tool/ summary tool for a specific agent based on the query.

In [15]:
# should use Toronto agent -> vector tool
response = query_engine.query("What is the population of Toronto?")

[1;3;38;2;11;159;203mRetrieval entering Toronto: ReActAgent
[0m[1;3;38;2;237;90;200mRetrieving from object ReActAgent with query What is the population of Toronto?
[0m[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: vector_tool
Action Input: {'input': 'What is the population of Toronto?'}
[0m[1;3;34mObservation: Based on the provided context, the 2021 census reported that Toronto had a population of 2,794,356. The context states that "In the 2021 Census of Population conducted by Statistics Canada, Toronto had a population of 2,794,356 living in 1,160,892 of its 1,253,238 total private dwellings, a change of 2.3 percent from its 2016 population of 2,731,571." Therefore, the population of Toronto is 2,794,356 as of the 2021 census.
[0m[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer
Answer: The population of Toronto is 2,794,356 according to t

In [16]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [17]:
# should use Houston agent -> vector tool
response = query_engine.query("Who and when was Houston founded?")

[1;3;38;2;11;159;203mRetrieval entering Houston: ReActAgent
[0m[1;3;38;2;237;90;200mRetrieving from object ReActAgent with query Who and when was Houston founded?
[0m[1;3;38;5;200mThought: The user asked a question about when Houston was founded and by whom. To answer this, I will use the summary_tool to get key details about Houston's history.
Action: summary_tool
Action Input: {'input': 'Provide a brief summary of when and by whom Houston, Texas was founded.'}
[0m[1;3;34mObservation: Based on the context provided, here is a brief summary of when and by whom Houston, Texas was founded:

Houston was founded in 1836 by brothers Augustus Chapman Allen and John Kirby Allen. On August 26, 1836, the Allen brothers bought land at the confluence of Buffalo Bayou and White Oak Bayou from Elizabeth Parrott, the widow of John Austin. Just four days later, on August 30, 1836, the Allen brothers ran an advertisement to promote the new town, naming it after Sam Houston who was elected as pre

In [18]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [19]:
# should use Boston agent -> summary tool
response = query_engine.query("Summarize about the sports teams in Boston")

[1;3;38;2;11;159;203mRetrieval entering Boston: ReActAgent
[0m[1;3;38;2;237;90;200mRetrieving from object ReActAgent with query Summarize about the sports teams in Boston
[0m[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: summary_tool
Action Input: {'input': 'Summarize about the sports teams in Boston'}
[0m[1;3;34mObservation: Here is a summary of the sports teams in Boston:

- Boston has teams in the four major North American professional sports leagues - MLB, NFL, NBA, and NHL - and has won championships in all four. 

- The Boston Red Sox play in MLB at Fenway Park. They have won 9 World Series titles, most recently in 2018.

- The New England Patriots play in the NFL, though they are based in Foxborough. They have won 6 Super Bowl titles, most recently in 2018.

- The Boston Celtics play in the NBA and have won 17 championships, tied for most in the league. Their most recent title was in 2008.


In [20]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [21]:
# should use Seattle agent -> summary tool
response = query_engine.query(
    "Give me a summary on all the positive aspects of Chicago"
)

[1;3;38;2;11;159;203mRetrieval entering Chicago: ReActAgent
[0m[1;3;38;2;237;90;200mRetrieving from object ReActAgent with query Give me a summary on all the positive aspects of Chicago
[0m[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: summary_tool
Action Input: {'input': 'Give me a summary on all the positive aspects of Chicago'}
[0m[1;3;34mObservation: Here is a summary of some of the key positive aspects of Chicago based on the context provided:

- Chicago is a major global city and an important transportation, business, and cultural hub. It has the third largest metropolitan economy in the U.S. 

- The city is renowned for its architecture, including iconic skyscrapers like the Willis Tower and diverse architectural styles from the Chicago School to modern masterpieces. It has a thriving arts scene from theater and comedy to music and museums.

- Chicago has an extensive public transportation 

In [22]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))