# Multi-Document Agents (V1)

In this guide, you learn towards setting up a multi-document agent over the LlamaIndex documentation.

This is an extension of V0 multi-document agents with the additional features:
- Reranking during document (tool) retrieval
- Query planning tool that the agent can use to plan 


We do this with the following architecture:

- setup a "document agent" over each Document: each doc agent can do QA/summarization within its doc
- setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import phoenix as px
px.launch_app()

🌍 To view the Phoenix app in your browser, visit http://127.0.0.1:6060/
📺 To view the Phoenix app in a notebook, run `px.active_session().view()`
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


<phoenix.session.session.ThreadSession at 0x105cd82b0>

In [3]:
import llama_index
llama_index.set_global_handler("arize_phoenix")

## Setup and Download Data

In this section, we'll load in the LlamaIndex documentation.

In [None]:
domain = "docs.llamaindex.ai"
docs_url = "https://docs.llamaindex.ai/en/latest/"
!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}

In [4]:
from llama_hub.file.unstructured.base import UnstructuredReader
from pathlib import Path
from llama_index.llms import OpenAI
from llama_index import ServiceContext

In [5]:
reader = UnstructuredReader()

[nltk_data] Downloading package punkt to /Users/jerryliu/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/jerryliu/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [6]:
all_files_gen = Path("./docs.llamaindex.ai/").rglob("*")
all_files = [f.resolve() for f in all_files_gen]

In [7]:
all_html_files = [f for f in all_files if f.suffix.lower() == ".html"]

In [8]:
len(all_html_files)

418

In [9]:
from llama_index import Document

docs = []
for idx, f in enumerate(all_html_files):
    if idx > 10:
        break
    print(f"Idx {idx}/{len(all_html_files)}")
    loaded_docs = reader.load_data(file=f, split_documents=True)
    # Hardcoded Index. Everything before this is ToC
    start_idx = 72
    loaded_doc = Document(text="\n\n".join([d.get_content() for d in loaded_docs[72:]]), metadata={"path": str(f)})
    print(loaded_doc.get_content(metadata_mode="all")[:200])
    docs.append(loaded_doc)

Idx 0/418
path: /Users/jerryliu/Programming/gpt_index/docs/examples/agent/docs.llamaindex.ai/en/latest/index.html

Welcome to LlamaIndex 🦙 !

LlamaIndex (formerly GPT Index) is a data framework for LLM applica
Idx 1/418
path: /Users/jerryliu/Programming/gpt_index/docs/examples/agent/docs.llamaindex.ai/en/latest/deprecated_terms.html

Deprecated Terms

As LlamaIndex continues to evolve, many class names and APIs have
Idx 2/418
path: /Users/jerryliu/Programming/gpt_index/docs/examples/agent/docs.llamaindex.ai/en/latest/genindex.html

A |

B |

C |

D |

E |

F |

G |

H |

I |

J |

K |

L |

M |

N |

O |

P |

Q |

R |

S 
Idx 3/418
path: /Users/jerryliu/Programming/gpt_index/docs/examples/agent/docs.llamaindex.ai/en/latest/search.html

Please activate JavaScript to enable the search functionality.

Copyright © 2022, Jerry Liu Ma
Idx 4/418
path: /Users/jerryliu/Programming/gpt_index/docs/examples/agent/docs.llamaindex.ai/en/latest/development/documentation.html

Documentation Guide


Define LLM + Service Context + Callback Manager

In [10]:
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm)

## Building Multi-Document Agents

In this section we show you how to construct the multi-document agent. We first build a document agent for each document, and then define the top-level parent agent with an object index.

In [11]:
from llama_index import VectorStoreIndex, SummaryIndex

In [12]:
import nest_asyncio

nest_asyncio.apply()

### Build Document Agent for each Document

In this section we define "document agents" for each document.

We define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.

This document agent can dynamically choose to perform semantic search or summarization within a given document.

We create a separate document agent for each city.

In [13]:
from llama_index.agent import OpenAIAgent
from llama_index import load_index_from_storage, StorageContext
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.node_parser import SimpleNodeParser
import os
from tqdm.notebook import tqdm
import pickle


async def build_agent_per_doc(nodes, file_base):
    print(file_base)

    vi_out_path = f"./data/llamaindex_docs/{file_base}"
    summary_out_path = f"./data/llamaindex_docs/{file_base}_summary.pkl"
    if not os.path.exists(vi_out_path):
        Path("./data/llamaindex_docs/").mkdir(parents=True, exist_ok=True)
        # build vector index
        vector_index = VectorStoreIndex(nodes, service_context=service_context)
        vector_index.storage_context.persist(persist_dir=vi_out_path)
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=vi_out_path),
            service_context=service_context,
        )

    # build summary index
    summary_index = SummaryIndex(nodes, service_context=service_context)
    
    # define query engines
    vector_query_engine = vector_index.as_query_engine()
    summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize")

    # extract a summary
    if not os.path.exists(summary_out_path):
        Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)
        summary = str(await summary_query_engine.aquery("Extract a concise 1-2 line summary of this document"))
        pickle.dump(summary, open(summary_out_path, 'wb'))
    else:
        summary = pickle.load(open(summary_out_path, 'rb'))

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name=f"vector_tool_{file_base}",
                description=f"Useful for questions related to specific facts",
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name=f"summary_tool_{file_base}",
                description=f"Useful for summarization questions",
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-4")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
        system_prompt=f"""\
You are a specialized agent designed to answer queries about the `{file_base}.html` part of the LlamaIndex docs.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
    )

    return agent, summary


async def build_agents(docs):

    node_parser = SimpleNodeParser.from_defaults()
    
    # Build agents dictionary
    agents_dict = {}
    extra_info_dict = {}
    
    # # this is for the baseline
    # all_nodes = []
    
    for idx, doc in enumerate(tqdm(docs)):
        nodes = node_parser.get_nodes_from_documents([doc])
        # all_nodes.extend(nodes)

        file_base = Path(doc.metadata["path"]).stem
        agent, summary = await build_agent_per_doc(nodes, file_base)
    
        agents_dict[file_base] = agent
        extra_info_dict[file_base] = {
            "summary": summary,
            "nodes": nodes
        }
        
    return agents_dict, extra_info_dict

In [14]:
agents_dict, extra_info_dict = await build_agents(docs)

  0%|          | 0/11 [00:00<?, ?it/s]

index
deprecated_terms
genindex
search
documentation
changelog
contributing
privacy
evaluation
node
callbacks


### Build Retriever-Enabled OpenAI Agent

We build a top-level agent that can orchestrate across the different document agents to answer any user query.

This agent takes in all document agents as tools. This specific agent `RetrieverOpenAIAgent` performs tool retrieval before tool use (unlike a default agent that tries to put all tools in the prompt).

Here we use a top-k retriever, but we encourage you to customize the tool retriever method!


In [15]:
# define tool for each document agent
all_tools = []
for file_base, agent in agents_dict.items():
    summary = extra_info_dict[file_base]["summary"]
    doc_tool = QueryEngineTool(
        query_engine=agent,
        metadata=ToolMetadata(
            name=f"tool_{file_base}",
            description=summary,
        ),
    )
    all_tools.append(doc_tool)

In [74]:
# define an "object" index and retriever over these tools
from llama_index import VectorStoreIndex
from llama_index.objects import ObjectIndex, SimpleToolNodeMapping, ObjectRetriever
from llama_index.retrievers import BaseRetriever
from llama_index.indices.postprocessor import CohereRerank
from llama_index.tools import QueryPlanTool

tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
    all_tools,
    tool_mapping,
    VectorStoreIndex,
)
# vector_node_retriever = obj_index.as_node_retriever(similarity_top_k=10)
vector_node_retriever = obj_index.as_node_retriever(similarity_top_k=2)


# define a custom retriever with reranking
class CustomRetriever(BaseRetriever):
    def __init__(self, vector_retriever, postprocessor = None):
        self._vector_retriever = vector_retriever
        self._postprocessor = postprocessor or CohereRerank(top_n=5)
        
    def _retrieve(self, query_bundle):
        retrieved_nodes = self._vector_retriever.retrieve(query_bundle)
        filtered_nodes = self._postprocessor.postprocess_nodes(
            retrieved_nodes, query_bundle=query_bundle
        )
        
        return filtered_nodes

# # define a custom object retriever that adds in a query planning tool 
# class CustomObjectRetriever(ObjectRetriever):
#     def __init__(self, retriever, object_node_mapping, all_tools):
#         self._retriever = retriever
#         self._object_node_mapping = object_node_mapping

#         self._query_plan_tool = QueryPlanTool.from_defaults(
#             query_engine_tools=all_tools,
#             # include_tool_description=False,
#             include_tool_description=True,
#             name="tool_queryplan",
#             # description_prefix="ONLY use this tool for compare/contrast queries (NEVER use any other tool for that purpose)"
#             description_prefix="""\
# This is a query plan tool. It takes in a query plan DAG \
# in the form of QueryNode objects (each containing tool_name, query_str, and dependencies).
# It will then execute this query plan to answer the user question.

# You should ALWAYS use this tool to answer any question over multiple documents (e.g. document comparisons).\
# """
#         )
        
#     def retrieve(self, query_bundle):
#         nodes = self._retriever.retrieve(query_bundle)
#         tools = [self._object_node_mapping.from_node(n.node) for n in nodes]
#         return tools + [self._query_plan_tool]
#         # return [self._query_plan_tool]


# define a custom object retriever that adds in a query planning tool 
class CustomObjectRetriever(ObjectRetriever):
    def __init__(self, retriever, object_node_mapping, all_tools):
        self._retriever = retriever
        self._object_node_mapping = object_node_mapping

#         self._query_plan_tool = QueryPlanTool.from_defaults(
#             query_engine_tools=all_tools,
#             # include_tool_description=False,
#             include_tool_description=True,
#             name="tool_queryplan",
#             # description_prefix="ONLY use this tool for compare/contrast queries (NEVER use any other tool for that purpose)"
#             description_prefix="""\
# This is a query plan tool. It takes in a query plan DAG \
# in the form of QueryNode objects (each containing tool_name, query_str, and dependencies).
# It will then execute this query plan to answer the user question.

# You should ALWAYS use this tool to answer any question over multiple documents (e.g. document comparisons).\
# """
#         )
        
    def retrieve(self, query_bundle):
        nodes = self._retriever.retrieve(query_bundle)
        tools = [self._object_node_mapping.from_node(n.node) for n in nodes]

        query_plan_tool = QueryPlanTool.from_defaults(
            query_engine_tools=tools,
            # include_tool_description=False,
            include_tool_description=True,
            name="tool_queryplan",
            # description_prefix="ONLY use this tool for compare/contrast queries (NEVER use any other tool for that purpose)"
#             description_prefix="""\
# This is a query plan tool. It takes in a query plan DAG \
# in the form of QueryNode objects (each containing tool_name, query_str, and dependencies).
# It will then execute this query plan to answer the user question.

# You should ALWAYS use this tool to answer any question over multiple documents (e.g. document comparisons).\
# """
        )
        
        
        # return tools + [self._query_plan_tool]
        return [query_plan_tool]

In [75]:
custom_node_retriever = CustomRetriever(vector_node_retriever)

# wrap it with ObjectRetriever to return objects
custom_obj_retriever = CustomObjectRetriever(custom_node_retriever, tool_mapping, all_tools)

In [82]:
tmps = custom_obj_retriever.retrieve("hello")
display(tmps[0].metadata.to_openai_function())

{'name': 'tool_queryplan',
 'description': '            This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions are as follows:\n\n\n\n            Tool Name: tool_index\nTool Description: LlamaIndex is a data framework that allows LLM applications to ingest, structure, and access private or domain-specific data by providing tools such as data connectors, data indexes, engines, data agents, and application integrations. It is designed for beginners, advanced users, and everyone in between, and offers both high-level and lower-level APIs for customization. \n\nTool Name: tool_search\nTool Description: This document is a web page that provides information about enabling JavaScript for search functionality and credits the creators of Sphi

In [76]:
from llama_index.agent import FnRetrieverOpenAIAgent, ReActAgent
from llama_index.llms import OpenAI

llm = OpenAI(model_name="gpt-4-0613")
# top_agent = FnRetrieverOpenAIAgent.from_retriever(
#     custom_obj_retriever,
#     system_prompt=""" \
# You are an agent designed to answer queries about the documentation.
# Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

# """,
#     llm=llm,
#     verbose=True,
# )

# top_agent = ReActAgent.from_tools(
#     custom_obj_retriever,
#     system_prompt=""" \
# You are an agent designed to answer queries about the documentation.
# Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

# """,
#     llm=llm,
#     verbose=True,
# )

### Define Baseline Vector Store Index

As a point of comparison, we define a "naive" RAG pipeline which dumps all docs into a single vector index collection.

We set the top_k = 4

In [67]:
all_nodes = [n for extra_info in extra_info_dict.values() for n in extra_info["nodes"]]

In [68]:
base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)

## Running Example Queries

Let's run some example queries, ranging from QA / summaries over a single document to QA / summarization over multiple documents.

In [77]:
# should use Boston agent -> vector tool
response = top_agent.query("Tell me about the different types of evaluation in LlamaIndex")

[{'name': 'tool_queryplan', 'description': '            This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions are as follows:\n\n\n\n            Tool Name: tool_evaluation\nTool Description: This document provides information about evaluation modules and classes, dataset generation, evaluators and metrics, and the evaluation process for a retriever in the llama_index library. It also includes details about model construction, validation, and duplication, as well as methods for parsing, validating, and updating objects. \n\nTool Name: tool_index\nTool Description: LlamaIndex is a data framework that allows LLM applications to ingest, structure, and access private or domain-specific data by providing tools such as data connectors, dat

InvalidRequestError: '            This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions are as follows:\n\n\n\n            Tool Name: tool_evaluation\nTool Description: This document provides information about evaluation modules and classes, dataset generation, evaluators and metrics, and the evaluation process for a retriever in the llama_index library. It also includes details about model construction, validation, and duplication, as well as methods for parsing, validating, and updating objects. \n\nTool Name: tool_index\nTool Description: LlamaIndex is a data framework that allows LLM applications to ingest, structure, and access private or domain-specific data by providing tools such as data connectors, data indexes, engines, data agents, and application integrations. It is designed for beginners, advanced users, and everyone in between, and offers both high-level and lower-level APIs for customization. \n            ' is too long - 'functions.0.description'

In [29]:
print(response)

There are two types of evaluation in LlamaIndex:

1. LLM-based evaluation: This type of evaluation focuses on evaluating the performance of the LLM (Language Learning Model) in understanding and generating responses. It involves measuring metrics such as perplexity, BLEU score, and F1 score to assess the quality of the generated responses.

2. Retrieval-based evaluation: This type of evaluation focuses on evaluating the performance of the retrieval system in retrieving relevant documents or responses. It involves measuring metrics such as precision, recall, and mean average precision (MAP) to assess the effectiveness of the retrieval system.

Both types of evaluation are important in assessing the performance and effectiveness of the LlamaIndex system.


In [30]:
# baseline
response = base_query_engine.query("Tell me about the different types of evaluation in LlamaIndex")
print(str(response))

LlamaIndex utilizes various types of evaluation methods to assess its performance and effectiveness. These evaluation methods include RelevancyEvaluator, RetrieverEvaluator, SemanticSimilarityEvaluator, PairwiseComparisonEvaluator, CorrectnessEvaluator, FaithfulnessEvaluator, and GuidelineEvaluator. Each of these evaluators serves a specific purpose in evaluating different aspects of the LlamaIndex system.


In [60]:
response = top_agent.query(
    "Compare the contributions page vs. the home/index page"
)

[{'name': 'tool_queryplan', 'description': '            This is a query plan tool. It takes in a query plan DAG in the form of QueryNode objects (each containing tool_name, query_str, and dependencies).\nIt will then execute this query plan to answer the user question.\n\nYou should ALWAYS use this tool to answer any question over multiple documents (e.g. document comparisons).\n\n\n            Tool Name: tool_index\nTool Description: LlamaIndex is a data framework that allows LLM applications to ingest, structure, and access private or domain-specific data by providing tools such as data connectors, data indexes, engines, data agents, and application integrations. It is designed for beginners, advanced users, and everyone in between, and offers both high-level and lower-level APIs for customization. \n\nTool Name: tool_deprecated_terms\nTool Description: This document provides a list of deprecated terms in LlamaIndex, including class names and APIs that have been adjusted, improved, o

InvalidRequestError: '            This is a query plan tool. It takes in a query plan DAG in the form of QueryNode objects (each containing tool_name, query_str, and dependencies).\nIt will then execute this query plan to answer the user question.\n\nYou should ALWAYS use this tool to answer any question over multiple documents (e.g. document comparisons).\n\n\n            Tool Name: tool_index\nTool Description: LlamaIndex is a data framework that allows LLM applications to ingest, structure, and access private or domain-specific data by providing tools such as data connectors, data indexes, engines, data agents, and application integrations. It is designed for beginners, advanced users, and everyone in between, and offers both high-level and lower-level APIs for customization. \n\nTool Name: tool_deprecated_terms\nTool Description: This document provides a list of deprecated terms in LlamaIndex, including class names and APIs that have been adjusted, improved, or replaced. It also provides links to their replacements and additional details on usage. \n\nTool Name: tool_genindex\nTool Description: This document provides a list of methods, classes, attributes, and properties related to the Llama Index system, including various modules and components such as vector stores, document readers, chat engines, and more. \n\nTool Name: tool_search\nTool Description: This document is a web page that provides information about enabling JavaScript for search functionality and credits the creators of Sphinx and Furo. \n\nTool Name: tool_documentation\nTool Description: This document provides a guide for contributors interested in running and making changes to the LlamaIndex documentation locally. It includes instructions on cloning the GitHub repo, installing dependencies, building the Sphinx docs, and using sphinx-autobuild for live-reloading during development. \n\nTool Name: tool_changelog\nTool Description: This document provides a summary of the new features, bug fixes, and other changes in different versions of a software package or project. \n\nTool Name: tool_contributing\nTool Description: This document provides guidelines for contributing to LlamaIndex, including extending core modules, fixing bugs, adding usage examples, adding experimental features, and improving code quality and documentation. It also includes information on the environment setup, validating changes, formatting and linting, testing, creating example notebooks, and creating a pull request. \n\nTool Name: tool_privacy\nTool Description: This document provides information about the privacy and security features of LLamaIndex, including the option to configure data handling preferences, use custom embedding models, and connect with other vector stores. \n\nTool Name: tool_evaluation\nTool Description: This document provides information about evaluation modules and classes, dataset generation, evaluators and metrics, and the evaluation process for a retriever in the llama_index library. It also includes details about model construction, validation, and duplication, as well as methods for parsing, validating, and updating objects. \n\nTool Name: tool_node\nTool Description: This document provides information about various classes, methods, and attributes related to node objects, including base components, base nodes, documents, image documents, metadata modes, node relationships, and node with scores. \n\nTool Name: tool_callbacks\nTool Description: This document provides information about the callback handlers and event handling in the LlamaIndex library. \n            ' is too long - 'functions.0.description'

In [21]:
print(response)

Houston has a diverse population with a demographic makeup that includes non-Hispanic whites (23.3%), Hispanics and Latino Americans (45.8%), Blacks or African Americans (22.4%), and Asian Americans (6.5%). The largest Hispanic or Latino American ethnic group in Houston is Mexican Americans. Houston is also home to the largest African American community west of the Mississippi River and has a growing Muslim population.

On the other hand, Chicago is also known for its diverse demographics. The city has a significant non-Hispanic White population, along with a substantial Black population and Hispanic population. Chicago is celebrated for its cultural diversity and has a significant LGBT population.

Both Houston and Chicago have diverse populations, with a mix of different racial and ethnic groups contributing to their vibrant communities.


In [22]:
# baseline
response = base_query_engine.query(
    "Tell the demographics of Houston, and then compare that with the demographics of Chicago"
)
print(str(response))

Houston is the most populous city in Texas and the fourth-most populous city in the United States. It has a population of 2,304,580 as of the 2020 U.S. census. The city is known for its diversity, with a significant proportion of minorities. In 2019, non-Hispanic whites made up 23.3% of the population, Hispanics and Latino Americans 45.8%, Blacks or African Americans 22.4%, and Asian Americans 6.5%. The largest Hispanic or Latino American ethnic group in Houston is Mexican Americans, comprising 31.6% of the population.

In comparison, Chicago is the third-most populous city in the United States. According to the 2020 U.S. census, Chicago has a population of 2,746,388. The demographics of Chicago are different from Houston, with non-Hispanic whites making up 32.7% of the population, Hispanics and Latino Americans 29.9%, Blacks or African Americans 29.8%, and Asian Americans 7.6%. The largest Hispanic or Latino American ethnic group in Chicago is Mexican Americans, comprising 21.6% of th

In [None]:
# baseline: the response tells you nothing about Chicago...
response.source_nodes[3].get_content()

In [99]:
response = top_agent.query(
    "Tell me the differences between Shanghai and Beijing in terms of history and current economy"
)

=== Calling Function ===
Calling function: tool_Shanghai with args: {
  "input": "history"
}
=== Calling Function ===
Calling function: vector_tool with args: {
  "input": "history"
}
Got output: Shanghai has a rich history that dates back to ancient times. However, in the context provided, the history of Shanghai is mainly discussed in relation to its modern development. After the war, Shanghai's economy experienced significant growth, with increased agricultural and industrial output. The city's administrative divisions were rearranged, and it became a center for radical leftism during the 1950s and 1960s. The Cultural Revolution had a severe impact on Shanghai's society, but the city maintained economic production with a positive growth rate. Shanghai also played a significant role in China's Third Front campaign and has been a major contributor of tax revenue to the central government. Economic reforms were initiated in Shanghai in 1990, leading to the development of the Pudong dis

In [100]:
print(str(response))

In terms of history, both Shanghai and Beijing have rich and complex pasts. Shanghai's history dates back to ancient times, but its modern development is particularly noteworthy. It experienced significant economic growth after the war and played a major role in China's economic reforms. Beijing, on the other hand, has a history that spans several dynasties and served as the capital during the Ming and Qing dynasties. It has preserved its historical heritage while evolving into a modern metropolis.

In terms of current economy, Shanghai is a global center for finance and innovation. It has a diverse economy and has experienced rapid development, with a high GDP and significant foreign investment. It is a major player in the global financial industry and is home to the Shanghai Stock Exchange. Beijing's economy is primarily driven by the tertiary sector, with a focus on services such as professional services, information technology, and commercial real estate. It has identified high-end

In [34]:
# baseline
response = base_query_engine.query(
    "Tell me the differences between Shanghai and Beijing in terms of history and current economy"
)
print(str(response))

Shanghai and Beijing have distinct differences in terms of history and current economy. Historically, Shanghai was the largest and most prosperous city in East Asia during the 1930s, while Beijing served as the capital of the Republic of China and later the People's Republic of China. Shanghai experienced significant growth and redevelopment in the 1990s, while Beijing expanded its urban area and underwent rapid development in the last two decades.

In terms of the current economy, Shanghai is considered the "showpiece" of China's booming economy. It is a global center for finance and innovation, with a strong focus on industries such as retail, finance, IT, real estate, machine manufacturing, and automotive manufacturing. Shanghai is also home to the world's busiest container port, the Port of Shanghai. The city has a high GDP and is classified as an Alpha+ city by the Globalization and World Cities Research Network.

On the other hand, Beijing is a global financial center and ranks t