## Hierarchical Agents with LlamaIndex

The goal of this notebook is to demonstrate and compare the use of hierachical agents with LlamaIndex. We will set up a few tools and compare the capabilities of the agent when using the resources it has available for retrieval as agents vs tools.

### Setup OpenAI Agent
To start we will import the OpenAI agent we will be using across examples, and authenticate with our key:

In [1]:
# Set up OpenAI
import openai
from llama_index.agent import OpenAIAgent
openai.api_key = 'sk-your-key'

### Setting up a Database Query Engine

The first data source we will be setting up is a tool to convert natural language prompts into an SQL query against a database. This tool connects to a SQL database and retrieves the metadata of the tables, using a LLM to craft the queries.

For more information see the [LlamaIndex SQL Guide](https://gpt-index.readthedocs.io/en/latest/examples/query_engine/SQLAutoVectorQueryEngine.html)

In [2]:
from sqlalchemy import create_engine
from llama_index import SQLDatabase
from llama_index.indices.struct_store.sql_query import NLSQLTableQueryEngine
from llama_index.tools.query_engine import QueryEngineTool

sql_database = SQLDatabase(create_engine('sqlite:///countries.db'), include_tables=["airports", "gdp", "population"])

sql_query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["airports", "gdp", "population"],
)

print(sql_query_engine.query("list all of the countries in the database in quotes seperated by a comma"))

"United States", "China", "Japan", "Germany", "India", "United Kingdom", "France", "Russian Federation", "Canada", "Italy", "Brazil", "Australia", "Korea, Rep.", "Mexico", "Spain", "Indonesia", "Saudi Arabia", "Netherlands", "Türkiye", "Switzerland", "Poland", "Argentina", "Sweden", "Norway", "Belgium", "Cuba", "Ireland", "Israel", "United Arab Emirates", "Thailand", "Nigeria", "Egypt, Arab Rep.", "Austria", "Singapore", "Bangladesh", "Vietnam", "Malaysia", "South Africa", "Philippines", "Denmark", "Iran, Islamic Rep.", "Pakistan", "Hong Kong SAR, China", "Colombia", "Romania", "Chile", "Czech Republic", "Finland", "Iraq", "Portugal", "New Zealand", "Peru", "Qatar", "Kazakhstan", "Greece", "Algeria", "Kuwait", "Hungary", "Ukraine", "Morocco", "Ethiopia", "Slovak Republic", "Ecuador", "Oman", "Dominican Republic", "Puerto Rico", "Kenya", "Angola", "Guatemala", "Bulgaria", "Luxembourg", "Uzbekistan", "Azerbaijan", "Panama", "Tanzania", "Sri Lanka", "Ghana", "Belarus", "Uruguay", "Croatia

For the purposes of this demo, we are using a database of airports, GDP and population from the [world bank](https://datacatalog.worldbank.org/home). The airports table contains international airports, with the country they are located in and the number of travelers that passed through in 2019. The population and GDP tables contain population and GDP data of each country.

With the SQL query engine properly set up, we retrieve the list of countries in the database to use for the next step, where we will load all of the countries wikipedia pages into a vector index.



## Setting up a Vector Index Query engine

We take the list of countries that the SQL query engine nicely formatted for us, and use it to pull wikipedia pages for each country. Then, we put all of the pages into a vector index that we can later use to retrieve content.


In [3]:
from pathlib import Path
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
)

import requests
wiki_titles = ["Afghanistan", "Albania", "Algeria", "American Samoa", "Andorra", "Angola", "Anguilla", "Antigua and Barbuda", "Argentina", "Armenia", "Aruba", "Australia", "Austria", "Azerbaijan", "Bahamas", "Bahamas, The", "Bahrain", "Bangladesh", "Barbados", "Belarus", "Belgium", "Belize", "Benin", "Bermuda", "Bhutan", "Bolivia", "Bosnia and Herzegovina", "Botswana", "Brazil", "British Virgin Islands", "Brunei Darussalam", "Bulgaria", "Burkina Faso", "Burundi", "Cabo Verde", "Cambodia", "Cameroon", "Canada", "Cape Verde", "Cayman Islands", "Central African Republic", "Chad", "Channel Islands", "Chile", "China", "Christmas Island", "Cocos (Keeling) Islands", "Colombia", "Comoros", "Congo", "Congo, Dem. Rep.", "Congo, Rep.", "Cook Islands", "Costa Rica", "Croatia", "Cuba", "Curacao", "Curaçao", "Cyprus", "Czech Republic", "Côte d'Ivoire", "Democratic Republic of the Congo", "Denmark", "Djibouti", "Dominica", "Dominican Republic", "East Timor", "Ecuador", "Egypt", "Egypt, Arab Rep.", "El Salvador", "Equatorial Guinea", "Eritrea", "Estonia", "Eswatini", "Ethiopia", "Falkland Islands", "Faroe Islands", "Fiji", "Finland", "France", "French Guiana", "French Polynesia", "Gabon", "Gambia", "Gambia, The", "Georgia", "Germany", "Ghana", "Gibraltar", "Greece", "Greenland", "Grenada", "Guadeloupe", "Guam", "Guatemala", "Guinea Bissau", "Guinea", "Guinea-Bissau", "Guyana", "Haiti", "Honduras", "Hungary", "Iceland", "India", "Indonesia", "Iran", "Iran, Islamic Rep.", "Iraq", "Ireland", "Isle of Man", "Israel", "Italy", "Ivory Coast (Cote d'Ivoire)", "Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Kiribati", "Korea, Rep.", "Kosovo", "Kuwait", "Kyrgyz Republic", "Kyrgyzstan", "Lao PDR", "Laos", "Latvia", "Lebanon", "Lesotho", "Liberia", "Libya", "Liechtenstein", "Lithuania", "Luxembourg", "Macao SAR, China", "Macedonia", "Madagascar", "Malawi", "Malaysia", "Maldives", "Mali", "Malta", "Marshall Islands", "Martinique", "Mauritania", "Mauritius", "Mayotte", "Mexico", "Micronesia", "Micronesia, Fed. Sts.", "Moldova", "Monaco", "Mongolia", "Montenegro", "Montserrat", "Morocco", "Mozambique", "Myanmar", "Namibia", "Nauru", "Nepal", "Netherlands", "New Caledonia", "New Zealand", "Nicaragua", "Niger", "Nigeria", "Niue", "Norfolk Island", "North Korea", "North Macedonia", "Northern Mariana Islands", "Norway", "Oman", "Pakistan", "Palau", "Panama", "Papua New Guinea", "Paraguay", "Peru", "Philippines", "Poland", "Portugal", "Puerto Rico", "Qatar", "Reunion", "Romania", "Russian Federation", "Rwanda", "Saint Barthelemy", "Saint Helena", "Saint Kitts and Nevis", "Saint Lucia", "Saint Martin", "Saint Pierre and Miquelon", "Saint Vincent and Grenadines", "Samoa", "San Marino", "Sao Tome and Principe", "Saudi Arabia", "Senegal", "Serbia", "Seychelles", "Sierra Leone", "Singapore", "Sint Maarten (Dutch part)", "Sint Maarten", "Slovak Republic", "Slovakia", "Slovenia", "Solomon Islands", "Somalia", "South Africa", "South Korea", "South Sudan", "Spain", "Sri Lanka", "St. Kitts and Nevis", "St. Lucia", "St. Martin (French part)", "St. Vincent and the Grenadines", "Sudan", "Suriname", "Swaziland", "Sweden", "Switzerland", "Syria", "Syrian Arab Republic", "São Tomé and Principe", "Taiwan", "Tajikistan", "Tanzania", "Thailand", "Timor-Leste", "Togo", "Tonga", "Trinidad and Tobago", "Tunisia", "Turkey", "Turkmenistan", "Turks and Caicos Islands", "Tuvalu", "Türkiye", "Uganda", "Ukraine", "United Arab Emirates", "United Kingdom", "United States", "Uruguay", "Uzbekistan", "Vanuatu", "Venezuela", "Venezuela, RB", "Vietnam", "Virgin Islands (U.S.)", "Wallis and Futuna Islands", "West Bank and Gaza", "Western Sahara", "Western Samoa", "Yemen", "Yemen, Rep.", "Zambia", "Zimbabwe"]
docs = []

for title in wiki_titles:
    # Fetch the wikipedia page
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    if "extract" not in page:
        continue
    wiki_text = page["extract"]

    data_path = Path("data")
    if not data_path.exists():
        Path.mkdir(data_path)

    # Save the content to a file
    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)
        
    # Append the text content to our docs array
    docs.append(SimpleDirectoryReader(
        input_files=[f"data/{title}.txt"]
    ).load_data()[0])

# Create a vector store from ALL of the documents
vector_query_engine = VectorStoreIndex.from_documents(docs).as_query_engine()
vector_query_engine.query('what is the culture of germany')


Response(response='Germany has a rich and diverse culture that has been shaped by major intellectual and popular currents in Europe, both religious and secular. It is known as "the land of poets and thinkers" due to the significant contributions of German scientists, writers, and philosophers to Western thought. Germany is also famous for its folk festival traditions, such as the Oktoberfest, and its Christmas customs. The country has a strong literary tradition, with renowned authors like Johann Wolfgang von Goethe and the Brothers Grimm, who popularized German folklore. German philosophy has also had a significant impact on the world, with influential thinkers such as Immanuel Kant, Friedrich Nietzsche, and Karl Marx. Germany\'s cultural heritage is further reflected in its UNESCO World Heritage sites and its vibrant film industry, which has made major contributions to cinema.', source_nodes=[NodeWithScore(node=TextNode(id_='f86fff2b-82a0-4f2d-a0f3-1d6de09aa358', embedding=None, meta

### Flat vs Hierarchical Agents

Currently we have two useful query engines that can answer specific questions about countries. To make use of these query engines at the same time, we can wrap them in the `QueryEngineTool` abstraction which allows them to be passed to an agent. 

We can also take things one step further, and pass each individual tool to an agent which is then wrapped as a tool again. We will consider the agent that has direct access to the QueryEngineTools as a "Flat Agent", and the agent that is using the two sub agents as tools a "Hierarchical Agent"



In [10]:
# Make the query engine a tool
sql_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine,
    name="sql_tool",
    description=f"""Useful for translating a natural language query into a SQL query over a database containing
        This tool has the following tables: {sql_query_engine._get_table_context('')}
    """
)

# Create the SQL Agent
sql_agent = OpenAIAgent.from_tools(
    [
        QueryEngineTool.from_defaults(
            query_engine=sql_query_engine,
            name="sql_tool",
            description=f"""Useful for translating a natural language query into a SQL query over a database containing
                This tool has the following tables: {sql_query_engine._get_table_context('')}
        """
    )],
    system_prompt="""
        You are a specialized agent designed to use an SQL database to
        query and retrieve information for the user.
        
        If you cannot answer a spefic question using the database, tell the user 
        that the information is not contained in the database.
    """,
    verbose=True
)

# Convert the SQL Agent to a Tool to use in another agent
sql_agent_tool = QueryEngineTool.from_defaults(
    sql_agent,
    name='country_db_agent',
    description=f"""Useful for translating a natural language query into a SQL query over a database containing
        This tool has the following tables: {sql_query_engine._get_table_context('')}
    """
)

# Make the vector query engine a tool
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    name="country_tool",
    description="""
        Useful for retrieving specfic information about countries, based on the content of the wikipedia page.",
        Pass a natural language query to this tool to retrieve information
    """,
)

# Create the SQL Agent
vector_agent = OpenAIAgent.from_tools(
    [vector_tool],
    system_prompt="""
        You are a specialized agent designed to answer questions on countries
        You should use the country_tool to retrieve information on specfic countries.
        If you have to answer questions about multiple countries, consider asking the country_tool
        multiple questions to collect information and synthesize it into a proper response
    """,
    verbose=True
)

# Convert the SQL Agent to a Tool to use in another agent
vector_agent_tool = QueryEngineTool.from_defaults(
    vector_agent,
    name='country_agent',
    description=f"""
        Useful for retrieving specfic information about countries, based on the content of the wikipedia page.",
        Pass a natural language query to this tool to retrieve information
    """,
)

flat_agent = OpenAIAgent.from_tools(
    [vector_tool, sql_tool],
    system_prompt="""
        You are a specialized agent designed to have access to retrieve specialized information about countries
        using the tools that are provided to you. Assist the user in answering any country related questions with these tools.
    """,
    verbose=True
)

hierarchical_agent = OpenAIAgent.from_tools(
    [vector_agent_tool, sql_agent_tool],
    system_prompt="""
        You are a specialized agent designed to have access to retrieve specialized information about countries
        using the tools that are provided to you. Assist the user in answering any country related questions with these tools.
    """,
    verbose=True
)


### Hierarchical vs Flat Agent

Now that we have our agents setup, we can run some simple and more complicated tasks to test them out. More simple tasks are tasks that can be completed with only a single tool/sub-agent, while more complicated tasks will require combining the information from multiple sub agents and sub queries. As a reminder the structure of our agents right now is:

Flat Agent:
* SQL Query Engine Tool (Countries DB)
* Vector Index Query Engine Tool (Countries Wikipedia Pages)

Hierarchical Agent:
* SQL Agent
    * SQL Query Engine Tool (Countries DB)
* Vector Index Agent
    * Vector Index Query Engine Tool (Countries Wikipedia Pages)
    
Let's start with some simple questions that only require a single tool to answer correctly:

In [11]:
print(flat_agent.chat('what are some similarities between norway and sweden'))

=== Calling Function ===
Calling function: country_tool with args: {
  "input": "similarities between Norway and Sweden"
}
Got output: Norway and Sweden share several similarities. Both countries are located in Northern Europe and are part of the Scandinavian Peninsula. They have a long history of political and cultural connections, including a period of union between 1814 and 1905. Both countries have a constitutional monarchy and share a similar political system. Additionally, Norway and Sweden have a similar standard of living and are known for their high levels of social welfare and quality of life. They also have a strong tradition of environmental conservation and are known for their beautiful natural landscapes, including fjords and mountains.
Some similarities between Norway and Sweden include:

1. Geographical Location: Both Norway and Sweden are located in Northern Europe and share the Scandinavian Peninsula. They have a similar climate and are surrounded by the Baltic Sea, N

In [12]:
print(hierarchical_agent.chat('what are some similarities between norway and sweden'))

=== Calling Function ===
Calling function: country_agent with args: {
  "input": "similarities between Norway and Sweden"
}
=== Calling Function ===
Calling function: country_tool with args: {
  "input": "Norway"
}
Got output: Norway is a Nordic country located in Northern Europe. It is situated on the western and northernmost part of the Scandinavian Peninsula. The country has a total area of 385,207 square kilometers and a population of 5,488,984. Norway shares borders with Sweden, Finland, and Russia, and has a long coastline facing the North Atlantic Ocean and the Barents Sea. The capital and largest city of Norway is Oslo. Norway is known for its extensive reserves of petroleum, natural gas, minerals, lumber, seafood, and fresh water. It has a strong welfare model with universal healthcare and a comprehensive social security system. Norway is a constitutional monarchy with a unitary state structure and a parliamentary system. It is a member of various international organizations, 

## Interpretation 

Both of the agents used the wikipedia tools to answer a generic question about two countries, as expected. The main difference we see is that the flat agent delegated the entire question ("similarities between Norway and Sweden") to the query engine, where as the sub-agent in the hierarchical agent broke the question into two queries, "Norway" and "Sweden", and then synthesized the similarities from there.

Which retrieval method is more effective will likely be dependant on the data and data formats that you are using. One note is that both methods are susceptible to hallucinations or adding extra content, as the retrieved chunks in both examples don't mention outdoors activities.

In [13]:
print(flat_agent.chat('what is the name and gdp of the country with the most airports and a population under 10m'))

=== Calling Function ===
Calling function: sql_tool with args: {
  "input": "SELECT airports.AirportCountry AS Country, gdp.MillionsGDP AS GDP FROM airports JOIN gdp ON airports.AirportCountry = gdp.CountryName JOIN population ON airports.AirportCountry = population.Country WHERE population.ThousandsPopulation < 10000 ORDER BY airports.TotalTravelers DESC LIMIT 1"
}
Got output: The country with the highest number of total travelers and a population of less than 10,000 is the United Arab Emirates, with a GDP of 507,535 million.
The country with the most airports and a population under 10 million is the United Arab Emirates. The GDP of the United Arab Emirates is 507,535 million.


In [16]:
print(hierarchical_agent.chat('what is the name and gdp of the country with the most airports and a population under 10m'))

=== Calling Function ===
Calling function: country_db_agent with args: {
  "input": "SELECT airports.AirportCountry AS CountryName, gdp.MillionsGDP FROM airports JOIN population ON airports.AirportCountry = population.Country JOIN gdp ON airports.AirportCountry = gdp.CountryName WHERE population.ThousandsPopulation < 10000 GROUP BY airports.AirportCountry ORDER BY COUNT(*) DESC LIMIT 1;"
}
=== Calling Function ===
Calling function: sql_tool with args: {
  "input": "SELECT airports.AirportCountry AS CountryName, gdp.MillionsGDP FROM airports JOIN population ON airports.AirportCountry = population.Country JOIN gdp ON airports.AirportCountry = gdp.CountryName WHERE population.ThousandsPopulation < 10000 GROUP BY airports.AirportCountry ORDER BY COUNT(*) DESC LIMIT 1;"
}
Got output: The country with the highest number of airports and a population of less than 10,000 is Finland. Finland has a GDP of 280,826 million.
Got output: The country with the highest number of airports and a populatio

### Interpretation

The two agents interestingly decided to run different queries. The flat agents query amounts to selecting the country with the  airport that had the most travelers and a population under 10 million, while the hierarchical agent is selecting the country with the most airports in the airports table. In this case the hierarchical agent is correctly answering the question.

### Joint SQL and Text searching

Let's ask some questions that require the agents to use both SQL and vector searches


In [17]:
print(flat_agent.chat('what is the culture like of the country with the 10th highest population'))

=== Calling Function ===
Calling function: country_tool with args: {
  "input": "culture of the country with the 10th highest population"
}
Got output: Based on the context information, the country with the 10th highest population is Japan. The culture of Japan is characterized by its rich history, traditions, and religious diversity. Japanese cultural history spans over thousands of years, with influences from various schools of thought such as Shinto, Buddhism, and Confucianism. The Japanese language, with its unique writing system, is the primary language spoken in the country. Traditional customs and practices, such as participating in religious ceremonies and festivals, hold significant importance in Japanese culture. Additionally, Western customs and influences, including Christianity, have also become popular in Japan. The country is known for its homogeneous society, with the majority of the population being of Japanese ethnicity.
The country with the 10th highest population is

In [18]:
print(hierarchical_agent.chat('what is the culture like of the country with the 10th highest population'))

=== Calling Function ===
Calling function: country_agent with args: {
  "input": "culture of the country with the 10th highest population"
}
=== Calling Function ===
Calling function: country_tool with args: {
  "input": "culture of the country with the 10th highest population"
}
Got output: Based on the context information, we cannot determine the country with the 10th highest population. Therefore, we cannot provide information about the culture of that specific country.
Got output: I'm sorry, but I couldn't determine the country with the 10th highest population based on the information provided. Could you please specify the country you are referring to?
=== Calling Function ===
Calling function: country_db_agent with args: {
  "input": "SELECT Country FROM population ORDER BY ThousandsPopulation DESC LIMIT 1 OFFSET 9;"
}
=== Calling Function ===
Calling function: sql_tool with args: {
  "input": "SELECT Country FROM population ORDER BY ThousandsPopulation DESC LIMIT 1 OFFSET 9;"
}
G

### Interpretation

In this case, the country with the 10th highest population according to the data set is Mexico. Both agents attempted to use the wikipedia tool/agent as a resource first, however the hierarchical agent was told it didn't have access to the information while the flat agent was simply returned an answer based on Japan, which