<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/agent/openai_agent_query_plan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI Agent Query Planning
In this demo, we explore adding a `QueryPlanTool` to an `OpenAIAgent`. This effectively enables the agent
to do advanced query planning, all through a single tool!

The `QueryPlanTool` is designed to work well with the OpenAI Function API. The tool takes in a set of other tools as input.
The tool function signature contains of a QueryPlan Pydantic object, which can in turn contain a DAG of QueryNode objects defining a compute graph.
The agent is responsible for defining this graph through the function signature when calling the tool. The tool itself executes the DAG over any corresponding tools.

In this setting we use a familiar example: Uber 10Q filings in March, June, and September of 2022.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex ü¶ô.

In [1]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.9.41-py3-none-any.whl (15.9 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m15.9/15.9 MB[0m [31m68.1 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json (from llama-index)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index)
  Downloading dirtyjson-1.0.8-py3-none-any.whl (25 kB)
Collecting httpx (from llama-index)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m75.9/75.9 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=1.1.0 (from llama-index)
  Downloading openai-1.10.0-py3-none-any.whl (225 kB)

In [None]:
# # uncomment to turn on logging
# import logging
# import sys

# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [8]:
pip install pypdf

Collecting pypdf
  Downloading pypdf-4.0.1-py3-none-any.whl (283 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m284.0/284.0 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-4.0.1


In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import os
import openai

os.environ["OPENAI_API_KEY"] = "Your api key goes here"

In [4]:
from llama_index import (
    SimpleDirectoryReader,
    ServiceContext,
    GPTVectorStoreIndex,
)
from llama_index.response.pprint_utils import pprint_response
from llama_index.llms import OpenAI

In [5]:
llm = OpenAI(temperature=0, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)

## Download Data

In [6]:
!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'

--2024-02-02 06:54:02--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1260185 (1.2M) [application/octet-stream]
Saving to: ‚Äòdata/10q/uber_10q_march_2022.pdf‚Äô


2024-02-02 06:54:02 (133 MB/s) - ‚Äòdata/10q/uber_10q_march_2022.pdf‚Äô saved [1260185/1260185]

--2024-02-02 06:54:02--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_june_2022.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response...

## Load data

In [9]:
march_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_march_2022.pdf"]
).load_data()
june_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_june_2022.pdf"]
).load_data()
sept_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_sept_2022.pdf"]
).load_data()

## Build indices

We build a vector index / query engine over each of the documents (March, June, September).

In [10]:
march_index = GPTVectorStoreIndex.from_documents(march_2022)
june_index = GPTVectorStoreIndex.from_documents(june_2022)
sept_index = GPTVectorStoreIndex.from_documents(sept_2022)

In [11]:
march_engine = march_index.as_query_engine(
    similarity_top_k=3, service_context=service_context
)
june_engine = june_index.as_query_engine(
    similarity_top_k=3, service_context=service_context
)
sept_engine = sept_index.as_query_engine(
    similarity_top_k=3, service_context=service_context
)

## OpenAI Function Agent with a Query Plan Tool

Use OpenAIAgent, built on top of the OpenAI tool use interface.

Feed it our QueryPlanTool, which is a Tool that takes in other tools. And the agent to generate a query plan DAG over these tools.

In [12]:
from llama_index.tools import QueryEngineTool


query_tool_sept = QueryEngineTool.from_defaults(
    query_engine=sept_engine,
    name="sept_2022",
    description=(
        f"Provides information about Uber quarterly financials ending"
        f" September 2022"
    ),
)
query_tool_june = QueryEngineTool.from_defaults(
    query_engine=june_engine,
    name="june_2022",
    description=(
        f"Provides information about Uber quarterly financials ending June"
        f" 2022"
    ),
)
query_tool_march = QueryEngineTool.from_defaults(
    query_engine=march_engine,
    name="march_2022",
    description=(
        f"Provides information about Uber quarterly financials ending March"
        f" 2022"
    ),
)

In [13]:
# define query plan tool
from llama_index.tools import QueryPlanTool
from llama_index import get_response_synthesizer

response_synthesizer = get_response_synthesizer(
    service_context=service_context
)
query_plan_tool = QueryPlanTool.from_defaults(
    query_engine_tools=[query_tool_sept, query_tool_june, query_tool_march],
    response_synthesizer=response_synthesizer,
)

In [14]:
query_plan_tool.metadata.to_openai_tool()  # to_openai_function() deprecated

{'type': 'function',
 'function': {'name': 'query_plan_tool',
  'description': '        This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions are as follows:\n\n\n\n        Tool Name: sept_2022\nTool Description: Provides information about Uber quarterly financials ending September 2022 \n\nTool Name: june_2022\nTool Description: Provides information about Uber quarterly financials ending June 2022 \n\nTool Name: march_2022\nTool Description: Provides information about Uber quarterly financials ending March 2022 \n        ',
  'parameters': {'type': 'object',
   'properties': {'nodes': {'title': 'Nodes',
     'description': 'The original question we are asking.',
     'type': 'array',
     'items': {'$ref': '#/definitions/QueryNode'

In [15]:
from llama_index.agent import OpenAIAgent
from llama_index.llms import OpenAI


agent = OpenAIAgent.from_tools(
    [query_plan_tool],
    max_function_calls=10,
    llm=OpenAI(temperature=0, model="gpt-4-0613"),
    verbose=True,
)

In [16]:
response = agent.query("What were the risk factors in sept 2022?")

Added user message to memory: What were the risk factors in sept 2022?
=== Calling Function ===
Calling function: query_plan_tool with args: {
  "nodes": [
    {
      "id": 1,
      "query_str": "What were the risk factors in sept 2022?",
      "tool_name": "sept_2022",
      "dependencies": []
    }
  ]
}
[1;3;34mExecuting node {"id": 1, "query_str": "What were the risk factors in sept 2022?", "tool_name": "sept_2022", "dependencies": []}
[0m[1;3;38;5;200mSelected Tool: ToolMetadata(description='Provides information about Uber quarterly financials ending September 2022', name='sept_2022', fn_schema=<class 'llama_index.tools.types.DefaultToolFnSchema'>)
[0m[1;3;34mExecuted query, got response.
Query: What were the risk factors in sept 2022?
Response: The risk factors in September 2022 included failure to meet regulatory requirements related to climate change or to meet stated climate change commitments, which could impact costs, operations, brand, and reputation. Outbreaks of con

In [17]:
from llama_index.tools.query_plan import QueryPlan, QueryNode

query_plan = QueryPlan(
    nodes=[
        QueryNode(
            id=1,
            query_str="risk factors",
            tool_name="sept_2022",
            dependencies=[],
        )
    ]
)

In [18]:
QueryPlan.schema()

{'title': 'QueryPlan',
 'description': "Query plan.\n\nContains a list of QueryNode objects (which is a recursive object).\nOut of the list of QueryNode objects, one of them must be the root node.\nThe root node is the one that isn't a dependency of any other node.",
 'type': 'object',
 'properties': {'nodes': {'title': 'Nodes',
   'description': 'The original question we are asking.',
   'type': 'array',
   'items': {'$ref': '#/definitions/QueryNode'}}},
 'required': ['nodes'],
 'definitions': {'QueryNode': {'title': 'QueryNode',
   'description': 'Query node.\n\nA query node represents a query (query_str) that must be answered.\nIt can either be answered by a tool (tool_name), or by a list of child nodes\n(child_nodes).\nThe tool_name and child_nodes fields are mutually exclusive.',
   'type': 'object',
   'properties': {'id': {'title': 'Id',
     'description': 'ID of the query node.',
     'type': 'integer'},
    'query_str': {'title': 'Query Str',
     'description': 'Question we 

In [19]:
response = agent.query(
    "Analyze Uber revenue growth in March, June, and September"
)

Added user message to memory: Analyze Uber revenue growth in March, June, and September
=== Calling Function ===
Calling function: query_plan_tool with args: {
  "nodes": [
    {
      "id": 1,
      "query_str": "What is Uber's revenue for March 2022?",
      "tool_name": "march_2022",
      "dependencies": []
    },
    {
      "id": 2,
      "query_str": "What is Uber's revenue for June 2022?",
      "tool_name": "june_2022",
      "dependencies": []
    },
    {
      "id": 3,
      "query_str": "What is Uber's revenue for September 2022?",
      "tool_name": "sept_2022",
      "dependencies": []
    },
    {
      "id": 4,
      "query_str": "Analyze Uber revenue growth in March, June, and September",
      "tool_name": "revenue_growth_analyzer",
      "dependencies": [1, 2, 3]
    }
  ]
}
[1;3;34mExecuting node {"id": 4, "query_str": "Analyze Uber revenue growth in March, June, and September", "tool_name": "revenue_growth_analyzer", "dependencies": [1, 2, 3]}
[0m[1;3;38;5;200m

In [20]:
print(str(response))

Uber's revenue has shown a growth trend from March to June 2022. In March, the revenue was $6.854 billion, which increased to $8.073 billion in June. The exact revenue for September is not specified, but the total revenue for the third quarter, which includes July, August, and September, was $8.343 billion.


In [21]:
response = agent.query(
    "Analyze changes in risk factors in march, june, and september for Uber"
)

Added user message to memory: Analyze changes in risk factors in march, june, and september for Uber
=== Calling Function ===
Calling function: query_plan_tool with args: {
  "nodes": [
    {
      "id": 1,
      "query_str": "What were the risk factors for Uber in March 2022?",
      "tool_name": "march_2022",
      "dependencies": []
    },
    {
      "id": 2,
      "query_str": "What were the risk factors for Uber in June 2022?",
      "tool_name": "june_2022",
      "dependencies": []
    },
    {
      "id": 3,
      "query_str": "What were the risk factors for Uber in September 2022?",
      "tool_name": "sept_2022",
      "dependencies": []
    },
    {
      "id": 4,
      "query_str": "Analyze changes in risk factors in March, June, and September for Uber",
      "tool_name": "risk_analysis_tool",
      "dependencies": [1, 2, 3]
    }
  ]
}
[1;3;34mExecuting node {"id": 4, "query_str": "Analyze changes in risk factors in March, June, and September for Uber", "tool_name": "ri

In [22]:
print(str(response))

In March 2022, Uber's risk factors were largely centered around the impact of COVID-19, the reliance on large metropolitan areas for a significant portion of their Gross Bookings, the potential failure of autonomous vehicle technologies, and the need for additional capital to support business growth. They also faced risks related to data privacy and security breaches, the impact of climate change, and the potential inability to protect their intellectual property.

By June 2022, the risk factors had evolved. The inherent danger of operating motor vehicles became more prominent due to the growth of their Delivery offering. The risk associated with their substantial investments in new offerings and technologies also became more significant. Their dependence on operations outside the United States, particularly in markets where they had limited experience, was another risk factor that emerged. The COVID-19 pandemic continued to pose a significant risk.

In September 2022, the risk factors

In [None]:
# response = agent.query("Analyze both Uber revenue growth and risk factors over march, june, and september")

In [23]:
print(str(response))

In March 2022, Uber's risk factors were largely centered around the impact of COVID-19, the reliance on large metropolitan areas for a significant portion of their Gross Bookings, the potential failure of autonomous vehicle technologies, and the need for additional capital to support business growth. They also faced risks related to data privacy and security breaches, the impact of climate change, and the potential inability to protect their intellectual property.

By June 2022, the risk factors had evolved. The inherent danger of operating motor vehicles became more prominent due to the growth of their Delivery offering. The risk associated with their substantial investments in new offerings and technologies also became more significant. Their dependence on operations outside the United States, particularly in markets where they had limited experience, was another risk factor that emerged. The COVID-19 pandemic continued to pose a significant risk.

In September 2022, the risk factors

In [24]:
response = agent.query(
    "First look at Uber's revenue growth and risk factors in March, "
    + "then revenue growth and risk factors in September, and then compare and"
    " contrast the two documents?"
)

Added user message to memory: First look at Uber's revenue growth and risk factors in March, then revenue growth and risk factors in September, and then compare and contrast the two documents?
=== Calling Function ===
Calling function: query_plan_tool with args: {
  "nodes": [
    {
      "id": 1,
      "query_str": "What is Uber's revenue growth and risk factors in March 2022?",
      "tool_name": "march_2022",
      "dependencies": []
    },
    {
      "id": 2,
      "query_str": "What is Uber's revenue growth and risk factors in September 2022?",
      "tool_name": "sept_2022",
      "dependencies": []
    },
    {
      "id": 3,
      "query_str": "Compare and contrast the revenue growth and risk factors of Uber in March 2022 and September 2022.",
      "tool_name": "comparison_tool",
      "dependencies": [1, 2]
    }
  ]
}
[1;3;34mExecuting node {"id": 3, "query_str": "Compare and contrast the revenue growth and risk factors of Uber in March 2022 and September 2022.", "tool_nam

In [25]:
response

Response(response="In March 2022, Uber's revenue was $6.9 billion, marking a 136% increase year-over-year. This growth was primarily due to a $1.5 billion increase in the Freight business, a favorable impact to revenue of $200 million due to a change in the accounting model for the UK Mobility business, and a favorable comparative impact to the same period in 2021. The risk factors identified during this period were the COVID-19 pandemic, seasonal fluctuations in financial results, and the possibility of slower-than-expected growth. The pandemic reduced global demand for Mobility offerings, and the company's operating results fluctuated due to factors such as seasonal demand changes. If Uber's growth slowed more significantly than expected, it could impact its profitability.\n\nIn contrast, in September 2022, Uber's revenue increased by 72% for the three months ended, and by 99% for the nine months ended, compared to the same periods in 2021. This growth was mainly due to an increase i