# Context-Augmented OpenAI Agent

In this tutorial, we show you how to use our `ContextRetrieverOpenAIAgent` implementation
to build an agent on top of OpenAI's function API and store/index an arbitrary number of tools. Our indexing/retrieval modules help to remove the complexity of having too many functions to fit in the prompt.

## Initial Setup 

Here we setup a ContextRetrieverOpenAIAgent. This agent will perform retrieval first before calling any tools. This can help ground the agent's tool picking and answering capabilities in context.

In [5]:
import json
from typing import Sequence

from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.tools import QueryEngineTool, ToolMetadata

In [6]:
try:
    storage_context = StorageContext.from_defaults(persist_dir="./storage/march")
    march_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(persist_dir="./storage/june")
    june_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(persist_dir="./storage/sept")
    sept_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

In [7]:
# build indexes across the three data sources

if not index_loaded:
    # load data
    march_docs = SimpleDirectoryReader(
        input_files=["../data/10q/uber_10q_march_2022.pdf"]
    ).load_data()
    june_docs = SimpleDirectoryReader(
        input_files=["../data/10q/uber_10q_june_2022.pdf"]
    ).load_data()
    sept_docs = SimpleDirectoryReader(
        input_files=["../data/10q/uber_10q_sept_2022.pdf"]
    ).load_data()

    # build index
    march_index = VectorStoreIndex.from_documents(march_docs)
    june_index = VectorStoreIndex.from_documents(june_docs)
    sept_index = VectorStoreIndex.from_documents(sept_docs)

    # persist index
    march_index.storage_context.persist(persist_dir="./storage/march")
    june_index.storage_context.persist(persist_dir="./storage/june")
    sept_index.storage_context.persist(persist_dir="./storage/sept")

In [8]:
march_engine = march_index.as_query_engine(similarity_top_k=3)
june_engine = june_index.as_query_engine(similarity_top_k=3)
sept_engine = sept_index.as_query_engine(similarity_top_k=3)

In [9]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=march_engine,
        metadata=ToolMetadata(
            name="uber_march_10q",
            description="Provides information about Uber 10Q filings for March 2022. "
            "Use a detailed plain text question as input to the tool.",
        ),
    ),
    QueryEngineTool(
        query_engine=june_engine,
        metadata=ToolMetadata(
            name="uber_june_10q",
            description="Provides information about Uber financials for June 2021. "
            "Use a detailed plain text question as input to the tool.",
        ),
    ),
    QueryEngineTool(
        query_engine=sept_engine,
        metadata=ToolMetadata(
            name="uber_sept_10q",
            description="Provides information about Uber financials for Sept 2021. "
            "Use a detailed plain text question as input to the tool.",
        ),
    ),
]

### Try Context-Augmented Agent

Here we augment our agent with context in different settings:
- toy context: we define some abbreviations that map to financial terms (e.g. R=Revenue). We supply this as context to the agent

In [10]:
from llama_index.schema import Document
from llama_index.agent import ContextRetrieverOpenAIAgent

In [28]:
# toy index - stores a list of abbreviations
texts = [
    "Abbrevation: X = Revenue",
    "Abbrevation: YZ = Risk Factors",
    "Abbreviation: Z = Costs",
]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)

In [29]:
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    query_engine_tools, context_index.as_retriever(similarity_top_k=1), verbose=True
)

In [30]:
response = context_agent.chat("What is the YZ of March 2022?")

[33;1m[1;3mContext information is below.
---------------------
Abbrevation: YZ = Risk Factors
---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: What is the YZ of March 2022?

[0m=== Calling Function ===
Calling function: uber_march_10q with args: {
  "input": "Risk Factors"
}
Got output: 
•The COVID-19 pandemic and the impact of actions to mitigate the pandemic have adversely affected and may continue to adversely affect parts of our business.
•Our business would be adversely affected if Drivers were classified as employees, workers or quasi-employees instead of independent contractors.
•The mobility, delivery, and logistics industries are highly competitive, with well-established and low-cost alternatives that have been available for decades, low barriers to entry, low switching costs, and well-capitalized competitors in nearly every major geographic region.
•To remain competitive in certain marke

In [31]:
print(str(response))

The risk factors for Uber in March 2022 include:

1. The adverse impact of the COVID-19 pandemic and actions taken to mitigate it on Uber's business.
2. The potential adverse effect on Uber's business if drivers are classified as employees instead of independent contractors.
3. Intense competition in the mobility, delivery, and logistics industries, with low-cost alternatives and well-capitalized competitors.
4. The need to lower fares, offer driver incentives, and provide consumer discounts and promotions to remain competitive in certain markets.
5. Uber's history of significant losses and the expectation of increased operating expenses in the future, which may affect profitability.
6. The importance of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers to keep the platform appealing.
7. The significance of maintaining and enhancing Uber's brand and reputation, as negative publicity could harm the business.
8. The potential impact of ec

In [None]:
context_agent.chat("What is the X and Z in September 2022?")

### Use Uber 10-Q as context, use Calculator as Tool

In [11]:
from llama_index.tools import BaseTool, FunctionTool


def magic_formula(revenue: int, cost: int) -> int:
    """Runs MAGIC_FORMULA on revenue and cost."""
    return revenue - cost


magic_tool = FunctionTool.from_defaults(fn=magic_formula, name="magic_formula")

In [12]:
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    [magic_tool], sept_index.as_retriever(similarity_top_k=3), verbose=True
)

In [13]:
response = context_agent.chat("Can you run MAGIC_FORMULA on Uber's revenue and cost?")

[33;1m[1;3mContext information is below.
---------------------
Three Months Ended September 30, Nine Months Ended September 30,
2021 2022 2021 2022
Revenue 100 % 100 % 100 % 100 %
Costs and expenses
Cost of revenue, exclusive of depreciation and amortization shown separately
below 50 % 62 % 53 % 62 %
Operations and support 10 % 7 % 11 % 8 %
Sales and marketing 24 % 14 % 30 % 16 %
Research and development 10 % 9 % 13 % 9 %
General and administrative 13 % 11 % 15 % 10 %
Depreciation and amortization 4 % 3 % 6 % 3 %
Total costs and expenses 112 % 106 % 128 % 107 %
Loss from operations (12)% (6)% (28)% (7)%
Interest expense (3)% (2)% (3)% (2)%
Other income (expense), net (38)% (6)% 16 % (34)%
Loss before income taxes and income (loss) from equity method
investments (52)% (14)% (16)% (43)%
Provision for (benefit from) income taxes (2)% 1 % (3)% — %
Income (loss) from equity method investments — % — % — % — %
Net loss including non-controlling interests (50)% (14)% (12)% (42)%
Less: net in

In [14]:
print(response)

The result of running MAGIC_FORMULA on Uber's revenue and cost is -1690.
