This notebook was inpired by this LlamaIndex notebook:

https://colab.research.google.com/drive/1XYNaGvEdyKVbs4g_Maffyq08DUArcW8H?usp=sharing#scrollTo=fQW2ccGlLrg7

Making some changes to it with the only intention of trying ideas and learning.

Notice that I am assuming you have the relevant API_KEYs as environmental variables.

In [2]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import QueryEngineTool, ToolMetadata
import os
import subprocess

## Defining global variables

In [3]:
DATA_DIRS = {}
DATA_DIRS["lyft"] = os.path.join(os.environ["DATA_DIR"], "lyft")
DATA_DIRS["uber"] = os.path.join(os.environ["DATA_DIR"], "uber")
PERSIST_DIRS = {}
PERSIST_DIRS["lyft"] = os.path.join(os.environ["PERSIST_DIR"], "lyft")
PERSIST_DIRS["uber"] = os.path.join(os.environ["PERSIST_DIR"], "uber")

## Download data

In [4]:
for c in ["uber", "lyft"]:
    if not os.path.exists(DATA_DIRS[c]):
        os.mkdir(DATA_DIRS[c])
        command = f"wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/{c}_2021.pdf' -O '{DATA_DIRS[c]}/{c}_2021.pdf'"        
        subprocess.run(command, shell=True)

# Create index and query engine for each company individually

In [5]:
query_engine_dict = {}
for c in ["uber", "lyft"]:
    if not os.path.exists(PERSIST_DIRS[c]):
        print("Creating Index")
        # load the documents and create the index
        documents = SimpleDirectoryReader(DATA_DIRS[c]).load_data()
        index = VectorStoreIndex.from_documents(documents)
        # store it for later
        index.storage_context.persist(persist_dir=PERSIST_DIRS[c])
    else:
        print("Loading Index")
        # load the existing index
        storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIRS[c])
        index = load_index_from_storage(storage_context)
    
    query_engine_dict[c] = index.as_query_engine(similarity_top_k=3)
    

Loading Index
Loading Index


## Define tool

In [6]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=query_engine_dict["lyft"],
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=query_engine_dict["uber"],
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

## Agent

In [7]:
llm_gpt4 = OpenAI(model="gpt-4")
gpt4_agent_worker = FunctionCallingAgentWorker.from_tools(query_engine_tools, llm=llm_gpt4, verbose=True, allow_parallel_tool_calls=False)
gpt4_agent = AgentRunner(gpt4_agent_worker)
response = gpt4_agent.chat("Compare the revenue growth of Uber and Lyft in 2021.")

Added user message to memory: Compare the revenue growth of Uber and Lyft in 2021.


=== Calling Function ===
Calling function: uber_10k with args: {"input": "What was Uber's revenue growth in 2021?"}
=== Function Output ===
Uber's revenue growth in 2021 was 57%.
=== Calling Function ===
Calling function: lyft_10k with args: {"input": "What was Lyft's revenue growth in 2021?"}
=== Function Output ===
Lyft's revenue increased by 36% in 2021 compared to the prior year.
=== LLM Response ===
In 2021, Uber's revenue growth was 57%, while Lyft's revenue increased by 36%. Therefore, Uber had a higher revenue growth compared to Lyft in the same year.
