This notebook was inpired by [this LlamaIndex notebook](https://colab.research.google.com/drive/1XYNaGvEdyKVbs4g_Maffyq08DUArcW8H?usp=sharing#scrollTo=fQW2ccGlLrg7)

Making some changes to it with the only intention of trying ideas and learning.

Notice that I am assuming you have the relevant API_KEYs as environmental variables.

In [2]:
from bubls.utils.data.download import download_file_from_url
from bubls.utils.indexing import create_index_from_path
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import QueryEngineTool, ToolMetadata
import os
# from llama_parse import LlamaParse
# import nest_asyncio
# nest_asyncio.apply()

## Defining global variables

In [3]:
METADATA = {
    "lyft_10k": {
        "source_url": "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf",
        "file_name": "lyft_10k_2021.pdf",
        "save_data_to": os.path.join(os.environ["DATA_DIR"], "lyft_10k"),
        "persist_index_to": os.path.join(os.environ["PERSIST_DIR"], "lyft_10k"),
        },
    "uber_10k": {
        "source_url": "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf",
        "file_name": "uber_10k_2021.pdf",
        "save_data_to": os.path.join(os.environ["DATA_DIR"], "uber_10k"),
        "persist_index_to": os.path.join(os.environ["PERSIST_DIR"], "uber_10k"),
        },
}

## Ingest Data
- Download Information
- Create&Persist or Load Index 
- Create Query Engine


In [4]:
# parser = LlamaParse(result_type="markdown")

In [5]:
query_engine_dict = {}
for k, md in METADATA.items():
    download_file_from_url(md["source_url"], md["file_name"], md["save_data_to"])
    index = create_index_from_path(
        md["persist_index_to"],
        md["save_data_to"],
        # {".pdf": parser}
    )
    query_engine_dict[k] = index.as_query_engine(similarity_top_k=3)

Loading Index
Loading Index


## Define tool

In [6]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=query_engine_dict[k],
        metadata=ToolMetadata(
            name=k,
            description=(
                f"Provides information about {k} financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    )
    for k in METADATA.keys()
]

## Agent

In [7]:
llm_gpt4 = OpenAI(model="gpt-4")
gpt4_agent = ReActAgent.from_tools(query_engine_tools, llm=llm_gpt4, verbose=True)
response = gpt4_agent.chat("Compare the revenue growth of Uber and Lyft in 2021.")

[1;3;38;5;200mThought: To compare the revenue growth of Uber and Lyft in 2021, I need to use the uber_10k and lyft_10k tools to get the financial data for each company.
Action: uber_10k
Action Input: {'input': "What was Uber's revenue growth in 2021?"}
[0m[1;3;34mObservation: Uber's revenue grew by 57% in 2021.
[0m[1;3;38;5;200mThought: I have the revenue growth for Uber. Now I need to get the revenue growth for Lyft in 2021.
Action: lyft_10k
Action Input: {'input': "What was Lyft's revenue growth in 2021?"}
[0m[1;3;34mObservation: Lyft's revenue increased by 36% in 2021 compared to the prior year.
[0m[1;3;38;5;200mThought: I can answer without using any more tools. I have the revenue growth for both Uber and Lyft in 2021.
Answer: In 2021, Uber's revenue grew by 57%, while Lyft's revenue increased by 36%. Therefore, Uber had a higher revenue growth compared to Lyft in 2021.
[0m