<a href="https://colab.research.google.com/github/dhorvath/AI-Stuff/blob/main/LlamaIndex_ReAct_Agent_with_Query_Engine_(RAG)_Tools.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Setup an agent powered by the ReAct loop for financial analysis.
Following the instructions here to test:
https://docs.llamaindex.ai/en/stable/examples/agent/react_agent_with_query_engine/

In [1]:
!pip install llama-index-readers-file
!pip install llama-index-llms-openai
!pip install llama-index-embeddings-openai

Collecting llama-index-readers-file
  Downloading llama_index_readers_file-0.2.0-py3-none-any.whl.metadata (5.4 kB)
Collecting llama-index-core<0.12.0,>=0.11.0 (from llama-index-readers-file)
  Downloading llama_index_core-0.11.1-py3-none-any.whl.metadata (2.4 kB)
Collecting pypdf<5.0.0,>=4.0.1 (from llama-index-readers-file)
  Downloading pypdf-4.3.1-py3-none-any.whl.metadata (7.4 kB)
Collecting striprtf<0.0.27,>=0.0.26 (from llama-index-readers-file)
  Downloading striprtf-0.0.26-py3-none-any.whl.metadata (2.1 kB)
Collecting dataclasses-json (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core<0.12.0,>=0.11.0->llama-index-readers-file)
  Downloading dirtyjson-1.0.8-py3-none-any.w

In [16]:
!pip install textwrap3

Collecting textwrap3
  Downloading textwrap3-0.9.2-py2.py3-none-any.whl.metadata (4.6 kB)
Downloading textwrap3-0.9.2-py2.py3-none-any.whl (12 kB)
Installing collected packages: textwrap3
Successfully installed textwrap3-0.9.2


In [2]:
# Imports
from openai import OpenAI
from google.colab import userdata
from google.colab import files
import os

# API
open_ai_key = userdata.get('open_ai_key')
client = OpenAI(api_key=open_ai_key)

os.environ["OPENAI_API_KEY"] = open_ai_key

In [17]:
# Helper
import textwrap3
def wrap_print(long_text):
  print('\n'.join(textwrap3.wrap(long_text)))

In [3]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)

from llama_index.core.tools import QueryEngineTool, ToolMetadata

In [4]:
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft"
    )
    lyft_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/uber"
    )
    uber_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

In [5]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

--2024-08-26 20:01:59--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘data/10k/uber_2021.pdf’


2024-08-26 20:02:00 (7.55 MB/s) - ‘data/10k/uber_2021.pdf’ saved [1880483/1880483]

--2024-08-26 20:02:00--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [appl

In [6]:
if not index_loaded:
    # load data
    lyft_docs = SimpleDirectoryReader(
        input_files=["./data/10k/lyft_2021.pdf"]
    ).load_data()
    uber_docs = SimpleDirectoryReader(
        input_files=["./data/10k/uber_2021.pdf"]
    ).load_data()

    # build index
    lyft_index = VectorStoreIndex.from_documents(lyft_docs)
    uber_index = VectorStoreIndex.from_documents(uber_docs)

    # persist index
    lyft_index.storage_context.persist(persist_dir="./storage/lyft")
    uber_index.storage_context.persist(persist_dir="./storage/uber")

In [7]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)

In [8]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

# Setup ReAct Agent

In [9]:
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

In [12]:
context = "You are a stock market sorcerer who is an expert on the companies Lyft and Uber. You will answer questions about Uber and Lyft as in the persona of a sorcerer and veteran stock market investor."

llm = OpenAI(model="gpt-3.5-turbo")

agent = ReActAgent.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    # context=context
)

In [20]:
response = agent.chat("What was Lyft's revenue growth in 2021?")
wrap_print(str(response))

> Running step 3ab4da3c-b434-44b3-87ec-3084f9195039. Step input: What was Lyft's revenue growth in 2021?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: lyft_10k
Action Input: {'input': "What was Lyft's revenue growth in 2021?"}
[0m[1;3;34mObservation: Lyft's revenue increased by 36% in 2021 compared to the prior year.
[0m> Running step b9063441-237b-40d5-b16f-9e9968434edb. Step input: None
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer
Answer: Lyft's revenue grew by 36% in 2021 compared to the previous year.
[0mLyft's revenue grew by 36% in 2021 compared to the previous year.


In [19]:
response = agent.chat(
    "Compare and contrast the revenue growth of Uber and Lyft in 2021, then"
    " give an analysis"
)
wrap_print(str(response))

> Running step 81f6d5de-0d18-43d0-8790-c7fbc0318c9f. Step input: Compare and contrast the revenue growth of Uber and Lyft in 2021, then give an analysis
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer
Answer: In 2021, Uber's revenue growth was higher at 57% compared to Lyft's revenue growth of 36%. This indicates that Uber experienced a stronger growth in revenue during that year. The higher revenue growth for Uber could be attributed to various factors such as market expansion, diversification of services, and potentially more aggressive marketing strategies. On the other hand, Lyft's slightly lower revenue growth may be influenced by factors like market competition, regulatory challenges, and specific operational decisions made by the company.
[0mIn 2021, Uber's revenue growth was higher at 57% compared to Lyft's
revenue growth of 36%. This indicates that Uber experienced a stronger
growth in revenue during that year. The hig

In [18]:
response = agent.chat(
    "Can you tell me about the risk factors of the company with the higher"
    " revenue?"
)
wrap_print(str(response))

> Running step 7640231e-0b56-4b81-97bd-4a9b2e845200. Step input: Can you tell me about the risk factors of the company with the higher revenue?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: uber_10k
Action Input: {'input': 'Please provide information on the risk factors of the company with higher revenue in 2021.'}
[0m[1;3;34mObservation: The risk factors of the company with higher revenue in 2021 include the potential inability to accurately forecast operating results due to challenging expense level estimates, fixed expenses that may not be adjusted quickly if revenue falls short, the risk of sustained losses impacting investor value, the possibility of not achieving profitability if growth slows significantly, and the expectation of continued slowing in Gross Bookings and revenue growth rates. Additionally, factors such as the duration and severity of the COVID-19 pandemic, the need to grow supply