<a href="https://colab.research.google.com/github/SARA3SAEED/LLM-2/blob/main/40_Building_Advanced_LLM_Applications_Module_5_Agentic_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Dive Agentic Retrieval Augmented Generation
**Credits: Sai Panyam**

An AI Agent is required when we use reasoning to determine which action(s) to take and in which order to take them. Essentially we use agents instead of a LLM directly to accomplish a set of tasks which requires planning, multi step reasoning, tool use and/or learning over time. Agents give us agency!!!

Agency : The ability to take action or to choose what action to take

In the context of RAG, we can plug in agents to enhance the reasoning prior to selection of RAG pipelines, within a RAG pipeline for retrieval or reranking and finally for synthesising before we send out the response. This improves RAG to a large extent by automating complex workflows and decisions that are required for a non trivial RAG use case.

This notebook brings together the implementations of various techniques for Agentic RAG using LlamaIndex V 0.10.5


In [None]:
!pip install llama-index -q
!pip install llama-index-tools-wolfram-alpha -q
!pip install langchain -q
!pip install langchain_experimental -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.5/15.5 MB[0m [31m77.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m60.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m169.4/169.4 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m360.7/360.7 kB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.8/295.8 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Setup LLM
For now let us use OpenAI

In [None]:
import os
import nest_asyncio

# This is needed for jupyter notebook to do asynchronous rendering
nest_asyncio.apply()

# Set the OpenAI API key as an environment variable
os.environ["OPENAI_API_KEY"] = "sk-YOUR-OPENAI-API-KEY"

# Verify that the key is set (optional)
print(os.getenv("OPENAI_API_KEY"))

sk-YOUR-OPENAI-API-KEY


In [None]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
# Setup OpenAI Model and Embeddings used for indexing the documents
Settings.llm = OpenAI(model='gpt-4-0125-preview', temperature=0.2)
Settings.embed_model = OpenAIEmbedding(model='text-embedding-3-small')
Settings.chunk_size = 1024

## Load Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Download Data
data_dir = '/content/drive/MyDrive' # Input a data dir path from your mounted Google Drive

os.makedirs(f'{data_dir}/RAG/data/paul_graham/', exist_ok=True)
os.makedirs(f'{data_dir}/RAG/data/10k/', exist_ok=True)

!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O '{data_dir}/RAG/data/paul_graham/paul_graham_essay.txt'

--2024-08-10 14:08:14--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘/content/drive/MyDrive/RAG/data/paul_graham/paul_graham_essay.txt’


2024-08-10 14:08:14 (6.01 MB/s) - ‘/content/drive/MyDrive/RAG/data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



## Data Source
Expedia Group 10k 2023
https://d18rn0p25nwr6d.cloudfront.net/CIK-0001324424/1d1b7ef4-fd87-4efa-a8fd-f728746142d1.pdf

Booking.com 10k 2023
https://d18rn0p25nwr6d.cloudfront.net/CIK-0001075531/31c876e5-f44f-4645-8757-e2b828c23357.pdf

Uber 10k 2023
https://d18rn0p25nwr6d.cloudfront.net/CIK-0001543151/6fabd79a-baa9-4b08-84fe-deab4ef8415f.pdf

Lyft 10k 2023
https://d18rn0p25nwr6d.cloudfront.net/CIK-0001759509/d576a7f4-780c-4f39-86a6-aa54b03fa2ec.pdf


**Saved the above documents to 10k folder using these names: expedia_10k_2023.pdf, booking_10k_2023.pdf, uber_10k_2023.pdf, and lyft_10k_2023.pdf. **

In [None]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core import StorageContext, load_index_from_storage

# In order to avoid repeated calls to LLMs we can store the documents index and load it if present else create it
PERSIST_INDEX_DIR = f"/{data_dir}/RAG/storage/"
def get_index(index_name, doc_file_path):
  index = None
  if not os.path.exists(f"{PERSIST_INDEX_DIR}{index_name}/"):
    # Load the documents
    documents = SimpleDirectoryReader(input_files=[doc_file_path]).load_data()
    index = VectorStoreIndex.from_documents(documents)
    # Store the index to disk
    index.storage_context.persist(f"{PERSIST_INDEX_DIR}{index_name}/")
  else: # Load index from disk
    storage_context = StorageContext.from_defaults(persist_dir=f"{PERSIST_INDEX_DIR}{index_name}/")
    index = load_index_from_storage(storage_context)

  return index


In [None]:
# Load Paul Graham Documents
docs_paul_graham = SimpleDirectoryReader(f"{data_dir}/RAG/data/paul_graham/").load_data()

## Indexing Data

In [None]:
from llama_index.core import VectorStoreIndex, SummaryIndex
# For paul graham we initialise a storage context and use that for both Vector Index and Summary Index
pg_nodes = Settings.node_parser.get_nodes_from_documents(docs_paul_graham)
pg_storage_context = StorageContext.from_defaults()
pg_storage_context.docstore.add_documents(pg_nodes)

# Setup Paul Graham Vector and Summary Index from Storage Context
pg_summary_index = SummaryIndex(pg_nodes, storage_context=pg_storage_context)
pg_vector_index = VectorStoreIndex(pg_nodes, storage_context=pg_storage_context)

# Setup Uber and Lyft Vector Indices
uber_index = get_index("uber_10k",f"{data_dir}/RAG/data/10k/uber_10k_2023.pdf")
lyft_index = get_index("lyft_10k",f"{data_dir}/RAG/data/10k/lyft_10k_2023.pdf")
expedia_index = get_index("expedia_10k", f"{data_dir}/RAG/data/10k/expedia_10k_2023.pdf")
booking_index = get_index("booking_10k", f"{data_dir}/RAG/data/10k/booking_10k_2023.pdf")

## Usage Patterns
Usage patterns for Agents in a RAG context comprises of the following:

-- Use an existing RAG pipeline as a tool by an agent

-- Use an agent itself as a RAG tool

-- Use an agent to retrieve tools from a RAG (Vector index) at query time using a provided context.

-- Use an agent to do query planning over a set of existing tools

-- Use an agent to select a tool from candidate tools which have been retrieved from a pool of tools using RAG.(This is especially useful when we have a large set of tools to select from)

One can also mix and match the above usage patterns to realise a complex RAG application.

RAG Agents can be further classified based on function. They can be used for routing, one-shot query planning, tool use, reason + act (ReAct) and dynamic planning and execution. These range from simple, low cost and low latency to complex, high cost and high latency.



## Routers
Routing agent essentially uses an LLM to pick what downstream RAG pipeline to pick. This is agentic reasoning as it uses an LLM to reason about what RAG pipeline to pick based on the input query. This is the simplest form of agentic reasoning.


Another class of routing is to select between Summarization and Question Answering RAG pipelines. Based on the input query the agent reasons about routing to the Summary query engine or the Vector Query Engine that are configured as tools.

### Use Case: Query Engine Routing
Use a selector to selct between different query engines (data sources)

In [None]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
# Create the query engines
expedia_engine = expedia_index.as_query_engine(similarity_top_k=3)
booking_engine = booking_index.as_query_engine(similarity_top_k=3)

uber_engine = uber_index.as_query_engine(similarity_top_k=3)
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)

expedia_query_tool = QueryEngineTool(
                      query_engine=expedia_engine,
                      metadata=ToolMetadata(
                          name="expedia_10k",
                          description="Provides information about Expedia group's 10k filing for year 2023"
                      )
                    )

booking_query_tool = QueryEngineTool(
                      query_engine=booking_engine,
                      metadata=ToolMetadata(
                          name="booking_10k",
                          description="Provides information about Booking 10k filing for year 2023"
                      )
                    )
uber_query_tool = QueryEngineTool(
                      query_engine=uber_engine,
                      metadata=ToolMetadata(
                          name="uber_10k",
                          description="Provides information about Uber's 10k filing for year 2023"
                      )
                    )
lyft_query_tool = QueryEngineTool(
                      query_engine=lyft_engine,
                      metadata=ToolMetadata(
                          name="lyft_10k",
                          description="Provides information about Lyft's 10k filing for year 2023"
                      )
                    )

tools = [expedia_query_tool, booking_query_tool, uber_query_tool, lyft_query_tool]

filing_10k_engine = RouterQueryEngine(
                      selector= LLMSingleSelector.from_defaults(),
                      query_engine_tools=tools
                      )

In [None]:
expedia_query = "What is the income statement of Expedia for year 2023?"
booking_query = "What is the income statement of booking.com for year 2023?"
uber_query = "What is the income statement of Uber for year 2023?"
lyft_query = "What is the income statement of Lyft for year 2023?"

In [None]:
# Test individual query engines for output
# Expedia and Booking.com
response = expedia_engine.query(expedia_query)
print (response)
print("\n.------------------.\n")
response = booking_engine.query(booking_query)
print (response)
print("\n.------------------.\n")


The income statement of Expedia for the year 2023 is as follows:

- Revenue: $12,839 million
- Costs and expenses:
  - Cost of revenue (exclusive of depreciation and amortization shown separately below): $1,573 million
  - Selling and marketing - direct: $6,107 million
  - Selling and marketing - indirect: $756 million
  - Technology and content: $1,358 million
  - General and administrative: $771 million
  - Depreciation and amortization: $807 million
  - Impairment of goodwill: $297 million
  - Intangible and other long-term asset impairment: $129 million
  - Legal reserves, occupancy tax, and other: $8 million
- Operating income: $1,033 million
- Other income (expense):
  - Interest income: $207 million
  - Interest expense: $(245) million
  - Gain on sale of business, net: $25 million
  - Other, net: $(2) million
- Total other expense, net: $(15) million
- Income before income taxes: $1,018 million
- Provision for income taxes: $(330) million
- Net income: $688 million
- Net income

In [None]:
# Uber and Lyft
response = uber_engine.query(uber_query)
print (response)
print("\n.------------------.\n")
response = lyft_engine.query(lyft_query)
print (response)

The income statement of Uber for the year 2023 is as follows:

- Revenue: $37,281 million
- Costs and expenses:
  - Cost of revenue, exclusive of depreciation and amortization: $22,457 million
  - Operations and support: $2,689 million
  - Sales and marketing: $4,356 million
  - Research and development: $3,164 million
  - General and administrative: $2,682 million
  - Depreciation and amortization: $823 million
  - Total costs and expenses: $36,171 million
- Income from operations: $1,110 million
- Interest expense: $(633) million
- Other income (expense), net: $1,844 million
- Income before income taxes and income from equity method investments: $2,321 million
- Provision for (benefit from) income taxes: $213 million
- Income from equity method investments: $48 million
- Net income including non-controlling interests: $2,156 million
- Less: net income attributable to non-controlling interests, net of tax: $269 million
- Net income attributable to Uber Technologies, Inc.: $1,887 milli

In [None]:
# Now use the Router Qery engine to route to specific query engines
response = filing_10k_engine.query(expedia_query)
print (response)
print("\n.------------------.\n")
response = filing_10k_engine.query(booking_query)
print (response)
print("\n.------------------.\n")


The income statement of Expedia for the year ended December 31, 2023, is as follows:

- Revenue: $12,839 million
- Costs and expenses:
  - Cost of revenue (exclusive of depreciation and amortization shown separately below): $1,573 million
  - Selling and marketing - direct: $6,107 million
  - Selling and marketing - indirect: $756 million
  - Technology and content: $1,358 million
  - General and administrative: $771 million
  - Depreciation and amortization: $807 million
  - Impairment of goodwill: $297 million
  - Intangible and other long-term asset impairment: $129 million
  - Legal reserves, occupancy tax, and other: $8 million
- Operating income: $1,033 million
- Other income (expense):
  - Interest income: $207 million
  - Interest expense: $(245) million
  - Gain on sale of business, net: $25 million
  - Other, net: $(2) million
  - Total other expense, net: $(15) million
- Income before income taxes: $1,018 million
- Provision for income taxes: $(330) million
- Net income: $68

In [None]:
response = filing_10k_engine.query(uber_query)
print (response)
print("\n.------------------.\n")
response = filing_10k_engine.query(lyft_query)
print (response)

The income statement of Uber for the year 2023 is as follows:

- Revenue: $37,281 million
- Costs and expenses:
  - Cost of revenue, exclusive of depreciation and amortization: $22,457 million
  - Operations and support: $2,689 million
  - Sales and marketing: $4,356 million
  - Research and development: $3,164 million
  - General and administrative: $2,682 million
  - Depreciation and amortization: $823 million
  - Total costs and expenses: $36,171 million
- Income from operations: $1,110 million
- Interest expense: ($633 million)
- Other income (expense), net: $1,844 million
- Income before income taxes and income from equity method investments: $2,321 million
- Provision for income taxes: $213 million
- Income from equity method investments: $48 million
- Net income including non-controlling interests: $2,156 million
- Less: net income attributable to non-controlling interests, net of tax: $269 million
- Net income attributable to Uber Technologies, Inc.: $1,887 million

.----------

### Use Case : Joint QA & Summarization Router

Deciding whether to do summarization using summary index query engine or semantic search using vector index query engine

In [None]:
# Define the Summary and Vector query engines for Paul Graham text corpa

summary_query_engine = pg_summary_index.as_query_engine(response_mode= "tree_summarize")
vector_query_engine = pg_vector_index.as_query_engine()

# Now Create the query engine tools from the above query engines
summary_tool = QueryEngineTool(
                query_engine=summary_query_engine,
                metadata=ToolMetadata(
                    name="Paul_Graham_Summary",
                    description="Summarizes the paul graham essay"
                    )
                )
vector_tool = QueryEngineTool(
                query_engine=vector_query_engine,
                metadata = ToolMetadata(
                    name="Paul_Graham_QA",
                    description="Retrieves answers for questions from paul graham essay"
                    )
                )
pg_tools = [summary_tool, vector_tool]
# Now define the Router Query Engine
paul_graham_engine = RouterQueryEngine(
                      selector= LLMSingleSelector.from_defaults(),
                      query_engine_tools=pg_tools
                      )

In [None]:
# Summarization routing
response = paul_graham_engine.query("Summarize the information contained in the paul graham essay")
print (response)

The essay by Paul Graham provides a comprehensive and personal account of his journey through various phases of his career, highlighting his experiences, insights, and the evolution of his work and interests. It begins with his early interests in writing and programming, leading to his exploration of Lisp and the creation of Viaweb, which was later acquired by Yahoo. Graham discusses the founding of Y Combinator (YC), a pioneering startup accelerator, and its impact on the startup ecosystem, including the introduction of the batch model and the focus on founder-friendly practices. He also touches on the development of Hacker News as a community platform and his return to programming with the creation of a new Lisp dialect called Bel. Throughout the essay, Graham reflects on the themes of discovery, the importance of working on what genuinely interests you, and the value of pursuing less prestigious or conventional paths to find real innovation and fulfillment. The narrative is interspe

In [None]:
# Confirm that the Summary Engine was used
print (response.metadata["selector_result"])

selections=[SingleSelection(index=0, reason='This option directly requests a summary of the content in the Paul Graham essay, which aligns with the task of summarizing the information contained within it.')]


In [None]:
# Now route to a Question Answer engine
response = paul_graham_engine.query("What did paul graham do at Y Combinator from the paul graham essay")
print (response)

Paul Graham co-founded Y Combinator (YC), initially funding it with his own money. He was instrumental in conceptualizing and implementing the unique batch model of funding startups, where a group of startups would be funded all at once, twice a year, and then intensely mentored over three months. This model was a departure from traditional venture capital practices and was designed to provide startups with the early-stage support and resources they needed, much like the support Graham himself had received from Julian. Graham also played a key role in the operational aspects of YC, using his own building in Cambridge as the headquarters and organizing weekly dinners where experts on startups were brought in to give talks. Additionally, he was responsible for the creation of the Summer Founders Program, aimed at encouraging undergraduates to start their own companies over the summer, which led to YC receiving a significant number of applications and ultimately broadening its scope beyon

In [None]:
# Confirm that Vector Engine was used.
print (response.metadata["selector_result"])

selections=[SingleSelection(index=1, reason='This choice directly involves retrieving specific information or answers from a Paul Graham essay, which aligns with the request for details about what Paul Graham did at Y Combinator.')]


### Agentic Router RAG
--We will use LlamaIndex inbuilt OpenAI Agent which uses the query engines as tools

-- We will also use the new OpenAIAssistant to do the same thing

In [None]:
from llama_index.agent.openai import OpenAIAgent
agent = OpenAIAgent.from_tools(tools=tools, verbose=True)
# Uncomment and use the below call for interactive session
# agent.chat_repl()
response = agent.chat("What is the revenue growth of expedia in 2023?")
print (response)

Added user message to memory: What is the revenue growth of expedia in 2023?
=== Calling Function ===
Calling function: expedia_10k with args: {"input":"revenue growth"}
Got output: The revenue growth for the year ended December 31, 2023, compared to 2022, was 10%.

The revenue growth of Expedia in 2023 compared to 2022 was 10%.


In [None]:
from llama_index.agent.openai import OpenAIAssistantAgent
agent = OpenAIAssistantAgent.from_new(
          name = "10K Filing QA Assistant",
          instructions= "You are an assistant that provides answers to questions from 10k Filings",
          tools=tools,
          verbose=True,
          run_retrieve_sleep_time=1.0
        )
response = agent.chat("What is the revenue growth of booking in 2023?")
print (response)

=== Calling Function ===
Calling function: booking_10k with args: {"input":"revenue growth"}
Got output: The company expects year-over-year growth in revenues to be between 11% and 13% for the first quarter of 2024. For the full year, the anticipated growth in revenues is projected to be similar to the growth in gross bookings, which is expected to be slightly higher than 7%.
The revenue growth of Booking for the year 2023 is expected to be similar to the growth in gross bookings, which is anticipated to be slightly higher than 7%. Additionally, for the first quarter of 2024, the year-over-year growth in revenues is expected to be between 11% and 13%.


In [None]:
# Paul Graham Joint QA and Summarization Agent
agent = OpenAIAssistantAgent.from_new(
          name = "Paul Graham Assistant",
          instructions= "You are an assistant that provides answers to questions and also summary of Paul Graham essay",
          tools=pg_tools,
          verbose=True,
          run_retrieve_sleep_time=1.0
        )
response = agent.chat("Summarize the information contained in the paul graham What I worked on essay.")
print (response)
print("\n.------------------.\n")
response = agent.chat("What are some personal anecdotes from paul graham essay")
print (response)

=== Calling Function ===
Calling function: Paul_Graham_Summary with args: {"input":"What I worked on"}
Got output: Before college, the focus was primarily on writing and programming, with an initial attempt at writing short stories and programming on an IBM 1401 using punch cards. This early exposure to programming, despite the limitations of technology at the time, laid the groundwork for a future in technology. The transition to microcomputers, specifically a TRS-80, marked a significant shift, allowing for more direct interaction with computing processes and the development of simple games, a model rocket prediction program, and a word processor.

College years brought a pivot towards philosophy, but a growing interest in AI, influenced by literature and documentaries, eventually led to a switch in focus. Self-teaching in Lisp and an undergraduate thesis on reverse-engineering SHRDLU were pivotal moments, highlighting a deepening engagement with programming and AI. Post-college year

In [None]:
# Alternatively we can also define a Query Engine tool that uses the filling_10k_engine
# Which will internally route to specific query engines based on the query
filing_10k_tool = QueryEngineTool(
                    query_engine=filing_10k_engine,
                    metadata=ToolMetadata(
                        name="Filing_10k",
                        description="Selects the appropriate company filing query engine based on the query and executes it"
                    )
                  )
agent = OpenAIAssistantAgent.from_new(
          name = "10K Filing Assistant",
          instructions= "You are an assistant that provides answers to questions from 10k Filings",
          tools=[filing_10k_tool],
          verbose=True,
          run_retrieve_sleep_time=1.0
        )
response = agent.chat("What is the revenue growth of expedia in 2023?")
print (response)
print ("\n---------------------------\n")
response = agent.chat("What is the revenue growth of booking.com in 2023?")
print (response)

=== Calling Function ===
Calling function: Filing_10k with args: {"input":"expedia revenue growth 2023"}
Got output: Expedia's total revenue grew by 10% in 2023 compared to 2022.
Expedia's revenue growth in 2023 was 10% compared to 2022.

---------------------------

=== Calling Function ===
Calling function: Filing_10k with args: {"input":"booking.com revenue growth 2023"}
Got output: Booking.com's revenue growth in 2023 was 25.0%, with total revenues increasing to $21,365 million from $17,090 million in 2022.
Booking.com experienced a revenue growth of 25.0% in 2023, with total revenues increasing to $21,365 million from $17,090 million in 2022.


## One shot Query Planning
Query Planning agent breaks down a complex query in to parallelizable sub queries.
Each sub query can then possibly executed against a set of RAG piplines based on different data sources.
The resulting responses from each RAG pipeline is then synthesised in to the final response.

In [None]:
from llama_index.core.query_engine import SubQuestionQueryEngine
sub_question_query = "Compare and contrasts the revenue and expenses of expedia and booking.com for 2023 and provide an in depth analysis"
query_planning_engine = SubQuestionQueryEngine.from_defaults(
                          query_engine_tools=tools,
                          use_async=True
                        )
response = query_planning_engine.query(sub_question_query)
print (response)

Generated 8 sub questions.
[1;3;38;2;237;90;200m[expedia_10k] Q: What is the total revenue of Expedia for 2023?
[0m[1;3;38;2;90;149;237m[booking_10k] Q: What is the total revenue of Booking.com for 2023?
[0m[1;3;38;2;11;159;203m[expedia_10k] Q: What are the major expenses for Expedia in 2023?
[0m[1;3;38;2;155;135;227m[booking_10k] Q: What are the major expenses for Booking.com in 2023?
[0m[1;3;38;2;237;90;200m[expedia_10k] Q: How does Expedia's revenue growth in 2023 compare to previous years?
[0m[1;3;38;2;90;149;237m[booking_10k] Q: How does Booking.com's revenue growth in 2023 compare to previous years?
[0m[1;3;38;2;11;159;203m[expedia_10k] Q: What is the net income of Expedia for 2023?
[0m[1;3;38;2;155;135;227m[booking_10k] Q: What is the net income of Booking.com for 2023?
[0m[1;3;38;2;90;149;237m[booking_10k] A: The total revenue of Booking.com for 2023 is $21,365 million.
[0m[1;3;38;2;11;159;203m[expedia_10k] A: The net income of Expedia for 2023 is $688 millio

### Agentic Query Planning RAG

In [None]:
filing_10k_query_planning_tool = QueryEngineTool(
                                  query_engine=query_planning_engine,
                                  metadata=ToolMetadata(
                                      name="Filing_10k_query_planner",
                                      description="""It first breaks down the complex query into sub questions for each relevant data source,
                                                  then gather all the intermediate reponses and synthesizes a final response"""
                                  )

)
#
"""You are an assistant that tackle the problem of answering a complex query using multiple 10k Filings data sources."""
agent = OpenAIAssistantAgent.from_new(
          name = "10K Filing Query Planner",
          instructions= """You are a veteran stock market investor who is an expert analysing companies's annual 10k filings.
          You will answer questions in the persona of a veteran stock market investor.""",
          tools=[filing_10k_query_planning_tool],
          verbose=True,
          run_retrieve_sleep_time=1.0
        )
response = agent.chat(sub_question_query)
print (response)

=== Calling Function ===
Calling function: Filing_10k_query_planner with args: {"input": "Expedia 2023 revenue and expenses"}
Generated 2 sub questions.
[1;3;38;2;237;90;200m[expedia_10k] Q: What is the total revenue of Expedia in 2023?
[0m[1;3;38;2;90;149;237m[expedia_10k] Q: What are the total expenses of Expedia in 2023?
[0m[1;3;38;2;237;90;200m[expedia_10k] A: The total revenue of Expedia in 2023 is $12,839 million.
[0m[1;3;38;2;90;149;237m[expedia_10k] A: The total expenses of Expedia in 2023 were $11,806 million. This figure is derived by adding the costs and expenses listed in the consolidated statements of operations for the year ended December 31, 2023, which include cost of revenue ($1,573 million), selling and marketing - direct ($6,107 million), selling and marketing - indirect ($756 million), technology and content ($1,358 million), general and administrative ($771 million), depreciation and amortization ($807 million), impairment of goodwill ($297 million), intangi

## Tool Use
Normal RAG a query is just passed in to get the top k documents semantically matching the query. However there are times when we need to get data from an external API, a SQL Database or an application that exposes an API, which can then be used as aditional context to the input query before sending it to the LLM. In such scenarios the agent can use a RAG toolspec.

In [None]:
from llama_index.tools.wolfram_alpha  import WolframAlphaToolSpec
from llama_index.core.tools.tool_spec.load_and_search import LoadAndSearchToolSpec

# V8289A-7LJ3YW4KVX
wolfram_alpha_spec = WolframAlphaToolSpec(app_id="V8289A-7LJ3YW4KVX")
tools = wolfram_alpha_spec.to_tool_list()
# Create the Agent with our tools
agent = OpenAIAgent.from_tools(tools, verbose=True)

response = agent.query("What is the value of pi to 6 decimal places")
print (response)

Added user message to memory: What is the value of pi to 6 decimal places
=== Calling Function ===
Calling function: wolfram_alpha_query with args: {"query":"value of pi to 6 decimal places"}
Got output: 3.141593

The value of pi to 6 decimal places is 3.141593.


## ReAct : Reason and Act
The next step up is to add some sort of reasoning and actions which are executed in a loop over a complex query. Essentially it is a superset of Routing, Query Planning and Tool Use all rolled in to one. A ReAct agent can handle sequential multi part query and keep state (in memory).


In [None]:
from llama_index.core.agent import ReActAgent
filing_10k_query_engine_tools = [expedia_query_tool, booking_query_tool, uber_query_tool, lyft_query_tool]
agent = ReActAgent.from_tools(
            tools= filing_10k_query_engine_tools,
            verbose=True,
            context="""You are a veteran stock market investor who is an expert analysing companies's annual 10k filings.
          You will answer questions in the persona of a veteran stock market investor."""
          )
response = agent.query("Compare and contrast the revenue and expenses of expedia and booking holdings for 2023 and provide an in depth analysis")
print (response)



> Running step c85e7720-d3bb-4b7e-94f5-3644eba6a68a. Step input: Compare and contrast the revenue and expenses of expedia and booking holdings for 2023 and provide an in depth analysis
[1;3;38;5;200mThought: The current language of the user is English. I need to use tools to gather the revenue and expenses information for Expedia and Booking Holdings for the year 2023 to perform an in-depth analysis.
Action: expedia_10k
Action Input: {'input': 'revenue and expenses'}
[0m[1;3;34mObservation: Revenue and expenses are recognized and managed in various ways across different segments and models within the company. Revenue is generated through multiple streams, including transaction-based services, subscription-based services, and advertising. For transaction-based services, revenue is recognized upon the transfer of control of the promised services, reflecting the consideration expected in exchange for those services. The company operates under both the merchant and agency models for boo

## Dynamic Planning & Execution
ReAct is by far the most popular agent so far. However there is a need for handling user intent that is more complex. Also more and more agents are deployed in production settings that require higher reliability, observability, parallelization, control and separation of concerns. Essentially we need long term planning, execution insight, efficiency, optimization and reduce latency.

Two papers have been published in the recent past that address this:

Plan and Solve

LLMCompiler

At a high level these attempt to separate the higher level planning from short term execution. The logic for such agents is:

Given an input query plan the steps that are required to complete. (Essentially the whole computational graph/DAG)

For each step in the plan determine what tools to use if any and execute it with the required inputs.

So we would need a planner and a executor. The planner most likely will be using an LLM to take the user query and generate a step by step plan. The executor can then take each step and figure out which tools if any are required to complete the task defined in the step. This process continues until the whole plan is executed and the final response is shown.

Langchain has a Plan and Execute agent (still experimental). Llama Index has a Llama pack for LLMCompiler

#### Plan And Execute
Using Langchain's PlanAndExecute agent with LlamaIndex query engines as tools by converting them to Langchain tools using LlamaIndex Langchain helpers

In [None]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain_experimental.plan_and_execute import PlanAndExecute, load_chat_planner, load_agent_executor
# Use the LlamaIndex converter of query engine to Lang chain tool
from llama_index.core.langchain_helpers.agents import IndexToolConfig, LlamaIndexTool

expedia_tool_config = IndexToolConfig(
                        query_engine=expedia_engine,
                        name="expedia_10k",
                        description="Provides information about Expedia group's 10k filing for year 2023",
                        tool_kwargs={"return_direct": True}
                      )
booking_tool_config = IndexToolConfig(
                        query_engine=booking_engine,
                        name="booking_10k",
                        description="Provides information about Booking 10k filing for year 2023",
                        tool_kwargs={"return_direct": True}
                      )
# Do similarly for Uber and Lyft ....

# lc_expedia_tool = LlamaIndexTool.from_tool_config(expedia_tool_config)

# lc_booking_tool = LlamaIndexTool.from_tool_config(booking_tool_config)

# Use the LlamaIndex Router Query Engine instead as the individual query engines are not showing correct output

filing_10k_tool_config = IndexToolConfig(
                      query_engine=filing_10k_engine,
                      name="Filing_10k",
                      description="Selects the appropriate company filing query engine based on the query and executes it",
                      tool_kwargs={"return_direct": True}
                     )
lc_filing_10k_tool = LlamaIndexTool.from_tool_config(filing_10k_tool_config)

lc_agent_tools = [lc_filing_10k_tool]

# Now that we wrapped LlamaIndex query engines as langchain agent tools we can now define the PlanAndExecute Langchain agent
model = ChatOpenAI(model='gpt-4-0125-preview', temperature=0)
# Implement the Planner
# This method takes in the intermediate steps taken by the agent and user inputs as arguments.
# It should analyze the current state and decide what action or tool to use next.
# The plan method should return a list of AgentAction objects specifying the tools to use.
planner = load_chat_planner(model)

# Instantiate the AgentExecutor
# Finally, create an instance of the agent executor class, passing in the agent and tools as arguments.
# The agent executor handles the execution of the agent’s actions and tools.
executor = load_agent_executor(model, lc_agent_tools, verbose=True)

agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)
# Now run the agent
response = agent.run("Compare and contrast the revenue and expenses of Expedia and booking holdings for 2023 and provide an in depth analysis")

print (response)

  warn_deprecated(
  warn_deprecated(




[1m> Entering new PlanAndExecute chain...[0m
steps=[Step(value='Research the latest financial reports or earnings releases from Expedia and Booking Holdings for the fiscal year 2023 to gather data on their revenue and expenses.'), Step(value='Compare the revenue figures for both companies, analyzing any growth, decline, or trends observed within the 2023 fiscal year.'), Step(value='Contrast the expense structures of both companies, identifying key areas of spending, differences in cost management strategies, and any notable changes from previous years.'), Step(value='Analyze the impact of the revenue and expense figures on the overall financial health and performance of both companies, considering factors such as profit margins, earnings growth, and market share.'), Step(value='Given the above steps taken, provide an in-depth analysis comparing and contrasting the revenue and expenses of Expedia and Booking Holdings for 2023.\n\n')]

[1m> Entering new AgentExecutor chain...[0m
[

In [None]:
# Do similarly for Uber and Lyft ....
# Now run the agent
response = agent.run("Compare and contrast the revenue and expenses of Uber and Lyft for 2023 and provide an in depth analysis")

print (response)

#### LLMCompiler
Use LlamaIndex LLMCompiler Llama Pack

In [None]:
!pip install llama_index-packs-agents-llm-compiler -q

In [None]:
from llama_index.packs.agents_llm_compiler.step import LLMCompilerAgentWorker
from llama_index.core.agent import AgentRunner
# Instantiate Agent Worker
#filing_10k_tools = [expedia_query_tool, booking_query_tool, uber_query_tool, lyft_query_tool]
# Let us try with the RouterQueryEngine tool instead of the individual query tool
filing_10k_tools = [filing_10k_tool]
filing_10k_agent_worker = LLMCompilerAgentWorker.from_tools(
                tools = filing_10k_tools,
                verbose=True
              )
filing_10k_agent = AgentRunner(agent_worker=filing_10k_agent_worker)

response = filing_10k_agent.chat("Compare and contrast the revenue and expenses of expedia and booking holdings for 2023 and provide an in depth analysis")

print (response)


> Running step d60d65e8-c0f8-4d2e-bfeb-0ba7c08b708b for task c24a9728-1adb-4326-a91a-198d11976e62.
> Step count: 0
[1;3;38;5;200m> Plan: 1. Filing_10k({"input": "expedia 2023 revenue"})
2. Filing_10k({"input": "expedia 2023 expenses"})
3. Filing_10k({"input": "booking holdings 2023 revenue"})
4. Filing_10k({"input": "booking holdings 2023 expenses"})
5. join()<END_OF_PLAN>
[0m[1;3;34mRan task: Filing_10k. Observation: Booking Holdings Inc. reported revenues of $0 for the year ended December 31, 2023.
[0m[1;3;34mRan task: Filing_10k. Observation: Expedia Group, Inc. reported a revenue of $12,839 million for the year ended December 31, 2023.
[0m[1;3;34mRan task: Filing_10k. Observation: Booking Holdings Inc. reported operating expenses of $299 million for the year ended December 31, 2023.
[0m[1;3;34mRan task: Filing_10k. Observation: Expedia's expenses for the year ended December 31, 2023, are detailed as follows (in millions):

- Cost of revenue (exclusive of depreciation and a