# RAG Retrieval Optimization - Query Rewriting using Amazon Bedrock and Llamaindex

In this tutorial, we showcase how to use a sub question query engine to tackle the problem of answering a complex query.
It first breaks down the complex query into sub questions for each relevant data source, then gather all the intermediate reponses and synthesizes a final response.

- Vector Database (Faiss / local)
- LLM (Amazon Bedrock - Claude3 Sonnet)
- Embeddings Model (Bedrock Titan Text Embeddings v2.0)
- Datasets ( Amazons 10-k sec filings from year 2022 and 2023 )
- Llamaindex  (This example is built on referece llamaindex documentation available at - https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine/)


In [1]:
!pip install llama-index
%pip install llama-index-llms-bedrock
%pip install llama-index-embeddings-bedrock
!pip uninstall pydantic -y
!pip install pydantic
%pip install sqlalchemy==2.0.21 --force-reinstall --quiet
%pip install llama-index-embeddings-instructor

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Found existing installation: pydantic 2.8.2
Uninstalling pydantic-2.8.2:
  Successfully uninstalled pydantic-2.8.2
Collecting pydantic
  Using cached pydantic-2.8.2-py3-none-any.whl.metadata (125 kB)
Using cached pydantic-2.8.2-py3-none-any.whl (423 kB)
Installing collected packages: pydantic
Successfully installed pydantic-2.8.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-server 2.14.1 requires packaging>=22.0, but you have packaging 21.3 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
from llama_index.embeddings.bedrock import BedrockEmbedding

In [3]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings

In [4]:
import json
from typing import Sequence, List
from llama_index.core.settings import Settings
from llama_index.llms.bedrock import Bedrock
from llama_index.embeddings.bedrock import BedrockEmbedding, Models

llm = Bedrock(model = "anthropic.claude-3-sonnet-20240229-v1:0")
embed_model = BedrockEmbedding(model = "amazon.titan-embed-text-v2:0")

Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 256

from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool, FunctionTool
import nest_asyncio
nest_asyncio.apply()

In [5]:
# Using the LlamaDebugHandler to print the trace of the sub questions
# captured by the SUB_QUESTION callback event type
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

Settings.callback_manager = callback_manager

In [6]:
!mkdir -p 'data/amazon/'
!wget 'https://s2.q4cdn.com/299287126/files/doc_financials/2023/q4/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf' -O 'data/amazon/amazon_2023.pdf'
!wget 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q4/d2fde7ee-05f7-419d-9ce8-186de4c96e25.pdf' -O 'data/amazon/amazon_2022.pdf'

--2024-07-25 19:51:20--  https://s2.q4cdn.com/299287126/files/doc_financials/2023/q4/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf
Resolving s2.q4cdn.com (s2.q4cdn.com)... 68.70.205.3, 68.70.205.2, 68.70.205.1, ...
Connecting to s2.q4cdn.com (s2.q4cdn.com)|68.70.205.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 800598 (782K) [application/pdf]
Saving to: ‘data/amazon/amazon_2023.pdf’


2024-07-25 19:51:20 (16.3 MB/s) - ‘data/amazon/amazon_2023.pdf’ saved [800598/800598]

--2024-07-25 19:51:21--  https://s2.q4cdn.com/299287126/files/doc_financials/2022/q4/d2fde7ee-05f7-419d-9ce8-186de4c96e25.pdf
Resolving s2.q4cdn.com (s2.q4cdn.com)... 68.70.205.4, 68.70.205.1, 68.70.205.2, ...
Connecting to s2.q4cdn.com (s2.q4cdn.com)|68.70.205.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 712683 (696K) [application/pdf]
Saving to: ‘data/amazon/amazon_2022.pdf’


2024-07-25 19:51:21 (1.27 MB/s) - ‘data/amazon/amazon_2022.pdf’ saved [712683/712683]



In [7]:
# load data
amazon_secfiles = SimpleDirectoryReader(input_dir="./data/amazon/").load_data()

# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
    amazon_secfiles,
    use_async=True,
).as_query_engine()

**********
Trace: index_construction
    |_node_parsing -> 0.676243 seconds
      |_chunking -> 0.003311 seconds
      |_chunking -> 0.001372 seconds
      |_chunking -> 0.001886 seconds
      |_chunking -> 0.003729 seconds
      |_chunking -> 0.001981 seconds
      |_chunking -> 0.002247 seconds
      |_chunking -> 0.002679 seconds
      |_chunking -> 0.002503 seconds
      |_chunking -> 0.002069 seconds
      |_chunking -> 0.002624 seconds
      |_chunking -> 0.002383 seconds
      |_chunking -> 0.002423 seconds
      |_chunking -> 0.002264 seconds
      |_chunking -> 0.001867 seconds
      |_chunking -> 0.002581 seconds
      |_chunking -> 0.001178 seconds
      |_chunking -> 0.000939 seconds
      |_chunking -> 0.000132 seconds
      |_chunking -> 0.002669 seconds
      |_chunking -> 0.002524 seconds
      |_chunking -> 0.002208 seconds
      |_chunking -> 0.002904 seconds
      |_chunking -> 0.001467 seconds
      |_chunking -> 0.001275 seconds
      |_chunking -> 0.002221 seconds

In [8]:
# setup base query engine as tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="Amazon-10k",
            description="Amazon SEC 10-k filings for years 2022 and 2023",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

In [9]:
response = query_engine.query(
    "What were key challenges faced by Amazon in year 2022 and 2023?"
)

Generated 2 sub questions.
[1;3;38;2;237;90;200m[Amazon-10k] Q: What were the key challenges faced by Amazon in 2022?
[0m[1;3;38;2;237;90;200m[Amazon-10k] A: Based on the context information provided, some of the key challenges faced by Amazon in 2022 included:

1. Maintaining their unique culture of innovation, customer obsession, and long-term thinking, which has been critical to their growth and success.

2. Potential disruptions from natural or human-caused disasters, extreme weather events (including those related to climate change), geopolitical events, security issues, labor or trade disputes, and similar events.

3. Negative impacts of climate change, such as increased operating costs due to extreme weather events, increased investment requirements for transitioning to a low-carbon economy, decreased demand for products and services due to changes in customer behavior, and increased compliance costs due to more extensive and global regulations.

4. Variations in their level 

In [10]:
print(response)

Based on the provided information, some of the key challenges faced by Amazon in 2022 and 2023 included:

1. Maintaining their unique culture focused on innovation, customer obsession, and long-term thinking.

2. Potential disruptions from natural disasters, extreme weather events related to climate change, geopolitical events, security issues, labor disputes, and similar events.

3. Negative impacts of climate change, such as increased operating costs, investment requirements for transitioning to a low-carbon economy, decreased demand due to changes in customer behavior, and increased compliance costs due to regulations.

4. Fluctuations in merchandise and vendor returns.

5. Factors affecting their reputation or brand image, particularly related to the development and use of artificial intelligence and machine learning technologies.

6. Availability and rising prices of transportation, resources (land, water, energy), commodities (paper, packing supplies, hardware products), and tech

In [11]:
# iterate through sub_question items captured in SUB_QUESTION event
from llama_index.core.callbacks import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")

Sub Question 0: What were the key challenges faced by Amazon in 2022?
Answer: Based on the context information provided, some of the key challenges faced by Amazon in 2022 included:

1. Maintaining their unique culture of innovation, customer obsession, and long-term thinking, which has been critical to their growth and success.

2. Potential disruptions from natural or human-caused disasters, extreme weather events (including those related to climate change), geopolitical events, security issues, labor or trade disputes, and similar events.

3. Negative impacts of climate change, such as increased operating costs due to extreme weather events, increased investment requirements for transitioning to a low-carbon economy, decreased demand for products and services due to changes in customer behavior, and increased compliance costs due to more extensive and global regulations.

4. Variations in their level of merchandise and vendor returns.

5. Factors affecting their reputation or brand 

In [12]:
response = query_engine.query(
    "What are amazons key priorities before, during and after covid?"
)

Generated 3 sub questions.
[1;3;38;2;237;90;200m[Amazon-10k] Q: What were Amazon's key priorities before COVID-19 (e.g., in 2019 or earlier)?
[0m[1;3;38;2;237;90;200m[Amazon-10k] A: Based on the context provided, one of Amazon's key priorities before COVID-19 appears to have been maintaining their unique culture of innovation, customer obsession, and long-term thinking. The passage mentions that this culture "has been critical to our growth and success." While no specific years are mentioned, this suggests it was an important focus for Amazon leading up to and preceding the COVID-19 pandemic.
[0m[1;3;38;2;90;149;237m[Amazon-10k] Q: How did Amazon's priorities shift during the COVID-19 pandemic (e.g., in 2020 and 2021)?
[0m[1;3;38;2;90;149;237m[Amazon-10k] A: The context does not provide any specific information about how Amazon's priorities shifted during the COVID-19 pandemic in 2020 and 2021. The context discusses potential risks and disruptions to Amazon's business from event

In [13]:
print(response)

Before COVID-19, one of Amazon's key priorities appears to have been maintaining their unique culture of innovation, customer obsession, and long-term thinking, which has been critical to their growth and success.

The context does not provide specific information about how Amazon's priorities shifted during the COVID-19 pandemic in 2020 and 2021. However, it suggests they likely focused on mitigating disruptions from events like public health crises that could impact their operations, customers, sellers, and suppliers.

After the COVID-19 pandemic, some of Amazon's key priorities seem to be:

1. Continuing to maintain their innovative culture and long-term thinking approach.
2. Addressing potential negative impacts of climate change on their business.
3. Mitigating disruptions from various natural or human-caused disasters, extreme weather events, geopolitical issues, and other events that could affect their operations.
4. Optimizing and effectively operating their fulfillment network