# RAG Retrieval Optimization - Query Transformation (Hyde) using Amazon Bedrock and Llamaindex

In this tutorial, we showcase how to use a sub question query engine to tackle the problem of answering a complex query using multiple data sources.
It first breaks down the complex query into sub questions for each relevant data source, then gather all the intermediate reponses and synthesizes a final response.

- Vector Database (Faiss / local)
- LLM (Amazon Bedrock - Claude3 Sonnet)
- Embeddings Model (Bedrock Titan Text Embedding v2.0)
- Datasets ( Amazons , Google SEC-10k statments )

This example is built on referece llamaindex documentation available at - https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/query_transformations/HyDEQueryTransformDemo.ipynb

Transforming an initial query into a form that can be more easily embedded (e.g. HyDE)


In [1]:
!pip install llama-index 
%pip install llama-index-llms-bedrock
%pip install llama-index-embeddings-bedrock
!pip uninstall pydantic -y
!pip install pydantic
%pip install sqlalchemy==2.0.21 --force-reinstall --quiet
%pip install llama-index-embeddings-instructor

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Found existing installation: pydantic 2.8.2
Uninstalling pydantic-2.8.2:
  Successfully uninstalled pydantic-2.8.2
Collecting pydantic
  Using cached pydantic-2.8.2-py3-none-any.whl.metadata (125 kB)
Using cached pydantic-2.8.2-py3-none-any.whl (423 kB)
Installing collected packages: pydantic
Successfully installed pydantic-2.8.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-server 2.14.1 requires packaging>=22.0, but you have packaging 21.3 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
from llama_index.embeddings.bedrock import BedrockEmbedding

In [3]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine
from IPython.display import Markdown, display

In [4]:
import json
from typing import Sequence, List
from llama_index.core.settings import Settings
from llama_index.llms.bedrock import Bedrock
from llama_index.embeddings.bedrock import BedrockEmbedding, Models

llm = Bedrock(model = "anthropic.claude-3-sonnet-20240229-v1:0")
embed_model = BedrockEmbedding(model = "amazon.titan-embed-text-v2:0")

Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 128

from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool, FunctionTool
import nest_asyncio
nest_asyncio.apply()

In [5]:
!mkdir -p 'data/amazon/'
!wget 'https://s2.q4cdn.com/299287126/files/doc_financials/2023/q4/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf' -O 'data/amazon/amazon_2023.pdf'
!wget 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q4/d2fde7ee-05f7-419d-9ce8-186de4c96e25.pdf' -O 'data/amazon/amazon_2022.pdf'

--2024-07-25 20:19:03--  https://s2.q4cdn.com/299287126/files/doc_financials/2023/q4/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf
Resolving s2.q4cdn.com (s2.q4cdn.com)... 68.70.205.1, 68.70.205.3, 68.70.205.4, ...
Connecting to s2.q4cdn.com (s2.q4cdn.com)|68.70.205.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 800598 (782K) [application/pdf]
Saving to: ‘data/amazon/amazon_2023.pdf’


2024-07-25 20:19:03 (8.39 MB/s) - ‘data/amazon/amazon_2023.pdf’ saved [800598/800598]

--2024-07-25 20:19:03--  https://s2.q4cdn.com/299287126/files/doc_financials/2022/q4/d2fde7ee-05f7-419d-9ce8-186de4c96e25.pdf
Resolving s2.q4cdn.com (s2.q4cdn.com)... 68.70.205.2, 68.70.205.4, 68.70.205.3, ...
Connecting to s2.q4cdn.com (s2.q4cdn.com)|68.70.205.2|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 712683 (696K) [application/pdf]
Saving to: ‘data/amazon/amazon_2022.pdf’


2024-07-25 20:19:04 (1.48 MB/s) - ‘data/amazon/amazon_2022.pdf’ saved [712683/712683]



In [6]:
# load data
amazon_secfiles = SimpleDirectoryReader(input_dir="./data/amazon/").load_data()

# build index and query engine
index = VectorStoreIndex.from_documents(amazon_secfiles)

In [7]:
query_str = "What were key challenges faced by Amazon in year 2022 and 2023?"

In [8]:
query_engine = index.as_query_engine()
response = query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))

<b>Based on the context information provided, some of the key challenges faced by Amazon in 2022 and 2023 included:

- Fluctuations in foreign exchange rates
- Changes in global economic conditions and customer demand/spending patterns
- Inflationary pressures and interest rate changes  
- Regional labor market constraints and global supply chain disruptions
- Geopolitical events and world developments impacting business operations
- Managing growth and investments in new business opportunities
- Variability in product/service mix and associated revenue streams
- Tax obligations and regulatory compliance
- Intense competition in the markets they operate in
- Potential risks from litigation, government investigations, and other legal proceedings
- Optimizing fulfillment, sortation, delivery, and data center operations
- Inventory management and demand variability challenges
- Risks associated with commercial agreements, acquisitions, and strategic transactions
- Payments-related risks and productivity/throughput challenges in fulfillment operations

The context highlights that unforeseen global economic and geopolitical conditions could further amplify many of these risks and challenges for Amazon during this period.</b>

In [9]:
print(response)

Based on the context information provided, some of the key challenges faced by Amazon in 2022 and 2023 included:

- Fluctuations in foreign exchange rates
- Changes in global economic conditions and customer demand/spending patterns
- Inflationary pressures and interest rate changes  
- Regional labor market constraints and global supply chain disruptions
- Geopolitical events and world developments impacting business operations
- Managing growth and investments in new business opportunities
- Variability in product/service mix and associated revenue streams
- Tax obligations and regulatory compliance
- Intense competition in the markets they operate in
- Potential risks from litigation, government investigations, and other legal proceedings
- Optimizing fulfillment, sortation, delivery, and data center operations
- Inventory management and demand variability challenges
- Risks associated with commercial agreements, acquisitions, and strategic transactions
- Payments-related risks and 

In [10]:
hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)
response = hyde_query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))

<b>Based on the context information provided, some of the key challenges faced by Amazon in 2022 and 2023 included:

1. Fluctuations in foreign exchange rates
2. Changes in global economic conditions and customer demand/spending patterns
3. Inflationary pressures and interest rate changes
4. Regional labor market constraints and global supply chain disruptions
5. Geopolitical events and their impacts
6. Managing growth and investments in new business opportunities
7. Variability in product/service mix and associated revenue streams
8. Tax obligations and regulatory scrutiny
9. Intense competition in various markets
10. Operational challenges related to fulfillment, sortation, delivery, and data centers
11. Inventory management and demand variability
12. Risks associated with commercial agreements, acquisitions, and strategic transactions
13. Potential legal claims, litigation, and government investigations

The context highlights that unforeseen global economic and geopolitical conditions could amplify or give rise to many of these challenges faced by Amazon during these years.</b>

In [11]:
print(response)

Based on the context information provided, some of the key challenges faced by Amazon in 2022 and 2023 included:

1. Fluctuations in foreign exchange rates
2. Changes in global economic conditions and customer demand/spending patterns
3. Inflationary pressures and interest rate changes
4. Regional labor market constraints and global supply chain disruptions
5. Geopolitical events and their impacts
6. Managing growth and investments in new business opportunities
7. Variability in product/service mix and associated revenue streams
8. Tax obligations and regulatory scrutiny
9. Intense competition in various markets
10. Operational challenges related to fulfillment, sortation, delivery, and data centers
11. Inventory management and demand variability
12. Risks associated with commercial agreements, acquisitions, and strategic transactions
13. Potential legal claims, litigation, and government investigations

The context highlights that unforeseen global economic and geopolitical conditions

In [12]:
query_bundle = hyde(query_str)
hyde_doc = query_bundle.embedding_strs[0]

In [13]:
hyde_doc

"Amazon faced several key challenges in 2022 and 2023, including:\n\n1. Economic downturn and inflation: The global economic slowdown and high inflation rates impacted consumer spending, leading to a decline in demand for some of Amazon's products and services.\n\n2. Supply chain disruptions: Ongoing supply chain issues, such as labor shortages, port congestion, and transportation challenges, affected Amazon's ability to efficiently deliver products to customers.\n\n3. Increased competition: Amazon faced intense competition from other e-commerce giants, such as Walmart and Target, as well as smaller niche players in various product categories.\n\n4. Regulatory scrutiny: Amazon faced heightened regulatory scrutiny from governments around the world, particularly regarding antitrust concerns, labor practices, and data privacy issues.\n\n5. Labor challenges: Amazon grappled with labor shortages, high turnover rates, and unionization efforts among its warehouse and delivery workers, leading