# RAG Retrieval Optimization - Query Reformulation Supported By Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases now suppport query reformulation for complex user input queries. This feature can take a complex input query and break it into multiple sub-queries. These sub-queries will then separately go through their own retrieval steps to find relevant chunks. In this process, the subqueries having less semantic complexity might find more targeted chunks. These chunks will then be pooled and ranked together before passing them to the FM to generate a response.

Also with query reformulation natively supported, user can take advance of this feature to imporve robustness and accuracy of their RAG solution without having to manage extrac components like third party libraries.

## Pre-req
You must run the `[workshop_setup.ipynb]`(../lab00-setup/workshop_setup.ipynb) notebook in `lab00-setup` before starting this lab.

In [1]:
import warnings
warnings.warn("Warning: if you did not run lab00-setup, please go back and run the lab00 notebook") 



## Load the parameters

In [2]:
print("Lab parameters....\n")
%store -r amzn10k_prefix
%store -r amzn10k_s3_path
%store -r bucket
print(amzn10k_prefix)
print(amzn10k_s3_path)
print(bucket)

print("\nload the vector db parameters....\n")
# vector parameters stored from Initial setup lab02
%store -r vector_store_name
%store -r vector_collection_arn
%store -r vector_collection_id
%store -r vector_host
%store -r bedrock_kb_execution_role_arn
## check all 4 values are printed and do not fail
print(vector_store_name)
print(vector_collection_arn)
print(vector_collection_id)
print(vector_host)
print(bedrock_kb_execution_role_arn)

Lab parameters....

amazon_10k
s3://sagemaker-us-west-2-878687028155/amazon_10k/
sagemaker-us-west-2-878687028155

load the vector db parameters....

swagger-api-vector-332
arn:aws:aoss:us-west-2:878687028155:collection/o1cfjeuk8rk7o1k40qe4
o1cfjeuk8rk7o1k40qe4
o1cfjeuk8rk7o1k40qe4.us-west-2.aoss.amazonaws.com
arn:aws:iam::878687028155:role/AmazonBedrockExecutionRoleForKnowledgeBase_332


In [3]:
import random
import time
import boto3
import sys

sys.path.append('../lab00-setup')
from knowledge_base import BedrockKnowledgeBase

# auth for opensearch
boto3_session = boto3.Session()
region_name = boto3_session.region_name
# try out KB using RetrieveAndGenerate API
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region_name)
model_id = "anthropic.claude-3-sonnet-20240229-v1:0" # try with both claude instant as well as claude-v2. for claude v2 - "anthropic.claude-v2"
model_arn = f'arn:aws:bedrock:{region_name}::foundation-model/{model_id}'

In [4]:
suffix = random.randrange(200, 900)
kb_name = f"bedrock-query-reformation-{suffix}"
index_name = f"bedrock-query-reformulation-{suffix}"
description = "This knowledge base contain Amazon 10K financial document from 2022 and 2023"

knowledge_base = BedrockKnowledgeBase(
    kb_name=kb_name,
    kb_description=description,
    data_bucket_name=bucket,
    data_prefix=[amzn10k_prefix],
    vector_collection_arn=vector_collection_arn,
    vector_collection_id=vector_collection_id,
    vector_host=vector_host,
    bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn,
    index_name=index_name,
    suffix=suffix,
)

Step 1 - Creating OSS Vector Index

Creating index:
{ 'acknowledged': True,
  'index': 'bedrock-query-reformulation-635',
  'shards_acknowledged': True}
Step 2 - Creating Knowledge Base
Creating KB with chunking strategy - FIXED_SIZE
bedrock-query-reformation-635
2ALV4ISRQK
{'bucketArn': 'arn:aws:s3:::sagemaker-us-west-2-878687028155', 'inclusionPrefixes': ['amazon_10k']}
{ 'createdAt': datetime.datetime(2024, 10, 31, 4, 25, 40, 714528, tzinfo=tzlocal()),
  'dataDeletionPolicy': 'RETAIN',
  'dataSourceConfiguration': { 's3Configuration': { 'bucketArn': 'arn:aws:s3:::sagemaker-us-west-2-878687028155',
                                                    'inclusionPrefixes': [ 'amazon_10k']},
                               'type': 'S3'},
  'dataSourceId': 'KMTZM2DCIC',
  'description': 'This knowledge base contain Amazon 10K financial document '
                 'from 2022 and 2023',
  'knowledgeBaseId': '2ALV4ISRQK',
  'name': 'bedrock-query-reformation-635',
  'status': 'AVAILABLE',
  '

### Start ingestion job
Once the KB and data source is created, we can start the ingestion job.
During the ingestion job, KB will fetch the documents in the data source, pre-process it to extract text, chunk it based on the chunking size provided, create embeddings of each chunk and then write it to the vector database, in this case OSS.

In [5]:
# ensure that the kb is available
time.sleep(30)
# sync knowledge base
knowledge_base.start_ingestion_job()

{ 'dataSourceId': 'KMTZM2DCIC',
  'ingestionJobId': 'QZJNYSY8SO',
  'knowledgeBaseId': '2ALV4ISRQK',
  'startedAt': datetime.datetime(2024, 10, 31, 4, 26, 11, 101342, tzinfo=tzlocal()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 0,
                  'numberOfMetadataDocumentsModified': 0,
                  'numberOfMetadataDocumentsScanned': 0,
                  'numberOfModifiedDocumentsIndexed': 0,
                  'numberOfNewDocumentsIndexed': 0},
  'status': 'STARTING',
  'updatedAt': datetime.datetime(2024, 10, 31, 4, 26, 11, 101342, tzinfo=tzlocal())}
{ 'dataSourceId': 'KMTZM2DCIC',
  'ingestionJobId': 'QZJNYSY8SO',
  'knowledgeBaseId': '2ALV4ISRQK',
  'startedAt': datetime.datetime(2024, 10, 31, 4, 26, 11, 101342, tzinfo=tzlocal()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 2,
  

In [6]:
kb_id = knowledge_base.get_knowledge_base_id()

'2ALV4ISRQK'


### Query Reformulation
To demonstrate the functionality, we are going to more complex query contained in Aamzon's 10K financial document. This query contains a few asks that are not semantically related. When this query is embedded during the retrieval step, some aspects of the query may become diluted and therefore the relevant chunks returned may not address all components of this complex query.

To query our Knowledge Base and generate a response we will use the retrieve_and_generate API call. To use the query reformulation feature, we will include in our knowledge base configuration the additional information as shown below:

```
'orchestrationConfiguration': {
        'queryTransformationConfiguration': {
            'type': 'QUERY_DECOMPOSITION'
        }
    }

```

### > Without Query Reformulation
Let's see how the generated result looks like for the following query without using query reformulation:

"What was Amazon's total net sales in 2023, and how did it compare to 2021? Also, what were the main factors contributing to the sales growth?"

In [82]:
query = """
What were Amazons net sales in year 2022 and 2023? Can you compare key challenges faced by Amazon for both years?
"""

In [83]:
response_ret = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region_name, 
                                                                         model_id),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":3
                } 
            }
        }
    }
)


# generated text output
raw_response = response_ret['output']['text']
print(raw_response,end='\n'*2)

According to the search results, Amazon's net sales in 2022 were $513,983 million. The net sales for 2023 are not provided in the search results, but guidance for the first quarter of 2023 indicates that net sales are expected to be between $121.0 billion and $126.0 billion. The key challenges faced by Amazon in 2022 and 2023 based on the search results appear to be:

2022: Changes in foreign currency exchange rates reduced net sales by $15.5 billion.

2023: The guidance for Q1 2023 anticipates an unfavorable impact of approximately 210 basis points from foreign exchange rates. Additionally, the guidance mentions factors like uncertainty from the COVID-19 pandemic, fluctuations in foreign exchange rates, changes in global economic and geopolitical conditions, customer demand and spending (including recessionary fears), inflation, interest rates, regional labor market constraints, and global supply chain constraints as potential challenges.



In [84]:
response_without_qr = response_ret['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_without_qr))
def citations_rag_print(response_ret):
#structure 'retrievalResults': list of contents. Each list has content, location, score, metadata
    for num,chunk in enumerate(response_ret,1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

citations_rag_print(response_without_qr)

# of citations or chunks used to generate the response:  2
Chunk 1:  Service sales primarily represent third-party seller fees, which includes commissions and any related fulfillment and shipping fees, AWS sales, advertising services, Amazon Prime membership fees, and certain digital content subscriptions. Net sales information is as follows (in millions): Year Ended December 31,      2021 2022     Net Sales: North America $ 279,833 $ 315,880 International 127,787 118,007 AWS 62,202 80,096     Consolidated $ 469,822 $ 513,983 Year-over-year Percentage Growth (Decline):     North America 18 % 13 % International 22 (8) AWS 37 29     Consolidated 22 9 Year-over-year Percentage Growth, excluding the effect of foreign exchange rates:     North America 18 % 13 % International 20 4 AWS 37 29     Consolidated 21 13 Net sales mix:     North America 60 % 61 % International 27 23 AWS 13 16     Consolidated 100 % 100 %     Sales increased 9% in 2022, compared to the prior year. Changes in foreign 

### > With Query Reformulation
Now let's see how query reformulation can benefit the more aligned context retrieval, which in turn, will enhace the accuracy of response generation.

In [85]:
response_ret = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region_name, 
                                                                         model_id),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":3
                } 
            },
            'orchestrationConfiguration': {
                'queryTransformationConfiguration': {
                    'type': 'QUERY_DECOMPOSITION'
                }
            }
        }
    }
)


# generated text output
transformed_response = response_ret['output']['text']
print(transformed_response,end='\n'*2)

According to the search results, Amazon's net sales in 2022 were $513,983 million, with North America sales of $315,880 million, International sales of $118,007 million, and AWS sales of $80,096 million. In 2023, Amazon's net sales increased to $574,785 million, with North America sales of $352,828 million, International sales of $131,200 million, and AWS sales of $90,757 million. The search results indicate that some key challenges faced by Amazon in 2022 and 2023 included the impact of changes in foreign exchange rates, which reduced net sales by $15.5 billion in 2022 and $71 million in 2023. Additionally, the guidance for Q1 2023 mentions uncertainty regarding the impacts of the COVID-19 pandemic, fluctuations in foreign exchange rates, changes in global economic and geopolitical conditions, customer demand and spending, inflation, interest rates, regional labor market constraints, and global supply chain constraints as potential challenges.



In [86]:
response_with_qr = response_ret['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_with_qr))

citations_rag_print(response_with_qr)

# of citations or chunks used to generate the response:  2
Chunk 1:  Service sales primarily represent third-party seller fees, which includes commissions and any related fulfillment and shipping fees, AWS sales, advertising services, Amazon Prime membership fees, and certain digital content subscriptions. Net sales information is as follows (in millions): Year Ended December 31,      2021 2022     Net Sales: North America $ 279,833 $ 315,880 International 127,787 118,007 AWS 62,202 80,096     Consolidated $ 469,822 $ 513,983 Year-over-year Percentage Growth (Decline):     North America 18 % 13 % International 22 (8) AWS 37 29     Consolidated 22 9 Year-over-year Percentage Growth, excluding the effect of foreign exchange rates:     North America 18 % 13 % International 20 4 AWS 37 29     Consolidated 21 13 Net sales mix:     North America 60 % 61 % International 27 23 AWS 13 16     Consolidated 100 % 100 %     Sales increased 9% in 2022, compared to the prior year. Changes in foreign 

### > Display the results side-by-side 

Notice splitting the question into sub questions increase your chances of matching the right and complete information and generate a more comprehensive final answer.

In [87]:
import pandas as pd
from IPython.display import display, HTML

# Create the first table
df = pd.DataFrame({
    'Naive RAG': [query, raw_response],
    'RAG w/ Query Rewrite': [query, transformed_response]
})

output=""
output += df.style.hide().set_table_attributes("style='display:inline'")._repr_html_()
output += "&nbsp;"

display(HTML(output))

Naive RAG,RAG w/ Query Rewrite
What were Amazons net sales in year 2022 and 2023? Can you compare key challenges faced by Amazon for both years?,What were Amazons net sales in year 2022 and 2023? Can you compare key challenges faced by Amazon for both years?
"According to the search results, Amazon's net sales in 2022 were $513,983 million. The net sales for 2023 are not provided in the search results, but guidance for the first quarter of 2023 indicates that net sales are expected to be between $121.0 billion and $126.0 billion. The key challenges faced by Amazon in 2022 and 2023 based on the search results appear to be: 2022: Changes in foreign currency exchange rates reduced net sales by $15.5 billion. 2023: The guidance for Q1 2023 anticipates an unfavorable impact of approximately 210 basis points from foreign exchange rates. Additionally, the guidance mentions factors like uncertainty from the COVID-19 pandemic, fluctuations in foreign exchange rates, changes in global economic and geopolitical conditions, customer demand and spending (including recessionary fears), inflation, interest rates, regional labor market constraints, and global supply chain constraints as potential challenges.","According to the search results, Amazon's net sales in 2022 were $513,983 million, with North America sales of $315,880 million, International sales of $118,007 million, and AWS sales of $80,096 million. In 2023, Amazon's net sales increased to $574,785 million, with North America sales of $352,828 million, International sales of $131,200 million, and AWS sales of $90,757 million. The search results indicate that some key challenges faced by Amazon in 2022 and 2023 included the impact of changes in foreign exchange rates, which reduced net sales by $15.5 billion in 2022 and $71 million in 2023. Additionally, the guidance for Q1 2023 mentions uncertainty regarding the impacts of the COVID-19 pandemic, fluctuations in foreign exchange rates, changes in global economic and geopolitical conditions, customer demand and spending, inflation, interest rates, regional labor market constraints, and global supply chain constraints as potential challenges."


### Clean Up

In [69]:
#knowledge_base.delete_kb()