# RAG Retrieval Optimization - Query Reformulation Supported By Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases now suppport query reformulation for complex user input queries. This feature can take a complex input query and break it into multiple sub-queries. These sub-queries will then separately go through their own retrieval steps to find relevant chunks. In this process, the subqueries having less semantic complexity might find more targeted chunks. These chunks will then be pooled and ranked together before passing them to the FM to generate a response.

Also with query reformulation natively supported, user can take advance of this feature to imporve robustness and accuracy of their RAG solution without having to manage extrac components like third party libraries.

## Pre-req
You must run the `[workshop_setup.ipynb]`(../lab00-setup/workshop_setup.ipynb) notebook in `lab00-setup` before starting this lab.

In [1]:
import warnings
warnings.warn("Warning: if you did not run lab00-setup, please go back and run the lab00 notebook") 



## Load the parameters

In [2]:
print("Lab parameters....\n")
%store -r amzn10k_prefix
%store -r amzn10k_s3_path
%store -r bucket
print(amzn10k_prefix)
print(amzn10k_s3_path)
print(bucket)

print("\nload the vector db parameters....\n")
# vector parameters stored from Initial setup lab02
%store -r vector_store_name
%store -r vector_collection_arn
%store -r vector_collection_id
%store -r vector_host
%store -r bedrock_kb_execution_role_arn
## check all 4 values are printed and do not fail
print(vector_store_name)
print(vector_collection_arn)
print(vector_collection_id)
print(vector_host)
print(bedrock_kb_execution_role_arn)

Lab parameters....

amazon_10k
s3://sagemaker-us-west-2-376678947624/amazon_10k/
sagemaker-us-west-2-376678947624

load the vector db parameters....

swagger-api-vector-541
arn:aws:aoss:us-west-2:376678947624:collection/8o3snghnd44k5fz4itf8
8o3snghnd44k5fz4itf8
8o3snghnd44k5fz4itf8.us-west-2.aoss.amazonaws.com
arn:aws:iam::376678947624:role/AmazonBedrockExecutionRoleForKnowledgeBase_541


In [None]:
from knowledge_base import BedrockKnowledgeBase
import random
import time
import boto3
# auth for opensearch
boto3_session = boto3.Session()
region_name = boto3_session.region_name
# try out KB using RetrieveAndGenerate API
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region_name)
model_id = "anthropic.claude-3-sonnet-20240229-v1:0" # try with both claude instant as well as claude-v2. for claude v2 - "anthropic.claude-v2"
model_arn = f'arn:aws:bedrock:{region_name}::foundation-model/{model_id}'

In [3]:
suffix = random.randrange(200, 900)
kb_name = f"bedrock-query-reformation-{suffix}"
index_name = f"bedrock-query-reformulation-{suffix}"
description = "This knowledge base contain Amazon 10K financial document from 2022 and 2023"

knowledge_base = BedrockKnowledgeBase(
    kb_name=kb_name,
    kb_description=description,
    data_bucket_name=bucket,
    data_prefix=[amzn10k_prefix],
    vector_collection_arn=vector_collection_arn,
    vector_collection_id=vector_collection_id,
    vector_host=vector_host,
    bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn,
    index_name=index_name,
    suffix=suffix,
)

Step 1 - Creating OSS Vector Index

Creating index:
{ 'acknowledged': True,
  'index': 'bedrock-query-reformulation-index-688',
  'shards_acknowledged': True}
Step 2 - Creating Knowledge Base
Creating KB with chunking strategy - FIXED_SIZE
 {'chunkingConfiguration': {'chunkingStrategy': 'FIXED_SIZE', 'fixedSizeChunkingConfiguration': {'maxTokens': 300, 'overlapPercentage': 20}}}
{ 'createdAt': datetime.datetime(2024, 10, 23, 13, 52, 59, 600547, tzinfo=tzlocal()),
  'description': 'This knowledge base contain Amazon 10K financial document '
                 'from 2022 and 2023',
  'knowledgeBaseArn': 'arn:aws:bedrock:us-west-2:376678947624:knowledge-base/IJJYGJVXUS',
  'knowledgeBaseConfiguration': { 'type': 'VECTOR',
                                  'vectorKnowledgeBaseConfiguration': { 'embeddingModelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v2:0'}},
  'knowledgeBaseId': 'IJJYGJVXUS',
  'name': 'bedrock-query-reformation-knowledge-base-688',
  'roleAr

### Start ingestion job
Once the KB and data source is created, we can start the ingestion job.
During the ingestion job, KB will fetch the documents in the data source, pre-process it to extract text, chunk it based on the chunking size provided, create embeddings of each chunk and then write it to the vector database, in this case OSS.

In [6]:

# ensure that the kb is available
time.sleep(30)
# sync knowledge base
knowledge_base.start_ingestion_job()

{ 'dataSourceId': '7E76XOLCH5',
  'ingestionJobId': 'LPLEIEBI4X',
  'knowledgeBaseId': 'IJJYGJVXUS',
  'startedAt': datetime.datetime(2024, 10, 23, 14, 29, 49, 398748, tzinfo=tzlocal()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 0,
                  'numberOfMetadataDocumentsModified': 0,
                  'numberOfMetadataDocumentsScanned': 0,
                  'numberOfModifiedDocumentsIndexed': 0,
                  'numberOfNewDocumentsIndexed': 0},
  'status': 'STARTING',
  'updatedAt': datetime.datetime(2024, 10, 23, 14, 29, 49, 398748, tzinfo=tzlocal())}
{ 'dataSourceId': '7E76XOLCH5',
  'ingestionJobId': 'LPLEIEBI4X',
  'knowledgeBaseId': 'IJJYGJVXUS',
  'startedAt': datetime.datetime(2024, 10, 23, 14, 29, 49, 398748, tzinfo=tzlocal()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 3,

In [9]:
kb_id = knowledge_base.get_knowledge_base_id()

'IJJYGJVXUS'


### Query Reformulation
To demonstrate the functionality, we are going to more complex query contained in Aamzon's 10K financial document. This query contains a few asks that are not semantically related. When this query is embedded during the retrieval step, some aspects of the query may become diluted and therefore the relevant chunks returned may not address all components of this complex query.

To query our Knowledge Base and generate a response we will use the retrieve_and_generate API call. To use the query reformulation feature, we will include in our knowledge base configuration the additional information as shown below:

```
'orchestrationConfiguration': {
        'queryTransformationConfiguration': {
            'type': 'QUERY_DECOMPOSITION'
        }
    }

```

### > Without Query Reformulation
Let's see how the generated result looks like for the following query without using query reformulation:

"What was Amazon's total net sales in 2023, and how did it compare to 2021? Also, what were the main factors contributing to the sales growth?"

In [22]:
query = """
What are amazons key priorities before, during and after covid?
"""

In [23]:
response_ret = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region_name, 
                                                                         model_id),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":3
                } 
            }
        }
    }
)


# generated text output
raw_response = response_ret['output']['text']
print(raw_response,end='\n'*2)

According to the search results, Amazon's key priorities include investment and innovation, inclusion and diversity, safety, and engagement to hire and develop the best talent. Some of the initiatives mentioned to support these priorities are:

- Competitive pay and benefits
- Flexible work arrangements
- Skills training and educational programs like Amazon Career Choice (education funding for eligible employees) and the Amazon Technical Academy (software development engineer training)
- Refining hiring, development, evaluation, and retention mechanisms to promote equity
- Investing in safety improvements like capital improvements, new safety technology, vehicle safety controls, and ergonomic solutions
- Providing mentorship, support resources, and programs to advance employee engagement, communication, and feedback The search results do not explicitly mention Amazon's priorities specifically before, during, or after COVID-19. However, the initiatives listed above seem to be ongoing pr

In [24]:
response_without_qr = response_ret['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_without_qr))
def citations_rag_print(response_ret):
#structure 'retrievalResults': list of contents. Each list has content, location, score, metadata
    for num,chunk in enumerate(response_ret,1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

citations_rag_print(response_without_qr)

# of citations or chunks used to generate the response:  3
Chunk 1:  As we strive to be Earth’s best employer, we focus on investment and innovation, inclusion and diversity, safety, and engagement to hire and develop the best talent. We rely on numerous and evolving initiatives to implement these objectives and invent mechanisms for talent development, including competitive pay and benefits, flexible work arrangements, and skills training and educational programs such as Amazon Career Choice (education funding for eligible employees) and the Amazon Technical Academy (software development engineer training). Over 175,000 Amazon employees around the world have participated in Career Choice. We also continue to inspect and refine the mechanisms we use to hire, develop, evaluate, and retain our employees to promote equity for all candidates and employees. In addition, safety is integral to everything we do at Amazon and we continue to invest in safety improvements such as capital improvem

### > With Query Reformulation
Now let's see how query reformulation can benefit the more aligned context retrieval, which in turn, will enhace the accuracy of response generation.

In [25]:
response_ret = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region_name, 
                                                                         model_id),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":3
                } 
            },
            'orchestrationConfiguration': {
                'queryTransformationConfiguration': {
                    'type': 'QUERY_DECOMPOSITION'
                }
            }
        }
    }
)


# generated text output
transformed_response = response_ret['output']['text']
print(transformed_response,end='\n'*2)

According to the search results, Amazon's key priorities include:

- Being Earth's best employer by focusing on investment and innovation, inclusion and diversity, safety, and engagement to hire and develop the best talent.
- Providing competitive pay and benefits, flexible work arrangements, and skills training programs like Amazon Career Choice (education funding for eligible employees) and the Amazon Technical Academy (software development engineer training).
- Promoting equity for all candidates and employees in hiring, development, evaluation, and retention processes.
- Investing in safety improvements like capital improvements, new safety technology, vehicle safety controls, and ergonomic solutions to establish industry best practices.
- Providing mentorship, support resources, and programs to advance employee engagement, communication, and feedback. The search results do not explicitly mention Amazon's priorities specifically before, during, or after the COVID-19 pandemic. Howev

In [26]:
response_with_qr = response_ret['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_with_qr))

citations_rag_print(response_with_qr)

# of citations or chunks used to generate the response:  2
Chunk 1:  As we strive to be Earth’s best employer, we focus on investment and innovation, inclusion and diversity, safety, and engagement to hire and develop the best talent. We rely on numerous and evolving initiatives to implement these objectives and invent mechanisms for talent development, including competitive pay and benefits, flexible work arrangements, and skills training and educational programs such as Amazon Career Choice (education funding for eligible employees) and the Amazon Technical Academy (software development engineer training). Over 175,000 Amazon employees around the world have participated in Career Choice. We also continue to inspect and refine the mechanisms we use to hire, develop, evaluate, and retain our employees to promote equity for all candidates and employees. In addition, safety is integral to everything we do at Amazon and we continue to invest in safety improvements such as capital improvem

### > Display the results side-by-side 

Notice splitting the question into sub questions increase your chances of matching the right and complete information and generate a more comprehensive final answer.

In [27]:
import pandas as pd
from IPython.display import display, HTML

# Create the first table
df = pd.DataFrame({
    'Naive RAG': [query, raw_response],
    'RAG w/ Query Rewrite': [query, transformed_response]
})

output=""
output += df.style.hide().set_table_attributes("style='display:inline'")._repr_html_()
output += "&nbsp;"

display(HTML(output))

Naive RAG,RAG w/ Query Rewrite
"What are amazons key priorities before, during and after covid?","What are amazons key priorities before, during and after covid?"
"According to the search results, Amazon's key priorities include investment and innovation, inclusion and diversity, safety, and engagement to hire and develop the best talent. Some of the initiatives mentioned to support these priorities are: - Competitive pay and benefits - Flexible work arrangements - Skills training and educational programs like Amazon Career Choice (education funding for eligible employees) and the Amazon Technical Academy (software development engineer training) - Refining hiring, development, evaluation, and retention mechanisms to promote equity - Investing in safety improvements like capital improvements, new safety technology, vehicle safety controls, and ergonomic solutions - Providing mentorship, support resources, and programs to advance employee engagement, communication, and feedback The search results do not explicitly mention Amazon's priorities specifically before, during, or after COVID-19. However, the initiatives listed above seem to be ongoing priorities for Amazon to attract and retain talent, promote diversity and inclusion, ensure workplace safety, and keep employees engaged - which would likely be important focuses both during the pandemic and in regular times.","According to the search results, Amazon's key priorities include: - Being Earth's best employer by focusing on investment and innovation, inclusion and diversity, safety, and engagement to hire and develop the best talent. - Providing competitive pay and benefits, flexible work arrangements, and skills training programs like Amazon Career Choice (education funding for eligible employees) and the Amazon Technical Academy (software development engineer training). - Promoting equity for all candidates and employees in hiring, development, evaluation, and retention processes. - Investing in safety improvements like capital improvements, new safety technology, vehicle safety controls, and ergonomic solutions to establish industry best practices. - Providing mentorship, support resources, and programs to advance employee engagement, communication, and feedback. The search results do not explicitly mention Amazon's priorities specifically before, during, or after the COVID-19 pandemic. However, the priorities listed above seem to be ongoing efforts by Amazon to be a leading employer and prioritize employee development, diversity, safety, and engagement."
