# Retrieval and Generation with Bedrock Foundational Models

### Overview  
This notebook demonstrates how to perform retrieval-augmented generation (RAG) using Amazon Bedrock's foundational models. It covers retrieving relevant documents from a knowledge base and generating responses based on the retrieved context.

### Build your own Retrieval Augmented Generation (RAG) system
When constructing your own retrieval augmented generation (RAG) system, you can leverage a retriever system and a generator system. The retriever can be an embedding model that identifies the relevant chunks from the vector database based on similarity scores. The generator can be a Large Language Model (LLM) that utilizes the model's capability to answer questions based on the retrieved results (also known as chunks). In the following sections, we will provide additional tips on how to optimize the prompts for your RAG system.

In [1]:
import json
with open("variables.json", "r") as f:
    variables = json.load(f)

variables

{'accountNumber': '307297743176',
 'regionName': 'us-west-2',
 'collectionArn': 'arn:aws:aoss:us-west-2:307297743176:collection/h7cmj732p9d3v91spkhd',
 'collectionId': 'h7cmj732p9d3v91spkhd',
 'vectorIndexName': 'ws-index-',
 'bedrockExecutionRoleArn': 'arn:aws:iam::307297743176:role/advanced-rag-workshop-bedrock_execution_role-us-west-2',
 's3Bucket': '307297743176-us-west-2-advanced-rag-workshop',
 'kbFixedChunk': '4P6PBDDEGL',
 'kbSemanticChunk': 'IC3ZCBORXT',
 'kbCustomChunk': 'Q2T9CZ5VFA',
 'kbHierarchicalChunk': '1YIFVW0Z5E'}

## RAG with a simple question

##### We will ask the question "In text-to-sql, what are the stages in data generation process?" <br/>
##### We should expect a response from a PDF shown below that includes the three stages shown in picture below.
![Image](./image01.png)

### Configuration

In [2]:
# Knowledge Base ID - Choose from different chunking strategies (Fixed, Hierarchical, or Semantic)
kb_id = variables["kbFixedChunk"] 

# Bedrock Model ARN - Using Amazon Nova Lite for inference
model_id = f"arn:aws:bedrock:us-west-2:{variables['accountNumber']}:inference-profile/us.amazon.nova-lite-v1:0"

# Number of relevant documents to retrieve for RAG
number_of_results = 5

# Configuration for text generation - Controls output length, randomness, and diversity
generation_configuration = {
    'inferenceConfig': {
        'textInferenceConfig': {
            'maxTokens': 4096,  # Maximum number of tokens in the generated response
            'stopSequences': [],  # List of sequences that indicate stopping points
            'temperature': 0.2,  # Controls randomness (lower values = more deterministic output)
            'topP': 0.5  # Controls diversity of output by considering top P probability mass
        }
    },
}

### Retrieve and Generate with a simple query

In [3]:
import boto3

# Initialize the Bedrock Agent Runtime client
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name=variables["regionName"])

# Define the query to search relevant knowledge base documents and generate an answer
query = "In text-to-sql, what are the stages in data generation process?"

# Perform retrieval-augmented generation (RAG) using the knowledge base
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        "text": query  # User query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,  # ID of the knowledge base used for retrieval
            "modelArn": model_id,  # Bedrock model ARN for text generation
            "generationConfiguration": generation_configuration,  # Model configuration parameters
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": number_of_results  # Number of relevant documents to fetch
                } 
            }
        }
    }
)

# Display the generated response
print('----------------- Answer ---------------------')
print(response['output']['text'], end='\n' * 2)

# Display the full response including citations for retrieved documents
print('----------------- Citations ------------------')
print(json.dumps(response, indent=2))


----------------- Answer ---------------------
Answer: The data generation process in text-to-SQL consists of three main stages:

1. **SQL parsing & Database modification**: The first step involves extracting the columns and cell values by parsing the SQL queries using a custom parser on top of SQLGLOT. Then, the database schemas are modified using an LLM to create ambiguous or unanswerable questions. For example, for Ambiguous SELECT Column questions, the LLM generates two alternative columns to replace the original column mentioned in the question, making it ambiguous.

2. **SQL modification and clarification response generation**: Based on the user question, the modified database, and the original SQL, the text-to-SQL assistant's initial response to the ambiguous/unanswerable question, the following user clarification response, and the assistant's SQL response to the clarified question are generated. The assistant's response to the initial user question is generated using either a t

### Comparison between chunking strategies: Fixed vs Semantic

##### Now, Let's ask a more nuanced question that needs to extract information from a table in the PDF. Also, let's ask it to do some analysis. <br/>
##### We will also compare the response quality when you use fixed size chunking vs Semantic chunking.
![image02](image02.png)

#### A nuanced query with a Fixed-sized chunking strategy

##### We will ask question that should answer how net income changed rom 2022 to 2023 to 20234.
![image03](image03.png)

In [4]:
# Knowledge Base ID - Fixed Chunk.
kb_id = variables["kbFixedChunk"] 

# Bedrock Model ARN - Using Amazon Nova Lite for inference
model_id = f"arn:aws:bedrock:us-west-2:{variables['accountNumber']}:inference-profile/us.amazon.nova-lite-v1:0"

# Number of relevant documents to retrieve for RAG
number_of_results = 5

# Configuration for text generation - Controls output length, randomness, and diversity
generation_configuration = {
    'inferenceConfig': {
        'textInferenceConfig': {
            'maxTokens': 4096,  # Maximum number of tokens in the generated response
            'stopSequences': [],  # List of sequences that indicate stopping points
            'temperature': 0.2,  # Controls randomness (lower values = more deterministic output)
            'topP': 0.5  # Controls diversity of output by considering top P probability mass
        }
    }
}

In [5]:
import boto3

# Initialize the Bedrock Agent Runtime client
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name=variables["regionName"])

# Define the query to search relevant knowledge base documents and generate an answer
query = "In CONSOLIDATED STATEMENTS OF CASH FLOWS, How much did net income change in years 2022, 2023, 2024?"

# Perform retrieval-augmented generation (RAG) using the knowledge base
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        "text": query  # User query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,  # ID of the knowledge base used for retrieval
            "modelArn": model_id,  # Bedrock model ARN for text generation
            "generationConfiguration": generation_configuration,  # Model configuration parameters
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": number_of_results  # Number of relevant documents to fetch
                } 
            }
        }
    }
)

# Display the generated response
print('----------------- Answer ---------------------')
print(response['output']['text'], end='\n' * 2)

----------------- Answer ---------------------
Answer: The net income for the years 2022, 2023, and 2024 was $(2,722) million, $30,425 million, and $30,425 million, respectively. The net income increased by $33,147 million from 2022 to 2023 and remained the same from 2023 to 2024.



#### The response above might not be accurate with what it should be.The accurate response should be:

> Year 2022 to Year 2023: \\$33,147 increase<br/>
Year 2023 to Year 2024: \\$28,823 increase 

#### Now Let's execute the same question while using the KB with Semantic Chunking.

In [6]:
# Knowledge Base ID - Semantic Chunk.
kb_id = variables["kbSemanticChunk"] 


# Bedrock Model ARN - Using Amazon Nova Lite for inference
model_id = f"arn:aws:bedrock:us-west-2:{variables['accountNumber']}:inference-profile/us.amazon.nova-lite-v1:0"

# Number of relevant documents to retrieve for RAG
number_of_results = 5

# Configuration for text generation - Controls output length, randomness, and diversity
generation_configuration = {
    'inferenceConfig': {
        'textInferenceConfig': {
            'maxTokens': 4096,  # Maximum number of tokens in the generated response
            'stopSequences': [],  # List of sequences that indicate stopping points
            'temperature': 0.2,  # Controls randomness (lower values = more deterministic output)
            'topP': 0.5  # Controls diversity of output by considering top P probability mass
        }
    },
}

In [10]:
import boto3

# Initialize the Bedrock Agent Runtime client
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name=variables["regionName"])

# Define the query to search relevant knowledge base documents and generate an answer
query = "In CONSOLIDATED STATEMENTS OF CASH FLOWS, How much did net income change in years 2022, 2023, 2024? Show me how you did the math."

# Perform retrieval-augmented generation (RAG) using the knowledge base
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        "text": query  # User query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,  # ID of the knowledge base used for retrieval
            "modelArn": model_id,  # Bedrock model ARN for text generation
            "generationConfiguration": generation_configuration,  # Model configuration parameters
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": number_of_results  # Number of relevant documents to fetch
                } 
            }
        }
    }
)

# Display the generated response
print('----------------- Answer ---------------------')
print(response['output']['text'], end='\n' * 2)

----------------- Answer ---------------------
Answer: Here is the change in net income for the years 2022, 2023, and 2024:

- 2022: Net income was $33,364 million in 2021 and $-2,722 million in 2022. So the change in net income was $-2,722 - $33,364 = -$36,086 million.
- 2023: Net income was $-2,722 million in 2022 and $30,425 million in 2023. So the change in net income was $30,425 - $-2,722 = $33,147 million.
- 2024: Net income was $30,425 million in 2023 and $59,248 million in 2024. So the change in net income was $59,248 - $30,425 = $28,823 million.

So in summary, the change in net income was -$36,086 million in 2022, $33,147 million in 2023, and $28,823 million in 2024.



Compare the above results with the accurate response that should be:
> Year 2022 to Year 2023: \\$33,147 increase <br/>
> Year 2023 to Year 2024: \\$28,823 increase

As you can see here, Semantic Chunking was able to deliver accurate response as compared to Fixed Size chunking.

## Improve RAG quality with Enhanced Prompts

### Importance of Prompt Engineering
Prompt engineering refers to the practice of optimizing textual input to a large language model (LLM) to improve output and receive the responses you want. Prompting helps an LLM perform a wide variety of tasks, including classification, question answering, code generation, creative writing, and more. The quality of prompts that you provide to a LLM can impact the quality of the model's responses. <br/>
 

### Useful techniques to improve prompts for Amazon Nova models
Please refer [link](https://docs.aws.amazon.com/nova/latest/userguide/prompting.html) for the best practice of prompt engineering with Amazon Nova models. Fllowings are a few highlights:
* Create precise prompts. Provide contextual information, speficy the output format and style, and provide clear prompt sections.
* Use system propmts to define how the model will repond.
* Give Amazon Nova time to think. For example, add ```"Think step-by-step."``` at the end of your query.
* Provide examples.

### Tips for using prompts in RAG
* Provide Prompt Template: As with other functionalities, enhancing the system prompt can be beneficial. You can define the RAG Systems description in the system prompt, outlining the desired persona and behavior for the model.
* Use Model Instructions: Additionally, you can include a dedicated ```"Model Instructions:"``` section within the system prompt, where you can provide specific guidelines for the model to follow. For instance, you can list instructions such as: ```In this example session, the model has access to search results and a user's question, its job is to answer the user's question using only information from the search results.```
* Avoid Hallucination by restricting the instructions: Bring more focus to instructions by clearly mentioning "DO NOT USE INFORMATION THAT IS NOT IN SEARCH RESULTS!" as a model instruction so the answers are grounded in the provided context.


In [11]:
# A prompt template with Model Instructions:
prompt_template = """
You are a professional financial analyst. 
Based on the retrieved content from Amazon's 10-K filings, provide clear, concise, and insightful answers to user questions. 
When summarizing financial results, respond in bullet points highlighting key metrics, trends, and takeaways. 
Ensure your answers are accurate, data-driven, and easy to understand.

$Query$
Resource: $search_results$
"""


#### Without a Prompt Template

In [12]:
query = "Show me the amazon financial results for 2023"

# Perform RAG with/without the prompt template
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        "text": query  # User query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,  # ID of the knowledge base used for retrieval
            "modelArn": model_id,  # Bedrock model ARN for text generation
            "generationConfiguration": {**generation_configuration
                                        },  # Model configuration parameters
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": number_of_results  # Number of relevant documents to fetch
                } 
            }
        }
    }
)
# Display the generated response
print('----------------- Answer ---------------------')
print(response['output']['text'], end='\n' * 2)


----------------- Answer ---------------------
Amazon's financial results for 2023 show a net sales of $574,785 million, with an operating income of $36,852 million.



#### Using a Prompt Template

In [13]:
# Perform RAG with/without the prompt template
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        "text": query  # User query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,  # ID of the knowledge base used for retrieval
            "modelArn": model_id,  # Bedrock model ARN for text generation
            "generationConfiguration": {**generation_configuration
                                        , "promptTemplate":{"textPromptTemplate": prompt_template} # Comment in/out to test the effect of the Prompt Template
                                    },  # Model configuration parameters
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": number_of_results  # Number of relevant documents to fetch
                } 
            }
        }
    }
)
# Display the generated response
print('----------------- Answer ---------------------')
print(response['output']['text'], end='\n' * 2)


----------------- Answer ---------------------
Based on the provided content from Amazon's 10-K filings, here are the financial results for 2023:

### Amazon Financial Results for 2023

#### Key Metrics:
- **Net Sales:**
  - As Reported: $574,785 million
  - Exchange Rate Effect: $71 million
  - At Prior Year Rates: $574,856 million

- **Operating Expenses:**
  - As Reported: $537,933 million
  - Exchange Rate Effect: $531 million
  - At Prior Year Rates: $538,464 million

- **Operating Income:**
  - As Reported: $36,852 million
  - Exchange Rate Effect: $(460) million
  - At Prior Year Rates: $36,392 million

#### Trends and Takeaways:
- **Net Sales:**
  - Amazon's net sales increased from the prior year, showing a positive trend in revenue generation.
  - The exchange rate had a minimal positive effect on reported net sales.

- **Operating Expenses:**
  - Operating expenses also increased, reflecting higher costs associated with running the business.
  - The exchange rate had a sligh