# Retrieval Augmented Generation with Amazon Bedrock - Why RAG is a Necessary Concept

> *PLEASE NOTE: This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

---

Question Answering (QA) is an important task that involves extracting answers to factual queries posed in natural language. Typically, a QA system processes a query against a knowledge base containing structured or unstructured data and generates a response with accurate information. Ensuring high accuracy is key to developing a useful, reliable and trustworthy question answering system, especially for enterprise use cases. However, in this notebook, we will highlight a well documented issue with LLMs: LLM's are unable to answer questions outside of their training data.

---
## Setup the `boto3` client connection to Amazon Bedrock

Similar to notebook 00, we will create a client side connection to Amazon Bedrock with the `boto3` library.

In [4]:
import boto3
import os
from IPython.display import Markdown, display
import logging
import boto3


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
    
logging.basicConfig(level=logging.INFO,format="%(levelname)s: %(message)s")

region = os.environ.get("AWS_REGION")
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
)
claude3 = 'claude3'
llama2 = 'llama2'
llama3='llama3'
mistral='mistral'
titan='titan'
models_dict = {
    claude3 : 'anthropic.claude-3-sonnet-20240229-v1:0',
    llama2: 'meta.llama2-13b-chat-v1',
    llama3: 'meta.llama3-8b-instruct-v1:0',
    mistral: 'mistral.mistral-7b-instruct-v0:2',
    titan : 'amazon.olympus-premier-v1:0'
}
max_tokens_val = 100
temperature_val = 0.1
dict_add_params = {
    llama3: {"max_gen_len":max_tokens_val, "temperature":temperature_val} , 
    claude3: {"top_k": 200,  "temperature": temperature_val, "max_tokens": max_tokens_val},
    mistral: {"max_tokens":max_tokens_val, "temperature": temperature_val} , 
    titan:  {"topK": 200,  "maxTokenCount": max_tokens_val}
}


def generate_conversation(bedrock_client,model_id,system_text,input_text):
    """
    Sends a message to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        system_text (JSON) : The system prompt.
        input text : The input message.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    logger.info("Generating message with model %s", model_id)

    # Message to send.
    message = {
        "role": "user",
        "content": [{"text": input_text}]
    }
    messages = [message]
    system_prompts = [{"text" : system_text}]

    # Inference parameters to use.
    temperature = 0.5
    top_k = 200
    max_tokens=100

    #Base inference parameters to use.
    inference_config = {"temperature": temperature}
    # Additional inference parameters to use.
    additional_model_fields = {"max_gen_len":100} #{"top_k": top_k, "max_tokens": max_tokens}

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
        additionalModelRequestFields=get_additional_model_fields(model_id)
    )

    return response

def get_additional_model_fields(modelId):

    return dict_add_params.get(modelId)
    #{"top_k": top_k, "max_tokens": max_tokens}}
    
def get_converse_output(response_obj):
    ret_messages=[]
    output_message = response['output']['message']
    role_out = output_message['role']

    for content in output_message['content']:
        ret_messages.append(content['text'])
        
    return ret_messages, role_out





#### Test an invocation

In [5]:

modelId = models_dict.get(llama3) #claude3) #llama3)
system_text = "You are an economist with access to lots of data."
input_text = "Write an article about impact of high inflation to GDP of a country."
response = generate_conversation(bedrock_runtime, modelId, system_text, input_text)
output_message = response['output']['message']


display(Markdown(get_converse_output(response)[0][0]))

INFO:__main__:Generating message with model meta.llama3-8b-instruct-v1:0




**The Devastating Impact of High Inflation on a Country's GDP**

Inflation, the rate at which prices for goods and services are rising, is a crucial economic indicator that can have far-reaching consequences on a country's economy. When inflation rises above a certain threshold, it can lead to a decline in the purchasing power of consumers, reduce savings, and increase uncertainty, ultimately affecting a country's Gross Domestic Product (GDP). In this article, we will explore the impact of high inflation on a country's GDP and examine the data to support our claims.

**The Relationship Between Inflation and GDP**

Economists have long recognized the inverse relationship between inflation and GDP. When inflation is high, it can lead to a decline in real GDP, as the increased prices reduce the purchasing power of consumers and businesses. This is because high inflation can:

1. **Reduce Consumer Spending**: As prices rise, consumers may delay purchases or switch to cheaper alternatives, leading to a decline in aggregate demand and, subsequently, a reduction in GDP.
2. **Increase Production Costs**: Higher production costs due to inflation can lead to reduced competitiveness, lower profit margins, and, ultimately, a decline in economic activity.
3. **Reduce Savings**: High inflation can erode the value of savings, leading to reduced consumer spending and investment, which can negatively impact GDP.
4. **Increase Uncertainty**: High inflation can create uncertainty, leading to reduced investment, lower economic growth, and a decline in GDP.

**Data Analysis**

To illustrate the impact of high inflation on a country's GDP, let's examine the data from several countries with high inflation rates in recent years.

* **Argentina**: In 2020, Argentina experienced an inflation rate of 53.8%, one of the highest in the world. The country's GDP growth rate declined to 2.3% in 2020, down from 3.2% in 2019.
* **Venezuela**: In 2020, Venezuela's inflation rate reached 10,000%, one of the highest in the world. The country's GDP contracted by 35.7% in 2020, its worst performance in decades.
* **Turkey**: In 2020, Turkey's inflation rate reached 15.3%, prompting the Central Bank to raise interest rates to combat the inflationary pressures. The country's GDP growth rate slowed to 0.6% in 2020, down from 2.6% in 2019.

**Conclusion**

The data clearly suggests that high inflation can have a devastating impact on a country's GDP. As prices rise, consumers and businesses become less confident, leading to reduced spending, investment, and economic activity. To combat high inflation, central banks and governments must implement policies that reduce inflationary pressures, such as monetary tightening, fiscal discipline, and structural reforms.

In conclusion, high inflation is a significant threat to a country's economic stability and growth. It is essential for policymakers to prioritize inflation control and implement policies that promote economic stability and growth. By doing so, countries can mitigate the negative impact of high inflation on their GDP and ensure a more sustainable economic future.

---
## Highlighting the Contextual Issue

We are trying to model a situation where we are asking the model to provide information about Amazon Advertizing Business. We will first ask the model based on the training data to provide us with an answer about pricing of this technoloy. This technique is called `Zero Shot`. Let's take a look at Claude's response to a quick question "How did Amazon's Advertising business do?"

In [8]:
import json

modelId = models_dict.get(claude3) #claude3) #llama3)
system_text = "You are an economist with access to lots of data."
input_text = "How did Amazon's Advertising business do in 2023?"
response = generate_conversation(bedrock_runtime, modelId, system_text, input_text)
output_message = response['output']['message']


display(Markdown(get_converse_output(response)[0][0]))

INFO:__main__:Generating message with model anthropic.claude-3-sonnet-20240229-v1:0


Unfortunately, I don't have access to Amazon's actual financial results for 2023 since that year is still in the future. As an AI assistant without direct connections to Amazon's internal data, I can only provide information based on what has been publicly reported so far.

Amazon does have a large and growing advertising business as part of its overall operations, but specifics on its 2023 performance are not yet known. Major tech companies typically release their quarterly and annual financial reports a few weeks after the end of each period.

If you're interested, I can share details on Amazon's advertising revenues and growth rates for past years based on the company's published reports and analyst estimates. However, for 2023 specifically, we'll have to wait until that data is released by Amazon next year. Let me know if you'd like me to provide historical advertising metrics for Amazon instead.

The answer provided by Llama3 or Claude is actually incorrect based on Andy Jassi's letter to shareholder in 2023. This is not surprising because the letter is fairly new at the time of writing, meaning that there are more likely changes to the correct answer to the question which are not included in Claude's training data.

This implies we need to augment the prompt with additional data about the desired technology question and then the model will return us a very factually accurate. We will see how this improves the response in the next section.

---
## Manually Providing Correct Context

In order to have Claude correctly answer the question provided, we need to provide the model context which is relevant to the question. Below is a an extract from the letter to shareholders documentation. 

```
Question:

How did Amazon's Advertising business do in 2023?

Answer:

Alongside our Stores business, Amazon’s Advertising progress remains strong, growing 24% YoY from
$38B in 2022 to $47B in 2023, primarily driven by our sponsored ads. We’ve added Sponsored TV to this
offering, a self-service solution for brands to create campaigns that can appear on up to 30+ streaming
TV services, including Amazon Freevee and Twitch, and have no minimum spend. Recently, we’ve expanded
our streaming TV advertising by introducing ads into Prime Video shows and movies, where brands can
reach over 200 million monthly viewers in our most popular entertainment offerings, across hit movies and
shows, award-winning Amazon MGM Originals, and live sports like Thursday Night Football. Streaming
TV advertising is growing quickly and off to a strong start.
```

We can inject this context into the prompt as shown below and ask the LLM to answer our question based on the context provided.

In [9]:
PROMPT = '''Here is some important context which can help inform the questions the Human asks.

<context> Amazon's Advertising business in 2023
Alongside our Stores business, Amazon’s Advertising progress remains strong, growing 24% YoY from
$38B in 2022 to $47B in 2023, primarily driven by our sponsored ads. We’ve added Sponsored TV to this
offering, a self-service solution for brands to create campaigns that can appear on up to 30+ streaming
TV services, including Amazon Freevee and Twitch, and have no minimum spend. Recently, we’ve expanded
our streaming TV advertising by introducing ads into Prime Video shows and movies, where brands can
reach over 200 million monthly viewers in our most popular entertainment offerings, across hit movies and
shows, award-winning Amazon MGM Originals, and live sports like Thursday Night Football. Streaming
TV advertising is growing quickly and off to a strong start.
</context>

Human: How did Amazon's Advertising business do in 2023?

Assistant:
'''

import json

modelId = models_dict.get(claude3) #claude3) #llama3)
system_text = "You are an economist with access to lots of data."
response = generate_conversation(bedrock_runtime, modelId, system_text, PROMPT)
output_message = response['output']['message']


display(Markdown(get_converse_output(response)[0][0]))

INFO:__main__:Generating message with model anthropic.claude-3-sonnet-20240229-v1:0


According to the context provided, Amazon's Advertising business performed strongly in 2023, growing 24% year-over-year from $38 billion in 2022 to $47 billion in 2023. This growth was primarily driven by Amazon's sponsored ads offerings.

Some key points about Amazon's Advertising business in 2023:

1. Revenue grew from $38 billion in 2022 to $47 billion in 2023, a 24% year-over-year increase.

2. The growth was mainly fueled by Amazon's sponsored ads products.

3. Amazon introduced a new offering called Sponsored TV, a self-service solution for brands to create ad campaigns across over 30 streaming TV services, including Amazon Freevee and Twitch.

4. Amazon expanded streaming TV advertising by introducing ads into Prime Video shows and movies, allowing brands to reach over 200 million monthly viewers on Amazon's popular entertainment offerings.

5. The streaming TV advertising segment is described as growing quickly and off to a strong start.

Overall, the context suggests that Amazon's Advertising business, particularly its sponsored ads and streaming TV advertising offerings, experienced significant growth and momentum in 2023.

Now you can see that the model answers the question accurately based on the factual context. However, this context had to be added manually to the prompt. In a production setting, we need a way to automate the retrieval of this information.

---
## Quick Note: Long Context Windows

One known limitation for RAG based solutions is the need for inclusion of lots of text into a prompt for an LLM. Fortunately, Claude can help this issue by providing an input token limit of 100k tokens. This limit [corresponds to around 75k words](https://www.anthropic.com/index/100k-context-windows) which is an astounding amount of text.

Let's take a look at an example of Claude handling this large context size...

In [12]:
book = ''
with open('../data/book/book.txt', 'r') as f:
    book = f.read()
print('Context:', book[0:53], '...')
print('The context contains', len(book.split(' ')), 'words')

Context: Great Gatsby By F. Scott Fitzgerald The Great Gatsby  ...
The context contains 52854 words


In [14]:
PROMPT =f'''Human: Summarize the plot of this book.

<book>
{book}
</book>

Assistant:'''

import json

modelId = models_dict.get(claude3) #claude3) #llama3)
system_text = "You are a Literary scholar"
response = generate_conversation(bedrock_runtime, modelId, system_text, PROMPT)
output_message = response['output']['message']


display(Markdown(get_converse_output(response)[0][0]))

INFO:__main__:Generating message with model anthropic.claude-3-sonnet-20240229-v1:0


Here is a summary of the plot of The Great Gatsby by F. Scott Fitzgerald:

The story is narrated by Nick Carraway, who moves to New York to become a bond trader. He rents a house next door to a mysterious millionaire named Jay Gatsby. Nick is drawn into the wealthy Long Island social circle of his cousin Daisy and her husband Tom Buchanan. 

Nick learns that Gatsby is in love with Daisy, with whom he had a romantic relationship years earlier before going off to war. Gatsby is incredibly wealthy from bootlegging and other shady business dealings. He throws lavish parties every weekend in hopes that Daisy will attend.

Nick arranges for Gatsby to reunite with Daisy, and they begin an affair. Tom grows suspicious of his wife's relationship with Gatsby. In a pivotal confrontation, Tom reveals that Gatsby's wealth comes from criminal activities. Daisy ends up leaving Gatsby after a dramatic incident.

In the climax, Gatsby's dreams are crushed when Daisy chooses to remain with her husband Tom. Later that night, Gatsby's former love interest Myrtle Wilson is struck and killed by Gatsby's car, which Daisy was driving. George Wilson mistakenly believes Gatsby was the driver and shoots him dead.

Nick is deeply disillusioned by the carelessness and lack of compassion shown by the wealthy characters. Only Nick and a few others attend Gatsby's funeral. In the end, Nick returns to the Midwest, leaving the materialistic East behind.

#### Latency

However you can see that it has taken close to a minute or more to generate the summary. Let us examine a technique to speed up the response by parsing the context into various sections and then invoke in parallel and then finally condense the response or summary of summary suaully known as prompt decomposition

---
## Next steps

Now you have been able to see a concrete example where LLMs can be improved with correct context injected into a prompt, lets move on to notebook 02 to see how we can automate this process.