# Retrieval Augmented Generation with Amazon Bedrock - Why RAG is a Necessary Concept

> *PLEASE NOTE: This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

---

Question Answering (QA) is an important task that involves extracting answers to factual queries posed in natural language. Typically, a QA system processes a query against a knowledge base containing structured or unstructured data and generates a response with accurate information. Ensuring high accuracy is key to developing a useful, reliable and trustworthy question answering system, especially for enterprise use cases. However, in this notebook, we will highlight a well documented issue with LLMs: LLM's are unable to answer questions outside of their training data.

---
## Setup the `boto3` client connection to Amazon Bedrock

Similar to notebook 00, we will create a client side connection to Amazon Bedrock with the `boto3` library.

In [None]:
import boto3
import os
from IPython.display import Markdown, display

region = os.environ.get("AWS_REGION")
boto3_bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
)

---
## Highlighting the Contextual Issue

We are trying to model a situation where we are asking the model to provide information about Amazon SageMaker Jumpstart foundation models. We will first ask the model based on the training data to provide us with an answer about pricing of this technoloy. This technique is called `Zero Shot`. Let's take a look at Claude's response to a quick question "How are SageMaker JumpStart foundation models priced?"

In [None]:
import json
PROMPT = '''Human: How are SageMaker JumpStart foundation models priced?

Assistant:
'''
body = json.dumps({"prompt": PROMPT, "max_tokens_to_sample": 500})
modelId = "anthropic.claude-instant-v1"
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

display(Markdown(f'{response_body.get("completion").strip()}'))

The answer provided by Claude is actually incorrect based on SageMaker's documentation. This is not surprising because SageMaker Jumpstart foundation models are a quite new technology at the time of writing, meaning that there are more likely changes to the correct answer to the question which are not included in Claude's training data.

This implies we need to augment the prompt with additional data about the desired technology question and then the model will return us a very factually accurate. We will see how this improves the response in the next section.

---
## Manually Providing Correct Context

In order to have Claude correctly answer the question provided, we need to provide the model context which is relevant to the question. Below is a frequently asked question (FAQ) from the public SageMaker documentation. 

```
Question:

How are SageMaker JumpStart foundation models priced?

Answer:

For proprietary models, you are charged for software pricing determined by the model provider and SageMaker infrastructure charges based on the instance used. For publicly available models, you are charged SageMaker infrastructure charges based on the instance used. For more information, see Amazon SageMaker Pricing and the AWS Marketplace.
```

We can inject this context into the prompt as shown below and ask the LLM to answer our question based on the context provided.

In [None]:
PROMPT = '''Here is some important context which can help inform the questions the Human asks.

<context>
Question: How are SageMaker JumpStart foundation models priced?
Answer: For proprietary models, you are charged for software pricing determined by the model provider and SageMaker infrastructure charges based on the instance used. For publicly available models, you are charged SageMaker infrastructure charges based on the instance used. For more information, see Amazon SageMaker Pricing and the AWS Marketplace.
</context>

Human: How are SageMaker JumpStart foundation models priced?

Assistant:
'''

body = json.dumps({"prompt": PROMPT, "max_tokens_to_sample": 500})
response = boto3_bedrock.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())
display(Markdown(f'{response_body.get("completion").strip()}'))

Now you can see that the model answers the question accurately based on the factual context. However, this context had to be added manually to the prompt. In a production setting, we need a way to automate the retrieval of this information.

---
## Quick Note: Long Context Windows

One known limitation for RAG based solutions is the need for inclusion of lots of text into a prompt for an LLM. Fortunately, Claude can help this issue by providing an input token limit of 100k tokens. This limit [corresponds to around 75k words](https://www.anthropic.com/index/100k-context-windows) which is an astounding amount of text.

Let's take a look at an example of Claude handling this large context size...

In [None]:
book = ''
with open('../data/book/book.txt', 'r') as f:
    book = f.read()
print('Context:', book[0:53], '...')
print('The context contains', len(book.split(' ')), 'words')

In [None]:
prompt =f'''Human: Summarize the plot of this book.

<book>
{book}
</book>

Assistant:'''

body = json.dumps({"prompt": prompt, "max_tokens_to_sample": 1000,})
response = boto3_bedrock.invoke_model(
    body=body, modelId='anthropic.claude-instant-v1', accept='application/json', contentType='application/json'
)
response_body = json.loads(response.get('body').read())
display(Markdown(f'{response_body.get("completion").strip()}'))

---
## Next steps

Now you have been able to see a concrete example where LLMs can be improved with correct context injected into a prompt, lets move on to notebook 02 to see how we can automate this process.