# Retrieval Augmented Generation with Amazon Bedrock - Solving Contextual Limitations with RAG

> *PLEASE NOTE: This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

---

## Background

Previously we saw that Amazon Bedrock could provide an answer to a technical question, however we had to manually provide it with the relevant data and provide the contex ourselves. While that approach works with short documents or single-ton applications, it fails to scale to enterprise level question answering where there could be large enterprise documents which cannot all be fit into the prompt sent to the model.

We can improve upon this process by implementing an architecure called Retreival Augmented Generation (RAG). RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. 

In this notebook we explain how to approach the pattern of Question Answering to find and leverage the documents to provide answers to the user questions.

## Solution
To the above challenges, this notebook uses the following strategy

### Prepare documents for search
![](./images/embeddings_lang.png)

Before being able to answer the questions, the documents must be processed and a stored in a document store index
- Load the documents
- Process and split them into smaller chunks
- Create a numerical vector representation of each chunk using Amazon Bedrock Titan Embeddings model
- Create an index using the chunks and the corresponding embeddings

### Respond to user question
![Question](./images/chatbot_lang.png)

When the documents index is prepared, you are ready to ask the questions and relevant documents will be fetched based on the question being asked. Following steps will be executed.
- Create an embedding of the input question
- Compare the question embedding with the embeddings in the index
- Fetch the (top N) relevant document chunks
- Add those chunks as part of the context in the prompt
- Send the prompt to the model under Amazon Bedrock
- Get the contextual answer based on the documents retrieved

---
## Setup the `boto3` client connection to Amazon Bedrock

Just like previous notebooks, we will create a client side connection to Amazon Bedrock with the `boto3` library.

In [None]:
import boto3
import os
from IPython.display import Markdown, display

region = os.environ.get("AWS_REGION")
boto3_bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
)

---
## Semantic Similarity with Amazon Titan Embeddings

Semantic search refers to searching for information based on the meaning and concepts of words and phrases, rather than just matching keywords. Embedding models like Amazon Titan Embeddings allow semantic search by representing words and sentences as dense vectors that encode their semantic meaning.

Semantic matching is extremely helpful for RAG because it returns results that are conceptually related to the user's query, even if they don't contain the exact keywords. This leads to more relevant and useful search results which can be injected into our LLM's prompts.

First, let's take a look below to illustrate the capabilities of semantic search with Amazon Titan.

The `embed_text_input` function below is an example function which will return an embedding output based on text output.

In [None]:
import json
import numpy as np

def embed_text_input(bedrock_client, prompt_data, modelId="amazon.titan-embed-text-v1"):
    accept = "application/json"
    contentType = "application/json"
    body = json.dumps({"inputText": prompt_data})
    response = bedrock_client.invoke_model(
        body=body, modelId=modelId, accept=accept, contentType=contentType
    )
    response_body = json.loads(response.get("body").read())
    embedding = response_body.get("embedding")
    return np.array(embedding)

To give an example of how this works, lets take a look at matching a user input to two "documents". We use a dot product calculation to rank the similarity between the input and each document, but there are many ways to do this in practice.

In [None]:
user_input = 'Things to do on vacation'
document_1 = 'swimming, site seeing, sky diving'
document_2 = 'cleaning, note taking, studying'

user_input_vector = embed_text_input(boto3_bedrock, user_input)
document_1_vector = embed_text_input(boto3_bedrock, document_1)
document_2_vector = embed_text_input(boto3_bedrock, document_2)

doc_1_match_score = np.dot(user_input_vector, document_1_vector)
doc_2_match_score = np.dot(user_input_vector, document_2_vector)

print(f'"{user_input}" matches "{document_1}" with a score of {doc_1_match_score:.1f}')
print(f'"{user_input}" matches "{document_2}" with a score of {doc_2_match_score:.1f}')

In [None]:
user_input = 'Things to do that are productive'
document_1 = 'swimming, site seeing, sky diving'
document_2 = 'cleaning, note taking, studying'

user_input_vector = embed_text_input(boto3_bedrock, user_input)
document_1_vector = embed_text_input(boto3_bedrock, document_1)
document_2_vector = embed_text_input(boto3_bedrock, document_2)

doc_1_match_score = np.dot(user_input_vector, document_1_vector)
doc_2_match_score = np.dot(user_input_vector, document_2_vector)

print(f'"{user_input}" matches "{document_1}" with a score of {doc_1_match_score:.1f}')
print(f'"{user_input}" matches "{document_2}" with a score of {doc_2_match_score:.1f}')

The example above shows how the semantic meaning behind the user input and provided documents can be effectively ranked by Amazon Titan.

---
## Simplifying Search with LangChain and FAISS

Two helpful tools that help set up these semantic similarity vector search engines are LangChain and FAISS. We will use LangChain to help prepare text documents, create an easy to use abstration to the Amazon Bedrock embedding model. We will use FAISS to create a searchable data structure for documents in vector formats.

First, let's import the required LangChain libraries for the system. Notice that LangChain has a FAISS wrapper class which we will be using as well.

In [None]:
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from langchain.embeddings import BedrockEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

### Prepare Text with LangChain

In order to load our document into FAISS, we first need to split the document into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limited length of input tokens, so for the sake of this use-case we are creating chunks of roughly 1000 characters.

In [None]:
# load the sagemaker FAQ list
with open('../data/sagemaker/sagemaker_faqs.csv') as f:
    doc = f.read()

# create a loader
docs = []
loader = TextLoader('')
docs.append(Document(page_content=doc))

# split documents into chunks
text_splitter = CharacterTextSplitter(
    separator='\n',
    chunk_size=1000,
    chunk_overlap=0,
    
)
split_docs = text_splitter.split_documents(docs)

Below is an example of one of the document chunks. Notice how the semantic text could easily be searched to answer a given question.

In [None]:
split_docs[0]

### Create an Embedding Store with FAISS

Once the documents are prepared, LangChain's `BedrockEmbeddings` and `FAISS` classes make it very easy to create an in memory vector store as shown below.

```python
# create instantiation to embedding model
embedding_model = BedrockEmbeddings(
    client=boto3_bedrock,
    model_id="amazon.titan-embed-text-v1"
)

# create vector store
vs = FAISS.from_documents(split_docs, embedding_model)
```

For times sake in this lab, we have already run the code above and provided the FAISS index as a persistent file in the `faiss-index/langchain` directory. We load the vector store (along with a connection to the Titan embedding model) into memory with the cell below.

In [None]:
embedding_model = BedrockEmbeddings(
    client=boto3_bedrock,
    model_id="amazon.titan-embed-text-v1"
)
vs = FAISS.load_local('../faiss-index/langchain/', embedding_model)

### Search the FAISS Vector Store

We can now use the `similarity_search` function to match a question to the best 3 chunks of text from our document which was loaded into FAISS. Notice how the search result is correctly matched to the input question :)

In [None]:
search_results = vs.similarity_search(
    'How are SageMaker JumpStart foundation models priced?', k=3
)

In [None]:
search_results[0]

---
## Combine Search Results with Text Generation

In the final section of this notebook, we can now combine our vector search capability with our LLM in order to dynamically provide context to answer questions effectively with RAG. 

First, we will start by using a utility from LangChain called prompt templates. The `PromptTemplate` class allows us to easily inject context and a human input into the Claude prompt template.

In [None]:
from langchain import PromptTemplate

RAG_PROMPT_TEMPLATE = '''Here is some important context which can help inform the questions the Human asks.
Make sure to not make anything up to answer the question if it is not provided in the context.

<context>
{context}
</context>

Human: {human_input}

Assistant:
'''
PROMPT = PromptTemplate.from_template(RAG_PROMPT_TEMPLATE)

Just like before, we will again use the `similarity_search` function to provide relevant context from our documentation.

In [None]:
human_input = 'How are SageMaker JumpStart foundation models priced?'
search_results = vs.similarity_search(human_input, k=3)
context_string = '\n\n'.join([f'Document {ind+1}: ' + i.page_content for ind, i in enumerate(search_results)])

Now we will augment the LangChain prompt template with the human input and the context from the documents.

In [None]:
prompt_data = PROMPT.format(human_input=human_input, context=context_string)

Finally, we will use the LangChain `Bedrock` class to call the Claude model with our augmented prompt

In [None]:
from langchain.llms import Bedrock

llm = Bedrock(
    client=boto3_bedrock,
    model_id="anthropic.claude-instant-v1",
    model_kwargs={
        "max_tokens_to_sample": 500,
        "temperature": 0.9,
    },
)
output = llm(prompt_data).strip()

In [None]:
display(Markdown(f'{output}'))

---

## Scaling Vector Databases

In this lab, we have only used a local, in-memory vector database with FAISS. This is due to the fact that is this is a workshop and not a production setting. If you are looking for a way to easily scale this FAISS solution on AWS, check out [this example](https://github.com/aws-samples/sagemaker-vector-store-microservice) which utilize Amazon SageMaker to deploy a vector search microservice with FAISS.

However, once you get to production and have billions (or more) vectors which need to be used in a RAG architecture, you will need to employ a larger scale solution which is purpose built and tuned for distributed vector search. AWS offers multiple ways to accomplish this this. Here are a few of the notable options available today.

### Amazon Open Search

The vector engine for Amazon OpenSearch Serverless introduces a simple, scalable, and high-performing vector storage and search capability that helps developers build machine learning (ML)–augmented search experiences and generative artificial intelligence (AI) applications without having to manage the vector database infrastructure. Get contextually relevant responses across billions of vectors in milliseconds by querying vector embeddings, which can be combined with text-based keywords in a single hybrid request.

Check out these links for more information...
* [Vector Engine for Amazon OpenSearch Serverless](https://aws.amazon.com/opensearch-service/serverless-vector-engine/)
* [Amazon OpenSearch Service’s vector database capabilities explained](https://aws.amazon.com/blogs/big-data/amazon-opensearch-services-vector-database-capabilities-explained/)

### Amazon Aurora with `pgvector`

Amazon Aurora PostgreSQL-Compatible Edition now supports the pgvector extension to store embeddings from machine learning (ML) models in your database and to perform efficient similarity searches. pgvector can store and search embeddings from Amazon Bedrock which helps power vector search for RAG. pgvector on Aurora PostgreSQL is a great option for a vector database for teams who are looking for the power of semantic search in combination with tried and trusted Amazon Relational Database Services (RDS).

Check out these links for more information...
* [Feature announcement](https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-aurora-postgresql-pgvector-vector-storage-similarity-search/)
* [Leverage pgvector and Amazon Aurora PostgreSQL for Natural Language Processing, Chatbots and Sentiment Analysis](https://aws.amazon.com/blogs/database/leverage-pgvector-and-amazon-aurora-postgresql-for-natural-language-processing-chatbots-and-sentiment-analysis/)

---
## Next steps

Now you have been able to enhance your Amazon Bedrock LLM with RAG in order to better answer user questions with up-to-date context. In the next section, we will learn how to combine this solution with a chat based paradigm in order to create a more interactive application which utilizes RAG.