# Retrieval Augmented Generation (RAG) using Bedrock in SageMaker

In this notebook we demonstrate how to use Retrieval Augmented Generation (RAG) to build a question-and-answer chatbot to converse with the **SEC Schedule 14A document** using Bedrock Models in SageMaker.

Retrieval Augmented Generation (RAG) is used to retrieve data from outside a bedrock model and augment your prompts by adding the relevant retrieved data in context. For more information about RAG model architectures, see [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401).

With RAG, the external data used to augment your prompts can come from multiple data sources, such as a document repositories, databases, or APIs. The first step is to convert your documents and any user queries into a compatible format to perform relevancy search. To make the formats compatible, a document collection, or knowledge library, and user-submitted queries are converted to numerical representations using embedding language models. Embedding is the process by which text is given numerical representation in a vector space. RAG model architectures compare the embeddings of user queries within the vector of the knowledge library. The original user prompt is then appended with relevant context from similar documents within the knowledge library. This augmented prompt is then sent to the bedrock model. You can update knowledge libraries and their relevant embeddings asynchronously.

In the previous sections of this workshop, you enabled Bedrock Models and tested these models for various Natural Language Processing (NLP) tasks such as text summarization, common sense reasoning, translation and question and answering. In this section, we will use this Bedrock endpoints to create vector embeddings that are stored in Amazon OpenSearch. We then use these embeddings in a RAG-model for a question-and-answer chatbot. The diagram below depicts this architecture.

We will also use **LangChain**, an opensource framework for developing and interfacing with applications powered by language models.

![Rag Architecture](../images/10-architecture.png)

## Prerequisites

The following are the prerequisites for this notebook:
1. Enable Access to Bedrock Models, We would use Claude V2 and Titan for embeddings


## Install Required Python Libraries

_*IMPORTANT*_: Ensure you are running Pythin 3.9+

### 1. Set Up Kernel and Required Dependencies

First, check that the correct kernel is chosen.

<img src="img/CheckDataScience30_python3.png" width="300"/>

You can click on that to see and check the details of the image, kernel, and instance type.

<img src="img/SetupDataScience30_python3.png" width="600"/>

# NOTE:  YOU CANNOT CONTINUE UNTIL THE KERNEL IS STARTED
# ### PLEASE WAIT UNTIL THE KERNEL IS STARTED BEFORE CONTINUING!!! ###

# Use `Shift+Enter` to Run Each Cell

Use `Shift+Enter` on the cell below to see the output.

# Click `Kernel` => `Restart Kernel and Run All Cells` to Run All Cells
![](img/restart-kernel-and-run-all-cells.png)

In [7]:
import sys

# Get the Python version.
python_version = sys.version_info

# Check if the Python version is above 3.9.
if python_version.major < 3 or python_version.minor < 9:
  # Raise an error message if the Python version is not above 3.9.
  raise Exception("Python version must be above 3.9.")

# Print a success message if the Python version is above 3.9.
print("Python version is above 3.9.")


Python version is above 3.9.


Begin by installing the required python libraries.

## _==> Please ignore all WARNINGs and ERRORs from the `pip install`'s below. <==_

In [36]:
!pip install -q -r ../requirements.txt

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Setup Environment

In [11]:
# Setup SageMaker Session
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


In [12]:
BWB_REGION_NAME = boto3.Session().region_name   
BWB_ENDPOINT_URL = 'https://bedrock-runtime.'+BWB_REGION_NAME+'.amazonaws.com'
BWB_PROFILE_NAME = 'bedrock-runtime'
# let's use 'store' so you can use these variables in other notebooks
%store BWB_ENDPOINT_URL
%store BWB_PROFILE_NAME
%store BWB_REGION_NAME

print(BWB_ENDPOINT_URL)
print(BWB_PROFILE_NAME)
print(BWB_REGION_NAME)

Stored 'BWB_ENDPOINT_URL' (str)
Stored 'BWB_PROFILE_NAME' (str)
Stored 'BWB_REGION_NAME' (str)
https://bedrock-runtime.us-east-1.amazonaws.com
bedrock-runtime
us-east-1


## Let's test Claude V2 to answer a simple question:  What is the largest city in New Hampshire?

In [18]:
import json
import os
import sys
import boto3

module_path = ".."
session = boto3.Session() 
sys.path.append(os.path.abspath(module_path))
##from utils import bedrock, print_ww

region = os.environ.get("AWS_REGION")
boto3_bedrock = session.client('bedrock-runtime', region, endpoint_url='https://bedrock-runtime.'+region+'.amazonaws.com')
os.environ["AWS_DEFAULT_REGION"] = boto3.Session().region_name 


#Here we are identifying the model to use, the prompt, and the inference parameters for the specified model.
bedrock_model_id = "anthropic.claude-v2" 
prompt = """
Human: What is the largest city in New Hampshire?
Assistant:
""" 
#the prompt to send to the model
body = json.dumps({"prompt": prompt,"max_tokens_to_sample": 512,"stop_sequences":[],"temperature":0,"top_p":0.9})

#We use Bedrock's invoke_model function to make the call.
response = boto3_bedrock.invoke_model(body=body, modelId=bedrock_model_id, accept='application/json', contentType='application/json') #send the payload to Bedrock

#This extracts & prints the returned text from the model's response JSON.
response_body = json.loads(response.get('body').read()) # read the response
print(response_body)


{'completion': ' The largest city in New Hampshire is Manchester.', 'stop_reason': 'stop_sequence', 'stop': '\n\nHuman:'}


## Let's incorporate LangChaing and run some tests with it.

In [19]:
# Import langchain 
from langchain.document_loaders import UnstructuredHTMLLoader,BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter,CharacterTextSplitter
from langchain.llms.sagemaker_endpoint import ContentHandlerBase
from langchain.vectorstores import OpenSearchVectorSearch

### In the next cell, we will take the name of the Amazon OpenSearch cluster we are using in this lab

In [20]:
# Set variables for Amazon OpenSearch

import boto3
cfn_client = boto3.client('cloudformation')
stack_name = "genai-rag-workshop-studio"

response = cfn_client.describe_stacks(
    StackName = stack_name,
)

outputs = response['Stacks'][0]['Outputs'] 

opensearch_host_id= next(output['OutputValue'] for output in outputs
        if output['OutputKey'] == 'ColectionURL')

#Please confirm the name of the OpenSearch Index
_aos_host = opensearch_host_id
_aos_host = _aos_host.replace("https://", "")
_aos_index = 'fsi-demo-knn'

print(_aos_host)

c5rhjwdc66y1i1n3r9r7.us-east-1.aoss.amazonaws.com


## Chunk your Data and Load into Amazon OpenSearch

In this section we will chunk the data into smaller documents. Chunking is a technique for splitting large texts into smaller chunks. It is an important step as it optimizes the relevance of the search query for our RAG-model. Which in turn improves the quality of the chatbot. 

In [21]:
loader = BSHTMLLoader("../data/14A/0000003153-20-000004.html")
data = loader.load()

In [22]:
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[0].page_content)} characters in your document')

You have 1 document(s) in your data
There are 153880 characters in your document



### Then we select  chunk size and overlap.

In [23]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1600, chunk_overlap=200)
docs = text_splitter.split_documents(data)

print (f'Now you have {len(docs)} documents')

Now you have 110 documents


In [24]:
# Helper function to process document

import regex as re

def postproc(s):
    s = s.replace(u'\xa0', u' ') # no-break space 
    s = s.replace('\n', ' ') # new-line
    s = re.sub(r'\s+', ' ', s) # multiple spaces
    return s

In [25]:
for doc in docs:
    doc.page_content = postproc(doc.page_content)

In [26]:
# Review the first document for correctness
docs[0]

Document(page_content='UNITED STATESSECURITIES AND EXCHANGE COMMISSIONWASHINGTON, D.C. 20549SCHEDULE 14A INFORMATIONProxy Statement Pursuant To Section 14(a)of the Securities Exchange Act of 1934xFiled by the RegistrantoFiled by a party other than the RegistrantCheck the appropriate box:oPreliminary proxy statementoConfidential, for use of the Commission only (as permitted by Rule 14a-6(e)(2))xDefinitive proxy statementoDefinitive additional materialsoSoliciting material under Rule 14a-12ALABAMA POWER COMPANY(Name of Registrant as Specified in Its Charter)(Name of Person(s) Filing Proxy Statement, if Other Than the Registrant)Payment of Filing Fee (Check the appropriate box):xNo fee required. oFee computed on table below per Exchange Act Rules 14a-6(i)(1) and 0-11. (1)Title of each class of securities to which transaction applies: (2)Aggregate number of securities to which transaction applies: (3)Per unit price or other underlying value of transaction computed pursuant to Exchange Act 

In [27]:
# Limit the number of total chunks to 1000
MAX_DOCS = 1000
if len(docs) > MAX_DOCS:
    docs = docs[:MAX_DOCS]

### Prior to populating a vector store, compute embedding to validate the smoothness / no exceptions.

We create the embeddings object and batch the create the document embeddings.

In [29]:
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock
from langchain.load.dump import dumps
import boto3

region = os.environ.get("AWS_REGION")
session = boto3.Session() 
boto3_bedrock = session.client('bedrock-runtime', region, endpoint_url='https://bedrock-runtime.'+region+'.amazonaws.com')
embeddings = BedrockEmbeddings(client=boto3_bedrock)
print(embeddings)

client=<botocore.client.BedrockRuntime object at 0x7f521c8e8100> region_name=None credentials_profile_name=None model_id='amazon.titan-embed-text-v1' model_kwargs=None endpoint_url=None


### Create embeddings of your documents to get ready for semantic search



In [30]:
from tqdm import trange
from opensearchpy import RequestsHttpConnection
from requests_aws4auth import AWS4Auth


service = "aoss"
credentials = boto3.Session().get_credentials()

# Create AWS4Auth object
awsauth = AWS4Auth(
    credentials.access_key,
    credentials.secret_key,
    region,
    service,
    session_token=credentials.token,
)


docsearch = OpenSearchVectorSearch.from_texts(
    texts=[d.page_content for d in docs],
    embedding=embeddings,
    metadatas=[d.metadata for d in docs],
    opensearch_url=[{'host': _aos_host, 'port': 443}],
    index_name=_aos_index,
    http_auth=awsauth,
    use_ssl=True,
    pre_delete_index=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=100000,
)

## Question answering over Documents 

So far, we have chunked a large document into smaller ones, created vector embedding and stored them in an OpenSearch Vector Database. Now, we can answer questions over this document data.

Since we have created an index over the data, we can do a semantic search over the documents; this way only the most relevant documents to answer the question are passed via the prompt to the Large Language Model (LLM). You save both time and money by not passing all the documents to the LLM.

We use langchains **question_answering** `stuff` document chain in this example. Further details on Document Chains can be found by visiting the langchain [documentation, here](https://python.langchain.com/docs/modules/chains/document/)

In [31]:
from typing import Dict

from langchain import PromptTemplate, SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
from langchain.chains.question_answering import load_qa_chain
import json


llm = Bedrock(
    model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={"max_tokens_to_sample": 200}
)


class SageMakerLLMContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        # input_str = json.dumps({prompt: prompt, **model_kwargs})
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        # return response_json[0]["generated_text"]
        return response_json['generated_texts'][0]


sagemaker_llm_content_handler= SageMakerLLMContentHandler()

chain = load_qa_chain(
    llm=llm,
    chain_type="stuff"
)


In [32]:
query = "Who are the directors?"
sim_docs = docsearch.similarity_search(query, include_metadata=True)
chain.run(input_documents=sim_docs, question=query)

' Based on the provided context, the directors mentioned are:\n\n- Anthony A. Joseph\n- Robert D. Powers'

In [33]:
query = "Who are the nominees?"
sim_docs = docsearch.similarity_search(query, include_metadata=True)
chain.run(input_documents=sim_docs, question=query)

' Based on the context provided, the nominees for election as directors are:\n\n- James F. Shackleford III\n- Phillip M. Webb  \n- And 7 other unnamed directors that are part of the current board.\n\nThe information provides background and descriptions for James F. Shackleford III and Phillip M. Webb, but does not name the other 7 nominees.'

In [34]:
for person in ['Mark A. Crosswhite', 'Phillip M. Webb']:
    for query_template in [
                    "How old is {PERSON}?",
                    "What is {PERSON} current position and what is the name of the organization he/she currently works for?"
                 ]:
    
        query = query_template.format(PERSON=person)
        print('Q:', query)

        sim_docs = docsearch.similarity_search(query, include_metadata=True)
        answer = chain.run(input_documents=sim_docs, question=query)    
        print('A:', answer)
        print('\n---\n')

Q: How old is Mark A. Crosswhite?
A:  Based on the context provided, Mark A. Crosswhite is 57 years old. The text states "Mr. Crosswhite, 57, is Chairman, President, and Chief Executive Officer of the Company." This indicates that Mark A. Crosswhite is 57 years old.

---

Q: What is Mark A. Crosswhite current position and what is the name of the organization he/she currently works for?
A:  Based on the context provided, Mark A. Crosswhite currently serves as the Chairman, President, and Chief Executive Officer of the Company. The name of the organization he works for is not explicitly stated in the passages.

---

Q: How old is Phillip M. Webb?
A:  Based on the context provided, Phillip M. Webb is 62 years old. The passage states "Mr. Webb, 62, is President of Webb Concrete and Building Materials, a position he has held since 1982." This indicates that Phillip M. Webb is currently 62 years old.

---

Q: What is Phillip M. Webb current position and what is the name of the organization h

## Cleanup

## If you are running these notebooks as part of a AWS workshop, The host of your event will take care of the clean up. 

## If you are running the notebooks within your own account, you would need to delete the CloudFormation Template to remove all the components deployed. Feel free to download a copy of these notebooks for your reference