# RAG with Amazon Bedrock Knowledge Base

In this notebook we use the information ingested in the Bedrock knowledge base to answer user queries.

## Import packages and utility functions
Import packages, setup utility functions, interface with Amazon OpenSearch Service Serverless (AOSS).

In [34]:
import os
import sys
import json
import boto3
from typing import Dict
from urllib.request import urlretrieve
from langchain.llms.bedrock import Bedrock
from IPython.display import Markdown, display
from langchain.embeddings import BedrockEmbeddings
from opensearchpy import OpenSearch, RequestsHttpConnection
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth

In [2]:
# global constants
SERVICE = 'aoss'

# do not change the name of the CFN stack, we assume that the 
# blog post creates a stack by this name and read output values
# from the stack.
CFN_STACK_NAME = "rag-w-bedrock-kb"

In [3]:
# Anthropic models need the Human/Assistant terminology used in the prompts, 
# they work better with XML style tags.
PROMPT_TEMPLATE = """Human: Answer the question based only on the information provided in few sentences.
<context>
{}
</context>
Include your answer in the <answer></answer> tags. Do not include any preamble in your answer.
<question>
{}
</question>
Assistant:"""

In [36]:
# utility functions

def get_cfn_outputs(stackname: str) -> str:
    cfn = boto3.client('cloudformation')
    outputs = {}
    for output in cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']:
        outputs[output['OutputKey']] = output['OutputValue']
    return outputs

def printmd(string: str):
    display(Markdown(string))

In [5]:
# Functions to talk to OpenSearch

# Define queries for OpenSearch
def query_docs(query: str, embeddings: BedrockEmbeddings, aoss_client: OpenSearch, index: str, k: int = 3) -> Dict:
    """
    Convert the query into embedding and then find similar documents from AOSS
    """

    # embedding
    query_embedding = embeddings.embed_query(query)

    # query to lookup OpenSearch kNN vector. Can add any metadata fields based filtering
    # here as part of this query.
    query_qna = {
        "size": k,
        "query": {
            "knn": {
            "vector": {
                "vector": query_embedding,
                "k": k
                }
            }
        }
    }

    # OpenSearch API call
    relevant_documents = aoss_client.search(
        body = query_qna,
        index = index
    )
    return relevant_documents

In [29]:
def create_context_for_query(q: str, embeddings: BedrockEmbeddings, aoss_client: OpenSearch, vector_index: str) -> str:
    """
    Create a context out of the similar docs retrieved from the vector database
    by concatenating the text from the similar documents.
    """
    print(f"query -> {q}")
    aoss_response = query_docs(q, embeddings, aoss_client, vector_index)
    context = ""
    for r in aoss_response['hits']['hits']:
        s = r['_source']
        print(f"{s['metadata']}\n{s['text']}")
        context += f"{s['text']}\n"
        print("----------------")
    return context

## Retrieve parameters needed from the AWS CloudFormation stack

In [27]:

outputs = get_cfn_outputs(CFN_STACK_NAME)

region = outputs["Region"]
aoss_collection_arn = outputs['CollectionARN']
aoss_host = f"{os.path.basename(aoss_collection_arn)}.{region}.aoss.amazonaws.com"
aoss_vector_index = outputs['AOSSVectorIndexName']
print(f"aoss_collection_arn={aoss_collection_arn}\naoss_host={aoss_host}\naoss_vector_index={aoss_vector_index}\naws_region={region}")

aoss_collection_arn=arn:aws:aoss:us-east-1:015469603702:collection/mi4dut5xaxie0wkxe8yj
aoss_host=mi4dut5xaxie0wkxe8yj.us-east-1.aoss.amazonaws.com
aoss_vector_index=sagemaker-readthedocs-io
aws_region=us-east-1


## Setup Embeddings and Text Generation model

We can use LangChain to setup the embeddings and text generation models provided via Amazon Bedrock.

In [31]:
# we will use Anthropic Claude for text generation
claude_llm = Bedrock(model_id= "anthropic.claude-v2")
claude_llm.model_kwargs = dict(temperature=0.5, max_tokens_to_sample=300, top_k=250, top_p=1, stop_sequences=[])

# we will be using the Titan Embeddings Model to generate our Embeddings.
embeddings = BedrockEmbeddings(model_id = "amazon.titan-embed-g1-text-02")

## Interface with Amazon OpenSearch Service Serverless
We use the open-source [opensearch-py](https://pypi.org/project/opensearch-py/) package to talk to AOSS.

In [23]:
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, SERVICE)

client = OpenSearch(
    hosts = [{'host': aoss_host, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection,
    pool_maxsize = 20
)

## Use Retrieval Augumented Generation (RAG) for answering queries

Now that we have setup the LLMs through Bedrock and vector database through AOSS, we are ready to answer queries using RAG. The workflow is as follows:

1. Convert the user query into embeddings.

1. Use the embeddings to find similar documents from the vector database.

1. Create a prompt using the user query and similar documents (retrieved from the vector db) to create a prompt.

1. Provide the prompt to the LLM to create an answer to the user query.

## Query 1

Let us first ask the our question to the model without providing any context, see the result and then ask the same question with context provided using document retrieved from AOSS and see if the answer improves!

In [38]:
# 1. Start with the query
q = "What versions of XGBoost are supported by Amazon SageMaker?"

# 2. Now create a prompt by combining the query and the context (which is empty at this time)
context = ""
prompt = PROMPT_TEMPLATE.format(context, q)

# 3. Provide the prompt to the LLM to generate an answer to the query without any additional context provided
response = claude_llm(prompt)
printmd(f"<span style='color:red'><b>question={q.strip()}<br>answer={response.strip()}</b></span>\n")

<span style='color:red'><b>question=What versions of XGBoost are supported by Amazon SageMaker?<br>answer=<answer>
Amazon SageMaker supports XGBoost versions 0.90-1, 0.90-2, and 1.0-1.
</answer></b></span>


**The answer provided above is incorrect**, as can be seen from the [SageMaker XGBoost Algorithm page](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html). The supported version numbers are "1.0, 1.2, 1.3, 1.5, and 1.7".

Now, let us see if we can improve upon this answer by using additional information that is available to use in the vector database. **Also notice in the response below that the source of the documents that are being used as context is also being called out (the name of the file in the S3 bucket), this helps create confidence in the response generated by the LLM**.

In [40]:
# 1. Start with the query
q = "What versions of XGBoost are supported by Amazon SageMaker?"

# 2. Create the context by finding similar documents from the knowledge base
context = create_context_for_query(q, embeddings, client, aoss_vector_index)

# 3. Now create a prompt by combining the query and the context
prompt = PROMPT_TEMPLATE.format(context, q)

# 4. Provide the prompt to the LLM to generate an answer to the query based on context provided
response = claude_llm(prompt)

printmd(f"<span style='color:red'><b>question={q.strip()}<br>answer={response.strip()}</b></span>\n")

query -> What versions of XGBoost are supported by Amazon SageMaker?
{"source":"s3://sagemaker-kb-015469603702/sagemaker.readthedocs.io_en_stable_frameworks_xgboost_using_xgboost.html"}
see Extending our PyTorch containers. Use XGBoost as a Built-in Algortihm¶ Amazon SageMaker provides XGBoost as a built-in algorithm that you can use like other built-in algorithms. Using the built-in algorithm version of XGBoost is simpler than using the open source version, because you don’t have to write a training script. If you don’t need the features and flexibility of open source XGBoost, consider using the built-in version. For information about using the Amazon SageMaker XGBoost built-in algorithm, see XGBoost Algorithm in the Amazon SageMaker Developer Guide. Use the Open Source XGBoost Algorithm¶ If you want the flexibility and additional features that it provides, use the SageMaker open source XGBoost algorithm. For which XGBoost versions are supported, see the AWS documentation. We recommen

<span style='color:red'><b>question=What versions of XGBoost are supported by Amazon SageMaker?<br>answer=<answer>
The XGBoost open source algorithm supports the latest XGBoost version. The XGBoost built-in algorithm is based on XGBoost versions 1.0, 1.2, 1.3, and 1.5.
</answer></b></span>


## Query 2

For the subsequent queries we use RAG directly.

In [41]:
# 1. Start with the query
q = "What are the different types of distributed training supported by SageMaker. Give a short summary of each."

# 2. Create the context by finding similar documents from the knowledge base
context = create_context_for_query(q, embeddings, client, aoss_vector_index)

# 3. Now create a prompt by combining the query and the context
prompt = PROMPT_TEMPLATE.format(context, q)

# 4. Provide the prompt to the LLM to generate an answer to the query based on context provided
response = claude_llm(prompt)
printmd(f"<span style='color:red'><b>question={q.strip()}<br>answer={response.strip()}</b></span>\n")

query -> What are the different types of distributed training supported by SageMaker. Give a short summary of each.
{"source":"s3://sagemaker-kb-015469603702/sagemaker.readthedocs.io_en_stable_api_training_distributed.html"}
Archive Launch a Distributed Training Job Using the SageMaker Python SDK Release Notes SageMaker Distributed Data Parallel 1.8.0 Release Notes Release History The SageMaker Distributed Model Parallel Library¶ The SageMaker Distributed Model Parallel Library Overview Use the Library’s API to Adapt Training Scripts Version 1.11.0, 1.13.0, 1.14.0, 1.15.0 (Latest) Documentation Archive Run a Distributed Training Job Using the SageMaker Python SDK Configuration Parameters for distribution Ranking Basics without Tensor Parallelism Placement Strategy with Tensor Parallelism Prescaled Batch Release Notes SageMaker Distributed Model Parallel 1.15.0 Release Notes Release History Next Previous © Copyright 2023, Amazon Revision af4d7949. Built with Sphinx using a theme provide

<span style='color:red'><b>question=What are the different types of distributed training supported by SageMaker. Give a short summary of each.<br>answer=<answer>
SageMaker supports two main types of distributed training:

1. SageMaker Distributed Data Parallel: This allows scaling training by splitting data across multiple instances. It uses data parallelism to train models faster.

2. SageMaker Distributed Model Parallel: This allows scaling training by splitting models across multiple instances. It uses model parallelism to train very large models that cannot fit on a single instance.
</answer></b></span>


## Query 3

In [42]:
# 1. Start with the query
q = "What advantages does SageMaker debugger provide?"

# 2. Create the context by finding similar documents from the knowledge base
context = create_context_for_query(q, embeddings, client, aoss_vector_index)

# 3. Now create a prompt by combining the query and the context
prompt = PROMPT_TEMPLATE.format(context, q)

# 4. Provide the prompt to the LLM to generate an answer to the query based on context provided
response = claude_llm(prompt)

printmd(f"<span style='color:red'><b>question={q.strip()}<br>answer={response.strip()}</b></span>\n")

query -> What advantages does SageMaker debugger provide?
{"source":"s3://sagemaker-kb-015469603702/sagemaker.readthedocs.io_en_stable_amazon_sagemaker_debugger.html"}
having the TensorBoard data emitted from the hook in addition to the tensors will incur a cost to the training and may slow it down. Interactive analysis using SageMaker Debugger SDK and visualizations¶ Amazon SageMaker Debugger SDK also allows you to do interactive analyses on the debugging data produced from a training job run and to render visualizations of it. After calling fit() on the estimator, you can use the SDK to load the saved data in a SageMaker Debugger trial and do an analysis on the data: from smdebug.trials import create_trial s3_output_path = estimator.latest_job_debugger_artifacts_path() trial = create_trial(s3_output_path) To learn more about the programming model for analysis using the SageMaker Debugger SDK, see SageMaker Debugger Analysis. For a tutorial on what you can do after creating the trial 

<span style='color:red'><b>question=What advantages does SageMaker debugger provide?<br>answer=<answer>
SageMaker Debugger provides the following advantages:

- Allows you to hook into the training process and emit debug artifacts (tensors) that represent the training state. This provides visibility into the training process.

- Stores the debug data in real time and allows you to analyze it using built-in rules or custom rules to detect anomalies. This enables debugging and monitoring of training. 

- Provides pre-defined debugger configurations for built-in rules, making it easier to debug common issues.

- Captures real-time TensorBoard data for interactive analysis and visualization using the Debugger SDK.

- Integrates with SageMaker training workflows with minimal code changes needed.

</answer></b></span>
