<center><img src="images/2024_reInvent_Logo_wDate_Black_V3.png" alt="drawing" width="400" style="background-color:white; padding:1em;" /></center> <br/>

# <a name="0">re:Invent 2024 | Lab 1: Build your RAG powered chatbot  </a>
## <a name="0">Build a chatbot with Knowledge Bases and Guardrails to detect and remediate hallucinations </a>

## Lab Overview
In this lab, you will:
1. Take a deeper look at which LLM parameters influence or control for model hallucinations
2. Set up Retrieval Augmented Generation and understand how it can control for hallucinations
3. Apply contextual grounding in Amazon Bedrock Guardrails to intervene when a model hallucinates
4. Use RAGAS evaluation and understand which metrics help us measure hallucinations

## Dataset
For this workshop, we will use the [Bedrock User Guide](https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf) available as a PDF file.
## Use-Case Overview
In this lab, we want to develop a chatbot which can answer questions about Amazon Bedrock as factually as possible. We will set up Retrieval Augmented Generation using [Amazon Bedrock Knowledge Bases](https://aws.amazon.com/bedrock/knowledge-bases/) and apply [Amazon Guardrails](https://aws.amazon.com/bedrock/guardrails/) to intervene when hallucinations are detected.


#### Lab Sections

This lab notebook has the following sections:
    
Please work top to bottom of this notebook and don't skip sections as this could lead to error messages due to missing code.


----

# Star Github repository for future reference

In [1]:
%%html

<a class="github-button" href="https://github.com/aws-samples/responsible_ai_aim325_reduce_hallucinations_for_genai_apps" data-color-scheme="no-preference: light; light: light; dark: dark;" data-icon="octicon-star" data-size="large" data-show-count="true" aria-label="Star Reduce Hallucinations workshop on GitHub">Star</a>
<script async defer src="https://buttons.github.io/buttons.js"></script>

# Environment Setup

In [2]:
#%pip install --upgrade --quiet pip sagemaker boto3 ragas==0.1.7 pydantic==2.6.1 langchain-core==0.1.40 langchain langchain-aws

In [3]:
%%capture
!pip3 install -r requirements.txt --quiet

In [4]:
# restart kernel
#from IPython.core.display import HTML
#HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [5]:
import time
import os
import json
import boto3
from time import gmtime, strftime, sleep
import pprint
import random
import zipfile
#from retrying import retry
from rag_setup.create_kb_utils import *
import warnings
warnings.filterwarnings('ignore')

import numpy as np  
import pandas as pd 
import sagemaker
from botocore.exceptions import ClientError

(sagemaker.__version__,boto3.__version__)





sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


('2.227.0', '1.35.15')

## Set constants

In [6]:
# Get some variables you need to interact with SageMaker service
boto_session = boto3.Session()
region = boto_session.region_name
bucket_name = sagemaker.Session().default_bucket()
bucket_prefix = "reduce-hallucinations-in-genai-apps"  
sm_session = sagemaker.Session()
sm_client = boto_session.client("sagemaker")
sm_role = sagemaker.get_execution_role()

initialized = True

print(sm_role)
print(bucket_name)

arn:aws:iam::040238304754:role/cfn-SageMakerExecutionRole-OXlXFaWIRw9w
sagemaker-us-west-2-040238304754


In [7]:
embedding_model_id="amazon.titan-embed-text-v2:0"
llm_model_id="anthropic.claude-3-sonnet-20240229-v1:0"

In [8]:
# Store some variables to keep the value between the notebooks
%store bucket_name
%store bucket_prefix
%store sm_role
%store region
%store initialized

Stored 'bucket_name' (str)
Stored 'bucket_prefix' (str)
Stored 'sm_role' (str)
Stored 'region' (str)
Stored 'initialized' (bool)


In [9]:
#test if bedrock model access has been enabled 
input_prompt = "Who was the first person to land on the sun?"
test_llm_call(input_prompt)

  response = llm(messages)


"No one has ever landed on the sun. The sun is a star with extremely hot temperatures and harsh conditions that make landing on its surface impossible with current technology.\n\nSome key facts:\n\n- The sun's surface temperature is around 5,500°C (10,000°F). Most materials would vaporize in such extreme heat.\n\n- The sun does not have a solid surface to land on. It is a ball of hot plasma and gases.\n\n- The gravitational forces and radiation levels on the sun are enormously high and would destroy any spacecraft trying to land.\n\n- The distance from Earth to the sun is about 93 million miles (150 million km), an immense distance that spacecraft would have difficulty traveling.\n\nWhile future advanced technologies may someday allow exploration of the sun from a safe distance, actually landing on the scorching hot solar surface is physically impossible, at least with anything resembling modern technology and materials. All current sun observations and studies are done from a far dist

# 1. Chat with Anthropic Claude 3 Sonnet through Bedrock

In [10]:
bedrock_runtime = boto3.client(service_name='bedrock-runtime')


def generate_message_claude(
    query, system_prompt="", max_tokens=1000, 
    model_id='anthropic.claude-3-sonnet-20240229-v1:0',
    temperature=0.9, top_p=0.99, top_k=100
):
    # Prompt with user turn only.
    user_message = {"role": "user", "content": query}
    messages = [user_message]
    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "system": system_prompt,
            "messages": messages,
            "temperature": temperature,
            "top_p": top_p,
            "top_k": top_k
        }
    )

    response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())
    return response_body

In [11]:
query = 'How do Amazon Bedrock Guardrails work?'

response = generate_message_claude(query)
print("User turn only.")
print(json.dumps(response, indent=4))

User turn only.
{
    "id": "msg_bdrk_01EXSdTCUQLSSCRDUyBqfuW7",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "Amazon Bedrock Guardrails is a service that allows you to define and enforce organizational policies across your Amazon Web Services (AWS) accounts and resources. It is part of the AWS Bedrock initiative, which aims to help organizations establish a secure and compliant foundation for their cloud environments.\n\nHere's a high-level overview of how Amazon Bedrock Guardrails work:\n\n1. Policy Definition: You define policies using the AWS Bedrock Guardrails policy language, which is based on the AWS Control Tower Lifecycle Event Handshake protocol. These policies specify the desired configuration for your AWS resources, such as restricting the creation of certain resource types, enforcing tagging requirements, or limiting access to specific services.\n\n2. Poli

## 1.1 Apply System Prompt

In [12]:
query = 'Is it possible to purchase provisioned throughput for Anthropic Claude models on Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt)
print("User turn only.")
print(json.dumps(response, indent=4))

User turn only.
{
    "id": "msg_bdrk_019xLzBQ3Dh8bH5PQjdoofM5",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "Unfortunately, I do not have any specific information about purchasing provisioned throughput for Anthropic Claude models on Amazon Bedrock. Amazon Bedrock appears to be a new managed service offering from Amazon Web Services, but details are limited in my training data. As an AI assistant created by Anthropic, I do not have inside knowledge about Anthropic's commercial offerings or integrations with cloud providers. My role is to provide helpful information to users, but in this case, I do not have enough factual details to definitively answer your query. You may need to check the official documentation or contact Anthropic or Amazon for the latest updates on any integration between Claude and Amazon Bedrock."
        }
    ],
    "stop_reason": "end_turn",
 

In [13]:
query = 'How do Amazon Bedrock Guardrails work?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt)
print("User turn only.")
print(json.dumps(response, indent=4))

User turn only.
{
    "id": "msg_bdrk_01R3HyWCKds8XMfFu8SqKnat",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "Amazon Bedrock Guardrails are a service provided by AWS that helps organizations govern their use of AWS resources through automated guardrails and preventive controls. Here's a brief overview of how Bedrock Guardrails work:\n\n1. Guardrails Definition: Organizations can define guardrails as code using the Bedrock Guardrails Domain Specific Language (DSL). Guardrails are essentially rules that enforce best practices, organizational policies, and regulatory requirements.\n\n2. Deployment: The defined guardrails are deployed as AWS CloudFormation stacks in the customer's AWS accounts and AWS Organizations. This allows the guardrails to span multiple accounts and organizational units.\n\n3. Continuous Evaluation: Bedrock Guardrails continuously evaluates the depl

## 1.2 Understanding LLM generation parameters
### 1. Temperature: The amount of randomness injected into the response.

In [14]:
query = 'What is Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt, temperature=1)
print("User turn only.")
print(json.dumps(response, indent=4))

User turn only.
{
    "id": "msg_bdrk_014VeKKVs1mM3iAG99PxN375",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "Amazon Bedrock is a set of internal tools, services, and foundational libraries used by Amazon's software engineers to build and deploy applications and services.\n\nSome key points about Amazon Bedrock:\n\n- It is an internal Amazon platform, not a customer-facing product or service.\n\n- It provides a common set of infrastructure components and frameworks that Amazon's engineering teams can leverage when building new applications.\n\n- It aims to increase development velocity and operational efficiency by providing reusable building blocks instead of having teams reinvent core functionality.\n\n- Components of Bedrock include services for compute, storage, database, networking, analytics, machine learning, and other cloud capabilities.\n\n- It allows Amazon 

In [15]:
query = 'What is Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt, temperature=0)
print("User turn only.")
print(json.dumps(response, indent=4))

User turn only.
{
    "id": "msg_bdrk_01DZAeAozMFoLUk8S1MWKKZ8",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "Amazon Bedrock is a real-time operating system developed by Amazon for running applications on resource-constrained devices like microcontrollers and sensors.\n\nSome key points about Amazon Bedrock:\n\n- It is designed to be a secure, real-time operating system for internet of things (IoT) devices and embedded applications.\n\n- It provides a lightweight environment with real-time performance for running multiple software components concurrently.\n\n- It supports C and C++ programming languages.\n\n- It includes built-in security features like memory protection, encrypted communication, secure boot, and code signing.\n\n- It aims to simplify development and deployment of IoT applications across different hardware platforms.\n\n- Bedrock is open source and ava

#### 2. top_p – Use nucleus sampling.

In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.

In [16]:
query = 'What is Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt, temperature=1, top_p=1)
print("User turn only.")
print(json.dumps(response, indent=4))

User turn only.
{
    "id": "msg_bdrk_0181MNg8JQzMHxZ46yTkYKsu",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "Amazon Bedrock is not a term or product that I'm familiar with in relation to Amazon. Amazon has many different services and products, but I don't have any specific information about something called \"Amazon Bedrock.\"\n\nIt's possible this could be referring to some internal codename or project at Amazon, but without more context, I can't provide any definitive details. Amazon does offer cloud computing services through Amazon Web Services (AWS), which provides infrastructure and platforms like servers, databases, networking, etc. But I haven't seen any public references to \"Bedrock\" in relation to AWS or other Amazon offerings.\n\nUnless you can provide some more specifics about what context \"Amazon Bedrock\" is being used in, I don't want to speculate t

#### 3. top_k: Only sample from the top K options for each subsequent token.

Use top_k to remove long tail low probability responses.

In [17]:
query = 'What is Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt, temperature=0, top_p=1, top_k=100)
print("User turn only.")
print(json.dumps(response, indent=4))

User turn only.
{
    "id": "msg_bdrk_01U5pLGkaUX9cjzRXwqfh7rA",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "Amazon Bedrock is a real-time operating system developed by Amazon for running applications on resource-constrained devices like microcontrollers and sensors.\n\nSome key points about Amazon Bedrock:\n\n- It is designed to be a secure, real-time operating system for internet of things (IoT) devices and embedded applications.\n\n- It provides a lightweight environment with real-time performance for running multiple software components concurrently.\n\n- It supports C and C++ programming languages.\n\n- It includes built-in security features like memory protection, encrypted communication, secure boot, and code signing.\n\n- It aims to simplify development and deployment of IoT applications across different hardware platforms.\n\n- Bedrock is open source and ava

# Retrieval Augmented Generation
We are using the Retrieval Augmented Generation (RAG) technique with Amazon Bedrock. A RAG implementation consists of two parts:

    1. A data pipeline that ingests that from documents (typically stored in Amazon S3) into a Knowledge Base i.e. a vector database such as Amazon OpenSearch Service Serverless (AOSS) so that it is available for lookup when a question is received.

The data pipeline represents an undifferentiated heavy lifting and can be implemented using Amazon Bedrock Knowledge Bases. We can now connect an S3 bucket to a vector database such as AOSS and have a Bedrock Knowledge Bases read the objects (html, pdf, text etc.), chunk them, and then convert these chunks into embeddings using Amazon Titan Embeddings model and then store these embeddings in AOSS. All of this without having to build, deploy, and manage the data pipeline.

<center><img src="images/fully_managed_ingestion.png" alt="This image shows how Aazon Bedrock Knowledge Bases ingests objects in a S3 bucket into the Knowledge Base for use in a RAG set up. The objects are chunks, embedded and then stored in a vector index." height="700" width="700" style="background-color:white; padding:1em;" /></center> <br/>
    

    2. An application that receives a question from the user, looks up the knowledge base for relevant pieces of information (context) and then creates a prompt that includes the question and the context and provides it to an LLM for generating a response.






Once the data is available in the Bedrock knowledge base, then user questions can be answered using the following system design:

<center><img src="images/retrieveAndGenerate.png" alt="This image shows the retrieval augmented generation (RAG) system design setup with knowledge bases, S3, and AOSS. Knowledge corpus is ingested into a vector database using Amazon Bedrock Knowledge Base Agent and then RAG approach is used to work question answering. The question is converted into embeddings followed by semantic similarity search to get similar documents. With the user prompt being augmented with the RAG search response, the LLM is invoked to get the final raw response for the user." height="700" width="700" style="background-color:white; padding:1em;" /></center> <br/>


# Data
Let's use the publicly available [Bedrock user guide](https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf) to inform the model

In [18]:
!wget -P data/ -N https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf --no-check-certificate

--2024-11-10 17:48:40--  https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf
Resolving docs.aws.amazon.com (docs.aws.amazon.com)... 3.163.24.36, 3.163.24.65, 3.163.24.45, ...
Connecting to docs.aws.amazon.com (docs.aws.amazon.com)|3.163.24.36|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 13669967 (13M) [application/pdf]
Saving to: ‘data/bedrock-ug.pdf’


2024-11-10 17:48:41 (28.5 MB/s) - ‘data/bedrock-ug.pdf’ saved [13669967/13669967]



In [19]:
# Upload data to S3
dataset_file_local_path = 'data/bedrock-ug.pdf'
input_s3_url = sagemaker.Session().upload_data(
    path=dataset_file_local_path,
    bucket=bucket_name
)
print(f"Upload the dataset to {input_s3_url}")

%store input_s3_url

Upload the dataset to s3://sagemaker-us-west-2-040238304754/data/bedrock-ug.pdf
Stored 'input_s3_url' (str)


# Steps

1. Create Amazon Bedrock Knowledge Base execution role with necessary policies for accessing data from S3 and writing embeddings into OSS.
2. Create an empty OpenSearch serverless index.
3. Create Amazon Bedrock knowledge base
4. Create a data source within knowledge base which will connect to Amazon S3
5. Start an ingestion job using KB APIs which will read data from s3, chunk it, convert chunks into embeddings using Amazon Titan Embeddings model and then store these embeddings in AOSS. 

In [20]:
!export PYTHONPATH='./lab1/'
#import sys
#sys.path.insert(0,'./lab1/')

In [21]:
kb_db_file_uri='data'

# if a kb already exists we can use the same, else the infra setup code will create one by itself using the bedrock user guide.
use_existing_kb = False
existing_kb_id = None

In [22]:
%load_ext autoreload
%autoreload 2
from rag_setup.create_kb_utils import *

In [23]:
%%time

# For new KB it takes around ~6 minutes for this setup to complete on a t2.medium instance.
infra_response = setup_knowledge_base(bucket_name, kb_db_file_uri, use_existing_kb, existing_kb_id)
infra_response

agent_bedrock_policy :: None
agent_s3_schema_policy :: None
kb_aws_bedrock_policy :: None
kb_db_s3_policy :: None
Creating collection...
Creating collection...
Creating collection...
Creating collection...
Creating collection...
Creating collection...
Creating collection...
Creating collection...
Creating collection...
Creating collection...
Creating collection...

Collection successfully created:

Creating index:
Knowledge base status -> is it READY ? :: ACTIVE
knowledge_base_db_id :: TLPZHUIWOK
CPU times: user 281 ms, sys: 50.3 ms, total: 331 ms
Wall time: 10min 52s


{'prefix_infra': 'l2678edb',
 'bucket_name': 'sagemaker-us-west-2-040238304754',
 'knowledge_base_db_id': 'TLPZHUIWOK',
 'agent_bedrock_policy': None,
 'agent_s3_schema_policy': None,
 'kb_db_collection_name': 'l2ef9d-kbdb-040238304754',
 'agent_kb_schema_policy': None,
 'kb_db_aoss_policy': None,
 'kb_db_s3_policy': None,
 'kb_db_role_name': 'AmazonBedrockExecutionRoleForAgentsAIAssistant05',
 'kb_db_opensearch_collection_response': {'createCollectionDetail': {'arn': 'arn:aws:aoss:us-west-2:040238304754:collection/iwqtdgih6qesm7wg8huk',
   'createdDate': 1731260924042,
   'description': 'OpenSearch collection for Amazon Bedrock Latest User guide Knowledge Base',
   'id': 'iwqtdgih6qesm7wg8huk',
   'kmsKeyArn': 'auto',
   'lastModifiedDate': 1731260924042,
   'name': 'l2ef9d-kbdb-040238304754',
   'standbyReplicas': 'DISABLED',
   'status': 'CREATING',
   'type': 'VECTORSEARCH'},
  'ResponseMetadata': {'RequestId': '9c9b87fb-7b14-4f39-ab4e-1aa325b527b2',
   'HTTPStatusCode': 200,
   'H

In [24]:
kb_id = infra_response['knowledge_base_db_id']
random_id = infra_response['prefix_infra']
# keep the kb_id for invocation later in the invoke request
%store kb_id
%store bucket_name

Stored 'kb_id' (str)
Stored 'bucket_name' (str)


In [25]:
kb_id

'TLPZHUIWOK'

In [26]:
# allow time for KB to be ready
time.sleep(180)

# Chat with the model using the knowledge base by providing the generated KB_ID
### Using RetrieveAndGenerate API
Behind the scenes, RetrieveAndGenerate API converts queries into embeddings, searches the knowledge base, and then augments the foundation model prompt with the search results as context information and returns the FM-generated response to the question. For multi-turn conversations, Knowledge Bases manage short-term memory of the conversation to provide more contextual results.The output of the RetrieveAndGenerate API includes the generated response, source attribution as well as the retrieved text chunks.

In [27]:
pp = pprint.PrettyPrinter(indent=2)

In [28]:
kb_id

'TLPZHUIWOK'

In [29]:
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region)


def ask_bedrock_llm_with_knowledge_base(query,
                                        kb_id=kb_id,
                                        model_arn=llm_model_id,
                                        ) -> str:
    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            'text': query
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': model_arn
            }
        },
    )

    return response

In [30]:
query = "What is Amazon Bedrock?"

response = ask_bedrock_llm_with_knowledge_base(query, kb_id)
generated_text = response['output']['text']
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
        contexts.append(reference["content"]["text"])
print(f"---------- Generated using Anthropic Claude 3 Sonnet:")
pp.pprint(generated_text )
print(f'---------- The citations for the response:')
pp.pprint(contexts)
print(kb_id)

---------- Generated using Anthropic Claude 3 Sonnet:
('Amazon Bedrock is a fully managed service that provides access to '
 'high-performing foundation models (FMs) from leading AI companies and Amazon '
 'through a unified API. It allows you to experiment with and evaluate '
 'different foundation models, customize them with your own data using '
 'techniques like fine-tuning and Retrieval Augmented Generation (RAG), and '
 'build agents that can execute tasks using your systems and data sources. '
 "With Amazon Bedrock's serverless experience, you can get started quickly, "
 'customize foundation models with your data, and easily integrate and deploy '
 'them into your applications using AWS tools without managing any '
 'infrastructure.')
---------- The citations for the response:
[ '............ 2009     xvAmazon Bedrock User Guide     What is Amazon '
  'Bedrock?     Amazon Bedrock is a fully managed service that makes '
  'high-performing foundation models (FMs) from leading AI 

In [31]:
query = "Is it possible to purchase provisioned throughput for Anthropic Claude Sonnet on Amazon Bedrock?"

response = ask_bedrock_llm_with_knowledge_base(query, kb_id)
generated_text = response['output']['text']
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
        contexts.append(reference["content"]["text"])
print(f"---------- Generated using Anthropic Claude 3 Sonnet:")
pp.pprint(generated_text )
print(f'---------- The citations for the response:')
pp.pprint(contexts)
print()

---------- Generated using Anthropic Claude 3 Sonnet:
('Yes, it is possible to purchase provisioned throughput for Anthropic Claude '
 'Sonnet models on Amazon Bedrock. Specifically, you can purchase provisioned '
 'throughput for the following Anthropic Claude Sonnet models:\n'
 '\n'
 '- Anthropic Claude 3 Sonnet 28K (model ID: '
 'anthropic.claude-3-sonnet-20240229-v1:0:28k)\n'
 '- Anthropic Claude 3 Sonnet 200K (model ID: '
 'anthropic.claude-3-sonnet-20240229-v1:0:200k)\n'
 '- Anthropic Claude 3.5 Sonnet 18K (model ID: '
 'anthropic.claude-3-5-sonnet-20240620-v1:0:18k) - Only available in the US '
 'West (Oregon) region\n'
 '- Anthropic Claude 3.5 Sonnet 51K (model ID: '
 'anthropic.claude-3-5-sonnet-20240620-v1:0:51k) - Only available in the US '
 'West (Oregon) region\n'
 '- Anthropic Claude 3.5 Sonnet 200K (model ID: '
 'anthropic.claude-3-5-sonnet-20240620-v1:0:200k) - Only available in the US '
 'West (Oregon) region')
---------- The citations for the response:
[ 'request. Pro

# Contextual Grounding with Amazon Bedrock Guardrails

In [32]:
# Create guardrail
bedrock_client = boto3.client('bedrock')
guardrail_name = f"bedrock-rag-grounding-guardrail-{random_id}"
print(guardrail_name)
guardrail_response = bedrock_client.create_guardrail(
    name=guardrail_name,
    description='Guardrail for ensuring relevance and grounding of model responses in RAG powered chatbot',
    contextualGroundingPolicyConfig={
        'filtersConfig': [
            {
                'type': 'GROUNDING',
                'threshold': 0.5
            },
            {
                'type': 'RELEVANCE',
                'threshold': 0.5
            },
        ]
    },
    blockedInputMessaging='Can you please rephrase your question?',
    blockedOutputsMessaging='Sorry, I am not able to find the correct answer to your query - Can you try reframing your query to be more specific'
)

bedrock-rag-grounding-guardrail-l2678edb


In [33]:
guardrailId = guardrail_response['guardrailId']
guardrail_response

{'ResponseMetadata': {'RequestId': 'a2b6d23a-fe09-4ad6-a905-35179f43f22c',
  'HTTPStatusCode': 202,
  'HTTPHeaders': {'date': 'Sun, 10 Nov 2024 18:02:49 GMT',
   'content-type': 'application/json',
   'content-length': '172',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'a2b6d23a-fe09-4ad6-a905-35179f43f22c'},
  'RetryAttempts': 0},
 'guardrailId': 'nj44ju4v72kr',
 'guardrailArn': 'arn:aws:bedrock:us-west-2:040238304754:guardrail/nj44ju4v72kr',
 'version': 'DRAFT',
 'createdAt': datetime.datetime(2024, 11, 10, 18, 2, 49, 42725, tzinfo=tzlocal())}

In [34]:
guardrail_version = bedrock_client.create_guardrail_version(
    guardrailIdentifier=guardrail_response['guardrailId'],
    description='Working version of RAG app guardrail with higher thresholds for contextual grounding'
)
print(guardrail_version)
guardrailVersion = guardrail_response['version']
print(guardrailId)
%store guardrailId

{'ResponseMetadata': {'RequestId': '705cd743-ebd9-4b9d-b8f4-d7bf06744f8d', 'HTTPStatusCode': 202, 'HTTPHeaders': {'date': 'Sun, 10 Nov 2024 18:02:49 GMT', 'content-type': 'application/json', 'content-length': '44', 'connection': 'keep-alive', 'x-amzn-requestid': '705cd743-ebd9-4b9d-b8f4-d7bf06744f8d'}, 'RetryAttempts': 0}, 'guardrailId': 'nj44ju4v72kr', 'version': '1'}
nj44ju4v72kr
Stored 'guardrailId' (str)


In [35]:
# Retrieve and Generate using Guardrail

bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region)


def retrieve_and_generate_with_guardrail(
    query,
    kb_id,
    model_arn=llm_model_id,
    session_id=None
):

    prompt_template = 'You are a helpful AI assistant to help users understand documented risks in various projects. \
    Answer the user query based on the context retrieved. If you dont know the answer, dont make up anything. \
    Only answer based on what you know from the provided context. You can ask the user for clarifying questions if anything is unclear\
    But generate an answer only when you are confident about it and based on the provided context.\
    User Query: $query$\
    Context: $search_results$'

    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            'text': query
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'generationConfiguration': {
                    'guardrailConfiguration': {
                        'guardrailId': guardrailId,
                        'guardrailVersion': guardrailVersion
                    },
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'temperature': 0.7,
                            'topP': 0.25
                        }
                    },
                    'promptTemplate': {
                        'textPromptTemplate': prompt_template
                    }
                },
                'knowledgeBaseId': kb_id,
                'modelArn': model_arn,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'overrideSearchType': 'SEMANTIC'
                    }
                }
            }
        }
    )
    return response

In [36]:
# Knowledge BAse ID

query = 'What is Amazon Bedrock?'
#query = "Is it possible to purchase provisioned throughput for Anthropic Claude Sonnet on Amazon Bedrock?"

model_response = retrieve_and_generate_with_guardrail(query, kb_id)

print(model_response)

{'ResponseMetadata': {'RequestId': '70114096-52d5-4753-b31c-27bcbcf0f2aa', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 10 Nov 2024 18:02:57 GMT', 'content-type': 'application/json', 'content-length': '1545', 'connection': 'keep-alive', 'x-amzn-requestid': '70114096-52d5-4753-b31c-27bcbcf0f2aa'}, 'RetryAttempts': 0}, 'citations': [{'generatedResponsePart': {'textResponsePart': {'span': {'end': 50, 'start': 0}, 'text': 'Sorry, I am unable to assist you with this request.'}}, 'retrievedReferences': []}], 'guardrailAction': 'NONE', 'output': {'text': 'According to the context provided, Amazon Bedrock is a fully managed service from AWS that provides access to high-performing foundation models (FMs) from leading AI companies and Amazon through a unified API.\n\nSome key points about Amazon Bedrock:\n\n- It allows you to choose from a wide range of foundation models to find the best one for your use case.\n- It offers capabilities to build generative AI applications with security, p

# Evaluating RAG with RAGAS

In [37]:
import boto3
import pprint
from botocore.client import Config
from langchain.llms.bedrock import Bedrock
from langchain_community.chat_models.bedrock import BedrockChat
from langchain.embeddings import BedrockEmbeddings
from langchain.retrievers.bedrock import AmazonKnowledgeBasesRetriever
from langchain.chains import RetrievalQA

pp = pprint.PrettyPrinter(indent=2)

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config
                              )

llm_for_text_generation = BedrockChat(model_id=llm_model_id, client=bedrock_client)

llm_for_evaluation = BedrockChat(model_id=llm_model_id, client=bedrock_client)

bedrock_embeddings = BedrockEmbeddings(model_id=embedding_model_id,client=bedrock_client)

  bedrock_embeddings = BedrockEmbeddings(model_id=embedding_model_id,client=bedrock_client)


In [38]:
import pandas as pd

test = pd.read_csv('data/bedrock-user-guide-test.csv')
test = test.dropna()
test.style.set_properties(**{'text-align': 'left', 'border': '1px solid black'})
test.to_string(justify='left', index=False)
with pd.option_context("display.max_colwidth", None):
    pretty_print(test)

Unnamed: 0,Question/prompt,Correct answer
0,Are all models accessible on Amazon Bedrock by default?,"Access to Amazon Bedrock foundation models isn't granted by default. You can request access, or modify access, to foundation models only by using the Amazon Bedrock console. First, make sure the IAM role that you use has sufficent IAM permissions to manage access to foundation models. Then, add or remove access to a model by following the instructions at Add or remove access to Amazon Bedrock foundation models."
1,What is the Model ID of Amazon Titan Text Premier,amazon.titan-text-premier-v1:0
2,With which Anthropic Claude models can I use the Text Completions API?,"Anthropic Claude Instant v1.2, Anthropic Claude v2, Anthropic Claude v2.1"
3,What policies can I configure in Amazon Bedrock guardrails?,"You can configure the following policies in a guardrail to avoid undesirable and harmful content and remove sensitive information for privacy protection. Content filters – Adjust filter strengths to block input prompts or model responses containing harmful content. Denied topics – Define a set of topics that are undesirable in the context of your application. These topics will be blocked if detected in user queries or model responses. Word filters – Configure filters to block undesirable words, phrases, and profanity. Such words can include offensive terms, competitor names etc. Sensitive information filters – Block or mask sensitive information such as personally identifiable information (PII) or custom regex in user inputs and model responses. Contextual grounding check – Detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query."
4,Which built in datasets are available on Amazon Bedrock for model evaluation of text generation?,"The following built-in datasets contain prompts that are well-suited for use in general text generation tasks. Bias in Open-ended Language Generation Dataset (BOLD) The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts. RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts. T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX) TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is ""George Washington was the president of the United States"". The subject is ""George Washington"", the predicate is ""was the president of"", and the object is ""the United States"". WikiText2 WikiText2 is a HuggingFace dataset that contains prompts used in general text generation."


In [39]:
from datasets import Dataset

questions = test['Question/prompt'].tolist()
ground_truths = [[gt] for gt in test['Correct answer'].tolist()]

answers = []
contexts = []

for query in questions:
    response = ask_bedrock_llm_with_knowledge_base(query, kb_id)
    generatedResult = response['output']['text']
    answers.append(generatedResult)
    contexts.append([doc['content']['text'] for doc in response['citations'][0]['retrievedReferences']])

# To dict
data = {
    "question": questions,
    "answer": answers,
    "contexts": contexts,
    "ground_truths": ground_truths
}

# Convert dict to dataset
dataset = Dataset.from_dict(data)

In [40]:
%%capture
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    context_entity_recall,
    answer_similarity,
    answer_correctness
)

from ragas.metrics.critique import correctness

#specify the metrics here, kept one for now, we can add more.
metrics = [
        answer_relevancy
    ]

result = evaluate(
    dataset = dataset, 
    metrics=metrics,
    llm=llm_for_evaluation,
    embeddings=bedrock_embeddings,
)

ragas_df = result.to_pandas()

AttributeError: type object 'Dataset' has no attribute 'from_list'

In [41]:
ragas_df.style.set_properties(**{'text-align': 'left', 'border': '1px solid black'})
ragas_df.to_string(justify='left', index=False)
with pd.option_context("display.max_colwidth", None):
    pretty_print(ragas_df)

NameError: name 'ragas_df' is not defined

### <a >Challenge Exercise :: Try it Yourself! </a>


<div style="border: 4px solid coral; text-align: left; margin: auto;">
    <br>
    <p style="text-align: center; margin: auto;"><b>Try the following exercises on this lab and note the observations.</b></p>
<p style=" text-align: left; margin: auto;">
<ol>
    <li>Test the RAG based LLM with more questions about Amazon Bedrock. </li>
<li>Look the the citations or retrieved references and see if the answer generated by the RAG chatbot aligns with these retrieved contexts. What response do you get when the retrieved context comes up empty? </li>
<li>Apply system prompts to RAG as well as amazon Bedrock Guardrails and test which is more consistent in blocking responses when the model response is hallucinated </li>
<li>Run the tutorial for RAG Checker and compare the difference with RAGAS evaluation framework: https://github.com/amazon-science/RAGChecker/blob/main/tutorial/ragchecker_tutorial_en.md </li>
</ol>
<br>
</p>
</div>


## Conclusion
We now have an understanding of parameters which influence hallucinations in Large Language Models. We learnt how to set up Retrieval Augmented Generation to provide a context to the model while answering.
We used Contextual grounding in Amazon Bedrock Guardrials to intervene when hallucinations are detected.
Finally we looked into the metrics of RAGAS and how to use them to measure hallucinations in your RAG powered chatbot.

In the next lab, we will:
1. Build a custom hallucination detector
2. Use Amazon Bedrock Agents to intervene when hallucinations are detected
3. Call a human for support when the LLM hallucinates
