<center><img src="images/2024_reInvent_Logo_wDate_Black_V3.png" alt="drawing" width="400" style="background-color:white; padding:1em;" /></center> <br/>

# <a name="0">re:Invent 2024 | Lab 1: Build your RAG powered chatbot  </a>
## <a name="0">Build a chatbot with Knowledge Bases and Guardrails to detect and remediate hallucinations </a>

## Lab Overview
In this lab, you will:
1. Take a deeper look at which LLM parameters influence or control for model hallucinations
2. Understand how Retrieval Augmented Generation can control for hallucinations
3. Apply contextual grounding in Amazon Bedrock Guardrails to intervene when a model hallucinates
4. Use RAGAS evaluation and understand which metrics help us measure hallucinations

## Dataset
For this workshop, we will use the [Bedrock User Guide](https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf) available as a PDF file.
## Use-Case Overview
In this lab, we want to develop a chatbot which can answer questions about Amazon Bedrock as factually as possible. We will work with Retrieval Augmented Generation using [Amazon Bedrock Knowledge Bases](https://aws.amazon.com/bedrock/knowledge-bases/) and apply [Amazon Guardrails](https://aws.amazon.com/bedrock/guardrails/) to intervene when hallucinations are detected.


#### Lab Sections

This lab notebook has the following sections:
    
Please work top to bottom of this notebook and don't skip sections as this could lead to error messages due to missing code.


----

# Star Github repository for future reference

In [1]:
%%html

<a class="github-button" href="https://github.com/aws-samples/responsible_ai_aim325_reduce_hallucinations_for_genai_apps" data-color-scheme="no-preference: light; light: light; dark: dark;" data-icon="octicon-star" data-size="large" data-show-count="true" aria-label="Star Reduce Hallucinations workshop on GitHub">Star</a>
<script async defer src="https://buttons.github.io/buttons.js"></script>

# Environment Setup

In [4]:
%pip install -r ../requirements.txt --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gluonts 0.13.7 requires pydantic~=1.7, but you have pydantic 2.9.2 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [5]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [8]:
import time
import os
import json
import boto3
from time import gmtime, strftime, sleep
import random
import zipfile
import uuid
from rag_setup.create_kb_utils import *
import warnings
warnings.filterwarnings('ignore')
from botocore.config import Config

import numpy as np  
import pandas as pd 
import sagemaker
from botocore.exceptions import ClientError

import pprint
pp = pprint.PrettyPrinter(indent=4)

(sagemaker.__version__,boto3.__version__)





sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


('2.227.0', '1.35.15')

## Set constants

In [10]:
# Get some variables you need to interact with SageMaker service
boto_session = boto3.Session()
region = boto_session.region_name

In [8]:
embedding_model_id="amazon.titan-embed-text-v2:0"
llm_model_id="anthropic.claude-3-sonnet-20240229-v1:0"

In [11]:
# Test if bedrock model access has been enabled
input_prompt = "Who was the first person to land on the sun?"
test_llm_call(input_prompt)

  response = llm(messages)


"No one has ever landed on the sun. The sun is a star with extremely hot temperatures and harsh conditions that make landing on its surface impossible with current technology.\n\nThe sun's surface temperature is around 5,500°C (9,940°F). Its powerful gravitational pull and lack of a solid surface also make landing unfeasible. Any spacecraft would burn up long before reaching the sun's photosphere (visible surface) due to the intense heat and radiation.\n\nVisiting the sun up close has been done only by unmanned spacecraft like NASA's Parker Solar Probe, which flew through the sun's outer atmosphere in 2018-2019 to study the solar corona and solar wind. But even this probe did not actually land on the sun's surface. Landing humans or machines on the sun remains science fiction for now."

<div style="border: 4px solid coral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px">
    <h4>If LLM call to Bedrock did not work, enable model access on Amazon Bedrock console</h4>
</div>
<br/>

# 1. Chat with Anthropic Claude 3 Sonnet through Bedrock

In [26]:
RETRY_CONFIG = Config(
    retries={
        'max_attempts': 5,            # Maximum number of retry attempts
        'mode': 'adaptive'            # Adaptive mode adjusts based on request limits
    },
    read_timeout=1000,
    connect_timeout=1000
)

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
    config=RETRY_CONFIG)

def generate_message_claude(
    query, 
    system_prompt="", 
    max_tokens=1000,
    model_id='anthropic.claude-3-sonnet-20240229-v1:0',
    temperature=0.9,
    top_p=0.99,
    top_k=100
):
    # Prompt with user turn only.
    user_message = {"role": "user", "content": query}
    messages = [user_message]
    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "system": system_prompt,
            "messages": messages,
            "temperature": temperature,
            "top_p": top_p,
            "top_k": top_k
        }
    )

    response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())
    return response_body['content'][0]['text']

In [30]:
query = 'How does Amazon Bedrock Guardrails work?'

response = generate_message_claude(query)
pp.pprint(response)

('Amazon Bedrock Guardrails is a service provided by AWS that helps '
 'organizations establish and enforce secure baselines across their AWS '
 'accounts and resources. It allows organizations to define and implement '
 'guardrails, which are rules or controls that enforce best practices and '
 'ensure compliance with organizational policies and industry standards.\n'
 '\n'
 "Here's how Amazon Bedrock Guardrails works:\n"
 '\n'
 '1. Guardrail Definition: Organizations can define guardrails using '
 'Infrastructure as Code (IaC) templates, such as AWS CloudFormation or AWS '
 'CDK. These guardrails can cover various areas, including identity and access '
 'management, networking, storage, compute, and more.\n'
 '\n'
 '2. Guardrail Deployment: Bedrock Guardrails provides a centralized mechanism '
 'for deploying and managing guardrails across multiple AWS accounts and '
 'regions within an organization. This ensures consistent application of '
 "security controls and compliance requirem

## 1.1 Apply System Prompt

In [31]:
query = 'Is it possible to purchase provisioned throughput for Anthropic Claude models on Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt)
pp.pprint(response)

('I do not have any information about purchasing provisioned throughput for '
 'Anthropic Claude models on Amazon Bedrock. Amazon Bedrock is not a service '
 "I'm familiar with, and I don't have specific details about Anthropic's cloud "
 'offerings or partnerships with Amazon Web Services. My knowledge is limited '
 'in this domain.')


In [32]:
query = 'How do Amazon Bedrock Guardrails work?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt)
pp.pprint(response)

('Amazon Bedrock Guardrails is a service provided by AWS that helps '
 'organizations govern their use of AWS through automated, continuous '
 'monitoring and enforcement of policies and best practices. It allows '
 'organizations to define and apply guardrails across their AWS environments '
 'to help ensure compliance with security, operational, and cost management '
 'requirements.\n'
 '\n'
 "Here's a high-level overview of how Amazon Bedrock Guardrails work:\n"
 '\n'
 '1. Guardrail Definition: Organizations define guardrails as code using AWS '
 'CloudFormation templates or AWS Service Catalog products. Guardrails can '
 'cover various areas such as identity and access management, data protection, '
 'logging and monitoring, and resource quotas.\n'
 '\n'
 "2. Deployment: The defined guardrails are deployed across the organization's "
 'AWS accounts and regions using AWS Organizations and AWS Control Tower.\n'
 '\n'
 '3. Continuous Monitoring: Bedrock Guardrails continuously monitor

## 1.2 Understanding LLM generation parameters
### 1. Temperature: The amount of randomness injected into the response.

In [33]:
query = 'What is Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt, temperature=1)
pp.pprint(response)

('Amazon Bedrock is an Amazon Web Services (AWS) service that provides '
 'businesses with the ability to build and deploy secure and resilient '
 'applications on a foundational layer of systems, runtimes, and services.\n'
 '\n'
 'Some key features and capabilities of Amazon Bedrock include:\n'
 '\n'
 '1) Automated Provisioning: It automates the provisioning and configuration '
 'of AWS accounts, networking, logging, and other foundational services '
 'according to best practices.\n'
 '\n'
 '2) Resilient Architecture: Bedrock provides a resilient architecture with '
 'multiple Availability Zones to protect applications from failures.\n'
 '\n'
 '3) Security Best Practices: It implements security best practices such as '
 'centralized logging, encryption of data at rest and in transit, and secure '
 'network configurations.\n'
 '\n'
 '4) Operational Visibility: Bedrock provides operational visibility through '
 'centralized logging and monitoring for AWS accounts provisioned using the '

In [34]:
query = 'What is Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt, temperature=0)
pp.pprint(response)

('Amazon Bedrock is a real-time operating system developed by Amazon for '
 'running applications on resource-constrained devices like microcontrollers '
 'and sensors.\n'
 '\n'
 'Some key points about Amazon Bedrock:\n'
 '\n'
 '- It is designed to be a secure, real-time operating system for internet of '
 'things (IoT) devices and embedded applications.\n'
 '\n'
 '- It provides a lightweight kernel and built-in libraries to enable '
 'real-time performance and efficient resource utilization on devices with '
 'limited compute power and memory.\n'
 '\n'
 '- It supports common microcontroller architectures like ARM, RISC-V, and '
 'x86.\n'
 '\n'
 '- Bedrock aims to simplify development and deployment of IoT applications by '
 'providing a consistent software foundation across different hardware '
 'platforms.\n'
 '\n'
 '- It includes built-in security features like memory protection, encrypted '
 'communication, and secure boot capabilities.\n'
 '\n'
 '- Bedrock integrates with AWS IoT 

### 2. top_p – Use nucleus sampling.

In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.

In [35]:
query = 'What is Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt, temperature=1, top_p=1)
pp.pprint(response)

('Amazon Bedrock is a secure, multi-tenant cloud service from Amazon Web '
 'Services (AWS) that makes it easier to set up and manage virtual private '
 'clouds.\n'
 '\n'
 'Some key points about Amazon Bedrock:\n'
 '\n'
 '- It provides a secure and governed way to create and manage multiple '
 'virtual private clouds (VPCs) across different AWS accounts and regions.\n'
 '\n'
 '- It uses a centralized approach to manage VPCs, networking resources, '
 'accounts, and teams through a single pane of glass.\n'
 '\n'
 '- It helps organizations gain operational visibility and control across '
 'their cloud networking environments.\n'
 '\n'
 '- It enables automated provisioning and secure connectivity between AWS '
 'resources spread across different VPCs and accounts.\n'
 '\n'
 '- It integrates with AWS services like AWS Organizations, AWS Control Tower, '
 'AWS Firewall Manager for central governance and compliance.\n'
 '\n'
 '- It aims to simplify and standardize how enterprises set up and o

### 3. top_k: Only sample from the top K options for each subsequent token.

Use top_k to remove long tail low probability responses.

In [36]:
query = 'What is Amazon Bedrock?'
system_prompt = 'You are a helpful AI assistant. You try to answer the user queries to the best of your knowledge. If you are unsure of the answer, do not make up any information.'

response = generate_message_claude(query, system_prompt, temperature=0, top_p=1, top_k=100)
pp.pprint(response)

('Amazon Bedrock is a real-time operating system developed by Amazon for '
 'running applications on resource-constrained devices like microcontrollers '
 'and sensors.\n'
 '\n'
 'Some key points about Amazon Bedrock:\n'
 '\n'
 '- It is designed to be a secure, real-time operating system for internet of '
 'things (IoT) devices and embedded applications.\n'
 '\n'
 '- It provides a lightweight environment with real-time performance for '
 'running multiple software components concurrently.\n'
 '\n'
 '- It supports common IoT communication protocols like MQTT, BLE, and Wi-Fi '
 'out of the box.\n'
 '\n'
 '- It includes built-in security features like code signing, encrypted '
 'storage, secure boot, and hardware security integration.\n'
 '\n'
 '- It allows developers to build and deploy applications in multiple '
 'programming languages like C, Rust, and Python.\n'
 '\n'
 '- Bedrock is open source and available under the Apache 2.0 license on '
 'GitHub.\n'
 '\n'
 '- It is optimized to r

# Retrieval Augmented Generation
We are using the Retrieval Augmented Generation (RAG) technique with Amazon Bedrock. A RAG implementation consists of two parts:

    1. A data pipeline that ingests that from documents (typically stored in Amazon S3) into a Knowledge Base i.e. a vector database such as Amazon OpenSearch Service Serverless (AOSS) so that it is available for lookup when a question is received.

The data pipeline represents an undifferentiated heavy lifting and can be implemented using Amazon Bedrock Knowledge Bases. We can now connect an S3 bucket to a vector database such as AOSS and have a Bedrock Knowledge Bases read the objects (html, pdf, text etc.), chunk them, and then convert these chunks into embeddings using Amazon Titan Embeddings model and then store these embeddings in AOSS. All of this without having to build, deploy, and manage the data pipeline.

<center><img src="images/fully_managed_ingestion.png" alt="This image shows how Aazon Bedrock Knowledge Bases ingests objects in a S3 bucket into the Knowledge Base for use in a RAG set up. The objects are chunks, embedded and then stored in a vector index." height="700" width="700" style="background-color:white; padding:1em;" /></center> <br/>
    

    2. An application that receives a question from the user, looks up the knowledge base for relevant pieces of information (context) and then creates a prompt that includes the question and the context and provides it to an LLM for generating a response.






Once the data is available in the Bedrock knowledge base, then user questions can be answered using the following system design:

<center><img src="images/retrieveAndGenerate.png" alt="This image shows the retrieval augmented generation (RAG) system design setup with knowledge bases, S3, and AOSS. Knowledge corpus is ingested into a vector database using Amazon Bedrock Knowledge Base Agent and then RAG approach is used to work question answering. The question is converted into embeddings followed by semantic similarity search to get similar documents. With the user prompt being augmented with the RAG search response, the LLM is invoked to get the final raw response for the user." height="700" width="700" style="background-color:white; padding:1em;" /></center> <br/>


# Data
Let's use publicly available [Bedrock user guide](https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf) to inform the model.

In [21]:
!export PYTHONPATH='./lab1/'

In [None]:
kb_id = None
kb_list = bedrock_agent_client.list_knowledge_bases()['knowledgeBaseSummaries']
for kb in kb_list:
    if kb['name'] == 'bedrock_user_guide_kb':
        kb_id = kb['knowledgeBaseId']

if kb_id is None:
    print(f"Please navigate to Amazon Bedrock > Builder Tools > Knowledge Bases. Click on 'bedrock_user_guide_kb' KB. Go to Datasource section and click `Sync` button. Please wait for it to finish, then re-run this cell. ")
print(kb_id)

STU0PK5QDL


In [23]:
# keep the kb_id for invocation later in the invoke request
%store kb_id

Stored 'kb_id' (str)
Stored 'bucket_name' (str)


# Chat with the model using the knowledge base by providing the generated KB_ID
### Using RetrieveAndGenerate API
Behind the scenes, RetrieveAndGenerate API converts queries into embeddings, searches the knowledge base, and then augments the foundation model prompt with the search results as context information and returns the FM-generated response to the question. For multi-turn conversations, Knowledge Bases manage short-term memory of the conversation to provide more contextual results.The output of the RetrieveAndGenerate API includes the generated response, source attribution as well as the retrieved text chunks.

In [26]:
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region)


def ask_bedrock_llm_with_knowledge_base(query,
                                        kb_id=kb_id,
                                        model_arn=llm_model_id,
                                        ) -> str:
    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            'text': query
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': model_arn
            }
        },
    )

    return response

In [27]:
query = "What is Amazon Bedrock?"

response = ask_bedrock_llm_with_knowledge_base(query, kb_id)
generated_text = response['output']['text']
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
        contexts.append(reference["content"]["text"])
print(f"---------- Generated using Anthropic Claude 3 Sonnet:")
pp.pprint(generated_text )
print(f'---------- The citations for the response:')
pp.pprint(contexts)

---------- Generated using Anthropic Claude 3 Sonnet:
('Amazon Bedrock is a fully managed service that provides access to '
 'high-performing foundation models (FMs) from leading AI companies and Amazon '
 'through a unified API. It allows you to experiment with and evaluate '
 'different foundation models, customize them with your own data using '
 'techniques like fine-tuning and Retrieval Augmented Generation (RAG), and '
 'build agents that can execute tasks using your systems and data sources. '
 "With Amazon Bedrock's serverless experience, you can get started quickly, "
 'customize foundation models with your data, and easily integrate and deploy '
 'them into your applications using AWS tools without managing any '
 'infrastructure.')
---------- The citations for the response:
[ '........... 1652     Amazon Bedrock Runtime '
  '................................................................................................................... '
  '1654 Basics '
  '..............

In [28]:
query = "Is it possible to purchase provisioned throughput for Anthropic Claude Sonnet on Amazon Bedrock?"

response = ask_bedrock_llm_with_knowledge_base(query, kb_id)
generated_text = response['output']['text']
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
        contexts.append(reference["content"]["text"])
print(f"---------- Generated using Anthropic Claude 3 Sonnet:")
pp.pprint(generated_text )
print(f'---------- The citations for the response:')
pp.pprint(contexts)

---------- Generated using Anthropic Claude 3 Sonnet:
('Yes, it is possible to purchase provisioned throughput for Anthropic Claude '
 'Sonnet models on Amazon Bedrock. The search results show that the following '
 'Anthropic Claude Sonnet models are supported for provisioned throughput:\n'
 '\n'
 '- Claude 3 Sonnet 28k\n'
 '- Anthropic Claude 3.5 Sonnet 18k\n'
 '- Anthropic Claude 3.5 Sonnet 200k\n'
 '- Anthropic Claude 3.5 Sonnet 51k The search results provide details on the '
 'regions where provisioned throughput can be purchased for these models, as '
 'well as whether no-commitment purchases are supported for the base models.')
---------- The citations for the response:
[ 'Claude 3 Sonnet 28k     anthropic.claude-3- sonnet-20240229-v 1:0:28k     '
  'us-east-1     us-west-2     ap-northeast-1     ap-northeast-2     '
  'ap-south-1     ap-southeast-1     ap-southeast-2     eu-west-1     '
  'eu-west-3     Yes     Anthropic Claude 3.5 Sonnet 18k     '
  'anthropic.claude-3-5- sonne

# Contextual Grounding Check with Amazon Bedrock Guardrails
Contextual grounding check evaluates for hallucinations across two paradigms:

- Grounding – This checks if the model response is factually accurate based on the source and is grounded in the source. Any new information introduced in the response will be considered un-grounded.

- Relevance – This checks if the model response is relevant to the user query.

In [29]:
# Create guardrail

random_id_suffix = str(uuid.uuid1())[:6] # get first 6 characters of uuid string to generate guardrail name suffix

bedrock_client = boto3.client('bedrock')
guardrail_name = f"bedrock-rag-grounding-guardrail-{random_id_suffix}"
print(guardrail_name)

guardrail_response = bedrock_client.create_guardrail(
    name=guardrail_name,
    description='Guardrail for ensuring relevance and grounding of model responses in RAG powered chatbot',
    contextualGroundingPolicyConfig={
        'filtersConfig': [
            {
                'type': 'GROUNDING',
                'threshold': 0.5
            },
            {
                'type': 'RELEVANCE',
                'threshold': 0.5
            },
        ]
    },
    blockedInputMessaging='Can you please rephrase your question?',
    blockedOutputsMessaging='Sorry, I am not able to find the correct answer to your query - Can you try reframing your query to be more specific'
)

bedrock-rag-grounding-guardrail-4be13c


In [30]:
guardrailId = guardrail_response['guardrailId']
guardrail_response

{'ResponseMetadata': {'RequestId': '3ab5faab-6f00-4610-90a4-8da1bc5542a1',
  'HTTPStatusCode': 202,
  'HTTPHeaders': {'date': 'Thu, 21 Nov 2024 22:16:46 GMT',
   'content-type': 'application/json',
   'content-length': '172',
   'connection': 'keep-alive',
   'x-amzn-requestid': '3ab5faab-6f00-4610-90a4-8da1bc5542a1'},
  'RetryAttempts': 0},
 'guardrailId': 'vkxoa1vlr9wb',
 'guardrailArn': 'arn:aws:bedrock:us-west-2:615452588358:guardrail/vkxoa1vlr9wb',
 'version': 'DRAFT',
 'createdAt': datetime.datetime(2024, 11, 21, 22, 16, 46, 708786, tzinfo=tzlocal())}

In [31]:
guardrail_version = bedrock_client.create_guardrail_version(
    guardrailIdentifier=guardrail_response['guardrailId'],
    description='Working version of RAG app guardrail with higher thresholds for contextual grounding'
)

guardrailVersion = guardrail_response['version']

%store guardrailId

{'ResponseMetadata': {'RequestId': 'd109a047-b1eb-4497-adcd-4e16495d1ba9', 'HTTPStatusCode': 202, 'HTTPHeaders': {'date': 'Thu, 21 Nov 2024 22:16:47 GMT', 'content-type': 'application/json', 'content-length': '44', 'connection': 'keep-alive', 'x-amzn-requestid': 'd109a047-b1eb-4497-adcd-4e16495d1ba9'}, 'RetryAttempts': 0}, 'guardrailId': 'vkxoa1vlr9wb', 'version': '1'}
vkxoa1vlr9wb
Stored 'guardrailId' (str)


In [32]:
# Retrieve and Generate using Guardrail

bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region)


def retrieve_and_generate_with_guardrail(
    query,
    kb_id,
    model_arn=llm_model_id,
    session_id=None
):

    prompt_template = 'You are a helpful AI assistant to help users understand documented risks in various projects. \
    Answer the user query based on the context retrieved. If you dont know the answer, dont make up anything. \
    Only answer based on what you know from the provided context. You can ask the user for clarifying questions if anything is unclear\
    But generate an answer only when you are confident about it and based on the provided context.\
    User Query: $query$\
    Context: $search_results$'

    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            'text': query
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'generationConfiguration': {
                    'guardrailConfiguration': {
                        'guardrailId': guardrailId,
                        'guardrailVersion': guardrailVersion
                    },
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'temperature': 0.7,
                            'topP': 0.25
                        }
                    },
                    'promptTemplate': {
                        'textPromptTemplate': prompt_template
                    }
                },
                'knowledgeBaseId': kb_id,
                'modelArn': model_arn,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'overrideSearchType': 'SEMANTIC'
                    }
                }
            }
        }
    )
    return response

In [33]:
# Knowledge BAse ID

query = 'What is Amazon Bedrock?'

model_response = retrieve_and_generate_with_guardrail(query, kb_id)

pp.pprint(model_response)

{'ResponseMetadata': {'RequestId': '4f277de6-0b5a-485e-ac8d-5964372008c3', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Thu, 21 Nov 2024 22:16:55 GMT', 'content-type': 'application/json', 'content-length': '1703', 'connection': 'keep-alive', 'x-amzn-requestid': '4f277de6-0b5a-485e-ac8d-5964372008c3'}, 'RetryAttempts': 0}, 'citations': [{'generatedResponsePart': {'textResponsePart': {'span': {'end': 50, 'start': 0}, 'text': 'Sorry, I am unable to assist you with this request.'}}, 'retrievedReferences': []}], 'guardrailAction': 'NONE', 'output': {'text': 'According to the context provided, Amazon Bedrock is a fully managed service from AWS that provides access to high-performing foundation models (FMs) from leading AI companies and Amazon through a unified API.\n\nSome key points about Amazon Bedrock:\n\n- It allows you to choose from a wide range of foundation models to find the one best suited for your use case.\n- It offers capabilities to build generative AI applications with secu

# Evaluating RAG with RAGAS

In [34]:
from botocore.client import Config
from langchain.llms.bedrock import Bedrock
from langchain_community.chat_models.bedrock import BedrockChat
from langchain.embeddings import BedrockEmbeddings
from langchain.retrievers.bedrock import AmazonKnowledgeBasesRetriever
from langchain.chains import RetrievalQA

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config
                              )

llm_for_text_generation = BedrockChat(model_id=llm_model_id, client=bedrock_client)

llm_for_evaluation = BedrockChat(model_id=llm_model_id, client=bedrock_client)

bedrock_embeddings = BedrockEmbeddings(model_id=embedding_model_id,client=bedrock_client)

  bedrock_embeddings = BedrockEmbeddings(model_id=embedding_model_id,client=bedrock_client)


In [35]:
import pandas as pd

test = pd.read_csv('data/bedrock-user-guide-test.csv')
test = test.dropna()
test.style.set_properties(**{'text-align': 'left', 'border': '1px solid black'})
test.to_string(justify='left', index=False)
with pd.option_context("display.max_colwidth", None):
    pp.pprint(test)

Unnamed: 0,Question/prompt,Correct answer
0,Are all models accessible on Amazon Bedrock by default?,"Access to Amazon Bedrock foundation models isn't granted by default. You can request access, or modify access, to foundation models only by using the Amazon Bedrock console. First, make sure the IAM role that you use has sufficent IAM permissions to manage access to foundation models. Then, add or remove access to a model by following the instructions at Add or remove access to Amazon Bedrock foundation models."
1,What is the Model ID of Amazon Titan Text Premier,amazon.titan-text-premier-v1:0
2,With which Anthropic Claude models can I use the Text Completions API?,"Anthropic Claude Instant v1.2, Anthropic Claude v2, Anthropic Claude v2.1"
3,What policies can I configure in Amazon Bedrock guardrails?,"You can configure the following policies in a guardrail to avoid undesirable and harmful content and remove sensitive information for privacy protection. Content filters – Adjust filter strengths to block input prompts or model responses containing harmful content. Denied topics – Define a set of topics that are undesirable in the context of your application. These topics will be blocked if detected in user queries or model responses. Word filters – Configure filters to block undesirable words, phrases, and profanity. Such words can include offensive terms, competitor names etc. Sensitive information filters – Block or mask sensitive information such as personally identifiable information (PII) or custom regex in user inputs and model responses. Contextual grounding check – Detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query."
4,Which built in datasets are available on Amazon Bedrock for model evaluation of text generation?,"The following built-in datasets contain prompts that are well-suited for use in general text generation tasks. Bias in Open-ended Language Generation Dataset (BOLD) The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts. RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts. T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX) TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is ""George Washington was the president of the United States"". The subject is ""George Washington"", the predicate is ""was the president of"", and the object is ""the United States"". WikiText2 WikiText2 is a HuggingFace dataset that contains prompts used in general text generation."


In [43]:
from datasets import Dataset

questions = test['Question/prompt'].tolist()
ground_truth = [gt for gt in test['Correct answer'].tolist()]

answers = []
contexts = []

for query in questions:
    response = ask_bedrock_llm_with_knowledge_base(query, kb_id)
    generatedResult = response['output']['text']
    answers.append(generatedResult)
    contexts.append([doc['content']['text'] for doc in response['citations'][0]['retrievedReferences']])

# To dict
data = {
    "question": questions,
    "answer": answers,
    "contexts": contexts,
    "ground_truth": ground_truth
}

# Convert dict to dataset
dataset = Dataset.from_dict(data)

In [44]:
ground_truth

["Access to Amazon Bedrock foundation models isn't granted by default. You can request access, or modify access, to foundation models only by using the Amazon Bedrock console. First, make sure the IAM role that you use has\xa0sufficent IAM permissions\xa0to manage access to foundation models. Then, add or remove access to a model by following the instructions at\xa0Add or remove access to Amazon Bedrock foundation models.",
 'amazon.titan-text-premier-v1:0',
 'Anthropic Claude Instant v1.2, Anthropic Claude v2, Anthropic Claude v2.1',
 'You can configure the following policies in a guardrail to avoid undesirable and harmful content and remove sensitive information for privacy protection. Content filters\xa0– Adjust filter strengths to block input prompts or model responses containing harmful content.\nDenied topics\xa0– Define a set of topics that are undesirable in the context of your application. These topics will be blocked if detected in user queries or model responses.\nWord filte

### Let us deep dive into the two RAGAS metrics that we will also use in the next lab

- answer_relevancy: Answer Relevancy metric focuses on assessing how pertinent the generated answer is to the given prompt. A lower score is assigned to answers that are incomplete or contain redundant information and higher scores indicate better relevancy. This metric is computed using the user_input, the retrived_contexts and the response.
  
- answer_correctness: The assessment of Answer Correctness involves gauging the accuracy of the generated answer when compared to the ground truth. This evaluation relies on the ground truth and the answer, with scores ranging from 0 to 1. A higher score indicates a closer alignment between the generated answer and the ground truth, signifying better correctness. Answer correctness encompasses two critical aspects: semantic similarity between the generated answer and the ground truth, as well as factual similarity. 

In [45]:
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy
)

#specify the metrics here, kept one for now, we can add more.
metrics_ar = [
        answer_relevancy
    ]

result_ar = evaluate(
    dataset = dataset, 
    metrics=metrics_ar,
    llm=llm_for_evaluation,
    embeddings=bedrock_embeddings,
    raise_exceptions=False
)

ragas_df_ar= result_ar.to_pandas()

Evaluating:   0%|          | 0/5 [00:00<?, ?it/s]

In [46]:
ragas_df_ar.style.set_properties(**{'text-align': 'left', 'border': '1px solid black'})
ragas_df_ar.to_string(justify='left', index=False)
with pd.option_context("display.max_colwidth", None):
    pp.pprint(ragas_df_ar)

Unnamed: 0,user_input,retrieved_contexts,response,reference,answer_relevancy
0,Are all models accessible on Amazon Bedrock by default?,"[The following table lists product IDs for Amazon Bedrock foundation models: The following is the format of the IAM policy you can attach to a role to control model access permissions: Model Product ID AI21 Labs Jurassic-2 Mid 1d288c71-65f9-489a-a3e2-9c7f4f6e6a85 AI21 Labs Jurassic-2 Ultra cc0bdd50-279a-40d8-829c-4009b77a1fcc AI21 Jamba-Instruct prod-dr2vpvd4k73aq AI21 Labs Jamba 1.5 Large prod-evcp4w4lurj26 AI21 Labs Jamba 1.5 Mini prod-ggrzjm65qmjhm Anthropic Claude c468b48a-84df-43a4-8c46-8870630108a7 Anthropic Claude Instant b0eb9475-3a2c-43d1-94d3-56756fd43737 Anthropic Claude 3 Sonnet prod-6dw3qvchef7zy Anthropic Claude 3.5 Sonnet prod-m5ilt4siql27k Anthropic Claude 3.5 Sonnet v2 prod-cx7ovbu5wex7g Anthropic Claude 3 Haiku prod-ozonys2hmmpeu Anthropic Claude 3.5 Haiku prod-5oba7y7jpji56 Anthropic Claude 3 Opus prod-fm3feywmwerog Grant permissions to request access to foundation models 27Amazon Bedrock User Guide Model Product ID Cohere Command a61c46fe-1747-41aa-9af0-2e0ae8a9ce05 Cohere Command Light 216b69fd-07d5-4c7b-866b-936456d68311 Cohere Command R prod-tukx4z3hrewle Cohere Command R+ prod-nb4wqmplze2pm Cohere Embed (English) b7568428-a1ab-46d8-bab3-37def50f6f6a Cohere Embed (Multilingual) 38e55671-c3fe-4a44-9783-3584906e7cad Stable Diffusion XL 1.0 prod-2lvuzn4iy6n6o Stable Image Core 1.0 prod-eacdrmv7zfc5e Stable Diffusion 3 Large 1.0 prod-cqfmszl26sxu4 Stable Image Ultra 1.0 prod-7boen2z2wnxrg { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Effect"": ""Allow|Deny"", ""Action"": [ ""aws-marketplace:Subscribe"" ], ""Resource"": ""*"", ""Condition"": { ""ForAnyValue:StringEquals"": { ""aws-marketplace:ProductId"": [ model-product-id-1, model-product-id-2, ... ] } } }, Grant permissions to request access to foundation models 28Amazon Bedrock User Guide { ""Effect"": ""Allow|Deny"", ""Action"": [ ""aws-marketplace:Unsubscribe"" ""aws-marketplace:ViewSubscriptions"" ], ""Resource"": ""*"" } ] } To see an example policy, refer to Allow access to third-party model subscriptions. Add or remove access to Amazon Bedrock foundation models Before you can use a foundation model in Amazon Bedrock, you must request access to it. If you no longer need access to a model, you can remove access from it. Note You can't remove request access from the Amazon Titan, Mistral AI, and Meta Llama 3 Instruct models. You can prevent users from making inference calls to these models by using an IAM policy and specifying the model ID. For more information, see Deny access for inference on specific models. Once access is provided to a model, it is available for all users in the AWS account. To add or remove access to foundation models 1. Make sure you have permissions to request access, or modify access, to Amazon Bedrock foundation models. 2. Sign into the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/. 3. In the left navigation pane, under Bedrock configurations, choose Model access. 4. On the Model access page, choose Modify model access. 5. Select the models that you want the account to have access to and unselect the models that you don't want the account to have access to.]","No, not all models are accessible on Amazon Bedrock by default. Before you can use a foundation model in Amazon Bedrock, you must request access to it. The documentation states: ""Before you can use a foundation model in Amazon Bedrock, you must request access to it. If you no longer need access to a model, you can remove access from it."" However, the documentation also notes that you cannot remove access from certain models like Amazon Titan, Mistral AI, and Meta Llama 3 Instruct models. For these models, you can prevent users from making inference calls by using an IAM policy and specifying the model ID.","Access to Amazon Bedrock foundation models isn't granted by default. You can request access, or modify access, to foundation models only by using the Amazon Bedrock console. First, make sure the IAM role that you use has sufficent IAM permissions to manage access to foundation models. Then, add or remove access to a model by following the instructions at Add or remove access to Amazon Bedrock foundation models.",1.0
1,What is the Model ID of Amazon Titan Text Premier,"[Separate keys with a space. aws bedrock-agent untag-resource \ --resource-arn ""arn:aws:bedrock:us-east-1:123456789012:agent/AGENT12345"" \ --tag-keys key=department facing List the tags for the agent. aws bedrock-agent list-tags-for-resource \ --resource-arn ""arn:aws:bedrock:us-east-1:123456789012:agent/AGENT12345"" Python (Boto) Add two tags to an agent. import boto3 bedrock = boto3.client(service_name='bedrock-agent') tags = [ { 'key': 'department', 'value': 'billing' }, { Use the API 1417 https://docs.aws.amazon.com/bedrock/latest/APIReference/API_TagResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_TagResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_UntagResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_UntagResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListTagsForResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ListTagsForResource.htmlAmazon Bedrock User Guide 'key': 'facing', 'value': 'internal' } ] bedrock.tag_resource(resourceArn='arn:aws:bedrock:us-east-1:123456789012:agent/ AGENT12345', tags=tags) Remove the tags from the agent. bedrock.untag_resource( resourceArn='arn:aws:bedrock:us-east-1:123456789012:agent/AGENT12345', tagKeys=['department', 'facing'] ) List the tags for the agent. bedrock.list_tags_for_resource(resourceArn='arn:aws:bedrock:us- east-1:123456789012:agent/AGENT12345') Use the API 1418Amazon Bedrock User Guide Overview of Amazon Titan models Amazon Titan foundation models (FMs) are a family of FMs pretrained by AWS on large datasets, making them powerful, general-purpose models built to support a variety of use cases. Use them as-is or privately customize them with your own data. Amazon Titan supports the following models for Amazon Bedrock. ? Amazon Titan Text ? Amazon Titan Text Embeddings V2 ? Amazon Titan Multimodal Embeddings G1 ? Amazon Titan Image Generator G1 V1 Topics ? Amazon Titan Text models ? Amazon Titan Text Embeddings models ? Amazon Titan Multimodal Embeddings G1 model ? Amazon Titan Image Generator G1 models Amazon Titan Text models Amazon Titan text models include Amazon Titan Text G1 - Premier, Amazon Titan Text G1 - Express and Amazon Titan Text G1 - Lite. Amazon Titan Text G1 - Premier Amazon Titan Text G1 - Premier is a large language model for text generation. It is useful for a wide range of tasks including open-ended and context-based question answering, code generation, and summarization. This model is integrated with Amazon Bedrock Knowledge Base and Amazon Bedrock Agents. The model also supports Custom Finetuning in preview. ? Model ID ? amazon.titan-text-premier-v1:0 ? Max tokens ? 32K ? Languages ? English Amazon Titan Text 1419Amazon Bedrock User Guide ? Supported use cases ? 32k context window, open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, chat, Knowledge Base support, Agents support, Model Customization (preview). ? Inference parameters ? Temperature, Top P (defaults: Temperature = 0.7, Top P = 0.9) AWS AI Service Card - Amazon Titan Text Premier Amazon Titan Text G1 - Express Amazon Titan Text G1 - Express is a large language model for text generation. It is useful for a wide range of advanced, general language tasks such as open-ended text generation and conversational chat, as well as support within Retrieval Augmented Generation (RAG). At launch, the model is optimized for English, with multilingual support for more than 30 additional languages available in preview. ? Model ID ? amazon.titan-text-express-v1 ? Max tokens ? 8K ? Languages ? English (GA), 100 additional languages (Preview) ? Supported use cases ? Retrieval augmented generation, open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, and chat., Model ID ? amazon.titan-text-premier-v1:0 ? Max tokens ? 32K ? Languages ? English Amazon Titan Text 1419Amazon Bedrock User Guide ? Supported use cases ? 32k context window, open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, chat, Knowledge Base support, Agents support, Model Customization (preview). ? Inference parameters ? Temperature, Top P (defaults: Temperature = 0.7, Top P = 0.9) AWS AI Service Card - Amazon Titan Text Premier Amazon Titan Text G1 - Express Amazon Titan Text G1 - Express is a large language model for text generation. It is useful for a wide range of advanced, general language tasks such as open-ended text generation and conversational chat, as well as support within Retrieval Augmented Generation (RAG). At launch, the model is optimized for English, with multilingual support for more than 30 additional languages available in preview. ? Model ID ? amazon.titan-text-express-v1 ? Max tokens ? 8K ? Languages ? English (GA), 100 additional languages (Preview) ? Supported use cases ? Retrieval augmented generation, open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, and chat. Amazon Titan Text G1 - Lite Amazon Titan Text G1 - Lite is a light weight efficient model, ideal for fine-tuning of English- language tasks, including like summarizations and copy writing, where customers want a smaller, more cost-effective model that is also highly customizable. ? Model ID ? amazon.titan-text-lite-v1 ? Max tokens ? 4K ? Languages ? English ? Supported use cases ? Open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, and chat. Amazon Titan Text G1 - Express 1420 https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text-premier/Amazon Bedrock User Guide Amazon Titan Text Model Customization For more information on customizing Amazon Titan text models, see the following pages. ? Prepare the datasets ? Amazon Titan text model customization hyperparameters Amazon Titan Text Prompt Engineering Guidelines Amazon Titan text models can be used in a wide variety of applications for different use cases. Amazon Titan Text models have prompt engineering guidelines for the following applications including: ? Chatbot ? Text2SQL ? Function Calling ? RAG (Retrieval Augmented Generation) For more information on Amazon Titan Text prompt engineering guidelines, see Amazon Titan Text Prompt Engineering Guidelines. For general prompt engineering guidelines, see Prompt Engineering Guidelines. AWS AI Service Card - Amazon Titan Text AI Service Cards provide transparency and document the intended use cases and fairness considerations for our AWS AI services. AI Service Cards provide a single place to find information on the intended use cases, responsible AI design choices, best practices, and performance for a set of AI service use cases. Amazon Titan Text Embeddings models Amazon Titan Embeddings text models include Amazon Titan Text Embeddings v2 and Titan Text Embeddings G1 model. Text embeddings represent meaningful vector representations of unstructured text such as documents, paragraphs, and sentences. You input a body of text and the output is a (1 x n) vector. You can use embedding vectors for a wide variety of applications. Amazon Titan Text Model Customization 1421 https://d2eo22ngex1n9g.cloudfront.net/Documentation/User+Guides/Titan/Amazon+Titan+Text+Prompt+Engineering+Guidelines.pdf https://d2eo22ngex1n9g.cloudfront.net/Documentation/User+Guides/Titan/Amazon+Titan+Text+Prompt+Engineering+Guidelines.pdf https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/Amazon Bedrock User Guide The Amazon Titan Text Embedding v2 model (amazon.titan-embed-text-v2:0) can intake up to 8,192 tokens and outputs a vector of 1,024 dimensions. The model also works in 100+ different languages. The model is optimized for text retrieval tasks, but can also perform additional tasks, such as semantic similarity and clustering. Amazon Titan Embeddings text v2 also supports long documents, however, for retrieval tasks it is recommended to segment documents into logical segments , such as paragraphs or sections. Amazon Titan Embeddings models generate meaningful semantic representation of documents, paragraphs and sentences.]",The Model ID of Amazon Titan Text Premier is amazon.titan-text-premier-v1:0.,amazon.titan-text-premier-v1:0,0.995229
2,With which Anthropic Claude models can I use the Text Completions API?,"[Anthropic Claude 3 model, such as Anthropic Claude 3 Opus For information about creating system prompts, see https://docs.anthropic.com/claude/ docs/how-to-use-system-prompts in the Anthropic Claude documentation. To avoid timeouts with Anthropic Claude version 2.1, we recommend limiting the input token count in the prompt field to 180K. We expect to address this timeout issue soon. In the inference call, fill the body field with a JSON object that conforms the type call you want to make, Anthropic Claude Text Completions API or Anthropic Claude Messages API. Topics ? Anthropic Claude Text Completions API ? Anthropic Claude Messages API Anthropic Claude models 147 https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags https://docs.anthropic.com/en/docs/welcome https://docs.anthropic.com/claude/docs/how-to-use-system-prompts https://docs.anthropic.com/claude/docs/how-to-use-system-promptsAmazon Bedrock User Guide Anthropic Claude Text Completions API This section provides inference parameters and code examples for using Anthropic Claude models with the Text Completions API. Topics ? Anthropic Claude Text Completions API overview ? Supported models ? Request and Response ? Code example Anthropic Claude Text Completions API overview Use the Text Completion API for single-turn text generation from a user supplied prompt. For example, you can use the Text Completion API to generate text for a blog post or to summarize text input from a user. For information about creating prompts for Anthropic Claude models, see Introduction to prompt design. If you want to use your existing Text Completions prompts with the Anthropic Claude Messages API, see Migrating from Text Completions. Supported models You can use the Text Completions API with the following Anthropic Claude models. ? Anthropic Claude Instant v1.2 ? Anthropic Claude v2 ? Anthropic Claude v2.1 Request and Response The request body is passed in the body field of a request to InvokeModel or InvokeModelWithResponseStream. For more information, see https://docs.anthropic.com/claude/reference/complete_post in the Anthropic Claude documentation. Anthropic Claude models 148 https://docs.anthropic.com/claude/docs/introduction-to-prompt-design https://docs.anthropic.com/claude/docs/introduction-to-prompt-design https://docs.anthropic.com/claude/reference/migrating-from-text-completions-to-messages https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html https://docs.anthropic.com/claude/reference/complete_postAmazon Bedrock User Guide Request Anthropic Claude has the following inference parameters for a Text Completion inference call. { ""prompt"": "" Human:<prompt> Assistant:"", ""temperature"": float, ""top_p"": float, ""top_k"": int, ""max_tokens_to_sample"": int, ""stop_sequences"": [string] } The following are required parameters. ? prompt ? (Required) The prompt that you want Claude to complete. For proper response generation you need to format your prompt using alternating Human: and Assistant: conversational turns. For example: "" Human: {userQuestion} Assistant:"" For more information, see Prompt validation in the Anthropic Claude documentation. ? max_tokens_to_sample ? (Required) The maximum number of tokens to generate before stopping. We recommend a limit of 4,000 tokens for optimal performance. Note that Anthropic Claude models might stop generating tokens before reaching the value of max_tokens_to_sample. Different Anthropic Claude models have different maximum values for this parameter. For more information, see Model comparison in the Anthropic Claude documentation. Default Minimum Maximum 200 0 4096 The following are optional parameters. ? stop_sequences ? (Optional) Sequences that will cause the model to stop generating. Anthropic Claude models 149 https://docs.anthropic.com/claude/reference/prompt-validation https://docs.anthropic.com/claude/docs/models-overview#model-comparisonAmazon Bedrock User Guide Anthropic Claude models stop on "" Human:"", and may include additional built-in stop sequences in the future. Use the stop_sequences inference parameter to include additional strings that will signal the model to stop generating text. ? temperature ?]",You can use the Text Completions API with the following Anthropic Claude models: - Anthropic Claude Instant v1.2 - Anthropic Claude v2 - Anthropic Claude v2.1,"Anthropic Claude Instant v1.2, Anthropic Claude v2, Anthropic Claude v2.1",0.676724
3,What policies can I configure in Amazon Bedrock guardrails?,"[For more information about the fields in a content filter, see ContentFilter. ? Specify the category in the type field. ? Specify the strength of the filter for prompts in the strength field of the textToTextFiltersForPrompt field and for model responses in the strength field of the textToTextFiltersForResponse. ? (Optional) Attach any tags to the guardrail. For more information, see Tagging Amazon Bedrock resources. ? (Optional) For security, include the ARN of a KMS key in the kmsKeyId field. The response format is as follows: HTTP/1.1 202 Content-type: application/json { ""createdAt"": ""string"", ""guardrailArn"": ""string"", ""guardrailId"": ""string"", ""version"": ""string"" } Create a guardrail 480 https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Topic.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ContentFilter.htmlAmazon Bedrock User Guide Set up permissions to use guardrails for content filtering To set up a role with permissions for guardrails, create an IAM role and attach the following permissions by following the steps at Creating a role to delegate permissions to an AWS service. If you are using guardrails with an agent, attach the permissions to a service role with permissions to create and manage agents. You can set up this role in the console or create a custom role by following the steps at Create a service role for Amazon Bedrock Agents. ? Permissions to invoke guardrails with foundation models ? Permissions to create and manage guardrails ? (Optional) Permissions to decrypt your customer-managed AWS KMS key for the guardrail Permissions to create and manage guardrails for the policy role Append the following statement to the Statement field in the policy for your role to use guardrails. { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Sid"": ""CreateAndManageGuardrails"", ""Effect"": ""Allow"", ""Action"": [ ""bedrock:CreateGuardrail"", ""bedrock:CreateGuardrailVersion"", ""bedrock:DeleteGuardrail"", ""bedrock:GetGuardrail"", ""bedrock:ListGuardrails"", ""bedrock:UpdateGuardrail"" ], ""Resource"": ""*"" } ] } Permissions for Amazon Bedrock Guardrails 481 https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.htmlAmazon Bedrock User Guide Permissions you need to invoke guardrails to filter content Append the following statement to the Statement field in the policy for the role to allow for model inference and to invoke guardrails. { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Sid"": ""InvokeFoundationModel"", ""Effect"": ""Allow"", ""Action"": [ ""bedrock:InvokeModel"", ""bedrock:InvokeModelWithResponseStream"" ], ""Resource"": [ ""arn:aws:bedrock:region::foundation-model/*"" ] }, { ""Sid"": ""ApplyGuardrail"", ""Effect"": ""Allow"", ""Action"": [ ""bedrock:ApplyGuardrail"" ], ""Resource"": [ ""arn:aws:bedrock:region:account-id:guardrail/guardrail-id"" ] } ] } (Optional) Create a customer managed key for your guardrail for additional security Any user with CreateKey permissions can create customer managed keys using either the AWS Key Management Service (AWS KMS) console or the CreateKey operation. Make sure to create a symmetric encryption key. After you create your key, set up the following permissions. Permissions you need to invoke guardrails to filter content 482 https://docs.aws.amazon.com/kms/latest/APIReference/API_CreateKey.htmlAmazon Bedrock User Guide 1. Follow the steps at Creating a key policy to create a resource-based policy for your KMS key. Add the following policy statements to grant permissions to guardrails users and guardrails creators. Replace each role with the role that you want to allow to carry out the specified actions. { ""Version"": ""2012-10-17"", ""Id"": ""KMS Key Policy"", ""Statement"": [ { ""Sid"": ""PermissionsForGuardrailsCreators"", ""Effect"": ""Allow"", ""Principal"": { ""AWS"": ""arn:aws:iam::account-id:user/role"" }, ""Action"": [ ""kms:Decrypt"", ""kms:GenerateDataKey"", ""kms:DescribeKey"", ""kms:CreateGrant"" ], ""Resource"": ""*"" }, { ""Sid"": ""PermissionsForGuardrailsUusers"", ""Effect"": ""Allow"", ""Principal"": { ""AWS"": ""arn:aws:iam::account-id:user/role"" }, ""Action"": ""kms:Decrypt"", ""Resource"": ""*"" } } 2. Attach the following identity-based policy to a role to allow it to create and manage guardrails., API To create a guardrail, send a CreateGuardrail request. The request format is as follows: POST /guardrails HTTP/1.1 Content-type: application/json { ""blockedInputMessaging"": ""string"", ""blockedOutputsMessaging"": ""string"", ""contentPolicyConfig"": { ""filtersConfig"": [ { ""inputStrength"": ""NONE | LOW | MEDIUM | HIGH"", ""outputStrength"": ""NONE | LOW | MEDIUM | HIGH"", ""type"": ""SEXUAL | VIOLENCE | HATE | INSULTS | MISCONDUCT | PROMPT_ATTACK"" } ] }, ""wordPolicyConfig"": { ""wordsConfig"": [ { ""text"": ""string"" } ], Create a guardrail 478 https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateGuardrail.htmlAmazon Bedrock User Guide ""managedWordListsConfig"": [ { ""type"": ""string"" } ] }, ""sensitiveInformationPolicyConfig"": { ""piiEntitiesConfig"": [ { ""type"": ""string"", ""action"": ""string"" } ], ""regexesConfig"": [ { ""name"": ""string"", ""description"": ""string"", ""regex"": ""string"", ""action"": ""string"" } ] }, ""description"": ""string"", ""kmsKeyId"": ""string"", ""name"": ""string"", ""tags"": [ { ""key"": ""string"", ""value"": ""string"" } ], ""topicPolicyConfig"": { ""topicsConfig"": [ { ""definition"": ""string"", ""examples"": [ ""string"" ], ""name"": ""string"", ""type"": ""DENY"" } ] } } Create a guardrail 479Amazon Bedrock User Guide ? Specify a name and description for the guardrail. ? Specify messages for when the guardrail successfully blocks a prompt or a model response in the blockedInputMessaging and blockedOutputsMessaging fields. ? Specify topics for the guardrail to deny in the topicPolicy object. Each item in the topics list pertains to one topic. For more information about the fields in a topic, see Topic. ? Give a name and description so that the guardrail can properly identify the topic. ? Specify DENY in the action field. ? (Optional) Provide up to five examples that you would categorize as belonging to the topic in the examples list. ? Specify filter strengths for the harmful categories defined in Amazon Bedrock in the contentPolicy object. Each item in the filters list pertains to a harmful category. For more information, see Block harmful words and conversations with content filters. For more information about the fields in a content filter, see ContentFilter. ? Specify the category in the type field. ? Specify the strength of the filter for prompts in the strength field of the textToTextFiltersForPrompt field and for model responses in the strength field of the textToTextFiltersForResponse. ? (Optional) Attach any tags to the guardrail. For more information, see Tagging Amazon Bedrock resources. ? (Optional) For security, include the ARN of a KMS key in the kmsKeyId field. The response format is as follows: HTTP/1.1 202 Content-type: application/json { ""createdAt"": ""string"", ""guardrailArn"": ""string"", ""guardrailId"": ""string"", ""version"": ""string"" } Create a guardrail 480 https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Topic.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ContentFilter.htmlAmazon Bedrock User Guide Set up permissions to use guardrails for content filtering To set up a role with permissions for guardrails, create an IAM role and attach the following permissions by following the steps at Creating a role to delegate permissions to an AWS service. If you are using guardrails with an agent, attach the permissions to a service role with permissions to create and manage agents. You can set up this role in the console or create a custom role by following the steps at Create a service role for Amazon Bedrock Agents. ? Permissions to invoke guardrails with foundation models ? Permissions to create and manage guardrails ? (Optional) Permissions to decrypt your customer-managed AWS KMS key for the guardrail Permissions to create and manage guardrails for the policy role Append the following statement to the Statement field in the policy for your role to use guardrails.]","In Amazon Bedrock guardrails, you can configure the following policies: 1. Content filters: You can specify filter strengths (none, low, medium, high) for harmful categories like sexual content, violence, hate speech, insults, and misconduct. 2. Word filters: You can specify lists of words or phrases to block or allow. 3. Sensitive information filters: You can specify types of personally identifiable information (PII) like names, addresses, etc. to block or mask, as well as custom regular expressions to match sensitive information. 4. Topic filters: You can specify topics to deny, including a definition, examples, and name for each topic.","You can configure the following policies in a guardrail to avoid undesirable and harmful content and remove sensitive information for privacy protection. Content filters – Adjust filter strengths to block input prompts or model responses containing harmful content. Denied topics – Define a set of topics that are undesirable in the context of your application. These topics will be blocked if detected in user queries or model responses. Word filters – Configure filters to block undesirable words, phrases, and profanity. Such words can include offensive terms, competitor names etc. Sensitive information filters – Block or mask sensitive information such as personally identifiable information (PII) or custom regex in user inputs and model responses. Contextual grounding check – Detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query.",0.98129
4,Which built in datasets are available on Amazon Bedrock for model evaluation of text generation?,"[Applications that use text classification include content recommendation, spam detection, language identification and trend analysis on social media. Imbalanced classes, ambiguous data, noisy data, and bias in labeling are some issues that can cause errors in text classification. Model evaluation task types 555 https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/Amazon Bedrock User Guide Important For text classification, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully. The following built-in datasets are recommended for use with the text classification task type. Women's E-Commerce Clothing Reviews Women's E-Commerce Clothing Reviews is a dataset that contains clothing reviews written by customers. This dataset is used in text classification tasks. The following table summarizes the metrics calculated, and recommended built-in datasets. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, Built-in datasets (API). Available built-in datasets in Amazon Bedrock Task type Metric Built-in datasets (console) Built- in datasets (API) Computed metric Accuracy Women's Ecommerce Clothing Reviews Builtin.W omensEcom merceClot hingBoolQ Accuracy (Binary Accuracy from class ification_accuracy_score) Text classific ation Robustnes s Women's Ecommerce Clothing Reviews Builtin.W omensEcom merceClot hingBoolQ classification_accuracy_score and delta_ classification_accuracy_score Model evaluation task types 556 https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviewsAmazon Bedrock User Guide To learn more about how the computed metric for each built-in dataset is calculated, see Review model evaluation job reports and metrics in Amazon Bedrock Use prompt datasets for model evaluation in Amazon Bedrock To create a model evaluation job you must specify a prompt dataset the model uses during inference. Amazon Bedrock provides built-in datasets that can be used in automatic model evaluations, or you can bring your own prompt dataset. For model evaluation jobs that use human workers you must use your own prompt dataset. Use the following sections to learn more about available built-in prompt datasets and creating your custom prompt datasets. To learn more about creating your first model evaluation job in Amazon Bedrock, see Choose the best performing model using Amazon Bedrock evaluations. Topics ? Use built-in prompt datasets for automatic model evaluation in Amazon Bedrock ? Use custom prompt dataset for model evaluation in Amazon Bedrock Use built-in prompt datasets for automatic model evaluation in Amazon Bedrock Amazon Bedrock provides multiple built-in prompt datasets that you can use in an automatic model evaluation job. Each built-in dataset is based off an open-source dataset. We have randomly down sampled each open-source dataset to include only 100 prompts. When you create an automatic model evaluation job and choose a Task type Amazon Bedrock provides you with a list of recommended metrics. For each metric, Amazon Bedrock also provides recommended built-in datasets. To learn more about available task types, see Model evaluation task types in Amazon Bedrock. Bias in Open-ended Language Generation Dataset (BOLD) The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts. Prompt datasets for model evaluation 557Amazon Bedrock User Guide RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity., row=3Amazon Bedrock User Guide Task type MetricBuilt-in datasets Computed metric Robustnes s Gigaword BERTScore and deltaBERTScore BoolQ NaturalQu estions Accuracy TriviaQA NLP-F1 BoolQ NaturalQu estions Robustnes s TriviaQA F1 and deltaF1 BoolQ NaturalQu estions Question and answer Toxicity TriviaQA Toxicity AccuracyWomen's Ecommerce Clothing Reviews Accuracy (Binary accuracy from classification_accuracy_s core) Text classific ation Robustnes s Women's Ecommerce Clothing Reviews classification_accuracy_score and delta_classifica tion_accuracy_score Topics ? General text generation for model evaluation in Amazon Bedrock Model evaluation task types 550 https://huggingface.co/datasets/gigaword?row=3 https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviewsAmazon Bedrock User Guide ? Text summarization for model evaluation in Amazon Bedrock ? Question and answer for model evaluation in Amazon Bedrock ? Text classification for model evaluation in Amazon Bedrock General text generation for model evaluation in Amazon Bedrock General text generation is a task used by applications that include chatbots. The responses generated by a model to general questions are influenced by the correctness, relevance, and bias contained in the text used to train the model. Important For general text generation, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully. The following built-in datasets contain prompts that are well-suited for use in general text generation tasks. Bias in Open-ended Language Generation Dataset (BOLD) The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts. RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts. T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX) TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is ""George Washington was the president of the United States"". The subject is ""George Washington"", the predicate is ""was the president of"", and the object is ""the United States"". Model evaluation task types 551Amazon Bedrock User Guide WikiText2 WikiText2 is a HuggingFace dataset that contains prompts used in general text generation. The following table summarizes the metrics calculated, and recommended built-in dataset that are available for automatic model evaluation jobs., RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts. T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX) TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is ""George Washington was the president of the United States"". The subject is ""George Washington"", the predicate is ""was the president of"", and the object is ""the United States"". Model evaluation task types 551Amazon Bedrock User Guide WikiText2 WikiText2 is a HuggingFace dataset that contains prompts used in general text generation. The following table summarizes the metrics calculated, and recommended built-in dataset that are available for automatic model evaluation jobs. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, Built- in datasets (API). Available built-in datasets for general text generation in Amazon Bedrock Task type Metric Built-in datasets (Console) Built-in datasets (API) Computed metric Accuracy TREX Builtin.T-REx Real world knowledge (RWK) score BOLD Builtin.BOLD WikiText2 Builtin.W ikiText2 Robustnes s TREX Builtin.T-REx Word error rate RealToxicityPrompts Builtin.R ealToxici tyPrompts General text generation Toxicity BOLD Builtin.Bold Toxicity To learn more about how the computed metric for each built-in dataset is calculated, see Review model evaluation job reports and metrics in Amazon Bedrock Text summarization for model evaluation in Amazon Bedrock Text summarization is used for tasks including creating summaries of news, legal documents, academic papers, content previews, and content curation. The ambiguity, coherence, bias, and Model evaluation task types 552 https://hadyelsahar.github.io/t-rex/ https://github.com/amazon-science/bold https://huggingface.co/datasets/wikitext https://hadyelsahar.github.io/t-rex/ https://github.com/allenai/real-toxicity-prompts https://github.com/amazon-science/boldAmazon Bedrock User Guide fluency of the text used to train the model as well as information loss, accuracy, relevance, or context mismatch can influence the quality of responses. Important For text summarization, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully. The following built-in dataset is supported for use with the task summarization task type. Gigaword The Gigaword dataset consists of news article headlines. This dataset is used in text summarization tasks. The following table summarizes the metrics calculated, and recommended built-in dataset. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, Built-in datasets (API). Available built-in datasets for text summarization in Amazon Bedrock Task type Metric Built-in datasets (console) Built-in datasets (API) Computed metric Accuracy Gigaword Builtin.Gigaword BERTScore Toxicity Gigaword Builtin.Gigaword Toxicity Text summariza tion Robustnes s Gigaword Builtin.Gigaword BERTScore and deltaBERT Score To learn more about how the computed metric for each built-in dataset is calculated, see Review model evaluation job reports and metrics in Amazon Bedrock Model evaluation task types 553 https://huggingface.co/datasets/gigaword?row=3 https://huggingface.co/datasets/gigaword?row=3 https://huggingface.co/datasets/gigaword?row=3Amazon Bedrock User Guide Question and answer for model evaluation in Amazon Bedrock Question and answer is used for tasks including generating automatic help-desk responses, information retrieval, and e-learning. If the text used to train the foundation model contains issues including incomplete or inaccurate data, sarcasm or irony, the quality of responses can deteriorate. Important For question and answer, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully. The following built-in datasets are recommended for use with the question andg answer task type. BoolQ BoolQ is a dataset consisting of yes/no question and answer pairs.]","The following built-in datasets are available on Amazon Bedrock for model evaluation of text generation: - Bias in Open-ended Language Generation Dataset (BOLD): Evaluates fairness in general text generation across domains like profession, gender, race, religious ideologies, and political ideologies. - RealToxicityPrompts: Evaluates toxicity by attempting to get the model to generate racist, sexist, or otherwise toxic language. - T-Rex: A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX): Consists of Knowledge Base Triples extracted from Wikipedia to evaluate real-world knowledge. - WikiText2: Contains prompts used in general text generation.","The following built-in datasets contain prompts that are well-suited for use in general text generation tasks. Bias in Open-ended Language Generation Dataset (BOLD) The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts. RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts. T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX) TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is ""George Washington was the president of the United States"". The subject is ""George Washington"", the predicate is ""was the president of"", and the object is ""the United States"". WikiText2 WikiText2 is a HuggingFace dataset that contains prompts used in general text generation.",0.958349


In [47]:
#specify the metrics here, kept one for now, we can add more.
from ragas import evaluate
from ragas.metrics import (
    answer_correctness
)

metrics_ac = [
        answer_correctness
    ]

result_ac = evaluate(
    dataset = dataset, 
    metrics=metrics_ac,
    llm=llm_for_evaluation,
    embeddings=bedrock_embeddings,
    raise_exceptions=False
)

ragas_df_ac = result_ac.to_pandas()

Evaluating:   0%|          | 0/5 [00:00<?, ?it/s]

In [48]:
ragas_df_ac.style.set_properties(**{'text-align': 'left', 'border': '1px solid black'})
ragas_df_ac.to_string(justify='left', index=False)
with pd.option_context("display.max_colwidth", None):
    pp.pprint(ragas_df_ac)

Unnamed: 0,user_input,retrieved_contexts,response,reference,answer_correctness
0,Are all models accessible on Amazon Bedrock by default?,"[The following table lists product IDs for Amazon Bedrock foundation models: The following is the format of the IAM policy you can attach to a role to control model access permissions: Model Product ID AI21 Labs Jurassic-2 Mid 1d288c71-65f9-489a-a3e2-9c7f4f6e6a85 AI21 Labs Jurassic-2 Ultra cc0bdd50-279a-40d8-829c-4009b77a1fcc AI21 Jamba-Instruct prod-dr2vpvd4k73aq AI21 Labs Jamba 1.5 Large prod-evcp4w4lurj26 AI21 Labs Jamba 1.5 Mini prod-ggrzjm65qmjhm Anthropic Claude c468b48a-84df-43a4-8c46-8870630108a7 Anthropic Claude Instant b0eb9475-3a2c-43d1-94d3-56756fd43737 Anthropic Claude 3 Sonnet prod-6dw3qvchef7zy Anthropic Claude 3.5 Sonnet prod-m5ilt4siql27k Anthropic Claude 3.5 Sonnet v2 prod-cx7ovbu5wex7g Anthropic Claude 3 Haiku prod-ozonys2hmmpeu Anthropic Claude 3.5 Haiku prod-5oba7y7jpji56 Anthropic Claude 3 Opus prod-fm3feywmwerog Grant permissions to request access to foundation models 27Amazon Bedrock User Guide Model Product ID Cohere Command a61c46fe-1747-41aa-9af0-2e0ae8a9ce05 Cohere Command Light 216b69fd-07d5-4c7b-866b-936456d68311 Cohere Command R prod-tukx4z3hrewle Cohere Command R+ prod-nb4wqmplze2pm Cohere Embed (English) b7568428-a1ab-46d8-bab3-37def50f6f6a Cohere Embed (Multilingual) 38e55671-c3fe-4a44-9783-3584906e7cad Stable Diffusion XL 1.0 prod-2lvuzn4iy6n6o Stable Image Core 1.0 prod-eacdrmv7zfc5e Stable Diffusion 3 Large 1.0 prod-cqfmszl26sxu4 Stable Image Ultra 1.0 prod-7boen2z2wnxrg { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Effect"": ""Allow|Deny"", ""Action"": [ ""aws-marketplace:Subscribe"" ], ""Resource"": ""*"", ""Condition"": { ""ForAnyValue:StringEquals"": { ""aws-marketplace:ProductId"": [ model-product-id-1, model-product-id-2, ... ] } } }, Grant permissions to request access to foundation models 28Amazon Bedrock User Guide { ""Effect"": ""Allow|Deny"", ""Action"": [ ""aws-marketplace:Unsubscribe"" ""aws-marketplace:ViewSubscriptions"" ], ""Resource"": ""*"" } ] } To see an example policy, refer to Allow access to third-party model subscriptions. Add or remove access to Amazon Bedrock foundation models Before you can use a foundation model in Amazon Bedrock, you must request access to it. If you no longer need access to a model, you can remove access from it. Note You can't remove request access from the Amazon Titan, Mistral AI, and Meta Llama 3 Instruct models. You can prevent users from making inference calls to these models by using an IAM policy and specifying the model ID. For more information, see Deny access for inference on specific models. Once access is provided to a model, it is available for all users in the AWS account. To add or remove access to foundation models 1. Make sure you have permissions to request access, or modify access, to Amazon Bedrock foundation models. 2. Sign into the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/. 3. In the left navigation pane, under Bedrock configurations, choose Model access. 4. On the Model access page, choose Modify model access. 5. Select the models that you want the account to have access to and unselect the models that you don't want the account to have access to.]","No, not all models are accessible on Amazon Bedrock by default. Before you can use a foundation model in Amazon Bedrock, you must request access to it. The documentation states: ""Before you can use a foundation model in Amazon Bedrock, you must request access to it. If you no longer need access to a model, you can remove access from it."" However, the documentation also notes that you cannot remove access from certain models like Amazon Titan, Mistral AI, and Meta Llama 3 Instruct models. For these models, you can prevent users from making inference calls by using an IAM policy and specifying the model ID.","Access to Amazon Bedrock foundation models isn't granted by default. You can request access, or modify access, to foundation models only by using the Amazon Bedrock console. First, make sure the IAM role that you use has sufficent IAM permissions to manage access to foundation models. Then, add or remove access to a model by following the instructions at Add or remove access to Amazon Bedrock foundation models.",0.551098
1,What is the Model ID of Amazon Titan Text Premier,"[Separate keys with a space. aws bedrock-agent untag-resource \ --resource-arn ""arn:aws:bedrock:us-east-1:123456789012:agent/AGENT12345"" \ --tag-keys key=department facing List the tags for the agent. aws bedrock-agent list-tags-for-resource \ --resource-arn ""arn:aws:bedrock:us-east-1:123456789012:agent/AGENT12345"" Python (Boto) Add two tags to an agent. import boto3 bedrock = boto3.client(service_name='bedrock-agent') tags = [ { 'key': 'department', 'value': 'billing' }, { Use the API 1417 https://docs.aws.amazon.com/bedrock/latest/APIReference/API_TagResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_TagResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_UntagResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_UntagResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListTagsForResource.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ListTagsForResource.htmlAmazon Bedrock User Guide 'key': 'facing', 'value': 'internal' } ] bedrock.tag_resource(resourceArn='arn:aws:bedrock:us-east-1:123456789012:agent/ AGENT12345', tags=tags) Remove the tags from the agent. bedrock.untag_resource( resourceArn='arn:aws:bedrock:us-east-1:123456789012:agent/AGENT12345', tagKeys=['department', 'facing'] ) List the tags for the agent. bedrock.list_tags_for_resource(resourceArn='arn:aws:bedrock:us- east-1:123456789012:agent/AGENT12345') Use the API 1418Amazon Bedrock User Guide Overview of Amazon Titan models Amazon Titan foundation models (FMs) are a family of FMs pretrained by AWS on large datasets, making them powerful, general-purpose models built to support a variety of use cases. Use them as-is or privately customize them with your own data. Amazon Titan supports the following models for Amazon Bedrock. ? Amazon Titan Text ? Amazon Titan Text Embeddings V2 ? Amazon Titan Multimodal Embeddings G1 ? Amazon Titan Image Generator G1 V1 Topics ? Amazon Titan Text models ? Amazon Titan Text Embeddings models ? Amazon Titan Multimodal Embeddings G1 model ? Amazon Titan Image Generator G1 models Amazon Titan Text models Amazon Titan text models include Amazon Titan Text G1 - Premier, Amazon Titan Text G1 - Express and Amazon Titan Text G1 - Lite. Amazon Titan Text G1 - Premier Amazon Titan Text G1 - Premier is a large language model for text generation. It is useful for a wide range of tasks including open-ended and context-based question answering, code generation, and summarization. This model is integrated with Amazon Bedrock Knowledge Base and Amazon Bedrock Agents. The model also supports Custom Finetuning in preview. ? Model ID ? amazon.titan-text-premier-v1:0 ? Max tokens ? 32K ? Languages ? English Amazon Titan Text 1419Amazon Bedrock User Guide ? Supported use cases ? 32k context window, open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, chat, Knowledge Base support, Agents support, Model Customization (preview). ? Inference parameters ? Temperature, Top P (defaults: Temperature = 0.7, Top P = 0.9) AWS AI Service Card - Amazon Titan Text Premier Amazon Titan Text G1 - Express Amazon Titan Text G1 - Express is a large language model for text generation. It is useful for a wide range of advanced, general language tasks such as open-ended text generation and conversational chat, as well as support within Retrieval Augmented Generation (RAG). At launch, the model is optimized for English, with multilingual support for more than 30 additional languages available in preview. ? Model ID ? amazon.titan-text-express-v1 ? Max tokens ? 8K ? Languages ? English (GA), 100 additional languages (Preview) ? Supported use cases ? Retrieval augmented generation, open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, and chat., Model ID ? amazon.titan-text-premier-v1:0 ? Max tokens ? 32K ? Languages ? English Amazon Titan Text 1419Amazon Bedrock User Guide ? Supported use cases ? 32k context window, open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, chat, Knowledge Base support, Agents support, Model Customization (preview). ? Inference parameters ? Temperature, Top P (defaults: Temperature = 0.7, Top P = 0.9) AWS AI Service Card - Amazon Titan Text Premier Amazon Titan Text G1 - Express Amazon Titan Text G1 - Express is a large language model for text generation. It is useful for a wide range of advanced, general language tasks such as open-ended text generation and conversational chat, as well as support within Retrieval Augmented Generation (RAG). At launch, the model is optimized for English, with multilingual support for more than 30 additional languages available in preview. ? Model ID ? amazon.titan-text-express-v1 ? Max tokens ? 8K ? Languages ? English (GA), 100 additional languages (Preview) ? Supported use cases ? Retrieval augmented generation, open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, and chat. Amazon Titan Text G1 - Lite Amazon Titan Text G1 - Lite is a light weight efficient model, ideal for fine-tuning of English- language tasks, including like summarizations and copy writing, where customers want a smaller, more cost-effective model that is also highly customizable. ? Model ID ? amazon.titan-text-lite-v1 ? Max tokens ? 4K ? Languages ? English ? Supported use cases ? Open-ended text generation, brainstorming, summarizations, code generation, table creation, data formatting, paraphrasing, chain of thought, rewrite, extraction, QnA, and chat. Amazon Titan Text G1 - Express 1420 https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text-premier/Amazon Bedrock User Guide Amazon Titan Text Model Customization For more information on customizing Amazon Titan text models, see the following pages. ? Prepare the datasets ? Amazon Titan text model customization hyperparameters Amazon Titan Text Prompt Engineering Guidelines Amazon Titan text models can be used in a wide variety of applications for different use cases. Amazon Titan Text models have prompt engineering guidelines for the following applications including: ? Chatbot ? Text2SQL ? Function Calling ? RAG (Retrieval Augmented Generation) For more information on Amazon Titan Text prompt engineering guidelines, see Amazon Titan Text Prompt Engineering Guidelines. For general prompt engineering guidelines, see Prompt Engineering Guidelines. AWS AI Service Card - Amazon Titan Text AI Service Cards provide transparency and document the intended use cases and fairness considerations for our AWS AI services. AI Service Cards provide a single place to find information on the intended use cases, responsible AI design choices, best practices, and performance for a set of AI service use cases. Amazon Titan Text Embeddings models Amazon Titan Embeddings text models include Amazon Titan Text Embeddings v2 and Titan Text Embeddings G1 model. Text embeddings represent meaningful vector representations of unstructured text such as documents, paragraphs, and sentences. You input a body of text and the output is a (1 x n) vector. You can use embedding vectors for a wide variety of applications. Amazon Titan Text Model Customization 1421 https://d2eo22ngex1n9g.cloudfront.net/Documentation/User+Guides/Titan/Amazon+Titan+Text+Prompt+Engineering+Guidelines.pdf https://d2eo22ngex1n9g.cloudfront.net/Documentation/User+Guides/Titan/Amazon+Titan+Text+Prompt+Engineering+Guidelines.pdf https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/Amazon Bedrock User Guide The Amazon Titan Text Embedding v2 model (amazon.titan-embed-text-v2:0) can intake up to 8,192 tokens and outputs a vector of 1,024 dimensions. The model also works in 100+ different languages. The model is optimized for text retrieval tasks, but can also perform additional tasks, such as semantic similarity and clustering. Amazon Titan Embeddings text v2 also supports long documents, however, for retrieval tasks it is recommended to segment documents into logical segments , such as paragraphs or sections. Amazon Titan Embeddings models generate meaningful semantic representation of documents, paragraphs and sentences.]",The Model ID of Amazon Titan Text Premier is amazon.titan-text-premier-v1:0.,amazon.titan-text-premier-v1:0,0.164179
2,With which Anthropic Claude models can I use the Text Completions API?,"[Anthropic Claude 3 model, such as Anthropic Claude 3 Opus For information about creating system prompts, see https://docs.anthropic.com/claude/ docs/how-to-use-system-prompts in the Anthropic Claude documentation. To avoid timeouts with Anthropic Claude version 2.1, we recommend limiting the input token count in the prompt field to 180K. We expect to address this timeout issue soon. In the inference call, fill the body field with a JSON object that conforms the type call you want to make, Anthropic Claude Text Completions API or Anthropic Claude Messages API. Topics ? Anthropic Claude Text Completions API ? Anthropic Claude Messages API Anthropic Claude models 147 https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags https://docs.anthropic.com/en/docs/welcome https://docs.anthropic.com/claude/docs/how-to-use-system-prompts https://docs.anthropic.com/claude/docs/how-to-use-system-promptsAmazon Bedrock User Guide Anthropic Claude Text Completions API This section provides inference parameters and code examples for using Anthropic Claude models with the Text Completions API. Topics ? Anthropic Claude Text Completions API overview ? Supported models ? Request and Response ? Code example Anthropic Claude Text Completions API overview Use the Text Completion API for single-turn text generation from a user supplied prompt. For example, you can use the Text Completion API to generate text for a blog post or to summarize text input from a user. For information about creating prompts for Anthropic Claude models, see Introduction to prompt design. If you want to use your existing Text Completions prompts with the Anthropic Claude Messages API, see Migrating from Text Completions. Supported models You can use the Text Completions API with the following Anthropic Claude models. ? Anthropic Claude Instant v1.2 ? Anthropic Claude v2 ? Anthropic Claude v2.1 Request and Response The request body is passed in the body field of a request to InvokeModel or InvokeModelWithResponseStream. For more information, see https://docs.anthropic.com/claude/reference/complete_post in the Anthropic Claude documentation. Anthropic Claude models 148 https://docs.anthropic.com/claude/docs/introduction-to-prompt-design https://docs.anthropic.com/claude/docs/introduction-to-prompt-design https://docs.anthropic.com/claude/reference/migrating-from-text-completions-to-messages https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html https://docs.anthropic.com/claude/reference/complete_postAmazon Bedrock User Guide Request Anthropic Claude has the following inference parameters for a Text Completion inference call. { ""prompt"": "" Human:<prompt> Assistant:"", ""temperature"": float, ""top_p"": float, ""top_k"": int, ""max_tokens_to_sample"": int, ""stop_sequences"": [string] } The following are required parameters. ? prompt ? (Required) The prompt that you want Claude to complete. For proper response generation you need to format your prompt using alternating Human: and Assistant: conversational turns. For example: "" Human: {userQuestion} Assistant:"" For more information, see Prompt validation in the Anthropic Claude documentation. ? max_tokens_to_sample ? (Required) The maximum number of tokens to generate before stopping. We recommend a limit of 4,000 tokens for optimal performance. Note that Anthropic Claude models might stop generating tokens before reaching the value of max_tokens_to_sample. Different Anthropic Claude models have different maximum values for this parameter. For more information, see Model comparison in the Anthropic Claude documentation. Default Minimum Maximum 200 0 4096 The following are optional parameters. ? stop_sequences ? (Optional) Sequences that will cause the model to stop generating. Anthropic Claude models 149 https://docs.anthropic.com/claude/reference/prompt-validation https://docs.anthropic.com/claude/docs/models-overview#model-comparisonAmazon Bedrock User Guide Anthropic Claude models stop on "" Human:"", and may include additional built-in stop sequences in the future. Use the stop_sequences inference parameter to include additional strings that will signal the model to stop generating text. ? temperature ?]",You can use the Text Completions API with the following Anthropic Claude models: - Anthropic Claude Instant v1.2 - Anthropic Claude v2 - Anthropic Claude v2.1,"Anthropic Claude Instant v1.2, Anthropic Claude v2, Anthropic Claude v2.1",0.887015
3,What policies can I configure in Amazon Bedrock guardrails?,"[For more information about the fields in a content filter, see ContentFilter. ? Specify the category in the type field. ? Specify the strength of the filter for prompts in the strength field of the textToTextFiltersForPrompt field and for model responses in the strength field of the textToTextFiltersForResponse. ? (Optional) Attach any tags to the guardrail. For more information, see Tagging Amazon Bedrock resources. ? (Optional) For security, include the ARN of a KMS key in the kmsKeyId field. The response format is as follows: HTTP/1.1 202 Content-type: application/json { ""createdAt"": ""string"", ""guardrailArn"": ""string"", ""guardrailId"": ""string"", ""version"": ""string"" } Create a guardrail 480 https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Topic.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ContentFilter.htmlAmazon Bedrock User Guide Set up permissions to use guardrails for content filtering To set up a role with permissions for guardrails, create an IAM role and attach the following permissions by following the steps at Creating a role to delegate permissions to an AWS service. If you are using guardrails with an agent, attach the permissions to a service role with permissions to create and manage agents. You can set up this role in the console or create a custom role by following the steps at Create a service role for Amazon Bedrock Agents. ? Permissions to invoke guardrails with foundation models ? Permissions to create and manage guardrails ? (Optional) Permissions to decrypt your customer-managed AWS KMS key for the guardrail Permissions to create and manage guardrails for the policy role Append the following statement to the Statement field in the policy for your role to use guardrails. { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Sid"": ""CreateAndManageGuardrails"", ""Effect"": ""Allow"", ""Action"": [ ""bedrock:CreateGuardrail"", ""bedrock:CreateGuardrailVersion"", ""bedrock:DeleteGuardrail"", ""bedrock:GetGuardrail"", ""bedrock:ListGuardrails"", ""bedrock:UpdateGuardrail"" ], ""Resource"": ""*"" } ] } Permissions for Amazon Bedrock Guardrails 481 https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.htmlAmazon Bedrock User Guide Permissions you need to invoke guardrails to filter content Append the following statement to the Statement field in the policy for the role to allow for model inference and to invoke guardrails. { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Sid"": ""InvokeFoundationModel"", ""Effect"": ""Allow"", ""Action"": [ ""bedrock:InvokeModel"", ""bedrock:InvokeModelWithResponseStream"" ], ""Resource"": [ ""arn:aws:bedrock:region::foundation-model/*"" ] }, { ""Sid"": ""ApplyGuardrail"", ""Effect"": ""Allow"", ""Action"": [ ""bedrock:ApplyGuardrail"" ], ""Resource"": [ ""arn:aws:bedrock:region:account-id:guardrail/guardrail-id"" ] } ] } (Optional) Create a customer managed key for your guardrail for additional security Any user with CreateKey permissions can create customer managed keys using either the AWS Key Management Service (AWS KMS) console or the CreateKey operation. Make sure to create a symmetric encryption key. After you create your key, set up the following permissions. Permissions you need to invoke guardrails to filter content 482 https://docs.aws.amazon.com/kms/latest/APIReference/API_CreateKey.htmlAmazon Bedrock User Guide 1. Follow the steps at Creating a key policy to create a resource-based policy for your KMS key. Add the following policy statements to grant permissions to guardrails users and guardrails creators. Replace each role with the role that you want to allow to carry out the specified actions. { ""Version"": ""2012-10-17"", ""Id"": ""KMS Key Policy"", ""Statement"": [ { ""Sid"": ""PermissionsForGuardrailsCreators"", ""Effect"": ""Allow"", ""Principal"": { ""AWS"": ""arn:aws:iam::account-id:user/role"" }, ""Action"": [ ""kms:Decrypt"", ""kms:GenerateDataKey"", ""kms:DescribeKey"", ""kms:CreateGrant"" ], ""Resource"": ""*"" }, { ""Sid"": ""PermissionsForGuardrailsUusers"", ""Effect"": ""Allow"", ""Principal"": { ""AWS"": ""arn:aws:iam::account-id:user/role"" }, ""Action"": ""kms:Decrypt"", ""Resource"": ""*"" } } 2. Attach the following identity-based policy to a role to allow it to create and manage guardrails., API To create a guardrail, send a CreateGuardrail request. The request format is as follows: POST /guardrails HTTP/1.1 Content-type: application/json { ""blockedInputMessaging"": ""string"", ""blockedOutputsMessaging"": ""string"", ""contentPolicyConfig"": { ""filtersConfig"": [ { ""inputStrength"": ""NONE | LOW | MEDIUM | HIGH"", ""outputStrength"": ""NONE | LOW | MEDIUM | HIGH"", ""type"": ""SEXUAL | VIOLENCE | HATE | INSULTS | MISCONDUCT | PROMPT_ATTACK"" } ] }, ""wordPolicyConfig"": { ""wordsConfig"": [ { ""text"": ""string"" } ], Create a guardrail 478 https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateGuardrail.htmlAmazon Bedrock User Guide ""managedWordListsConfig"": [ { ""type"": ""string"" } ] }, ""sensitiveInformationPolicyConfig"": { ""piiEntitiesConfig"": [ { ""type"": ""string"", ""action"": ""string"" } ], ""regexesConfig"": [ { ""name"": ""string"", ""description"": ""string"", ""regex"": ""string"", ""action"": ""string"" } ] }, ""description"": ""string"", ""kmsKeyId"": ""string"", ""name"": ""string"", ""tags"": [ { ""key"": ""string"", ""value"": ""string"" } ], ""topicPolicyConfig"": { ""topicsConfig"": [ { ""definition"": ""string"", ""examples"": [ ""string"" ], ""name"": ""string"", ""type"": ""DENY"" } ] } } Create a guardrail 479Amazon Bedrock User Guide ? Specify a name and description for the guardrail. ? Specify messages for when the guardrail successfully blocks a prompt or a model response in the blockedInputMessaging and blockedOutputsMessaging fields. ? Specify topics for the guardrail to deny in the topicPolicy object. Each item in the topics list pertains to one topic. For more information about the fields in a topic, see Topic. ? Give a name and description so that the guardrail can properly identify the topic. ? Specify DENY in the action field. ? (Optional) Provide up to five examples that you would categorize as belonging to the topic in the examples list. ? Specify filter strengths for the harmful categories defined in Amazon Bedrock in the contentPolicy object. Each item in the filters list pertains to a harmful category. For more information, see Block harmful words and conversations with content filters. For more information about the fields in a content filter, see ContentFilter. ? Specify the category in the type field. ? Specify the strength of the filter for prompts in the strength field of the textToTextFiltersForPrompt field and for model responses in the strength field of the textToTextFiltersForResponse. ? (Optional) Attach any tags to the guardrail. For more information, see Tagging Amazon Bedrock resources. ? (Optional) For security, include the ARN of a KMS key in the kmsKeyId field. The response format is as follows: HTTP/1.1 202 Content-type: application/json { ""createdAt"": ""string"", ""guardrailArn"": ""string"", ""guardrailId"": ""string"", ""version"": ""string"" } Create a guardrail 480 https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Topic.html https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ContentFilter.htmlAmazon Bedrock User Guide Set up permissions to use guardrails for content filtering To set up a role with permissions for guardrails, create an IAM role and attach the following permissions by following the steps at Creating a role to delegate permissions to an AWS service. If you are using guardrails with an agent, attach the permissions to a service role with permissions to create and manage agents. You can set up this role in the console or create a custom role by following the steps at Create a service role for Amazon Bedrock Agents. ? Permissions to invoke guardrails with foundation models ? Permissions to create and manage guardrails ? (Optional) Permissions to decrypt your customer-managed AWS KMS key for the guardrail Permissions to create and manage guardrails for the policy role Append the following statement to the Statement field in the policy for your role to use guardrails.]","In Amazon Bedrock guardrails, you can configure the following policies: 1. Content filters: You can specify filter strengths (none, low, medium, high) for harmful categories like sexual content, violence, hate speech, insults, and misconduct. 2. Word filters: You can specify lists of words or phrases to block or allow. 3. Sensitive information filters: You can specify types of personally identifiable information (PII) like names, addresses, etc. to block or mask, as well as custom regular expressions to match sensitive information. 4. Topic filters: You can specify topics to deny, including a definition, examples, and name for each topic.","You can configure the following policies in a guardrail to avoid undesirable and harmful content and remove sensitive information for privacy protection. Content filters – Adjust filter strengths to block input prompts or model responses containing harmful content. Denied topics – Define a set of topics that are undesirable in the context of your application. These topics will be blocked if detected in user queries or model responses. Word filters – Configure filters to block undesirable words, phrases, and profanity. Such words can include offensive terms, competitor names etc. Sensitive information filters – Block or mask sensitive information such as personally identifiable information (PII) or custom regex in user inputs and model responses. Contextual grounding check – Detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query.",0.810356
4,Which built in datasets are available on Amazon Bedrock for model evaluation of text generation?,"[Applications that use text classification include content recommendation, spam detection, language identification and trend analysis on social media. Imbalanced classes, ambiguous data, noisy data, and bias in labeling are some issues that can cause errors in text classification. Model evaluation task types 555 https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/Amazon Bedrock User Guide Important For text classification, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully. The following built-in datasets are recommended for use with the text classification task type. Women's E-Commerce Clothing Reviews Women's E-Commerce Clothing Reviews is a dataset that contains clothing reviews written by customers. This dataset is used in text classification tasks. The following table summarizes the metrics calculated, and recommended built-in datasets. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, Built-in datasets (API). Available built-in datasets in Amazon Bedrock Task type Metric Built-in datasets (console) Built- in datasets (API) Computed metric Accuracy Women's Ecommerce Clothing Reviews Builtin.W omensEcom merceClot hingBoolQ Accuracy (Binary Accuracy from class ification_accuracy_score) Text classific ation Robustnes s Women's Ecommerce Clothing Reviews Builtin.W omensEcom merceClot hingBoolQ classification_accuracy_score and delta_ classification_accuracy_score Model evaluation task types 556 https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviewsAmazon Bedrock User Guide To learn more about how the computed metric for each built-in dataset is calculated, see Review model evaluation job reports and metrics in Amazon Bedrock Use prompt datasets for model evaluation in Amazon Bedrock To create a model evaluation job you must specify a prompt dataset the model uses during inference. Amazon Bedrock provides built-in datasets that can be used in automatic model evaluations, or you can bring your own prompt dataset. For model evaluation jobs that use human workers you must use your own prompt dataset. Use the following sections to learn more about available built-in prompt datasets and creating your custom prompt datasets. To learn more about creating your first model evaluation job in Amazon Bedrock, see Choose the best performing model using Amazon Bedrock evaluations. Topics ? Use built-in prompt datasets for automatic model evaluation in Amazon Bedrock ? Use custom prompt dataset for model evaluation in Amazon Bedrock Use built-in prompt datasets for automatic model evaluation in Amazon Bedrock Amazon Bedrock provides multiple built-in prompt datasets that you can use in an automatic model evaluation job. Each built-in dataset is based off an open-source dataset. We have randomly down sampled each open-source dataset to include only 100 prompts. When you create an automatic model evaluation job and choose a Task type Amazon Bedrock provides you with a list of recommended metrics. For each metric, Amazon Bedrock also provides recommended built-in datasets. To learn more about available task types, see Model evaluation task types in Amazon Bedrock. Bias in Open-ended Language Generation Dataset (BOLD) The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts. Prompt datasets for model evaluation 557Amazon Bedrock User Guide RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity., row=3Amazon Bedrock User Guide Task type MetricBuilt-in datasets Computed metric Robustnes s Gigaword BERTScore and deltaBERTScore BoolQ NaturalQu estions Accuracy TriviaQA NLP-F1 BoolQ NaturalQu estions Robustnes s TriviaQA F1 and deltaF1 BoolQ NaturalQu estions Question and answer Toxicity TriviaQA Toxicity AccuracyWomen's Ecommerce Clothing Reviews Accuracy (Binary accuracy from classification_accuracy_s core) Text classific ation Robustnes s Women's Ecommerce Clothing Reviews classification_accuracy_score and delta_classifica tion_accuracy_score Topics ? General text generation for model evaluation in Amazon Bedrock Model evaluation task types 550 https://huggingface.co/datasets/gigaword?row=3 https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://github.com/google-research-datasets/boolean-questions https://github.com/google-research-datasets/natural-questions https://github.com/google-research-datasets/natural-questions https://nlp.cs.washington.edu/triviaqa/ https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviewsAmazon Bedrock User Guide ? Text summarization for model evaluation in Amazon Bedrock ? Question and answer for model evaluation in Amazon Bedrock ? Text classification for model evaluation in Amazon Bedrock General text generation for model evaluation in Amazon Bedrock General text generation is a task used by applications that include chatbots. The responses generated by a model to general questions are influenced by the correctness, relevance, and bias contained in the text used to train the model. Important For general text generation, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully. The following built-in datasets contain prompts that are well-suited for use in general text generation tasks. Bias in Open-ended Language Generation Dataset (BOLD) The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts. RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts. T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX) TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is ""George Washington was the president of the United States"". The subject is ""George Washington"", the predicate is ""was the president of"", and the object is ""the United States"". Model evaluation task types 551Amazon Bedrock User Guide WikiText2 WikiText2 is a HuggingFace dataset that contains prompts used in general text generation. The following table summarizes the metrics calculated, and recommended built-in dataset that are available for automatic model evaluation jobs., RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts. T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX) TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is ""George Washington was the president of the United States"". The subject is ""George Washington"", the predicate is ""was the president of"", and the object is ""the United States"". Model evaluation task types 551Amazon Bedrock User Guide WikiText2 WikiText2 is a HuggingFace dataset that contains prompts used in general text generation. The following table summarizes the metrics calculated, and recommended built-in dataset that are available for automatic model evaluation jobs. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, Built- in datasets (API). Available built-in datasets for general text generation in Amazon Bedrock Task type Metric Built-in datasets (Console) Built-in datasets (API) Computed metric Accuracy TREX Builtin.T-REx Real world knowledge (RWK) score BOLD Builtin.BOLD WikiText2 Builtin.W ikiText2 Robustnes s TREX Builtin.T-REx Word error rate RealToxicityPrompts Builtin.R ealToxici tyPrompts General text generation Toxicity BOLD Builtin.Bold Toxicity To learn more about how the computed metric for each built-in dataset is calculated, see Review model evaluation job reports and metrics in Amazon Bedrock Text summarization for model evaluation in Amazon Bedrock Text summarization is used for tasks including creating summaries of news, legal documents, academic papers, content previews, and content curation. The ambiguity, coherence, bias, and Model evaluation task types 552 https://hadyelsahar.github.io/t-rex/ https://github.com/amazon-science/bold https://huggingface.co/datasets/wikitext https://hadyelsahar.github.io/t-rex/ https://github.com/allenai/real-toxicity-prompts https://github.com/amazon-science/boldAmazon Bedrock User Guide fluency of the text used to train the model as well as information loss, accuracy, relevance, or context mismatch can influence the quality of responses. Important For text summarization, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully. The following built-in dataset is supported for use with the task summarization task type. Gigaword The Gigaword dataset consists of news article headlines. This dataset is used in text summarization tasks. The following table summarizes the metrics calculated, and recommended built-in dataset. To successfully specify the available built-in datasets using the AWS CLI, or a supported AWSSDK use the parameter names in the column, Built-in datasets (API). Available built-in datasets for text summarization in Amazon Bedrock Task type Metric Built-in datasets (console) Built-in datasets (API) Computed metric Accuracy Gigaword Builtin.Gigaword BERTScore Toxicity Gigaword Builtin.Gigaword Toxicity Text summariza tion Robustnes s Gigaword Builtin.Gigaword BERTScore and deltaBERT Score To learn more about how the computed metric for each built-in dataset is calculated, see Review model evaluation job reports and metrics in Amazon Bedrock Model evaluation task types 553 https://huggingface.co/datasets/gigaword?row=3 https://huggingface.co/datasets/gigaword?row=3 https://huggingface.co/datasets/gigaword?row=3Amazon Bedrock User Guide Question and answer for model evaluation in Amazon Bedrock Question and answer is used for tasks including generating automatic help-desk responses, information retrieval, and e-learning. If the text used to train the foundation model contains issues including incomplete or inaccurate data, sarcasm or irony, the quality of responses can deteriorate. Important For question and answer, there is a known system issue that prevents Cohere models from completing the toxicity evaluation successfully. The following built-in datasets are recommended for use with the question andg answer task type. BoolQ BoolQ is a dataset consisting of yes/no question and answer pairs.]","The following built-in datasets are available on Amazon Bedrock for model evaluation of text generation: - Bias in Open-ended Language Generation Dataset (BOLD): Evaluates fairness in general text generation across domains like profession, gender, race, religious ideologies, and political ideologies. - RealToxicityPrompts: Evaluates toxicity by attempting to get the model to generate racist, sexist, or otherwise toxic language. - T-Rex: A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX): Consists of Knowledge Base Triples extracted from Wikipedia to evaluate real-world knowledge. - WikiText2: Contains prompts used in general text generation.","The following built-in datasets contain prompts that are well-suited for use in general text generation tasks. Bias in Open-ended Language Generation Dataset (BOLD) The Bias in Open-ended Language Generation Dataset (BOLD) is a dataset that evaluates fairness in general text generation, focusing on five domains: profession, gender, race, religious ideologies, and political ideologies. It contains 23,679 different text generation prompts. RealToxicityPrompts RealToxicityPrompts is a dataset that evaluates toxicity. It attempts to get the model to generate racist, sexist, or otherwise toxic language. This dataset contains 100,000 different text generation prompts. T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples (TREX) TREX is dataset consisting of Knowledge Base Triples (KBTs) extracted from Wikipedia. KBTs are a type of data structure used in natural language processing (NLP) and knowledge representation. They consist of a subject, predicate, and object, where the subject and object are linked by a relation. An example of a Knowledge Base Triple (KBT) is ""George Washington was the president of the United States"". The subject is ""George Washington"", the predicate is ""was the president of"", and the object is ""the United States"". WikiText2 WikiText2 is a HuggingFace dataset that contains prompts used in general text generation.",0.538792


In [49]:
ground_truth

["Access to Amazon Bedrock foundation models isn't granted by default. You can request access, or modify access, to foundation models only by using the Amazon Bedrock console. First, make sure the IAM role that you use has\xa0sufficent IAM permissions\xa0to manage access to foundation models. Then, add or remove access to a model by following the instructions at\xa0Add or remove access to Amazon Bedrock foundation models.",
 'amazon.titan-text-premier-v1:0',
 'Anthropic Claude Instant v1.2, Anthropic Claude v2, Anthropic Claude v2.1',
 'You can configure the following policies in a guardrail to avoid undesirable and harmful content and remove sensitive information for privacy protection. Content filters\xa0– Adjust filter strengths to block input prompts or model responses containing harmful content.\nDenied topics\xa0– Define a set of topics that are undesirable in the context of your application. These topics will be blocked if detected in user queries or model responses.\nWord filte

### <a >Challenge Exercise :: Try it Yourself! </a>


<div style="border: 4px solid coral; text-align: left; margin: auto;">
    <br>
    <p style="text-align: center; margin: auto;"><b>Try the following exercises on this lab and note the observations.</b></p>
<p style=" text-align: left; margin: auto;">
<ol>
    <li>Test the RAG based LLM with more questions about Amazon Bedrock. </li>
<li>Look the the citations or retrieved references and see if the answer generated by the RAG chatbot aligns with these retrieved contexts. What response do you get when the retrieved context comes up empty? </li>
<li>Apply system prompts to RAG as well as amazon Bedrock Guardrails and test which is more consistent in blocking responses when the model response is hallucinated </li>
<li>Run the tutorial for RAG Checker and compare the difference with RAGAS evaluation framework: https://github.com/amazon-science/RAGChecker/blob/main/tutorial/ragchecker_tutorial_en.md </li>
</ol>
<br>
</p>
</div>


## Conclusion
We now have an understanding of parameters which influence hallucinations in Large Language Models. We learnt how to set up Retrieval Augmented Generation to provide a context to the model while answering.
We used Contextual grounding in Amazon Bedrock Guardrials to intervene when hallucinations are detected.
Finally we looked into the metrics of RAGAS and how to use them to measure hallucinations in your RAG powered chatbot.

In the next lab, we will:
1. Build a custom hallucination detector
2. Use Amazon Bedrock Agents to intervene when hallucinations are detected
3. Call a human for support when the LLM hallucinates
