# Applying Bedrock Guardrails to the Qwen3-4B-Instruct Model

----
Guardrails can be used to implement safeguards for your generative AI applications that are customized to your use cases and aligned with your responsible AI policies. Guardrails allows you to:

- Configure denied topics
- Filter harmful content
- Remove sensitive information


## The`ApplyGuardrail` API allows you to assess any text using pre-configured Bedrock Guardrails, without invoking the foundation models.

### Key Features:

1. **Content Validation**: Send any text input or output to the ApplyGuardrail API to have it evaluated against your defined topic avoidance rules, content filters, PII detectors, and word blocklists. You can evaluate user inputs and FM generated outputs independently.

2. **Flexible Deployment**: Integrate the Guardrails API anywhere in your application flow to validate data before processing or serving results to users. E.g. For a RAG application, you can now evaluate the user input prior to performing the retrieval instead of waiting until the final response generation.

3. **Decoupled from Foundation Models**: ApplyGuardrail is decoupled from foundational models. You can now use Guardrails without invoking Foundation Models.

You can use the assessment results to design the experience on your generative AI application. Let's now walk through a code-sample

## Prerequisites

In [None]:
%pip install -r ./scripts/requirements.txt

## This cell will restart the kernel. Click "OK".

In [None]:
from IPython import get_ipython
get_ipython().kernel.do_shutdown(True)

***

In [None]:
import boto3
import json
from botocore.exceptions import ClientError
from typing import Dict, Any
import sagemaker

sess = sagemaker.Session()
bedrock_client = boto3.client('bedrock', region_name=sess.boto_region_name)
bedrock_runtime = boto3.client('bedrock-runtime', region_name=sess.boto_region_name)

In [None]:
%store -r TUNED_ENDPOINT_NAME

# set the endpoint name manually by uncommenting below
#TUNED_ENDPOINT_NAME = ""  # Update with Fine-tuned model endpoint name

print(f"Tuned Endpoint: {TUNED_ENDPOINT_NAME}")

## Create a guardrail

Before running the code to apply a guardrail, you need to create a guardrail in Amazon Bedrock. We will create a guardrail that blocks input prompts and output responses from the model providing medical advice and obfuscates PII data, in addition to blocking generally harmful content.

In [None]:
import uuid

# Generate a unique client request token
client_request_token = str(uuid.uuid4())

# Create a Guardrail with specific filtering and compliance policies for medical use-case
response = bedrock_client.create_guardrail(
    name="MedicalContextGuardrails",
    description="Restrict responses to verified medical content only",
    blockedInputMessaging="Input Blocked: Sorry, I cannot provide non-medical advice, confirm/deny unverified medical information, or make any claims of guaranteed/definitive cures without sufficient evidence.",
    blockedOutputsMessaging="Output Blocked: Sorry, I cannot provide non-medical advice, confirm/deny unverified medical information, or make any claims of guaranteed/definitive cures without sufficient evidence.",

    # Topic-based restrictions (e.g., denying non-medical advice)
    topicPolicyConfig={
        'topicsConfig': [
            {'name': 'non-medical-advice', 'definition': 'Any recommendations outside medical expertise or context', 'type': 'DENY'},
            {'name': 'misinformation', 'definition': 'Dissemination of inaccurate or unverified medical information', 'type': 'DENY'},
            {'name': 'medical-cure-claims', 'definition': 'Claims of guaranteed or definitive cures for medical conditions without sufficient evidence', 'type': 'DENY'}
        ]
    },

    # Content filtering policies (e.g., blocking harmful or unethical content)
    contentPolicyConfig={
        'filtersConfig': [
            {'type': 'HATE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'INSULTS', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'SEXUAL', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'VIOLENCE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'MISCONDUCT', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'PROMPT_ATTACK', 'inputStrength': 'HIGH', 'outputStrength': 'NONE'}
        ]
    },

    # List of restricted words related to sensitive medical topics
    wordPolicyConfig={
        # Example: blocking inappropriate usage of critical medical terms
        'wordsConfig': [
            {'text': "malpractice"}, {'text': "misdiagnosis"}, {'text': "unauthorized treatment"},
            {'text': "experimental drug"}, {'text': "unapproved therapy"}, {'text': "medical fraud"},
            {'text': "cure"}, {'text': "guaranteed cure"}, {'text': "permanent remission"}
        ]
    },

    # Sensitive data anonymization (e.g., patient information)
    sensitiveInformationPolicyConfig={
        # Anonymize identifiable patient information
        'piiEntitiesConfig': [
            {'type': "NAME", "action": "ANONYMIZE"}, {'type': "EMAIL", "action": "ANONYMIZE"},
            {'type': "PHONE", "action": "ANONYMIZE"}, {'type': "US_SOCIAL_SECURITY_NUMBER", "action": "ANONYMIZE"},
            {'type': "ADDRESS", "action": "ANONYMIZE"}, {'type': "CA_HEALTH_NUMBER", "action": "ANONYMIZE"},
            {'type': "PASSWORD", "action": "ANONYMIZE"}, {'type': "IP_ADDRESS", "action": "ANONYMIZE"},
            {'type': "CA_SOCIAL_INSURANCE_NUMBER", "action": "ANONYMIZE"}, {'type': "CREDIT_DEBIT_CARD_NUMBER", "action": "ANONYMIZE"},
            {'type': "AGE", "action": "ANONYMIZE"}, {'type': "US_BANK_ACCOUNT_NUMBER", "action": "ANONYMIZE"}
        ],
        # Example regex patterns for anonymizing sensitive medical data
        'regexesConfig': [
            {
                "name": "medical_procedure_code",
                "description": "Pattern for medical procedure codes",
                "pattern": "\\b[A-Z]{1,5}\\d{1,5}\\b",
                "action": "ANONYMIZE"
            },
            {
                "name": "clinical_trial_id",
                "description": "Pattern for clinical trial identifiers",
                "pattern": "\\bNCT\\d{8}\\b",
                "action": "ANONYMIZE"
            }
        ]
    },

    # Tags for environment tracking
    tags=[
        {"key": "Environment", "value": "Production"},
        {"key": "Department", "value": "Medical"}
    ],
    clientRequestToken=client_request_token,
)

# Retrieve and print the Guardrail ID, ARN, and version
guardrail_id = response['guardrailId']

print(f"Guardrail ID: {guardrail_id}")
print(f"Guardrail ARN: {response['guardrailArn']}")
print(f"Version: {response['version']}")

Next, publish the draft of the guardrail so it can be used.

In [None]:
import time
# First create a published version
version_response = bedrock_client.create_guardrail_version(
    guardrailIdentifier=response['guardrailId'],
    description="Production version 1.0"
)

time.sleep(10)

guardrail_version = version_response['version']

print(f"Guardrail published with version: {guardrail_version}")

With a guardrail in place, you can now test its effectiveness. When processing input queries, no model is required and the `apply_guardrail` API can be invoked on the incoming request alone. The API will reply with an `action` field, showing whether the guardrail interviened which can be used to determine whether to send requests to the downstream LLM or to take some other action.

### Example: Blocking Medical Advice
In this example, the input prompt requests guidance on cures for COVID-19. Since the guardrail being invoked is configured to flag anything asking for medical advice or non-verified medical content, this request is blocked.

In [None]:
# Example of Input Prompt being Analyzed
content = [
    {
        "text": {
            "text": "Are there any cures for COVID-19?"
        }
    }
]

# Call the ApplyGuardrail API
try:
    response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        source='INPUT',  # or 'INPUT' depending on your use case
        content=content
    )
    
    # Process the response
    #print("API Response:")
    #print(json.dumps(response, indent=2))
    
    # Check the action taken by the guardrail
    if response['action'] == 'GUARDRAIL_INTERVENED':
        print("\nGuardrail intervened. \n\nOutput:")
        for output in response['outputs']:
            print(output['text'])
    else:
        print("\nGuardrail did not intervene.")
    
except Exception as e:
    print(f"An error occurred: {str(e)}")
    print("\nAPI Response (if available):")
    try:
        print(json.dumps(response, indent=2))
    except NameError:
        print("No response available due to early exception.")


### Example: Anonymization of PII in responses.

The guardrail in this example is configured to look at a variety of PII related field and anonymize them:
- NAME
- EMAIL
- PHONE
- US_SOCIAL_SECURITY_NUMBER
- ADDRESS
- CA_HEALTH_NUMBER
- PASSWORD
- IP_ADDRESS
- CA_SOCIAL_INSURANCE_NUMBER
- CREDIT_DEBIT_CARD_NUMBER
- AGE
- US_BANK_ACCOUNT_NUMBER

In the input content there are 3 entries:
- grounding source: ground truth context for the model to base its response on (simulated here)
- query: the user input query
- guard_content: the model output (simulated here)

This guardrail is going to be applied on the model **output** in this example, which means it won't affect the inputs at all.

A full list of all available types is available in the [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-sensitive-filters.html).

In [None]:
# An Example of Analyzing an Output Response, This time using Contexual Grounding

content = [
    {
        "text": {
            "text": "Clinical Trial NCT12345678 was performed by Dr. Vivek Murthy of Gainesville, Florida in 2022. Throughout a series of double-blind trials he was able to show the effects of catnip improving the cognitive functions of senior cats by 20%, but only for a few minutes.",
            "qualifiers": ["grounding_source"],
        }
    },
    {
        "text": {
            "text": "Can you provide the contact information, including the phone number and email address, for Dr. Vivek Murthy, who led the clinical trial NCT12345678?",
            "qualifiers": ["query"],
        }
    },
    {
        "text": {
            "text": "Dr. Vivek Murthy is based in Gainesville, Florida.",
            "qualifiers": ["guard_content"],
        }
    },
]

# Call the ApplyGuardrail API
try:
    response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        source='OUTPUT',  # or 'INPUT' depending on your use case
        content=content
    )
    
    # Process the response
    #print("API Response:")
    #print(json.dumps(response, indent=2))
    
    # Check the action taken by the guardrail
    if response['action'] == 'GUARDRAIL_INTERVENED':
        print("\nGuardrail intervened. \n\nOutput:")
        for output in response['outputs']:
            print(output['text'])
    else:
        print("\nGuardrail did not intervene.")

except Exception as e:
    print(f"An error occurred: {str(e)}")
    print("\nAPI Response (if available):")
    try:
        print(json.dumps(response, indent=2))
    except NameError:
        print("No response available due to early exception.")

## Using ApplyGuardrail API with a Third-Party or Self-Hosted Model

A common use case for the ApplyGuardrail API is in conjunction with a Language Model from a non Amazon Bedrock provider, or a model that you self-host. This combination allows you to apply guardrails to the input or output of any request.

The general flow would be:
1. Receive an input for your Model
2. Apply the guardrail to this input using the ApplyGuardrail API
3. If the input passes the guardrail, send it to your Model for Inference
4. Receive the output from your Model
5. Apply the Guardrail to your output
6. Return the final (potentially modified) output

### Here's a diagram illustrating this process:

<div style="text-align: center;">
    <img src="images/applyguardrail.png" alt="ApplyGuardrail API Flow" style="max-width: 100%;">
</div>

Let's walk through this with a code example that demonstrates this process

### These examples use SageMaker hosted model endpoint, but this could be any third-party model as well

We will use the `Qwen3-4B-Instruct-2507` model that we deployed earlier on a SageMaker Endpoint. 

### Incorporating the ApplyGuardrail API into our Self-Hosted Model

---
We've created a `TextGenerationWithGuardrails` class that integrates the ApplyGuardrail API with our SageMaker endpoint to ensure protected text generation. This class includes the following key methods:

1. `generate_text`: Calls our Language Model via a SageMaker endpoint to generate text based on the input.

2. `analyze_text`: A core method that applies our guardrail using the ApplyGuardrail API. It int|erprets the API response to determine if the guardrail passed or intervened.

3. `analyze_prompt` and `analyze_output`: These methods use `analyze_text` to apply our guardrail to the input prompt and generated output, respectively. They return a tuple indicating whether the guardrail passed and any associated message.

The class looks to implement the diagram above. It works as follows:

1. It first checks the input prompt using `analyze_prompt`.
2. If the input passes the guardrail, it generates text using `generate_text`.
3. The generated text is then checked using `analyze_output`.
4. If both guardrails pass, the generated text is returned. Otherwise, an intervention message is provided.

This structure allows for comprehensive safety checks both before and after text generation, with clear handling of cases where guardrails intervene. It's designed to easily integrate with larger applications while providing flexibility for error handling and customization based on guardrail results.

In [None]:
from botocore.exceptions import ClientError
from typing import Tuple, List, Dict, Any
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

class TextGenerationWithGuardrails:
    def __init__(self, endpoint_name: str, guardrail_id: str, guardrail_version: str, sagemaker_session=None):
        """
        Initialize the text generation class with guardrails.
        
        Args:
            endpoint_name: The SageMaker endpoint name
            model_id: The model ID (optional but useful for documentation)
            guardrail_id: The AWS Bedrock guardrail ID
            guardrail_version: The AWS Bedrock guardrail version
            sagemaker_session: SageMaker session object
        """
        # Create predictor directly instead of using retrieve_default
        self.predictor = Predictor(
            endpoint_name=endpoint_name,
            sagemaker_session=sagemaker_session,
            serializer=JSONSerializer(),
            deserializer=JSONDeserializer()
        )
        self.bedrock_runtime = boto3.client('bedrock-runtime')
        self.guardrail_id = guardrail_id
        self.guardrail_version = guardrail_version

    def generate_text(self, inputs: str, max_new_tokens: int = 256, temperature: float = 0.0) -> str:
        """Generate text using the specified SageMaker endpoint."""

        messages = [{"role": "user", "content": inputs}]
        
        payload = {
            "messages": messages,
            "parameters": {
                "max_new_tokens": max_new_tokens,
                "temperature": temperature,
                "stop": "<|eot_id|>"
            }
        }
    
        response = self.predictor.predict(payload)
        return response["choices"][0]["message"]["content"]

    def analyze_text(self, grounding_source: str, query: str, guard_content: str, source: str) -> Tuple[bool, str, Dict[str, Any]]:
        """
        Analyze text using the ApplyGuardrail API with contextual grounding.
        Returns a tuple (passed, message, details) where:
        - passed is a boolean indicating if the guardrail passed,
        - message is either the guardrail message or an empty string,
        - details is a dictionary containing the full API response for further analysis if needed.
        """
        try:
            content = [
                {
                    "text": {
                        "text": grounding_source,
                        "qualifiers": ["grounding_source"]
                    }
                },
                {
                    "text": {
                        "text": query,
                        "qualifiers": ["query"]
                    }
                },
                {
                    "text": {
                        "text": guard_content,
                        "qualifiers": ["guard_content"]
                    }
                }
            ]

            response = self.bedrock_runtime.apply_guardrail(
                guardrailIdentifier=self.guardrail_id,
                guardrailVersion=self.guardrail_version,
                source=source,
                content=content
            )
            
            action = response.get("action", "")
            if action == "NONE":
                return True, "", response
            elif action == "GUARDRAIL_INTERVENED":
                message = response.get("outputs", [{}])[-1].get("text", "Guardrail intervened")
                return False, message, response
            else:
                return False, f"Unknown action: {action}", response
        except ClientError as e:
            print(f"Error applying guardrail: {e}")
            raise

    def analyze_prompt(self, grounding_source: str, query: str) -> Tuple[bool, str, Dict[str, Any]]:
        """Analyze the input prompt."""
        return self.analyze_text(grounding_source, query, query, "INPUT")

    def analyze_output(self, grounding_source: str, query: str, generated_text: str) -> Tuple[bool, str, Dict[str, Any]]:
        """Analyze the generated output."""
        return self.analyze_text(grounding_source, query, generated_text, "OUTPUT")

    def generate_and_analyze(self, grounding_source: str, query: str, max_new_tokens: int = 256, temperature: float = 0.0) -> Tuple[bool, str, str]:
        """
        Generate text and analyze it with guardrails.
        Returns a tuple (passed, message, generated_text) where:
        - passed is a boolean indicating if the guardrail passed,
        - message is either the guardrail message or an empty string,
        - generated_text is the text generated by the model (if guardrail passed) or an empty string.
        """
        # First, analyze the prompt
        prompt_passed, prompt_message, _ = self.analyze_prompt(grounding_source, query)
        if not prompt_passed:
            return False, prompt_message, ""

        # If prompt passes, generate text
        generated_text = self.generate_text(query, max_new_tokens, temperature)

        # Analyze the generated text
        output_passed, output_message, _ = self.analyze_output(grounding_source, query, generated_text)
        if not output_passed:
            return False, output_message, ""

        return True, "", generated_text

### Examples

The following examples will allow you to test guardrail functionalize with your SageMaker hosted FM. 

The `test_generation_with_guardrail` function defined below will take a `TextGenerationWithGuardrails` along with model inputs, process the inputs with the supplied guardrail, send the inputs to your FM (if it passes), then process the model outputs with the guardrail before returning a final response.

In [None]:
# Bold text function
def bold(text):
    return f"\033[1m{text}\033[0m"

def test_generation_with_guardrail(text_gen: TextGenerationWithGuardrails, query, grounding_source="", max_new_tokens=512, temperature=0.0, print_api_responses=False):
    # Analyze input
    print(bold("\n=== Input Analysis ===\n"))
    input_passed, input_message, input_details = text_gen.analyze_prompt(grounding_source, query)
    if not input_passed:
        print(f"Input Guardrail Intervened. \n\nThe response to the User is: \n\n{input_message}\n")
        if print_api_responses:
            print("Full API Response:")
            print(json.dumps(input_details, indent=2))
        print()
    else:
        print("Input Prompt Passed The Guardrail Check - Moving to Generate the Response\n")
    
        # Generate text
        print(bold("\n=== Text Generation ===\n"))
        generated_text = text_gen.generate_text(query, max_new_tokens=max_new_tokens, temperature=temperature)
        print(f"Here is what the Model Responded with: \n\n{generated_text}\n")
        
        # Analyze output
        print(bold("\n=== Output Analysis ===\n"))
        print("Analyzing Model Response with the Response Guardrail\n")
        output_passed, output_message, output_details = text_gen.analyze_output(grounding_source, query, generated_text)
        if not output_passed:
            print(f"Output Guardrail Intervened. \n\nThe response to the User is: \n\n{output_message}\n")
            if print_api_responses:
                print("Full API Response:")
                print(json.dumps(output_details["outputs"], indent=2))
            print()
        else:
            print(f"Model Response Passed. The information presented to the user is: \n\n{generated_text}\n")

Initialize the `TextGenerationWithGuardrails` class with the SageMaker endpoint and Guardrail, then test with a variety of scenarios.

In [None]:
text_gen = TextGenerationWithGuardrails(
    endpoint_name=TUNED_ENDPOINT_NAME,
    guardrail_id=guardrail_id,
    guardrail_version=guardrail_version
)

In [None]:
test_generation_with_guardrail(
    text_gen,
    query="Is there a cure for COVID-19?",
    max_new_tokens=512,
    temperature=0.0)

In [None]:
test_generation_with_guardrail(
    text_gen,
    query="Can you provide the contact information, including the phone number and email address, for Dr. Vivek Murthy, who led the clinical trial NCT12345678?",
    max_new_tokens=512,
    temperature=0.0)

In [None]:
test_generation_with_guardrail(
    text_gen,
    query="Given the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?",
    max_new_tokens=512,
    temperature=0.0)

Congratulations! You've successfully implemented a guardrail for your model to help protect the inputs and outputs of your application. Continue to the clean up section.

## Clean Up

In [None]:
bedrock_client.delete_guardrail(guardrailIdentifier=guardrail_id)

In [None]:
sagemaker_client = boto3.client('sagemaker')

delete_sft_response = sagemaker_client.delete_endpoint(
    EndpointName=TUNED_ENDPOINT_NAME
)

print(delete_sft_response)

In [None]:
delete_sftcfg_response = sagemaker_client.delete_endpoint_config(
    EndpointConfigName=TUNED_ENDPOINT_NAME
)
print(delete_sftcfg_response)