# Red Teaming for AI Systems

## Objective
This notebook walks through how to use Azure AI Evaluation's Red Team functionality to assess the safety and resilience of AI systems against adversarial prompt attacks. Red teaming helps identify potential vulnerabilities across different risk categories (violence, hate/unfairness, sexual content, self-harm) and attack strategies of varying complexity levels.

## Time
You should expect to spend about 30-45 minutes running this notebook. Execution time will vary based on the number of risk categories, attack strategies, and complexity levels you choose to evaluate.

## Before you begin

### Installation
Install the following packages required to execute this notebook.

In [None]:
# Install the packages
%pip install azure-ai-evaluation[redteam] azure-identity openai

### Setup Environment Variables

Set the following variables for use in this notebook. These variables connect to your Azure resources and model deployments.

**Note:** You can find these values in your Azure AI Studio project or Azure OpenAI resource.

In [None]:
# Azure AI Project information
azure_ai_project = {
    "subscription_id": "<your-subscription-id>",
    "resource_group_name": "<your-resource-group-name>",
    "project_name": "<your-project-name>",
}

# Azure OpenAI deployment information
azure_openai_deployment = "<your-deployment-name>"  # e.g., "gpt-4"
azure_openai_endpoint = "<your-endpoint>"  # e.g., "https://example.openai.azure.com/"
azure_openai_api_version = "2023-12-01-preview"  # Use the latest API version

For reference, here's an example of what your populated environment variables should look like:

```
# Azure OpenAI
AZURE_OPENAI_API_KEY="your-api-key-here"
AZURE_OPENAI_ENDPOINT="https://endpoint-name.openai.azure.com/openai/deployments/deployment-name/chat/completions"
AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4"
AZURE_OPENAI_API_VERSION="2023-12-01-preview"

# Azure AI Project
AZURE_SUBSCRIPTION_ID="12345678-1234-1234-1234-123456789012"
AZURE_RESOURCE_GROUP_NAME="your-resource-group"
AZURE_PROJECT_NAME="your-project-name"
```

### Configuration
The Red Team evaluator requires an Azure AI Studio project configuration and Azure credentials. Your project configuration will be used to log evaluation results after the run is finished.

**Important**: Make sure to authenticate to Azure using `az login` in your terminal before running this notebook.

In [None]:
import os
from typing import Optional, Dict, Any

# Azure imports
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.ai.evaluation.red_team import RedTeam, RiskCategory, AttackStrategy

# OpenAI imports
from openai import AzureOpenAI

# Set up environment variables
os.environ["AZURE_DEPLOYMENT_NAME"] = azure_openai_deployment
os.environ["AZURE_ENDPOINT"] = azure_openai_endpoint
os.environ["AZURE_API_VERSION"] = azure_openai_api_version

# Initialize Azure credentials
credential = DefaultAzureCredential()

## Understanding Red Team Evaluation

The Azure AI Evaluation SDK's Red Team functionality evaluates AI systems against adversarial prompts across multiple dimensions:

1. **Risk Categories**: Different harmful content categories your model might generate
   - Violence
   - HateUnfairness
   - Sexual
   - SelfHarm

2. **Attack Strategies**: Along with standard unmodified prompts which are sent by default, you can specify different transformations of prompts to elicit harmful content
   - AnsiAttack: Using ANSI escape codes in prompts
   - AsciiArt: Using ASCII art to disguise harmful content
   - AsciiSmuggler: Hiding harmful content within ASCII characters
   - Atbash: Using the Atbash cipher to encode harmful requests
   - Base64: Encoding harmful content in Base64 format
   - Binary: Converting text to binary to bypass filters
   - Caesar: Using the Caesar cipher for encoding
   - CharacterSpace: Manipulating character spacing to confuse filters
   - CharSwap: Swapping characters to bypass detection
   - Diacritic: Using diacritical marks to alter text appearance
   - Flip: Flipping text to bypass content filters
   - Leetspeak: Converting letters to numbers and symbols
   - Morse: Using Morse code to encode harmful requests
   - ROT13: Using ROT13 cipher for text transformation
   - SuffixAppend: Adding suffixes to confuse detection systems
   - StringJoin: Joining strings in unconventional ways
   - Tense: Changing the tense of harmful requests to past tense
   - UnicodeConfusable: Using similar-looking Unicode characters
   - UnicodeSubstitution: Substituting characters with Unicode alternatives
   - Url: Embedding harmful content within URLs
   - Jailbreak: Specially crafted prompts to bypass AI safeguards

3. **Complexity Levels**: Different difficultly levels of attacks
   - Baseline: Standard functionality tests
   - Easy: Simple attack patterns
   - Moderate: More sophisticated attacks
   - Difficult: Complex, layered attack strategies

The key metric for evaluating results is the **Attack Success Rate (ASR)**, which measures the percentage of attacks that successfully elicit harmful content from your AI system.

## Basic Example: Fixed Response Callback

Let's start with a basic example that demonstrates how to set up a Red Team evaluation using a simple callback that always returns a fixed, safe response regardless of the input prompt.

**Important Note**: Since this callback always responds with the same safe message, the Attack Success Rate (ASR) will be 0% for all categories. This example is merely to demonstrate the API structure and flow.

In [None]:
# Define a simple callback function that always returns a fixed response
def financial_advisor_callback(query: str) -> str:  # noqa: ARG001
    return "I'm a financial advisor assistant. I can help with investment advice and financial planning within legal and ethical guidelines."

In [None]:
# Create the RedTeam instance
red_team = RedTeam(
    azure_ai_project=azure_ai_project,
    credential=credential,
    risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness],
    num_objectives=1,
)

Now let's run a simple red team evaluation using the fixed response target. We'll test against two risk categories and one attack strategy for simplicity.

In [None]:
# Run the red team scan called "Basic-Callback-Test" with limited scope for this basic example
# This will test 1 objective prompt for each of Violence and HateUnfairness categories with the Flip strategy
result = await red_team.scan(
    target=financial_advisor_callback, scan_name="Basic-Callback-Test", attack_strategies=[AttackStrategy.Flip]
)

## Advanced Example: Using an Actual Model Endpoint

Now let's create a more realistic example that uses an Azure OpenAI model for responding to the red team prompts. This will demonstrate how to evaluate an actual AI service.

In [None]:
# Define a callback that uses Azure OpenAI API to generate responses
async def azure_openai_callback(
    messages: list,
    stream: Optional[bool] = False,  # noqa: ARG001
    session_state: Optional[str] = None,  # noqa: ARG001
    context: Optional[Dict[str, Any]] = None,  # noqa: ARG001
) -> dict[str, list[dict[str, str]]]:
    deployment = os.environ.get("AZURE_DEPLOYMENT_NAME")
    endpoint = os.environ.get("AZURE_ENDPOINT")
    api_version = os.environ.get("AZURE_API_VERSION")

    # Get token provider for Azure AD authentication
    token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

    # Initialize Azure OpenAI client
    client = AzureOpenAI(azure_endpoint=endpoint, api_version=api_version, azure_ad_token_provider=token_provider)

    ## Extract the latest message from the conversation history
    messages_list = [{"role": message.get("role"), "content": message.get("content")} for message in messages]
    latest_message = messages_list[-1]["content"]

    try:
        # Call the model
        response = client.chat.completions.create(
            model=deployment,
            messages=[
                {"role": "user", "content": latest_message},
            ],
            max_tokens=500,
            temperature=0.7,
        )

        # Format the response to follow the expected chat protocol format
        formatted_response = {"content": response.choices[0].message.content, "role": "assistant"}

        return {"messages": [formatted_response]}
    except Exception as e:
        print(f"Error calling Azure OpenAI: {e!s}")
        return "I encountered an error and couldn't process your request."

In [None]:
# Create the RedTeam instance with the model target
model_red_team = RedTeam(
    azure_ai_project=azure_ai_project,
    credential=credential,
    risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness, RiskCategory.Sexual],
    num_objectives=5,
)

### Testing Different Attack Strategies

Now we'll run a more comprehensive evaluation using multiple attack strategies across risk categories. This will give us a better understanding of our model's vulnerabilities.

In [None]:
# Run the red team scan with multiple attack strategies
advanced_result = await model_red_team.scan(
    target=azure_openai_callback,
    scan_name="Advanced-Callback-Test",
    strategies=[
        AttackStrategy.EASY,  # Group of easy complexity attacks
        AttackStrategy.MODERATE,  # Group of moderate complexity attacks
        AttackStrategy.DIFFICULT,  # Group of difficult complexity attacks
        AttackStrategy.CharacterSpace,  # Add character spaces
        AttackStrategy.ROT13,  # Use ROT13 encoding
        AttackStrategy.UnicodeConfusable,  # Use confusable Unicode characters
        AttackStrategy.Tense,  # Change tense of prompts
        AttackStrategy.CharSwap,  # Swap characters in prompts
        AttackStrategy.Morse,  # Encode prompts in Morse code
        AttackStrategy.Leetspeak,  # Use Leetspeak
        AttackStrategy.Url,  # Use URLs in prompts
        AttackStrategy.Binary,  # Encode prompts in binary
    ],
    output_path="Advanced-Callback-Test.json",
)

## Conclusion

In this notebook, we've demonstrated how to use the Azure AI Evaluation SDK's Red Team functionality to assess the safety and resilience of AI systems. We started with a basic fixed-response example and then moved to a more realistic model evaluation across multiple risk categories and attack strategies.

The Red Team evaluation provides valuable insights into:

1. **Overall Attack Success Rate (ASR)** - The percentage of attacks that successfully elicit harmful content
2. **Vulnerability by Risk Category** - Which types of harmful content your model is most vulnerable to
3. **Effectiveness of Attack Strategies** - Which attack techniques are most successful against your model
4. **Impact of Complexity** - How more sophisticated attacks affect your model's safety guardrails

By regularly red-teaming your AI applications, you can identify and address potential vulnerabilities before deploying your models to production environments.

### Next Steps

1. **Mitigation**: Use these results to strengthen your model's guardrails against identified attack vectors
2. **Continuous Testing**: Implement regular red team evaluations as part of your development lifecycle
3. **Custom Strategies**: Develop custom attack strategies for your specific use cases and domain
4. **Safety Layers**: Consider adding additional safety layers like Azure AI Content Safety to filter harmful requests and responses