## 🕵🏾 AI Red Teaming Agent for Generative AI Applications 🕵🏾

This sample demonstrates how to use Azure AI Evaluation's RedTeam functionality to assess the safety and resilience of AI systems against adversarial prompt attacks.

### **Implementation Options**
- **SDK Local** - Use the Azure AI Evaluation SDK locally (with PyRIT support)
- **Cloud Option** - Use the Azure AI Foundry SDK to execute tests in the cloud

Microsoft encourages teams to use the AI Red Teaming Agent to run automated scans throughout the design, development, and pre-deployment stage:

- Design: Picking out the safest foundational model on your use case.
- Development: Upgrading models within your application or creating fine-tuned models for your specific application.
- Pre-deployment: Before deploying GenAI applications to productions.

### **Evaluations**
AI Red Teaming Agent leverages [Risk and Safety Evaluations](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#risk-and-safety-evaluators) to help identify potential safety issues across different risk categories (violence, hate/unfairness, sexual content, self-harm) combined with attack strategies of varying complexity levels from [PyRIT](https://github.com/Azure/PyRIT), Microsoft AI Red Teaming team's open framework for automated AI red teaming.
- **Automated scans** for content risks: first, you can automatically scan your model and application endpoints for safety risks by simulating adversarial probing.
- **Evaluate probing success:** next, you can evaluate and score each attack-response pair to generate insightful metrics such as Attack Success Rate (ASR).
- **Reporting and logging:** finally, you can generate a score card of the attack probing techniques and risk categories to help you decide if the system is ready for deployment. Findings can be logged, monitored, and tracked over time directly in Azure AI Foundry, ensuring compliance and continuous risk mitigation.

📚 **For complete details on AI Red Teaming Agent, visit:**  
**[Azure AI Foundry red teaming agent overview](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/ai-red-teaming-agent)**

---

## 🕵🏾 Red Teaming Agent - Notebook Overview 🕵🏾

This notebook walks through how to use Azure AI Evaluation's AI Red Teaming Agent functionality to assess the safety and resilience of AI systems against adversarial prompt attacks.

## What This Notebook Does:
1. **Setup & Environment** - Configure setup and environment file
2. **Local Evaluation** - Runs red teaming scans locally (Azure AI Evaluation SDK)
3. **Cloud Evaluation** - Runs red teaming scans in the cloud (Azure AI Foundry SDK)
4. (T) **Copilot-studio created agents option** - walks through how to use Azure AI Evaluation's AI Red Teaming Agent functionality to assess the safety and resilience of **Copilot Studio-created agents* against adversarial prompt attacks.

## Key Features:
✅ **Local Evaluations** - Run scan and specify risk categories 

✅ **Cloud Integration** - Create red teaming run and view results

✅ **Strategy and Planning** - potential strategies for planning how to set up and manage red teaming for responsible AI (RAI) risks throughout the large language model (LLM) product life cycle.

✅ **Error Handling** - Robust fallbacks and clear status reporting

## 1. Initialization and Setup 
**Prerequisites**
- An Azure AI Foundry project or hubs based project. To learn more, see [Create a project](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/create-projects).
- Create and connect your storage account to your Azure AI Foundry project at the resource level. There are two ways you can do this. You can use a Bicep template, which provisions and connects a storage account to your Foundry project with key authentication. You can also manually create and provision access to your storage account in the Azure portal.
- Make sure the connected storage account has access to all projects.
- If you connected your storage account with Microsoft Entra ID, make sure to give managed identity Storage Blob Data Owner permissions to both your account and the Foundry project resource in the Azure portal.

- A `.env` file containing `AI_FOUNDRY_PROJECT_ENDPOINT` (and optionally `MODEL_DEPLOYMENT_NAME`).
- A local environment with `redteam`, `azure-ai-identity`, `azure-ai-evaluation` packages installed.

**Note:**
Currently, AI Red Teaming Agent is only available in a few regions. Ensure your Azure AI Project is located in the following supported regions:

- East US2
- Sweden Central
- France Central
- Switzerland West

**What we do**:
- Load environment variables.
- Initialize `RedTeam` agent (currently in preview, not recommend for production workloads).
- Run scan on simple callback target.
- Specify risk categories to cover in scan.

In [None]:
# May need to install the Azure AI Evaluation SDK red teaming package 
## If you haven't installed the Azure AI Evaluation SDK with red teaming support, run the following command
#!pip install azure-ai-evaluation[redteam]

In [None]:
# Setup
## Import libraries and check connection and tenant ID
# Import necessary libraries
import json
import os
import time
from pathlib import Path
from dotenv import load_dotenv
from typing import Optional, Dict, Any

# Azure imports
from azure.identity import DefaultAzureCredential
from azure.ai.evaluation.red_team import RedTeam, RiskCategory
from openai import AzureOpenAI

# Load environment variables
notebook_path = Path().absolute()
env_path = notebook_path.parent.parent / '.env'  # Adjust path as needed
load_dotenv(env_path)
# Get project endpoint and tenant ID from environment variables
project_endpoint = os.environ.get("AI_FOUNDRY_PROJECT_ENDPOINT")
tenant_id = os.environ.get("TENANT_ID")
if not project_endpoint:
    raise ValueError("🚨 AI_FOUNDRY_PROJECT_ENDPOINT not set in .env.")
# Confirm tenant ID is set
print(f"🔑 Using Tenant ID: {tenant_id}")

In [None]:
# Instantiate your AI Red Teaming Agent
red_team_agent = RedTeam(
    azure_ai_project=project_endpoint, # required
    credential=DefaultAzureCredential() # required
)

# A simple example application callback function that always returns a fixed response
def simple_callback(query: str) -> str:
    return "I'm an AI assistant that follows ethical guidelines. I cannot provide harmful content."


## 2. Run Scan Locally
**🔍 Local Evaluation Simple Callback**

This example generates a default set of 10 attack prompts for each of the default set of four risk categories (violence, sexual, hate and unfairness, and self-harm) to result in a total of 40 rows of attack prompts to be generated and sent to your target.

In [None]:
# Create 'results' folder if it doesn't exist
results_folder = Path("./results")
results_folder.mkdir(parents=True, exist_ok=True)

# Define the path for the red team results file
red_team_results_path = results_folder / "red_team_results1.jsonl"

# Runs a red teaming scan on the simple callback target
red_team_result = await red_team_agent.scan(
    target=simple_callback,
    output_path=red_team_results_path)

# Note: the default iterations uses 10 objectives, so this may take 2-5 min.
# The scan results are also viewable in the Azure AI Foundry project UI.

In [None]:
# Quick view of results, also saved in filepath specified above
# Define the path to the results file
results_path = Path("./results/red_team_results1.jsonl")

# Read and display each line (each line is a JSON object)
with results_path.open("r", encoding="utf-8") as f:
    for line in f:
        result = json.loads(line)
        print(json.dumps(result, indent=2))  # Pretty print each result



**🔍 Specify Risk Categories**

Optionally, you can specify which risk categories of content risks you want to cover with risk_categories parameter and define the number of prompts covering each risk category with num_objectives parameter.

*AI Red Teaming Agent only supports single-turn interactions in text-only scenarios.*

In [None]:
# Specifying risk categories and number of attack objectives per risk categories you want the AI Red Teaming Agent to cover
red_team_agent = RedTeam(
    azure_ai_project=project_endpoint, # required
    credential=DefaultAzureCredential(), # required
    risk_categories=[ # optional, defaults to all four risk categories
        RiskCategory.Violence #,
        #RiskCategory.HateUnfairness,
        #RiskCategory.Sexual,
        #RiskCategory.SelfHarm
    ], 
    num_objectives=1, # optional, defaults to 10
)

In [None]:
# Run the red team scan called "Basic-Callback-Scan" with limited scope for this basic example
# This will test 5 objective prompt for specified Violence category with the Flip strategy
# Import the necessary attack strategy
from azure.ai.evaluation.red_team import AttackStrategy

# Define the path for the red team results file
red_team_results_path = results_folder / "red_team_results2.jsonl"
# Runs a red teaming scan on the simple callback target with specified attack strategies
result = await red_team_agent.scan(
    target=simple_callback,
    scan_name="Basic-Flip-Scan",
    attack_strategies=[AttackStrategy.Flip],
    output_path=red_team_results_path,
)

**Complex callback**: A more complex callback that is aligned to the OpenAI Chat Protocol

In [None]:
# Create a more complex callback function that handles conversation state
async def advanced_callback(messages, stream=False, session_state=None, context=None):
    # Extract the latest message from the conversation history
    messages_list = [{"role": message.role, "content": message.content} 
                    for message in messages]
    latest_message = messages_list[-1]["content"]
    
    # In a real application, you might process the entire conversation history
    # Here, we're just simulating a response
    response = "I'm an AI assistant that follows safety guidelines. I cannot provide harmful content."
    
    # Format the response to follow the expected chat protocol format
    formatted_response = {
        "content": response,
        "role": "assistant"
    }
    
    return {"messages": [formatted_response]}

red_team_result = await red_team_agent.scan(target=advanced_callback)

📈 **Results from automated scans**
The key metric for assessing your results is the Attack Success Rate (ASR), which measures the percentage of attacks that successfully elicit undesirable responses from your AI system. This can ve viewed in Evaluation section in AI Foundry or as the output path defined in the scan.

The output_path that captures a JSON file that represents a scorecard of your results for using in your own reporting tool or compliance platform.

## 3. Run Scan in Cloud


**☁️ Cloud Evaluation**

Running scans in the cloud allows for pre-deployment AI red teaming runs on larger combinations of attack strategies and risk categories for a fuller analysis. View results in Azure AI Foundry project for tracking and collaboration.

In [None]:
# Cloud Evaluation - Fixed using Official Microsoft Documentation
from azure.identity import DefaultAzureCredential
import os
import json
import time

print("☁️ Setting up Cloud Evaluation following official documentation...")

# Step 1: Install and import required packages
try:
    from azure.ai.projects import AIProjectClient
    from azure.ai.projects.models import (
        RedTeam,
        AzureOpenAIModelConfiguration,
        AttackStrategy,
        RiskCategory,
    )
    print("✅ Azure AI Projects SDK found")
except ImportError:
    print("❌ Installing azure-ai-projects...")
    import subprocess
    import sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "azure-ai-projects>=1.0.0b4"])

    from azure.ai.projects import AIProjectClient
    from azure.ai.projects.models import (
        RedTeam,
        AzureOpenAIModelConfiguration,
        AttackStrategy,
        RiskCategory
    )
    print("✅ Packages installed successfully")

# Step 2: Configuration using official environment variable names
PROJECT_ENDPOINT = os.environ.get("AI_FOUNDRY_PROJECT_ENDPOINT")
MODEL_ENDPOINT = os.environ.get("AZURE_OPENAI_ENDPOINT") 
MODEL_API_KEY = os.environ.get("AZURE_OPENAI_API_KEY")
MODEL_DEPLOYMENT_NAME = os.environ.get("MODEL_DEPLOYMENT_NAME")


print(f"🏢 Project Endpoint: {PROJECT_ENDPOINT}")
print(f"🤖 Model Deployment: {MODEL_DEPLOYMENT_NAME}")
print(f"🔗 Model Endpoint: {MODEL_ENDPOINT}")

if not PROJECT_ENDPOINT:
    print("⚠️ Missing AZURE_AI_PROJECT_ENDPOINT in .env file")
    cloud_result = None
else:
    try:
        # Step 3: Authentication using DefaultAzureCredential
        print("🔐 Setting up authentication...")
        credential = DefaultAzureCredential()
        
        # Step 4: Create AI Project Client
        with AIProjectClient(endpoint=PROJECT_ENDPOINT, credential=credential) as project_client:
            print("🌐 AI Project Client created successfully")
           
            # Step 5: Create Red Teaming Agent
            print("🤖 Creating Red Teaming Agent...")
            # Define target configuration for Azure OpenAI model
            target_config = AzureOpenAIModelConfiguration(model_deployment_name=MODEL_DEPLOYMENT_NAME)

            # Step 6: Create Red Teaming Agent
            red_team_agent = RedTeam(
                attack_strategies=[AttackStrategy.BASE64],
                risk_categories=[RiskCategory.VIOLENCE],
                display_name="red-team-cloud-run",
                target=target_config,
            )

            # Step 7: Set headers for model configuration
            headers = {
                "model-endpoint": MODEL_ENDPOINT,
                "api-key": MODEL_API_KEY,
            }

            print("🚀 Submitting Red Teaming scan...")
            red_team_response = project_client.red_teams.create(red_team=red_team_agent, headers=headers)

            print("⏳ Checking scan status...")
            time.sleep(2)
            get_red_team_response = project_client.red_teams.get(name=red_team_response.name)
            print(f"📊 Red Team scan status: {get_red_team_response.status}")

    except Exception as e:
        print(f"❌ Red Teaming setup failed: {e}")
        print(f"📋 Error type: {type(e).__name__}")

Once your AI red teaming run is finished running, you can view your results in your Azure AI Foundry project.