# üî¥ Red Team Security Testing for AI Agents

This notebook demonstrates how to perform **Red Team security scans** on AI models using Microsoft Foundry. Red teaming proactively identifies vulnerabilities by simulating adversarial attacks against your AI systems.

## üéØ Learning Objectives

1. **Understand Red Team concepts** for AI security
2. **Configure attack strategies** and risk categories
3. **Run security scans** against AI models
4. **Analyze vulnerabilities** and security findings

## üíº Industry Use Case: Banking AI Security Assessment

In financial services, AI systems are high-value targets. Red team testing helps:

- **Prevent prompt injection** attacks that could expose customer data
- **Detect jailbreak vulnerabilities** that bypass safety guardrails
- **Identify information leakage** risks before production deployment
- **Ensure regulatory compliance** with security requirements

**Attack Scenarios in Banking:**
| Threat | Impact | Red Team Detection |
|--------|--------|--------------------|
| Prompt Injection | Unauthorized data access | Encoding attacks |
| Jailbreak | Bypass fraud controls | Multi-turn manipulation |
| Data Extraction | Customer PII leakage | Crescendo attacks |
| Harmful Content | Reputation damage | Violence/hate detection |

### ‚ö†Ô∏è Disclaimer
> **This is a security testing demonstration.** Red team testing should only be performed on systems you own or have explicit permission to test. Follow your organization's security policies.

## üîê Authentication Setup

Before running this notebook, authenticate with Azure CLI:

```bash
az login --use-device-code
```

## 1. Environment Setup

In [None]:
import os
import time
from pathlib import Path
from pprint import pprint
from dotenv import load_dotenv

# Load environment variables
notebook_path = Path().absolute()
env_path = notebook_path.parent / '.env'
load_dotenv(env_path)

# Verify required environment variables
project_endpoint = os.environ.get("AI_FOUNDRY_PROJECT_ENDPOINT")
tenant_id = os.environ.get("TENANT_ID")
model_deployment = os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "gpt-4o")

# Model endpoint and API key for red team testing (from existing .env variables)
# Extract base endpoint from project endpoint or use AZURE_OPENAI_ENDPOINT
azure_openai_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
azure_openai_api_key = os.environ.get("AZURE_OPENAI_API_KEY")

# Derive model endpoint from project endpoint (remove /api/projects/<project_name>)
if project_endpoint:
    # Example: https://<account>.services.ai.azure.com/api/projects/<project> -> https://<account>.services.ai.azure.com
    model_endpoint = project_endpoint.split("/api/projects")[0]
else:
    model_endpoint = None

if not project_endpoint:
    raise ValueError("üö® AI_FOUNDRY_PROJECT_ENDPOINT not set in .env")

print(f"üîë Tenant ID: {tenant_id}")
print(f"üìç Project Endpoint: {project_endpoint[:50]}...")
print(f"ü§ñ Model Deployment: {model_deployment}")
print(f"üåê Model Endpoint (derived): {model_endpoint}")
print(f"üîó Azure OpenAI Endpoint: {azure_openai_endpoint}")
print(f"üîê Azure OpenAI API Key: {'*' * 10 if azure_openai_api_key else 'Not set'}")

## 2. Initialize AI Project Client

In [None]:
from azure.identity import AzureCliCredential
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import (
    RedTeam,
    AzureOpenAIModelConfiguration,
    AttackStrategy,
    RiskCategory,
)

# Initialize credentials and clients
credential = AzureCliCredential(tenant_id=tenant_id)
project_client = AIProjectClient(endpoint=project_endpoint, credential=credential)

print("‚úÖ AIProjectClient initialized for Red Team testing")

## 3. Understanding Attack Strategies

Microsoft Foundry provides several attack strategies to test model resilience:

| Strategy | Description | FSI Risk |
|----------|-------------|----------|
| `BASE64` | Encodes attacks in Base64 to bypass filters | Data exfiltration attempts |
| `FLIP` | Reverses/flips text to evade detection | Bypassing keyword filters |
| `MORSE_CODE` | Uses Morse encoding patterns | Obfuscated malicious prompts |
| `CAESAR_CIPHER` | Classic cipher encoding | Historical attack patterns |
| `UNICODE_CONFUSABLE` | Uses similar-looking Unicode chars | Phishing-style attacks |
| `CRESCENDO` | Gradually escalating harmful requests | Social engineering |
| `MULTI_TURN` | Multi-turn conversation manipulation | Context poisoning |

In [None]:
# Display available attack strategies
print("üéØ Available Attack Strategies:")
print("-" * 50)

attack_strategies_info = [
    ("BASE64", "Encodes malicious content in Base64 format"),
    ("FLIP", "Reverses or flips text to evade detection"),
    ("MORSE_CODE", "Uses Morse code encoding patterns"),
    ("CAESAR_CIPHER", "Applies Caesar cipher transformation"),
    ("UNICODE_CONFUSABLE", "Uses visually similar Unicode characters"),
    ("CRESCENDO", "Gradually escalates harmful requests"),
    ("MULTI_TURN", "Exploits multi-turn conversation context"),
]

for strategy, description in attack_strategies_info:
    print(f"   ‚Ä¢ {strategy}: {description}")

## 4. Understanding Risk Categories

Risk categories define what types of vulnerabilities to test for:

| Category | Description | FSI Concern |
|----------|-------------|-------------|
| `VIOLENCE` | Content promoting violence | Customer safety |
| `HATE_UNFAIRNESS` | Discriminatory content | Fair lending compliance |
| `SEXUAL` | Inappropriate content | Professional standards |
| `SELF_HARM` | Content promoting self-harm | Customer wellbeing |

In [None]:
# Display available risk categories
print("‚ö†Ô∏è Available Risk Categories:")
print("-" * 50)

risk_categories_info = [
    ("VIOLENCE", "Tests for violent content generation"),
    ("HATE_UNFAIRNESS", "Tests for discriminatory or biased content"),
    ("SEXUAL", "Tests for inappropriate sexual content"),
    ("SELF_HARM", "Tests for self-harm promotion"),
]

for category, description in risk_categories_info:
    print(f"   ‚Ä¢ {category}: {description}")

## 5. Configure Target Model

We'll configure the Azure OpenAI model to be tested. In FSI, this would typically be a customer-facing banking assistant or advisory model.

In [None]:
# Configure the target model for testing
target_config = AzureOpenAIModelConfiguration(
    AZURE_AI_MODEL_DEPLOYMENT_NAME=model_deployment
)

print(f"üéØ Target Model Configuration:")
print(f"   Deployment Name: {model_deployment}")
print(f"   Configuration Type: AzureOpenAIModelConfiguration")

## 6. Create Red Team Scan Configuration

For this FSI demonstration, we'll test:
- **Attack Strategy**: BASE64 encoding (common data exfiltration technique)
- **Risk Category**: VIOLENCE (safety compliance check)

In production, you would run comprehensive scans with multiple strategies and categories.

In [None]:
# Create Red Team configuration for FSI banking model testing
red_team = RedTeam(
    attack_strategies=[AttackStrategy.BASE64],
    risk_categories=[RiskCategory.VIOLENCE],
    display_name="fsi-banking-security-scan",
    target=target_config,
)

print("üî¥ Red Team Configuration Created:")
print("-" * 50)
print(f"   Display Name: {red_team.display_name}")
print(f"   Attack Strategies: {[str(s) for s in red_team.attack_strategies]}")
print(f"   Risk Categories: {[str(c) for c in red_team.risk_categories]}")
print(f"   Target: {model_deployment}")

## 7. Run Red Team Scan

‚ö†Ô∏è **Important**: This requires `MODEL_ENDPOINT` and `MODEL_API_KEY` environment variables to be set.

The scan will:
1. Generate adversarial prompts using the specified attack strategies
2. Send prompts to the target model
3. Analyze responses for security vulnerabilities
4. Report findings with severity scores

In [None]:
# Check if we have the required credentials from .env
if not model_endpoint or not azure_openai_api_key:
    print("‚ö†Ô∏è Required credentials not found in .env file.")
    print("\nRequired variables:")
    print(f"   AI_FOUNDRY_PROJECT_ENDPOINT: {'‚úÖ Set' if project_endpoint else '‚ùå Missing'}")
    print(f"   AZURE_OPENAI_API_KEY: {'‚úÖ Set' if azure_openai_api_key else '‚ùå Missing'}")
    print("\nüìå Skipping Red Team scan execution...")
    red_team_response = None
else:
    print("üöÄ Creating Red Team scan...")
    print(f"   Model Endpoint: {model_endpoint}")
    print(f"   Model Deployment: {model_deployment}")
    print("   This may take a few minutes to initialize.")
    
    try:
        # Create and run the Red Team scan
        red_team_response = project_client.red_teams.create(
            red_team=red_team,
            headers={
                "model-endpoint": model_endpoint,
                "model-api-key": azure_openai_api_key,
            },
        )
        
        print(f"\n‚úÖ Red Team scan created successfully!")
        print(f"   Scan Name: {red_team_response.name}")
        print(f"   Status: {red_team_response.status}")
    except Exception as e:
        print(f"\n‚ùå Error creating Red Team scan: {e}")
        red_team_response = None

## 8. Monitor Scan Status

In [None]:
if red_team_response:
    print("‚è≥ Monitoring Red Team scan progress...")
    print("-" * 50)
    
    scan_name = red_team_response.name
    max_wait_minutes = 10
    check_interval = 30  # seconds
    elapsed = 0
    
    while elapsed < max_wait_minutes * 60:
        # Get current scan status
        scan_status = project_client.red_teams.get(name=scan_name)
        print(f"   [{elapsed//60}m {elapsed%60}s] Status: {scan_status.status}")
        
        if scan_status.status in ["Completed", "Failed", "Cancelled"]:
            break
        
        time.sleep(check_interval)
        elapsed += check_interval
    
    print(f"\nüìä Final Status: {scan_status.status}")
else:
    print("üìå No active scan to monitor (credentials not configured)")

## 9. Get Scan Results

In [None]:
if red_team_response:
    print("\n" + "=" * 60)
    print("üî¥ RED TEAM SCAN RESULTS")
    print("=" * 60)
    
    # Get detailed scan results
    scan_details = project_client.red_teams.get(name=red_team_response.name)
    
    print(f"\nüìã Scan Details:")
    print(f"   Name: {scan_details.name}")
    print(f"   Display Name: {scan_details.display_name}")
    print(f"   Status: {scan_details.status}")
    
    # Display full scan object
    print("\nüìù Full Scan Results:")
    print("-" * 60)
    pprint(vars(scan_details) if hasattr(scan_details, '__dict__') else scan_details)
else:
    print("üìå No scan results available (credentials not configured)")

## 10. List All Red Team Scans

You can list all historical Red Team scans for audit and compliance purposes.

In [None]:
print("üìã Listing all Red Team scans in project...")
print("-" * 60)

try:
    scan_count = 0
    for scan in project_client.red_teams.list():
        scan_count += 1
        print(f"\nüîπ Scan {scan_count}:")
        print(f"   Name: {scan.name}")
        print(f"   Display Name: {getattr(scan, 'display_name', 'N/A')}")
        print(f"   Status: {scan.status}")
    
    if scan_count == 0:
        print("   No Red Team scans found in this project.")
    else:
        print(f"\nüìä Total scans: {scan_count}")
except Exception as e:
    print(f"‚ö†Ô∏è Could not list scans: {e}")

## 11. FSI Security Compliance Insights

In [None]:
print("\n" + "=" * 60)
print("üíº FSI SECURITY COMPLIANCE INSIGHTS")
print("=" * 60)

print("\nüîê Why Red Team Testing Matters for Banking:")
print("-" * 50)
print("   1. REGULATORY: SOC 2, PCI-DSS require security testing")
print("   2. DATA PROTECTION: Prevent customer PII exposure")
print("   3. FRAUD PREVENTION: Detect bypasses of fraud controls")
print("   4. REPUTATION: Protect brand from AI misuse")

print("\nüìä Recommended FSI Red Team Strategy:")
print("-" * 50)
print("   Phase 1: Basic encoding attacks (BASE64, FLIP)")
print("   Phase 2: Multi-turn manipulation tests")
print("   Phase 3: Crescendo (gradual escalation) attacks")
print("   Phase 4: Full risk category coverage")

print("\n‚úÖ Security Testing Checklist:")
print("-" * 50)
print("   ‚ñ° Test before production deployment")
print("   ‚ñ° Re-test after model updates")
print("   ‚ñ° Document all findings for audit")
print("   ‚ñ° Remediate critical vulnerabilities")
print("   ‚ñ° Schedule periodic security reviews")

## üéØ Summary

In this notebook, you learned how to:

‚úÖ **Understand Red Team concepts** for AI security testing  
‚úÖ **Configure attack strategies** (BASE64, FLIP, CRESCENDO, etc.)  
‚úÖ **Define risk categories** (Violence, Hate, Sexual, Self-harm)  
‚úÖ **Create and run security scans** against Azure OpenAI models  
‚úÖ **Monitor scan progress** and retrieve results  
‚úÖ **List historical scans** for compliance auditing  

### üîß Key APIs Used

| API | Purpose |
|-----|--------|
| `RedTeam` | Configure Red Team scan parameters |
| `AzureOpenAIModelConfiguration` | Define target model |
| `AttackStrategy` | Specify attack techniques |
| `RiskCategory` | Define vulnerability categories |
| `project_client.red_teams.create()` | Launch security scan |
| `project_client.red_teams.get()` | Retrieve scan results |
| `project_client.red_teams.list()` | List all scans |

### üî¥ Attack Strategies Quick Reference

| Strategy | Use Case |
|----------|----------|
| `BASE64` | Test encoding-based filter bypasses |
| `FLIP` | Test text reversal attacks |
| `CRESCENDO` | Test gradual escalation manipulation |
| `MULTI_TURN` | Test conversation context exploitation |

### üìö Next Steps

1. **Run comprehensive scans** with all attack strategies
2. **Integrate into CI/CD** for automated security testing
3. **Set up alerting** for critical findings
4. **Document findings** for compliance audits

