[Evaluation] Additional red team e2e tests #45579
Conversation
There was a problem hiding this comment.
Pull request overview
Adds additional end-to-end coverage for RedTeam “Foundry” execution and aligns a few red team internals/outputs with expected contracts and error semantics.
Changes:
- Expanded
test_red_team_foundry.pywith new Foundry e2e scenarios (model-config targets, agent targets, new risk categories, multi-turn strategies, and contract error paths). - Fixed Foundry baseline objective cache lookup keying to use the same risk-category→objective mapping as the generator path.
- Treated leftover
pending/runningstatuses as terminal failures when producing final run status, and surfaced RAI evaluation service “error outcome” as undetermined instead of attack success.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
sdk/evaluation/azure-ai-evaluation/tests/e2etests/test_red_team_foundry.py |
Adds substantial Foundry red team e2e coverage across targets, strategies, and risk categories. |
sdk/evaluation/azure-ai-evaluation/tests/conftest.py |
Updates OpenAI/test-proxy routing configuration used by recordings/playback. |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_result_processor.py |
Adjusts final run-level status determination semantics after scan completion. |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_red_team.py |
Fixes baseline objective cache key mismatch in Foundry execution path. |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_rai_scorer.py |
Detects evaluation-service error outcomes and raises so PyRIT marks results as undetermined. |
sdk/evaluation/azure-ai-evaluation/CHANGELOG.md |
Documents the bug fixes included in this PR. |
Tests cover: basic execution, XPIA, multiple risk categories, application scenarios, strategy combinations, model_config targets, agent callbacks, agent tool context, ProtectedMaterial/CodeVulnerability/TaskAdherence categories, SensitiveDataLeakage, agent-only risk rejection, multi-turn, and crescendo attacks. Also fixes PROXY_URL() TypeError in conftest.py (PROXY_URL is a str, not callable). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
250ada4 to
59748fa
Compare
- Revert PROXY_URL back to PROXY_URL() (it's a function, not a variable) - Apply black formatting to assert statements in test_red_team_foundry.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t risk categories - Add _safe_tqdm_write() wrapper to handle UnicodeEncodeError on Windows cp1252 terminals - Replace all tqdm.write() calls with _safe_tqdm_write() in _red_team.py - Add custom seed prompt files for agent-only risk categories (task_adherence, sensitive_data_leakage, prohibited_actions) that lack server-side seed data - Update test_foundry_task_adherence_category and test_foundry_agent_sensitive_data_leakage to use custom_attack_seed_prompts, bypassing get_attack_objectives API - Apply black formatting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Merge upstream/main (7 commits) into foundry-e2e-tests branch - Fix PROXY_URL() call in conftest.py (PROXY_URL is a string, not callable) - Re-record all 15 foundry red team E2E tests with updated source code Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In CI, devtools_testutils.config.PROXY_URL is a function that must be called. Locally (pip-installed), it's a string constant. Use callable() check to handle both environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Patch random.sample and random.choice to return deterministic (first-N) results for the model config target test. This ensures the same objectives are selected during both recording and playback, preventing test proxy 404 mismatches caused by non-deterministic objective selection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # sdk/evaluation/azure-ai-evaluation/assets.json # sdk/evaluation/azure-ai-evaluation/tests/conftest.py
Extend /openai/v1 path normalization to all Azure endpoint patterns (*.openai.azure.com, *.cognitiveservices.azure.com, sovereign clouds) not just Foundry endpoints. PyRIT 0.11+ uses AsyncOpenAI(base_url=) which appends /chat/completions directly, requiring the /openai/v1 prefix. Without this fix, model config targets using classic AOAI endpoints get 404 errors because PyRIT sends requests to the bare endpoint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RISK_CATEGORY_METRIC_MAP mapped HateUnfairness to HATE_FAIRNESS (legacy name), but the sync eval API returns results under hate_unfairness (canonical name). The scorer's result matching compared against the un-normalized hate_fairness, causing it to never match and silently fall back to score=0 — making ASR always 0% for hate_unfairness regardless of actual model behavior. Changes: - metric_mapping.py: Map HateUnfairness to HATE_UNFAIRNESS (canonical name). The routing layer in evaluate_with_rai_service_sync normalizes to the legacy name when use_legacy_endpoint=True, so both paths work. - _rai_scorer.py: Match results against both canonical and legacy aliases using _SYNC_TO_LEGACY_METRIC_NAMES, so future metric renames don't silently break scoring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests now expect /openai/v1 suffix on all Azure endpoints, matching the updated get_chat_target() behavior needed for PyRIT 0.11+. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ff6da95 to
35168c4
Compare
When target_type=agent and no client_id is provided (local execution, not ACA), fall back to the existing credential to set aml-aca-token header. Previously this header was only set via ACA managed identity, causing 'Authorization failed for seeds' when running agent-target red team scans locally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
77066a9 to
abb47c4
Compare
This reverts commit abb47c4.
Two fixes: 1. _rai_service_target.py: Accept both 'message' (PyRIT 0.11+) and 'prompt_request' (legacy) parameter names in send_prompt_async(). PyRIT 0.11 changed the interface from prompt_request= to message=, causing TypeError on multi-turn and crescendo attacks. 2. _generated_rai_client.py: Set aml-aca-token header from existing credential for agent-type seed requests when no client_id (ACA managed identity) is available. Enables local SDK testing of agent targets without ACA. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
23c612c to
2e5b2ba
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix list[Message] -> List[Message] type hint for Python 3.8 compat - Guard _fallback_response against None when retry kwargs are malformed - Add CHANGELOG entries for metric fix, PyRIT compat, endpoint normalization, and agent token fallback - Move _AZURE_OPENAI_HOST_SUFFIXES to module-level constant - Use _validate_attack_details shared helper in multi-turn/crescendo tests - Change agent token fallback log level from debug to warning Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix list[Message] -> List[Message] type hint for Python 3.8 compat - Guard _fallback_response against None when retry kwargs are malformed - Add CHANGELOG entries for metric fix, PyRIT compat, endpoint normalization, and agent token fallback - Move _AZURE_OPENAI_HOST_SUFFIXES to module-level constant - Use _validate_attack_details shared helper in multi-turn/crescendo tests - Change agent token fallback log level from debug to warning Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
79f30b9 to
24d0ff0
Compare
nagkumar91
left a comment
There was a problem hiding this comment.
Code Review
🟡 XPIA agent fallback catches all exceptions silently (Medium)
In _red_team.py, the XPIA prompt fetch for agent targets catches Exception and logs only at debug level:
except Exception as agent_error:
if target_type_str == "agent":
self.logger.debug(f"Agent-type XPIA prompt fetch failed ({agent_error}), falling back to model-type")This swallows ALL exceptions — network failures, bugs, timeouts, JSON parse errors — not just expected 404/auth errors. For agent targets, these errors vanish into debug logs making troubleshooting very difficult.
Fix: Catch specific expected exception types (e.g., HttpResponseError) and re-raise unexpected ones, or at minimum log at warning level.
🟡 Agent credential fallback — bare except at debug level (Medium)
In _generated_rai_client.py, the agent token fallback silently swallows all errors:
try:
token = self.token_manager.credential.get_token(TokenScope.DEFAULT_AZURE_MANAGEMENT.value).token
headers["aml-aca-token"] = token
except Exception:
self.logger.debug("Could not set aml-aca-token from existing credential", exc_info=True)If the credential is misconfigured or expired, execution continues without the token header. The subsequent service call will fail with a cryptic auth error that's very hard to trace back to this swallowed exception.
Fix: Log at warning level instead of debug.
🟡 _fallback_response returns [] on missing request (Medium)
In _rai_service_target.py, when neither message nor prompt_request is in retry kwargs:
request = retry_state.kwargs.get("message") or retry_state.kwargs.get("prompt_request")
if request is None:
logger.warning("_fallback_response: no 'message' or 'prompt_request' in retry kwargs")
return []This is the retry error callback after 5 attempts are exhausted. Returning [] breaks the List[Message] contract — callers accessing response[0] will get IndexError. Consider raising an exception instead since this represents an unrecoverable error state.
🟠 Import inside method in _rai_scorer.py (Low)
async def _score_piece_async(self, ...):
from azure.ai.evaluation._common.rai_service import (
_SYNC_TO_LEGACY_METRIC_NAMES, _LEGACY_TO_SYNC_METRIC_NAMES,
)This runs on every scoring call. Move to module level for clarity and minor perf improvement.
- Upgrade XPIA agent fallback log from debug to warning (_red_team.py) - Upgrade aml-aca-token credential fallback log from debug to warning (_generated_rai_client.py) - Raise RuntimeError instead of returning [] in _fallback_response (_rai_service_target.py) - Move metric name imports to module level (_rai_scorer.py) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nagkumar91
left a comment
There was a problem hiding this comment.
All four issues from previous review are addressed:
- XPIA agent fallback — log level raised from
debugtowarning✅ - Agent credential fallback — log level raised from
debugtowarning✅ _fallback_responseempty list — now raisesRuntimeErrorinstead of returning[]✅- Import inside method — moved to module-level in
_rai_scorer.py✅
LGTM.
* Add 15 Foundry red team E2E tests for full RAISvc contract coverage Tests cover: basic execution, XPIA, multiple risk categories, application scenarios, strategy combinations, model_config targets, agent callbacks, agent tool context, ProtectedMaterial/CodeVulnerability/TaskAdherence categories, SensitiveDataLeakage, agent-only risk rejection, multi-turn, and crescendo attacks. Also fixes PROXY_URL() TypeError in conftest.py (PROXY_URL is a str, not callable). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix PROXY_URL() call and apply black formatting - Revert PROXY_URL back to PROXY_URL() (it's a function, not a variable) - Apply black formatting to assert statements in test_red_team_foundry.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Windows encoding bug in tqdm output and use custom seeds for agent risk categories - Add _safe_tqdm_write() wrapper to handle UnicodeEncodeError on Windows cp1252 terminals - Replace all tqdm.write() calls with _safe_tqdm_write() in _red_team.py - Add custom seed prompt files for agent-only risk categories (task_adherence, sensitive_data_leakage, prohibited_actions) that lack server-side seed data - Update test_foundry_task_adherence_category and test_foundry_agent_sensitive_data_leakage to use custom_attack_seed_prompts, bypassing get_attack_objectives API - Apply black formatting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Re-record foundry E2E tests after merging upstream/main - Merge upstream/main (7 commits) into foundry-e2e-tests branch - Fix PROXY_URL() call in conftest.py (PROXY_URL is a string, not callable) - Re-record all 15 foundry red team E2E tests with updated source code Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix PROXY_URL handling for both callable and string variants In CI, devtools_testutils.config.PROXY_URL is a function that must be called. Locally (pip-installed), it's a string constant. Use callable() check to handle both environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix test_foundry_with_model_config_target recording playback failure Patch random.sample and random.choice to return deterministic (first-N) results for the model config target test. This ensures the same objectives are selected during both recording and playback, preventing test proxy 404 mismatches caused by non-deterministic objective selection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Azure OpenAI endpoint normalization for PyRIT 0.11+ compatibility Extend /openai/v1 path normalization to all Azure endpoint patterns (*.openai.azure.com, *.cognitiveservices.azure.com, sovereign clouds) not just Foundry endpoints. PyRIT 0.11+ uses AsyncOpenAI(base_url=) which appends /chat/completions directly, requiring the /openai/v1 prefix. Without this fix, model config targets using classic AOAI endpoints get 404 errors because PyRIT sends requests to the bare endpoint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix hate_unfairness metric name mismatch in RAI scorer RISK_CATEGORY_METRIC_MAP mapped HateUnfairness to HATE_FAIRNESS (legacy name), but the sync eval API returns results under hate_unfairness (canonical name). The scorer's result matching compared against the un-normalized hate_fairness, causing it to never match and silently fall back to score=0 — making ASR always 0% for hate_unfairness regardless of actual model behavior. Changes: - metric_mapping.py: Map HateUnfairness to HATE_UNFAIRNESS (canonical name). The routing layer in evaluate_with_rai_service_sync normalizes to the legacy name when use_legacy_endpoint=True, so both paths work. - _rai_scorer.py: Match results against both canonical and legacy aliases using _SYNC_TO_LEGACY_METRIC_NAMES, so future metric renames don't silently break scoring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update recording for model config target test * Update unit tests for Azure OpenAI endpoint normalization Tests now expect /openai/v1 suffix on all Azure endpoints, matching the updated get_chat_target() behavior needed for PyRIT 0.11+. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix agent seed auth for local SDK usage When target_type=agent and no client_id is provided (local execution, not ACA), fall back to the existing credential to set aml-aca-token header. Previously this header was only set via ACA managed identity, causing 'Authorization failed for seeds' when running agent-target red team scans locally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Revert "Fix agent seed auth for local SDK usage" This reverts commit abb47c4. * Fix send_prompt_async parameter name for PyRIT 0.11+ and agent seed auth Two fixes: 1. _rai_service_target.py: Accept both 'message' (PyRIT 0.11+) and 'prompt_request' (legacy) parameter names in send_prompt_async(). PyRIT 0.11 changed the interface from prompt_request= to message=, causing TypeError on multi-turn and crescendo attacks. 2. _generated_rai_client.py: Set aml-aca-token header from existing credential for agent-type seed requests when no client_id (ACA managed identity) is available. Enables local SDK testing of agent targets without ACA. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update recordings for foundry E2E tests * Update unit tests * Apply black formatting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR #45579 review feedback - Fix list[Message] -> List[Message] type hint for Python 3.8 compat - Guard _fallback_response against None when retry kwargs are malformed - Add CHANGELOG entries for metric fix, PyRIT compat, endpoint normalization, and agent token fallback - Move _AZURE_OPENAI_HOST_SUFFIXES to module-level constant - Use _validate_attack_details shared helper in multi-turn/crescendo tests - Change agent token fallback log level from debug to warning Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review: improve logging, error handling, and imports - Upgrade XPIA agent fallback log from debug to warning (_red_team.py) - Upgrade aml-aca-token credential fallback log from debug to warning (_generated_rai_client.py) - Raise RuntimeError instead of returning [] in _fallback_response (_rai_service_target.py) - Move metric name imports to module level (_rai_scorer.py) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines