Skip to content

Fix ASR scoring: use score-based threshold instead of passed field#46000

Merged
slister1001 merged 4 commits intoAzure:mainfrom
slister1001:fix/asr-passed-field-bug
Mar 31, 2026
Merged

Fix ASR scoring: use score-based threshold instead of passed field#46000
slister1001 merged 4 commits intoAzure:mainfrom
slister1001:fix/asr-passed-field-bug

Conversation

@slister1001
Copy link
Copy Markdown
Member

The sync eval API's 'passed' field indicates whether the eval task completed successfully, NOT whether the content passed a safety check. Using 'not passed' to determine attack success always returns False (since passed=True for successful evals), resulting in 0% ASR even for clearly harmful content (e.g. raw_score=5.0 for violence).

Fix: Replace passed-based logic with score-based threshold comparison matching _evaluation_processor.py. Use get_default_threshold_for_evaluator() (default=3 for 0-7 scale, 0 for binary) and compare raw_score > threshold.

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

The sync eval API's 'passed' field indicates whether the eval task
completed successfully, NOT whether the content passed a safety check.
Using 'not passed' to determine attack success always returns False
(since passed=True for successful evals), resulting in 0% ASR even
for clearly harmful content (e.g. raw_score=5.0 for violence).

Fix: Replace passed-based logic with score-based threshold comparison
matching _evaluation_processor.py. Use get_default_threshold_for_evaluator()
(default=3 for 0-7 scale, 0 for binary) and compare raw_score > threshold.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Mar 30, 2026
@slister1001 slister1001 marked this pull request as ready for review March 30, 2026 21:36
@slister1001 slister1001 requested a review from a team as a code owner March 30, 2026 21:36
Copilot AI review requested due to automatic review settings March 30, 2026 21:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Attack Success Rate (ASR) scoring for RAI sync evaluations by removing reliance on the sync eval API’s passed field (which indicates task completion, not safety pass/fail) and instead determining attack success via a score-vs-threshold comparison consistent with the red team evaluation processor.

Changes:

  • Update red team scoring logic to compute attack success as raw_score > get_default_threshold_for_evaluator(metric).
  • Remove passed from the red team response/score metadata and stop parsing threshold/passed from sync eval results.
  • Update unit tests to validate default-threshold behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_utils/_rai_service_eval_chat_target.py Switches attack-success logic to default-threshold score comparison and updates emitted metadata.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_rai_scorer.py Switches Foundry scorer attack-success logic to default-threshold score comparison; adds severity-label fallback.
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_rai_service_eval_chat_target.py Aligns expectations with default-threshold scoring and removed passed metadata.
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_foundry.py Aligns Foundry scorer tests with default-threshold scoring and removed passed/threshold fields.

…or test

- Fix get_harm_severity_level call to pass evaluator=metric_name_str so
  non-0-7-scale evaluators (e.g. task_adherence) get correct severity labels
- Add test_score_async_binary_evaluator_threshold covering binary evaluator
  threshold (task_adherence, threshold=0) to verify score>0 logic

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
slister1001 and others added 2 commits March 30, 2026 20:17
- Extract is_attack_successful() helper into _common/utils.py to avoid
  duplicating threshold comparison logic across _rai_scorer.py and
  _rai_service_eval_chat_target.py
- Fix get_harm_severity_level call in _rai_service_eval_chat_target.py
  to pass evaluator name for correct pattern-specific severity labels
- Add CHANGELOG entry for 1.16.3 describing the ASR scoring fix

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@slister1001 slister1001 enabled auto-merge (squash) March 31, 2026 01:04
@slister1001 slister1001 merged commit dc4b692 into Azure:main Mar 31, 2026
21 checks passed
slister1001 added a commit that referenced this pull request Apr 1, 2026
…46000)

* Fix ASR scoring: use score-based threshold instead of passed field

The sync eval API's 'passed' field indicates whether the eval task
completed successfully, NOT whether the content passed a safety check.
Using 'not passed' to determine attack success always returns False
(since passed=True for successful evals), resulting in 0% ASR even
for clearly harmful content (e.g. raw_score=5.0 for violence).

Fix: Replace passed-based logic with score-based threshold comparison
matching _evaluation_processor.py. Use get_default_threshold_for_evaluator()
(default=3 for 0-7 scale, 0 for binary) and compare raw_score > threshold.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Pass evaluator name to get_harm_severity_level and add binary evaluator test

- Fix get_harm_severity_level call to pass evaluator=metric_name_str so
  non-0-7-scale evaluators (e.g. task_adherence) get correct severity labels
- Add test_score_async_binary_evaluator_threshold covering binary evaluator
  threshold (task_adherence, threshold=0) to verify score>0 logic

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Deduplicate attack success logic, fix severity labels, add changelog

- Extract is_attack_successful() helper into _common/utils.py to avoid
  duplicating threshold comparison logic across _rai_scorer.py and
  _rai_service_eval_chat_target.py
- Fix get_harm_severity_level call in _rai_service_eval_chat_target.py
  to pass evaluator name for correct pattern-specific severity labels
- Add CHANGELOG entry for 1.16.3 describing the ASR scoring fix

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants