Skip to content

Fix inverse metric adjustment to skip string labels from code-based evaluators#46663

Merged
imatiach-msft merged 1 commit intoAzure:mainfrom
imatiach-msft:fix/inverse-metric-string-labels
May 1, 2026
Merged

Fix inverse metric adjustment to skip string labels from code-based evaluators#46663
imatiach-msft merged 1 commit intoAzure:mainfrom
imatiach-msft:fix/inverse-metric-string-labels

Conversation

@imatiach-msft
Copy link
Copy Markdown
Contributor

@imatiach-msft imatiach-msft commented May 1, 2026

Fix inverse metric adjustment to skip string labels from code-based evaluators

Bug

Bug #5240742 - deflection_rate evaluator shows incorrect pass/fail labels in AppInsights (score=1.0 labeled "pass" instead of "fail").

Root Cause

_create_result_object() calls _adjust_for_inverse_metric(label) for all inverse (decrease+boolean) metrics. Code-based evaluators like deflection_rate return string labels ("pass"/"fail") that already reflect direction-aware semantics. But _adjust_for_inverse_metric only handles bool - it treats any non-bool (including strings) as False, mapping everything to "pass".

Fix

Skip _adjust_for_inverse_metric entirely when the label is already a string, since string labels mean the evaluator already computed the correct direction-aware pass/fail.

Before (buggy):

if is_inverse:
    score, label, passed = _adjust_for_inverse_metric(label)

After (fix):

if is_inverse and not (label is not None and isinstance(label, str)):
    score, label, passed = _adjust_for_inverse_metric(label)

Boolean labels (from safety evaluators like indirect_attack, code_vulnerability) continue to be inverted as before.

Tests

  • TestAdjustForInverseMetric (3 tests): boolean True/False and None handling
  • TestIsInverseMetric (4 tests): hardcoded, configured, non-inverse, deflection_rate
  • TestCreateResultObjectInverseMetric (4 tests): integration tests verifying string labels preserved, boolean labels adjusted, non-inverse unmodified

Affected Evaluators

Only deflection_rate is affected - it is the only evaluator that is both code-based (string labels) AND decrease+boolean (triggers inverse path). All other inverse metrics return boolean labels and are unaffected.

Copilot AI review requested due to automatic review settings May 1, 2026 04:56
@imatiach-msft imatiach-msft requested a review from a team as a code owner May 1, 2026 04:56
@github-actions github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label May 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes inverse-metric adjustment logic so evaluators that emit string pass/fail labels (not booleans) don’t get their label incorrectly overwritten when is_inverse=True, addressing incorrect label="pass" reporting for deflection-rate-like metrics.

Changes:

  • Extend _adjust_for_inverse_metric to recognize string "pass" / "fail" labels (case/whitespace-insensitive).
  • Update _adjust_for_inverse_metric docstring to document string-label behavior and examples.

Comment thread sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py Outdated
Comment thread sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py Outdated
@imatiach-msft imatiach-msft force-pushed the fix/inverse-metric-string-labels branch 4 times, most recently from 5a574e6 to 740e8c2 Compare May 1, 2026 16:14
…valuators

Code-based evaluators like deflection_rate return string pass/fail labels
that already reflect direction-aware semantics. The inverse metric
adjustment was incorrectly treating these strings as boolean False
(since isinstance('fail', bool) is False), flipping 'fail' to 'pass'.

Fix: skip _adjust_for_inverse_metric entirely when the label is a string,
since string labels mean the evaluator already computed the correct
direction-aware pass/fail. Boolean labels (from safety evaluators) still
get inverted as before.

Fixes Bug #5240742

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@imatiach-msft imatiach-msft force-pushed the fix/inverse-metric-string-labels branch from 740e8c2 to fe204e7 Compare May 1, 2026 16:15
@imatiach-msft imatiach-msft changed the title Fix _adjust_for_inverse_metric to handle string pass/fail labels Fix inverse metric adjustment to skip string labels from code-based evaluators May 1, 2026
Copy link
Copy Markdown
Contributor

@YoYoJa YoYoJa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to have consistent type of value for evaluator in the future?

@imatiach-msft imatiach-msft merged commit a4541c2 into Azure:main May 1, 2026
19 checks passed
imatiach-msft added a commit that referenced this pull request May 6, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
imatiach-msft added a commit that referenced this pull request May 6, 2026
…46763)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants