Normalize parameter types in TaskNavigationEfficiency comparison by Copilot · Pull Request #46227 · Azure/azure-sdk-for-python

Copilot · 2026-04-09T17:19:22Z

Description

Port of fix from Azure/azureml-assets#4901 to this repository.

Problem

When comparing agent tool call parameters against ground truth in TaskNavigationEfficiencyEvaluator, parameter type mismatches caused false comparison failures. For example, an agent returning {"count": 1} (int) would not match a ground truth of {"count": "1"} (str), even though they represent the same value. Similarly, dict/list values were converted to non-canonical string representations that didn't match JSON-formatted strings.

Fix

Added _normalize_param_value static method to _TaskNavigationEfficiencyEvaluator — normalizes parameter values to strings using json.dumps (with sort_keys=True) for dicts/lists, and str() for other types (int, float, bool).
Updated _prepare_steps_for_comparison — applies normalization on both agent and ground truth parameter values before comparison.
Updated _extract_tool_names_and_params_from_response in the base evaluator — removed premature str(v) conversion so raw values are preserved and normalization happens at comparison time instead.

Files Changed

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_navigation_efficiency/_task_navigation_efficiency.py
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_task_navigation_efficiency_evaluators.py (13 new test cases)

Testing

13 new unit tests covering: int vs int, int vs str, str vs int, bool vs bool, bool vs str, dict vs dict, dict vs JSON string, JSON string vs dict, list vs list, list vs JSON string, stringified args vs dict, float vs float, float vs str
All normalization logic verified via standalone tests (full pytest suite requires network-dependent dependencies not available in sandbox)

Port fix from Azure/azureml-assets#4901. Adds _normalize_param_value static method for consistent string comparison of parameter values (int, float, bool, dict, list) between agent and ground truth. Updates _extract_tool_names_and_params_from_response to preserve original value types instead of premature str() conversion. Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/4888d4d1-bd21-46b6-a733-231b3ffefddd Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

…ine-length (#46232) Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/1c88d810-2e80-47a9-ad09-40adb6529219 Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot

Pull request overview

This PR ports a fix to reduce false negatives when comparing tool-call parameters against ground truth in TaskNavigationEfficiencyEvaluator by normalizing parameter values before comparison.

Changes:

Added _normalize_param_value to stringify/JSON-serialize parameter values for consistent comparisons.
Updated _prepare_steps_for_comparison to normalize both agent and ground truth parameters before matching.
Updated _extract_tool_names_and_params_from_response to preserve raw argument values (avoid premature str() conversion) and added unit tests for type normalization scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_navigation_efficiency/_task_navigation_efficiency.py	Adds parameter normalization and applies it during step preparation for matching.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py	Preserves raw tool argument values so normalization can happen at comparison time.
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_task_navigation_efficiency_evaluators.py	Adds unit tests to validate parameter type normalization behavior.

+            return value
+        if isinstance(value, (dict, list)):
+            try:
+                return json.dumps(value, sort_keys=True)


+            ),
+        )
+        assert result["task_navigation_efficiency_result"] == "pass"
+


…re#46227) * Normalize parameter types in TaskNavigationEfficiency comparison Port fix from Azure/azureml-assets#4901. Adds _normalize_param_value static method for consistent string comparison of parameter values (int, float, bool, dict, list) between agent and ground truth. Updates _extract_tool_names_and_params_from_response to preserve original value types instead of premature str() conversion. Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/4888d4d1-bd21-46b6-a733-231b3ffefddd Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> * Fix black formatting: collapse expressions that fit within 120 char line-length (Azure#46232) Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/1c88d810-2e80-47a9-ad09-40adb6529219 Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> * Fix black issue --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> Co-authored-by: mohessie <mohessie@microsoft.com>

Copilot AI assigned Copilot and m7md7sien Apr 9, 2026

Copilot created this pull request from a session on behalf of m7md7sien April 9, 2026 17:21 View session

Copilot AI requested a review from m7md7sien April 9, 2026 17:21

Copilot finished work on behalf of m7md7sien April 9, 2026 17:21

Copilot started work on behalf of m7md7sien April 9, 2026 18:53 View session

github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Apr 9, 2026

Copilot finished work on behalf of m7md7sien April 9, 2026 18:56

Copilot started work on behalf of m7md7sien April 9, 2026 19:28 View session

Copilot finished work on behalf of m7md7sien April 9, 2026 19:32

Copilot AI and others added 2 commits April 9, 2026 22:40

Fix black issue

c0b522c

m7md7sien approved these changes Apr 9, 2026

View reviewed changes

m7md7sien added 6 commits April 10, 2026 00:15

Merge branch 'main' into copilot/duplicate-fix-from-azureml-assets

dbf773e

Merge branch 'main' into copilot/duplicate-fix-from-azureml-assets

ad2dd02

Merge branch 'main' into copilot/duplicate-fix-from-azureml-assets

49bacb4

Merge branch 'main' into copilot/duplicate-fix-from-azureml-assets

bf0a13a

Merge branch 'main' into copilot/duplicate-fix-from-azureml-assets

fa93b61

Merge branch 'main' into copilot/duplicate-fix-from-azureml-assets

1a5ac5e

m7md7sien approved these changes May 13, 2026

View reviewed changes

Merge branch 'main' into copilot/duplicate-fix-from-azureml-assets

f557192

YoYoJa approved these changes May 15, 2026

View reviewed changes

m7md7sien marked this pull request as ready for review May 15, 2026 03:03

Copilot AI review requested due to automatic review settings May 15, 2026 03:03

m7md7sien requested a review from a team as a code owner May 15, 2026 03:03

m7md7sien merged commit f502345 into main May 15, 2026
22 checks passed

m7md7sien deleted the copilot/duplicate-fix-from-azureml-assets branch May 15, 2026 03:03

Copilot started reviewing on behalf of m7md7sien May 15, 2026 03:06 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize parameter types in TaskNavigationEfficiency comparison#46227

Normalize parameter types in TaskNavigationEfficiency comparison#46227
m7md7sien merged 10 commits into
mainfrom
copilot/duplicate-fix-from-azureml-assets

Copilot AI commented Apr 9, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Apr 9, 2026

Description

Problem

Fix

Files Changed

Testing

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants