Conversation
…efault 3.0 Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/e376f26a-4cd6-44a9-b271-81eb2b6e86d9 Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes incorrect default thresholding for the TaskNavigationEfficiencyEvaluator’s binary *_score output, ensuring EvaluatorBase._real_call computes consistent *_result/*_threshold values.
Changes:
- Set
EvaluatorBasethreshold to1.0fortask_navigation_efficiency_scoreby passingthreshold=1.0tosuper().__init__(). - Make the evaluator’s explicit
task_navigation_efficiency_thresholdoutput consistent with the base-class threshold (1.0).
…tors/_task_navigation_efficiency/_task_navigation_efficiency.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Hi @copilot. We've noticed that new changes have been pushed to this pull request. Because it is set to automatically merge, we've reset the approvals to allow the opportunity to review the updates. |
Hi @m7md7sien. We've noticed that new changes have been pushed to this pull request. Because it is set to automatically merge, we've reset the approvals to allow the opportunity to review the updates.
…rtions Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/28756174-3e26-4ea2-849c-9d5c0a28d6c3 Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
Hi @copilot. We've noticed that new changes have been pushed to this pull request. Because it is set to automatically merge, we've reset the approvals to allow the opportunity to review the updates. |
Hi @aprilk-ms. We've noticed that new changes have been pushed to this pull request. Because it is set to automatically merge, we've reset the approvals to allow the opportunity to review the updates.
… binary metric (Azure#46542) * Initial plan * Fix TaskNavigationEfficiencyEvaluator threshold: use 1.0 instead of default 3.0 Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/e376f26a-4cd6-44a9-b271-81eb2b6e86d9 Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_navigation_efficiency/_task_navigation_efficiency.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Use self._threshold in return dict and add result/threshold test assertions Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/28756174-3e26-4ea2-849c-9d5c0a28d6c3 Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> Co-authored-by: Mohamed Hessien <mohessie@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
TaskNavigationEfficiencyEvaluatorreturns a_scorekey, which triggersEvaluatorBase._real_callto auto-compute_result/_threshold. Becausesuper().__init__()was called without a threshold, it defaulted to3.0, causing a score of1.0to always be evaluated as fail.Changes
__init__: Passthreshold=1.0tosuper().__init__()soEvaluatorBasecorrectly classifies this binary (0.0/1.0) metric as passing whenscore >= 1.0_do_evalreturn dict: Usefloat(self._threshold)instead of a hard-coded value so the returnedtask_navigation_efficiency_thresholdalways stays in sync with the base class threshold and cannot drift if the threshold changestest_exact_match_scenariofortask_navigation_efficiency_result == "pass"andtask_navigation_efficiency_threshold == 1.0to prevent regressions of the original issueOriginal prompt
This pull request was created from Copilot chat.