Revert `_evaluate.py` changes from #46436 on `mohessie/standardize_output_schema` by Copilot · Pull Request #46835 · Azure/azure-sdk-for-python

Copilot · 2026-05-11T22:55:01Z

This PR removes only the _evaluate.py portion of #46436 from mohessie/standardize_output_schema so those edits can be handled in a separate PR. The target file is restored to exactly match main.

Scope
- Reverts only:
  - sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
- No other files are changed.
What was reverted (inverse of Standardize Output Schema for Evalautors #46436 in this file)
- Reintroduced EVALUATION_PASS_FAIL_MAPPING import.
- Switched binary aggregation back from *_passed columns to *_result columns.
- Restored pass counting logic to:
  - value_counts.get(EVALUATION_PASS_FAIL_MAPPING[True], 0)
- Restored _aggregate_metrics exclusion logic from *_passed back to *_result.
- Restored related wording/comments ("pass" vs passed) to match main.
Result
- _evaluate.py now aligns byte-for-byte with main, while keeping PR scope limited to the requested split.

# restored aggregation behavior
result_columns = [col for col in df.columns if col.startswith("outputs.") and col.endswith("_result")]
pass_count = value_counts.get(EVALUATION_PASS_FAIL_MAPPING[True], 0)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

centralus-2.in.applicationinsights.azure.com
- Triggering command: /home/REDACTED/bin/azsdk /home/REDACTED/bin/azsdk ingest-telemetry --client-type copilot-cli --event-type skill_invocation --session-id 8462065c-c6cf-473a-9421-84eaf0a44b5b --skill-name find-package-skill (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

This PR reverts the changes made to sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py in PR #46436 (#46436), so that those changes can be split into a separate PR. When this PR is merged into the mohessie/standardize_output_schema branch, the _evaluate.py portion of PR #46436 will be undone.

Goal

On a new branch created off mohessie/standardize_output_schema, restore sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py to exactly match its content on the main branch (i.e., revert the _evaluate.py edits introduced by PR #46436), then open a PR targeting mohessie/standardize_output_schema.

Do not modify any other file in the repository. Do not run formatters, linters, or any tooling that may alter other lines. Do not touch the changelog, version files, or any other file changed in PR #46436.

Exact diff to apply

Apply this unified diff verbatim to sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py on the mohessie/standardize_output_schema branch. This is the inverse of the diff applied in PR #46436:

@@ -26,6 +26,7 @@
 
 from .._constants import (
     CONTENT_SAFETY_DEFECT_RATE_THRESHOLD_DEFAULT,
+    EVALUATION_PASS_FAIL_MAPPING,
     EvaluationMetrics,
     DefaultOpenEncoding,
     Prefixes,
@@ -250,7 +251,7 @@ def _aggregation_binary_output(df: pd.DataFrame) -> Dict[str, float]:
     """
     Aggregate binary output results (pass/fail) from evaluation dataframe.
 
-    For each evaluator, calculates the proportion of passed results.
+    For each evaluator, calculates the proportion of "pass" results.
 
     :param df: The dataframe of evaluation results.
     :type df: ~pandas.DataFrame
@@ -259,12 +260,12 @@ def _aggregation_binary_output(df: pd.DataFrame) -> Dict[str, float]:
     """
     results = {}
 
-    # Find all columns that end with "_passed"
-    passed_columns = [col for col in df.columns if col.startswith("outputs.") and col.endswith("_passed")]
+    # Find all columns that end with "_result"
+    result_columns = [col for col in df.columns if col.startswith("outputs.") and col.endswith("_result")]
 
-    for col in passed_columns:
+    for col in result_columns:
         # Extract the evaluator name from the column name
-        # (outputs.<evaluator>.<metric>_passed)
+        # (outputs.<evaluator>.<metric>_result)
         parts = col.split(".")
         evaluator_name = None
         if len(parts) >= 3:
@@ -291,7 +292,7 @@ def _aggregation_binary_output(df: pd.DataFrame) -> Dict[str, float]:
 
             # Calculate the proportion of EVALUATION_PASS_FAIL_MAPPING[True] results
             total_rows = len(df)
-            pass_count = df[col].sum()
+            pass_count = value_counts.get(EVALUATION_PASS_FAIL_MAPPING[True], 0)
             proportion = pass_count / total_rows if total_rows > 0 else 0.0
 
             # Set the result with the evaluator name as the key
@@ -383,10 +384,10 @@ def _aggregate_metrics(df: pd.DataFrame, evaluators: Dict[str, Callable]) -> Dic
     token_count_cols = _get_token_count_columns_to_exclude(df)
     handled_columns.extend(token_count_cols)
 
-    # Exclude threshold and passed columns from aggregation
+    # Exclude threshold and result columns from aggregation
     # These are per-row metadata, not metrics to be averaged
-    threshold_and_passed_cols = [col for col in df.columns if col.endswith("_threshold") or col.endswith("_passed")]
-    handled_columns.extend(threshold_and_passed_cols)
+    threshold_and_result_cols = [col for col in df.columns if col.endswith("_threshold") or col.endswith("_result")]
+    handled_columns.extend(threshold_and_result_cols)
 
     # For rest of metrics, we will calculate mean
     df.drop(columns=handled_columns, inplace=True)

Authoritative reference (simplest approach)

The fastest and safest way to do this is to copy the file from main and overwrite the file on the new branch. The content after revert must be byte-for-byte equal to the version of the file on main:

Source (copy this content as-is): https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Target file path on the new branch: sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

After the commit, git diff origin/main -- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py must produce no output (the file matches main exactly).

Branch and PR requirements

Create a new branch off mohessie/standardize_output_schema (suggested name: m7md7sien/revert-evaluate-py-in-46436).
The PR must target the mohessie/standardize_output_schema branch (NOT main). This is critical — the goal is to remove the _evaluate.py changes from PR Standardize Output Schema for Evalautors #46436, so the PR base must be that PR's head branch.
Commit only the change...

This pull request was created from Copilot chat.

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/8462065c-c6cf-473a-9421-84eaf0a44b5b Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Initial plan

4f37c4d

Copilot AI assigned Copilot and m7md7sien May 11, 2026

Copilot started work on behalf of m7md7sien May 11, 2026 22:55 View session

Revert _evaluate.py changes from PR 46436 by restoring file from main

ebb8f9a

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/8462065c-c6cf-473a-9421-84eaf0a44b5b Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Revert changes to _evaluate.py from PR #46436~~ Revert _evaluate.py changes from #46436 on mohessie/standardize_output_schema May 11, 2026

Copilot finished work on behalf of m7md7sien May 11, 2026 23:02

Copilot AI requested a review from m7md7sien May 11, 2026 23:02

m7md7sien approved these changes May 11, 2026

View reviewed changes

m7md7sien marked this pull request as ready for review May 11, 2026 23:02

m7md7sien requested a review from a team as a code owner May 11, 2026 23:02

m7md7sien merged commit 36e0cbb into mohessie/standardize_output_schema May 11, 2026
2 checks passed

m7md7sien deleted the copilot/revert-evaluate-py-changes branch May 11, 2026 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert `_evaluate.py` changes from #46436 on `mohessie/standardize_output_schema`#46835

Revert `_evaluate.py` changes from #46436 on `mohessie/standardize_output_schema`#46835
m7md7sien merged 2 commits into
mohessie/standardize_output_schemafrom
copilot/revert-evaluate-py-changes

Copilot AI commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Goal

Exact diff to apply

Authoritative reference (simplest approach)

Branch and PR requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 11, 2026 •

edited

Loading