Skip to content

fix(eval): use percentages in evaluation summary#1084

Merged
christso merged 1 commit intomainfrom
fix/eval-summary-percentages
Apr 13, 2026
Merged

fix(eval): use percentages in evaluation summary#1084
christso merged 1 commit intomainfrom
fix/eval-summary-percentages

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

  • Changed evaluation summary to display scores as percentages (e.g., 75%) instead of decimals (e.g., 0.750), matching the per-test progress log format
  • Updated threshold display in verdict line (e.g., >= 80% instead of >= 0.8)
  • Updated histogram bin labels to use percentage ranges (e.g., 0%-20% instead of 0.0-0.2)

Before:

RESULT: FAIL  (2/5 scored >= 0.8, mean: 0.594)
Mean score: 0.594
Score distribution:
  0.0-0.2: 0
  0.2-0.4: 1

After:

RESULT: FAIL  (2/5 scored >= 80%, mean: 59%)
Mean score: 59%
Score distribution:
  0%-20%: 0
  20%-40%: 1

Test plan

  • All existing tests updated and passing (179 eval/compare tests, 1618 core tests, 67 eval package tests, 443 CLI tests)
  • Integration test updated for new format
  • Matrix summary test updated for new format

🤖 Generated with Claude Code

The evaluation summary displayed scores as decimals (e.g., "mean: 0.594")
while per-test progress logs used percentages (e.g., "59% PASS"). This
inconsistency made it harder to quickly interpret results. Now the summary
uses the same percentage format throughout: verdict line, score statistics,
histogram bins, top/bottom results, and matrix table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: ecd59ca
Status: ✅  Deploy successful!
Preview URL: https://cf267f68.agentv.pages.dev
Branch Preview URL: https://fix-eval-summary-percentages.agentv.pages.dev

View logs

@christso christso merged commit 9ea1a4a into main Apr 13, 2026
4 checks passed
@christso christso deleted the fix/eval-summary-percentages branch April 13, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant