fix(core): derive score from assertions when score absent in code-grader by christso · Pull Request #1212 · EntityProcess/agentv

christso · 2026-05-04T03:37:49Z

Summary

When a code-grader script returns { assertions } without an explicit score, the harness now derives score as passing / total instead of defaulting to 0
Removed redundant manual score computations from 6 example scripts that were already computing passing / total before returning { score, assertions } — they now return { assertions } only

What changed

packages/core/src/evaluation/graders/code-grader.ts — Reordered so assertions is built first, then score is derived from them when parsed.score is absent.

6 example scripts simplified (drop redundant score field):

copilot-log-eval/graders/transcript-quality.ts
import-claude/graders/transcript-quality.ts
code-grader-sdk/scripts/verify-attachments.ts
execution-metrics/scripts/check-metrics-present.ts
workspace-artifact/scripts/check-csv-artifact.ts
file-changes-with-repos/scripts/check-file-changes.ts

Not simplified (intentional custom scores):

execution-metrics/check-efficiency.ts — rounds score and slices assertion list; derived score from slice would differ
trial-output-consistency — uses a custom floating-point similarity score

Test plan

3 new unit tests: assertions without score → derived as passing/total, all passing → score 1, all failing → score 0
All existing tests pass (2318 total)
Manual e2e: see results below

Closes #1211

…grader When a code-grader script returns `{ assertions }` without an explicit `score`, the harness now computes score as passing/total instead of defaulting to 0. Also removes redundant manual score computations from six example scripts that already had assertions covering the same logic. Closes #1211 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-05-04T03:38:33Z

Deploying agentv with Cloudflare Pages

Latest commit:	`87c61fa`
Status:	✅ Deploy successful!
Preview URL:	https://66859700.agentv.pages.dev
Branch Preview URL:	https://fix-1211-derive-score-from-a.agentv.pages.dev

View logs

… test Addresses code review feedback: - Drop redundant passing/total score computation from functional-check.ts, validate-sync.ts, keyword-check.ts, and length-check.ts — same pattern as the 6 scripts updated in the previous commit - Add test for `{"assertions":[]}` without score → score 0 (empty guard) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

christso and others added 2 commits May 4, 2026 05:30

style: fix biome formatting in code-grader-plain-text test

bc8af5d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

christso merged commit d33285c into main May 4, 2026
4 checks passed

christso deleted the fix/1211-derive-score-from-assertions branch May 4, 2026 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): derive score from assertions when score absent in code-grader#1212

fix(core): derive score from assertions when score absent in code-grader#1212
christso merged 3 commits intomainfrom
fix/1211-derive-score-from-assertions

christso commented May 4, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented May 4, 2026

Summary

What changed

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented May 4, 2026 •

edited

Loading