fix(bench): findRunSeries/groupRunSeries respect matcher.scale by blove · Pull Request #158 · cacheplane/pretable

blove · 2026-06-05T23:55:22Z

Summary

findRunSeries and groupRunSeries in scripts/bench-matrix.mjs filtered by scenarioId + scriptName (+ adapterId) but silently ignored matcher.scale — even though evaluateH16/H17/H18 pass scale: "hypothesis". A runset that mixed scales for the same scenario+script would aggregate runs across scales into a single verdict.

Latent today (each bench-matrix invocation runs a single --scale, so runs only ever carry one scale), but a real correctness footgun the moment a multi-scale runset is fed to the evaluators — and the scale matcher field reads as if it filters when it doesn't.

Fix

Filter by scale when the matcher provides it — conditionally, since H1 / H6–H8 and the comparator matchers intentionally omit scale:

(matcher.scale === undefined || run.scale === matcher.scale)

Test

New regression test: two select-range-extend runs for the same scenario+script — a good hypothesis-scale run (10 ms) and a bad dev-scale run (120 ms). H16 must stay satisfied and report sampleCount === 1. Before the fix, the dev run aggregated in, dragging latency over the 16 ms budget and flipping the verdict to failing.

Gates

node --test scripts/__tests__/bench-matrix.test.mjs — 81 passed (incl. new guard)
prettier --check clean

🤖 Generated with Claude Code

The run-series matchers filtered by scenarioId + scriptName (+ adapterId) but silently ignored matcher.scale, even though H16/H17/H18 pass scale: "hypothesis". A runset mixing scales for the same scenario+script would aggregate runs across scales into one verdict. Latent today (each bench-matrix invocation is single -scale) but a real correctness footgun. Filter by scale when the matcher provides it (conditional, since H1/H6-H8 and the comparator matchers intentionally omit it). Adds a regression test: a bad dev-scale select-range-extend run no longer pollutes the hypothesis-scale H16 verdict. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-05T23:55:28Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
pretable	Ready	Preview, Comment	Jun 5, 2026 11:56pm

github-actions · 2026-06-05T23:59:32Z

Vercel preview ready

Preview: https://pretable-oh0cfy24r-cacheplane.vercel.app
Commit: d35696cdd3a070c2e462af60425c74c4e3d22c37

_{Updated automatically by the deploy-preview job.}

blove enabled auto-merge (squash) June 5, 2026 23:55

blove merged commit 6ebdd39 into main Jun 5, 2026
13 checks passed

blove deleted the bench-findrunseries-scale-filter branch June 5, 2026 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bench): findRunSeries/groupRunSeries respect matcher.scale#158

fix(bench): findRunSeries/groupRunSeries respect matcher.scale#158
blove merged 1 commit into
mainfrom
bench-findrunseries-scale-filter

blove commented Jun 5, 2026

Uh oh!

vercel Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blove commented Jun 5, 2026

Summary

Fix

Test

Gates

Uh oh!

vercel Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Vercel preview ready

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 5, 2026 •

edited

Loading