fix(bench): correct H1 status and gate parity check on minimum repeats by blove · Pull Request #125 · cacheplane/pretable

blove · 2026-05-09T03:07:02Z

Summary

PR #124 ran a high-repeat (n=20) re-measurement of S2/hypothesis/scroll for pretable + mui and found the B2 H1 failing verdict was a low-sample artifact. Memo at docs/research/2026-05-09-pretable-vs-mui-scroll-perf.md recommended raising the repeat protocol but didn't action the downstream cleanup. This PR closes that loop.

What changed

status/milestones/2026-05-09-b2-h1-high-repeat-correction.json (new) — overlays the original B2 evidence with the n=20 result and correctedH1.status = "satisfied". The original B2 milestone is left intact for historical reference.
apps/website/app/bench/page.tsx — loads both the n=3 milestone and the n=20 correction. verdictFor now respects a parityAdapters set so adapters with parity verdicts get parity at n=20 (full quality pass) instead of being crowned. Prose rewritten: parity is the headline; the original snapshot is described as a low-sample artifact.
scripts/bench-matrix.mjs evaluateH1 — adds a minimum-repeat gate. When the pretable / best-full-grid frame-p95 ratio is in the tight zone (0.9 ≤ r ≤ 1.2) and either adapter has < 10 repeats, returns insufficient with guidance to re-run at --repeats=10+. Outside the tight zone the existing path still fires.
scripts/__tests__/bench-matrix.test.mjs — new test for the gated insufficient case (using the actual B2 ratio of 1.115); existing failing test rewritten to use a clearly out-of-zone ratio (1.6) so the failing path stays exercised.
docs/research/repo-memory.md — appends a 2026-05-09 entry overturning the H1 flip narrative.

What's NOT changed

AG Grid (16.7 ms p95, 1 blank gap, 2 px row-height drift) and TanStack (16.7 ms p95, 1 blank gap) status from the B2 n=3 runset is not corrected. Both are >50% above pretable, well outside the noise zone. They remain ~1.7× pretable's scroll_frame_p95_ms with quality gaps pretable does not have.
No public-package source changes. Affects only apps/website, scripts/bench-matrix.mjs, status milestones, and docs.

Test plan

`pnpm -w typecheck` passes
`pnpm -w test` passes (190 tests)
`node --test scripts/tests/bench-matrix.test.mjs` 68/68 pass (added 1 new, modified 1 existing)
`pnpm -w lint` 0 errors
`pnpm format` clean

🤖 Generated with Claude Code

PR #124 (perf-diag rerun at n=20) showed the B2 H1 "failing" verdict was a low-sample artifact: pretable 9.07 ms ± 0.20 vs MUI 9.14 ms ± 0.19, mean diff −0.065 ms inside the 2σ noise floor of 0.40 ms. The original n=3 ratio of 1.115 was sample noise, not a real regression. Five targeted corrections: - Add status/milestones/2026-05-09-b2-h1-high-repeat-correction.json overlaying the original B2 evidence with the n=20 result and correctedH1.status = "satisfied". Original B2 milestone left intact. - Rewrite the apps/website/app/bench/page.tsx prose to a parity framing at high repeats. verdictFor now respects a parityAdapters set so the table doesn't crown a "fastest" off n=3 noise; H1 status reflects the corrected verdict. - Add a min-repeat gate to scripts/bench-matrix.mjs evaluateH1: when the pretable / best-full-grid frame-p95 ratio is in the tight zone (0.9 ≤ r ≤ 1.2) AND either adapter has < 10 repeats, return insufficient with guidance to re-run at --repeats=10+. Outside the tight zone the existing behavior is unchanged. New test covers the insufficient case; existing failing test rewritten to use a clearly out-of-zone ratio (1.6) so the failing path stays exercised. - Append a 2026-05-09 entry to docs/research/repo-memory.md overturning the H1 flip narrative, with the new evaluator gate documented. AG Grid (16.7 ms p95, 1 blank gap, 2 px row-height drift) and TanStack (16.7 ms p95, 1 blank gap) status from the B2 n=3 runset is not corrected here — both are >50% above pretable, well outside the noise zone. They remain ~1.7× pretable's scroll_frame_p95_ms with quality gaps that pretable does not have. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-09T03:07:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
pretable	Ready	Preview, Comment	May 9, 2026 3:08am

github-actions · 2026-05-09T03:11:22Z

Vercel preview ready

Preview: https://pretable-215yyjsfq-cacheplane.vercel.app
Commit: 73348c492bde6221f067ffd51b097a7f61d8231c

_{Updated automatically by the deploy-preview job.}

…ound B2 evidence (#126) The B2 corrections PR (#125) confirmed pretable / MUI parity at n=20 and overturned the H1 flip narrative. Three homepage components still referenced the old gridalpha-stub "4× faster" claim with stub-era numbers. This PR brings them in line with the real B2 runset. ComparisonTable.tsx: - Drop the "4× faster scroll" header badge. - Replace the gridalpha / gridbeta / gridgammaX columns with real ag-grid / tanstack / mui columns; rename Row interface fields. - Replace scroll-row data with real B2 numbers (pretable 9.07, MUI 9.14, AG Grid 16.7, TanStack 16.7) and add row-height-fidelity, blank-gap, anchor-shift rows that surface the quality wedge. - Drop streaming rows (S5/updates) until follow-up #6 lands real- comparator S5 evidence; replace with headless-engine + streaming- pipeline rows that distinguish pretable's surface honestly. - Update trail-marker labels to fact-checkable characterizations: AG Grid "Slower scroll; row-height drift", TanStack "Headless; you wire selection and nav", MUI X "Parity at scroll p95; full-grid feature surface". - Rewrite the section subhead to a parity framing. ReceiptsBand.tsx: - Drop the "4×" hero stat. - Replace stats with the quality wedge: 0 blank gaps (accent), 9 ms frame p95, ≤1 px row-height fidelity, 25k/s max sustained update rate. The 25k/s figure is pretable's own from the May-1 streaming runset; comparative S5 evidence is still pending. FeatureGrid.tsx: - Replace "16ms p95 ... 4× faster than Grid Alpha Community" with a parity + quality-wedge description that names real comparators. Test updates: - ReceiptsBand.test.tsx asserts the new "0" + "9ms" hero stats. - ComparisonTable.test.tsx asserts the new fact-checkable trail-marker labels (regex-matched so prose tweaks don't break the tests). No source/package changes outside apps/website. All 190 website tests pass; 68/68 bench-matrix tests pass; pnpm -w lint / typecheck / format clean. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* docs(specs): B2 follow-up #3 — autosize end-to-end wiring design End-to-end autosize harness wiring (pretable + ag-grid + mui; tanstack unsupported), with H22 comparator-parity hypothesis evaluator reusing the min-repeat gate from PR #125, and a full B2 matrix re-run with autosize included. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(plans): B2 follow-up #3 — autosize wiring implementation plan Six-task plan for wiring autosize through the bench harness end-to-end, adding evaluateH22 with the min-repeat gate, and re-running the B2 matrix with autosize included. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(bench-runner): accept autosize script through harness pipeline Adds "autosize" to the bench-runner supportedScripts allowlist (gated to S2 and to pretable | ag-grid | mui — tanstack remains unsupported per the B2 spec), to the apps/bench query-state parser, and to the BenchScriptName Extract narrow in bench-types. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(bench): measureBenchAutosizeRun helper for autosize script Adds a single-event autosize latency helper that awaits the adapter's autosize callback and one rAF, reporting interaction_latency_ms as "call-to-paint" timing. Mirrors the shape of measureBenchKeySequenceRun. Also unblocks the now-accepted "autosize" script in the query-state parser by retargeting the existing fallback-to-defaults test to an unrelated bogus value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(bench): wire onAutosizeReady on pretable/ag-grid/mui adapters Pretable, AG Grid, and MUI adapters now publish their autosize entry point through a new onAutosizeReady callback. bench-app.tsx captures it in autosizeApiRef and dispatches measureBenchAutosizeRun on the autosize script, mirroring the updateApiRef + measureBenchUpdatesRun chain. Replaces AG Grid's pre-emptive onGridReady autosize branch (which only ran at mount) with a callback so autosize fires on bench-script dispatch. MUI now exposes apiRef via useGridApiRef so the harness can call apiRef.current.autosizeColumns({ includeOutliers: true }) — async on v7+. TanStack accepts the prop for harness uniformity but the bench-runner returns "unsupported" before the adapter ever mounts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(bench-matrix): evaluateH22 autosize comparator-parity hypothesis Adds H22 ("pretable autosize is within a single 60Hz frame and within 10% of the best ag-grid/mui comparator on S2"). Reuses the H1 comparator-parity pattern: 16 ms single-frame floor, 10% parity band, ≥10 repeats per side before resolving a tight-zone (0.9–1.2) ratio. Hoists COMPARATOR_PARITY_MIN_REPEATS to module scope so H1 and H22 share a single source of truth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(bench): B2 matrix re-run with autosize; H22 evaluated S2/hypothesis/Chromium, all 13 scripts including autosize, repeats=3, ~5 min wall-clock. H22 satisfied: pretable autosize 5.3 ms vs MUI 11 ms (ratio 0.482, outside the tight zone — gate does not apply). H1 also flipped from failing → satisfied vs the 2026-05-08 milestone (parity at n=3 with mui this run; matches the n=20 correction documented in the previous repo-memory entry). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(format): prettier formatting for B2 follow-up #3 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

blove enabled auto-merge (squash) May 9, 2026 03:07

blove merged commit e3811cf into main May 9, 2026
13 checks passed

blove deleted the b2-followup-1-corrections branch May 9, 2026 03:09

blove mentioned this pull request May 9, 2026

fix(website): retire stub-era "4× faster" claims; reframe homepage around B2 evidence #126

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bench): correct H1 status and gate parity check on minimum repeats#125

fix(bench): correct H1 status and gate parity check on minimum repeats#125
blove merged 1 commit into
mainfrom
b2-followup-1-corrections

blove commented May 9, 2026

Uh oh!

vercel Bot commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blove commented May 9, 2026

Summary

What changed

What's NOT changed

Test plan

Uh oh!

vercel Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

Vercel preview ready

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 9, 2026 •

edited

Loading