fix(bench): correct H1 status and gate parity check on minimum repeats#125
Merged
Conversation
PR #124 (perf-diag rerun at n=20) showed the B2 H1 "failing" verdict was a low-sample artifact: pretable 9.07 ms ± 0.20 vs MUI 9.14 ms ± 0.19, mean diff −0.065 ms inside the 2σ noise floor of 0.40 ms. The original n=3 ratio of 1.115 was sample noise, not a real regression. Five targeted corrections: - Add status/milestones/2026-05-09-b2-h1-high-repeat-correction.json overlaying the original B2 evidence with the n=20 result and correctedH1.status = "satisfied". Original B2 milestone left intact. - Rewrite the apps/website/app/bench/page.tsx prose to a parity framing at high repeats. verdictFor now respects a parityAdapters set so the table doesn't crown a "fastest" off n=3 noise; H1 status reflects the corrected verdict. - Add a min-repeat gate to scripts/bench-matrix.mjs evaluateH1: when the pretable / best-full-grid frame-p95 ratio is in the tight zone (0.9 ≤ r ≤ 1.2) AND either adapter has < 10 repeats, return insufficient with guidance to re-run at --repeats=10+. Outside the tight zone the existing behavior is unchanged. New test covers the insufficient case; existing failing test rewritten to use a clearly out-of-zone ratio (1.6) so the failing path stays exercised. - Append a 2026-05-09 entry to docs/research/repo-memory.md overturning the H1 flip narrative, with the new evaluator gate documented. AG Grid (16.7 ms p95, 1 blank gap, 2 px row-height drift) and TanStack (16.7 ms p95, 1 blank gap) status from the B2 n=3 runset is not corrected here — both are >50% above pretable, well outside the noise zone. They remain ~1.7× pretable's scroll_frame_p95_ms with quality gaps that pretable does not have. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Vercel preview readyPreview: https://pretable-215yyjsfq-cacheplane.vercel.app Updated automatically by the |
4 tasks
blove
added a commit
that referenced
this pull request
May 9, 2026
…ound B2 evidence (#126) The B2 corrections PR (#125) confirmed pretable / MUI parity at n=20 and overturned the H1 flip narrative. Three homepage components still referenced the old gridalpha-stub "4× faster" claim with stub-era numbers. This PR brings them in line with the real B2 runset. ComparisonTable.tsx: - Drop the "4× faster scroll" header badge. - Replace the gridalpha / gridbeta / gridgammaX columns with real ag-grid / tanstack / mui columns; rename Row interface fields. - Replace scroll-row data with real B2 numbers (pretable 9.07, MUI 9.14, AG Grid 16.7, TanStack 16.7) and add row-height-fidelity, blank-gap, anchor-shift rows that surface the quality wedge. - Drop streaming rows (S5/updates) until follow-up #6 lands real- comparator S5 evidence; replace with headless-engine + streaming- pipeline rows that distinguish pretable's surface honestly. - Update trail-marker labels to fact-checkable characterizations: AG Grid "Slower scroll; row-height drift", TanStack "Headless; you wire selection and nav", MUI X "Parity at scroll p95; full-grid feature surface". - Rewrite the section subhead to a parity framing. ReceiptsBand.tsx: - Drop the "4×" hero stat. - Replace stats with the quality wedge: 0 blank gaps (accent), 9 ms frame p95, ≤1 px row-height fidelity, 25k/s max sustained update rate. The 25k/s figure is pretable's own from the May-1 streaming runset; comparative S5 evidence is still pending. FeatureGrid.tsx: - Replace "16ms p95 ... 4× faster than Grid Alpha Community" with a parity + quality-wedge description that names real comparators. Test updates: - ReceiptsBand.test.tsx asserts the new "0" + "9ms" hero stats. - ComparisonTable.test.tsx asserts the new fact-checkable trail-marker labels (regex-matched so prose tweaks don't break the tests). No source/package changes outside apps/website. All 190 website tests pass; 68/68 bench-matrix tests pass; pnpm -w lint / typecheck / format clean. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
blove
added a commit
that referenced
this pull request
May 9, 2026
* docs(specs): B2 follow-up #3 — autosize end-to-end wiring design End-to-end autosize harness wiring (pretable + ag-grid + mui; tanstack unsupported), with H22 comparator-parity hypothesis evaluator reusing the min-repeat gate from PR #125, and a full B2 matrix re-run with autosize included. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(plans): B2 follow-up #3 — autosize wiring implementation plan Six-task plan for wiring autosize through the bench harness end-to-end, adding evaluateH22 with the min-repeat gate, and re-running the B2 matrix with autosize included. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(bench-runner): accept autosize script through harness pipeline Adds "autosize" to the bench-runner supportedScripts allowlist (gated to S2 and to pretable | ag-grid | mui — tanstack remains unsupported per the B2 spec), to the apps/bench query-state parser, and to the BenchScriptName Extract narrow in bench-types. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(bench): measureBenchAutosizeRun helper for autosize script Adds a single-event autosize latency helper that awaits the adapter's autosize callback and one rAF, reporting interaction_latency_ms as "call-to-paint" timing. Mirrors the shape of measureBenchKeySequenceRun. Also unblocks the now-accepted "autosize" script in the query-state parser by retargeting the existing fallback-to-defaults test to an unrelated bogus value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(bench): wire onAutosizeReady on pretable/ag-grid/mui adapters Pretable, AG Grid, and MUI adapters now publish their autosize entry point through a new onAutosizeReady callback. bench-app.tsx captures it in autosizeApiRef and dispatches measureBenchAutosizeRun on the autosize script, mirroring the updateApiRef + measureBenchUpdatesRun chain. Replaces AG Grid's pre-emptive onGridReady autosize branch (which only ran at mount) with a callback so autosize fires on bench-script dispatch. MUI now exposes apiRef via useGridApiRef so the harness can call apiRef.current.autosizeColumns({ includeOutliers: true }) — async on v7+. TanStack accepts the prop for harness uniformity but the bench-runner returns "unsupported" before the adapter ever mounts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(bench-matrix): evaluateH22 autosize comparator-parity hypothesis Adds H22 ("pretable autosize is within a single 60Hz frame and within 10% of the best ag-grid/mui comparator on S2"). Reuses the H1 comparator-parity pattern: 16 ms single-frame floor, 10% parity band, ≥10 repeats per side before resolving a tight-zone (0.9–1.2) ratio. Hoists COMPARATOR_PARITY_MIN_REPEATS to module scope so H1 and H22 share a single source of truth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(bench): B2 matrix re-run with autosize; H22 evaluated S2/hypothesis/Chromium, all 13 scripts including autosize, repeats=3, ~5 min wall-clock. H22 satisfied: pretable autosize 5.3 ms vs MUI 11 ms (ratio 0.482, outside the tight zone — gate does not apply). H1 also flipped from failing → satisfied vs the 2026-05-08 milestone (parity at n=3 with mui this run; matches the n=20 correction documented in the previous repo-memory entry). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(format): prettier formatting for B2 follow-up #3 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #124 ran a high-repeat (n=20) re-measurement of
S2/hypothesis/scrollfor pretable + mui and found the B2 H1failingverdict was a low-sample artifact. Memo atdocs/research/2026-05-09-pretable-vs-mui-scroll-perf.mdrecommended raising the repeat protocol but didn't action the downstream cleanup. This PR closes that loop.What changed
status/milestones/2026-05-09-b2-h1-high-repeat-correction.json(new) — overlays the original B2 evidence with the n=20 result andcorrectedH1.status = "satisfied". The original B2 milestone is left intact for historical reference.apps/website/app/bench/page.tsx— loads both the n=3 milestone and the n=20 correction.verdictFornow respects aparityAdaptersset so adapters with parity verdicts getparity at n=20 (full quality pass)instead of being crowned. Prose rewritten: parity is the headline; the original snapshot is described as a low-sample artifact.scripts/bench-matrix.mjsevaluateH1— adds a minimum-repeat gate. When the pretable / best-full-grid frame-p95 ratio is in the tight zone (0.9 ≤ r ≤ 1.2) and either adapter has < 10 repeats, returnsinsufficientwith guidance to re-run at--repeats=10+. Outside the tight zone the existing path still fires.scripts/__tests__/bench-matrix.test.mjs— new test for the gatedinsufficientcase (using the actual B2 ratio of 1.115); existing failing test rewritten to use a clearly out-of-zone ratio (1.6) so the failing path stays exercised.docs/research/repo-memory.md— appends a 2026-05-09 entry overturning the H1 flip narrative.What's NOT changed
scroll_frame_p95_mswith quality gaps pretable does not have.apps/website,scripts/bench-matrix.mjs, status milestones, and docs.Test plan
🤖 Generated with Claude Code