Skip to content

docs(research): pretable vs MUI scroll perf diagnostic#124

Merged
blove merged 3 commits into
mainfrom
b2-followup-1-perf-diag
May 9, 2026
Merged

docs(research): pretable vs MUI scroll perf diagnostic#124
blove merged 3 commits into
mainfrom
b2-followup-1-perf-diag

Conversation

@blove
Copy link
Copy Markdown
Contributor

@blove blove commented May 9, 2026

Summary

B2 follow-up #1: diagnosed the 1 ms scroll-frame-p95 gap between pretable and MUI X DataGrid Community on S2/hypothesis.

  • Phase A ran pnpm bench:matrix at 20 repeats for pretable + mui only.
  • Aggregated stats are in status/milestones/2026-05-09-perf-diag-high-repeat.scroll.json.
  • Phase B trace capture was skipped because Phase A's verdict was noise.
  • Phase C wrote the research memo at docs/research/2026-05-09-pretable-vs-mui-scroll-perf.md.

Verdict

noise

Pretable averaged 9.07 ms; MUI averaged 9.14 ms. The mean gap was -0.065 ms (pretable - mui) against a 2σ noise floor of 0.401 ms, so the original B2 n=3 MUI advantage did not survive high-repeat measurement.

What's NOT in this PR

  • Code fixes to pretable's scroll path.
  • Other scenarios (S5 streaming, S7 filter-metadata).
  • Other browsers (WebKit, Firefox).
  • Updates to H1's evaluator threshold.

Test plan

  • pnpm --filter @pretable/app-bench build
  • pnpm bench:matrix --project=chromium --adapters=pretable,mui --scenarios=S2 --scripts=scroll --scale=hypothesis --repeats=20
  • pnpm -w typecheck
  • pnpm -w test
  • pnpm -w lint
  • pnpm format
  • Spec compliance subagent review approved

blove and others added 3 commits May 8, 2026 19:11
Phase C of B2 follow-up #1. Verdict: gap is noise. The high-repeat S2/hypothesis/scroll rerun shows no meaningful MUI advantage and recommends tightening H1-sensitive repeat protocol instead of scoping a perf-fix PR.

Spec: docs/superpowers/specs/2026-05-09-b2-followup-perf-diagnostic-design.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pretable Ready Ready Preview, Comment May 9, 2026 2:19am

@blove blove enabled auto-merge (squash) May 9, 2026 02:18
@blove blove merged commit b5ea678 into main May 9, 2026
11 checks passed
@blove blove deleted the b2-followup-1-perf-diag branch May 9, 2026 02:20
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Vercel preview ready

Preview: https://pretable-mrdvgvitb-cacheplane.vercel.app
Commit: 2a4b269777b9761da8d451e03a3b9181b03492e2

Updated automatically by the deploy-preview job.

blove added a commit that referenced this pull request May 9, 2026
#125)

PR #124 (perf-diag rerun at n=20) showed the B2 H1 "failing" verdict was
a low-sample artifact: pretable 9.07 ms ± 0.20 vs MUI 9.14 ms ± 0.19,
mean diff −0.065 ms inside the 2σ noise floor of 0.40 ms. The original
n=3 ratio of 1.115 was sample noise, not a real regression.

Five targeted corrections:

- Add status/milestones/2026-05-09-b2-h1-high-repeat-correction.json
  overlaying the original B2 evidence with the n=20 result and
  correctedH1.status = "satisfied". Original B2 milestone left intact.
- Rewrite the apps/website/app/bench/page.tsx prose to a parity framing
  at high repeats. verdictFor now respects a parityAdapters set so the
  table doesn't crown a "fastest" off n=3 noise; H1 status reflects the
  corrected verdict.
- Add a min-repeat gate to scripts/bench-matrix.mjs evaluateH1: when
  the pretable / best-full-grid frame-p95 ratio is in the tight zone
  (0.9 ≤ r ≤ 1.2) AND either adapter has < 10 repeats, return
  insufficient with guidance to re-run at --repeats=10+. Outside the
  tight zone the existing behavior is unchanged. New test covers the
  insufficient case; existing failing test rewritten to use a clearly
  out-of-zone ratio (1.6) so the failing path stays exercised.
- Append a 2026-05-09 entry to docs/research/repo-memory.md overturning
  the H1 flip narrative, with the new evaluator gate documented.

AG Grid (16.7 ms p95, 1 blank gap, 2 px row-height drift) and TanStack
(16.7 ms p95, 1 blank gap) status from the B2 n=3 runset is not
corrected here — both are >50% above pretable, well outside the noise
zone. They remain ~1.7× pretable's scroll_frame_p95_ms with quality
gaps that pretable does not have.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
blove added a commit that referenced this pull request May 11, 2026
…rdict) (#133)

* docs(specs): pretable scroll-with-render perf diagnostic design

Three-phase research PR mirroring PR #124's pattern: high-repeat (n=20)
re-run, conditional Playwright trace capture, research memo. Diagnoses
whether the PR #130 cheap-render anomaly (16.4 ms vs 10.3 ms for format
and heavy-render) is real or a low-sample artifact.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(plans): pretable scroll-with-render perf diagnostic plan

Seven-task plan mirroring PR #124's three-phase pattern: n=20 matrix
re-run, conditional Playwright trace capture, research memo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(research): pretable scroll-with-render perf diagnostic memo

Verdict: noise. The PR #130 cheap-render anomaly (16.4 ms vs ~10.3 ms
for format and heavy-render at n=3) was a sampling artifact. At higher
repeats, scroll-with-render is at parity with (in fact marginally
faster than) the other two:

| Script                     |   n | mean (ms) |   σ (ms) |
| -------------------------- | --: | --------: | -------: |
| scroll-with-format         |   8 |      9.36 |     0.80 |
| scroll-with-render         |   7 |      8.97 |     0.35 |
| scroll-with-heavy-render   |   6 |      9.15 |     0.13 |

Both 2σ pairs (cheap-vs-format, cheap-vs-heavy) are well within the
noise floor. Same shape as PR #124's finding at larger magnitude.

The matrix run completed only ~36% of planned repeats (Playwright
flake; not investigated) but the observed σ values make the verdict
unambiguous — PR #130's 6 ms gap is ~21σ away from the observed
distribution. No perf-fix PR needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: prettier-format perf-diag artifacts

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant