chore(bench): B2 follow-up #6 — S5/S7 cross-validation by blove · Pull Request #128 · cacheplane/pretable

blove · 2026-05-09T05:36:15Z

Summary

Matrix re-run at S5 (streaming updates) and S7 (filter-metadata) for all four real adapters (pretable, ag-grid, tanstack, mui) populates H9, H13, H14, H15 — all previously insufficient because the B2 Phase 4 retry was S2-only.
No source-code changes. Just a matrix run, a committed milestone, and a repo-memory entry.
Wall-clock ~3.5 min on hypothesis scale; command exactly as specified in the plan (--repeats=3 --update-rates=1000,25000).

Hypothesis status delta

H#	Before	After	Notes
H9	insufficient	satisfied	Pretable matches MUI on S7 scroll: 9.2 ms p95, 0 blank gaps, 0 long tasks, row-height error ≤ 1 px. TanStack 16.7 ms p95 with 1 blank gap. Mirrors H1's parity story on S7.
H13	insufficient	directional	Pretable holds the frame budget at 1000/sec and 25000/sec; AG Grid also clears it. Frame-budget threshold alone does not differentiate.
H14	insufficient	directional	Pretable reaches 25000/sec; AG Grid also reaches 25000/sec — no order-of-magnitude gap inside the configured rates.
H15	insufficient	directional	Pretable visible-row drift = 1, AG Grid drift = 0; differentiation threshold (5 rows) not exceeded by either side.

No other hypothesis status changed. S2-dependent hypotheses (H1, H6–H8, H10–H12, H16–H22) remain insufficient because S2 was not in this matrix — expected. Existing B2 + B2-with-autosize S2 milestones are unchanged.

What's NOT in this PR

Editorial homepage refresh (potentially repopulating the deleted streaming row from this evidence) — distinct prose work, separate follow-up.
Comparative interaction scripts (sort, filter-text, filter-metadata, cell-renderer) on S7 — still pretable-only per the supportedScripts gate; tracked as B2 follow-up feat: S4 off-screen autosize columns #5.
No threshold tuning. H13/H14/H15 came back directional (not satisfied) because AG Grid Community's native streaming clears the same bars; that's the news, not a threshold problem.

Test plan

`pnpm --filter @pretable/app-bench build`
`pnpm bench:matrix` (S5+S7, 4 adapters, repeats=3, update rates 1000+25000) — completed exit 0
Inspect `status/runsets/.../hypotheses.json` for H9/H13/H14/H15 status flips and unexpected changes elsewhere — none
Copy to `status/milestones/2026-05-09-b2-s5-s7-cross-validation.hypotheses.json`
Append repo-memory entry
`pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format` — clean

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Matrix re-run at S5 (streaming updates) + S7 (filter-metadata) for all four real adapters; populates H9, H13, H14, H15 (previously insufficient). Run command: pnpm bench:matrix --project=chromium \ --adapters=pretable,ag-grid,tanstack,mui \ --scenarios=S5,S7 --scripts=scroll,updates \ --scale=hypothesis --repeats=3 --update-rates=1000,25000 Wall-clock ~3.5 min. Milestone: status/milestones/2026-05-09-b2-s5-s7-cross-validation.hypotheses.json Status delta: | H# | Before | After | Notes | | --- | ------------- | ----------- | --------------------------------------------------------------- | | H9 | insufficient | satisfied | Mirrors H1 parity story on S7 scroll (9.2ms p95, 0 blank gaps). | | H13 | insufficient | directional | AG Grid clears the streaming frame budget too (9.2ms vs 9.2ms). | | H14 | insufficient | directional | AG Grid sustains 25k/sec — no order-of-magnitude gap. | | H15 | insufficient | directional | AG Grid drift 0 vs pretable drift 1; threshold not exceeded. | The streaming-uniqueness wedge (H13/H14/H15) is no longer numeric on hypothesis scale — AG Grid Community's native applyTransaction matches or beats pretable on every measured streaming metric. Pretable's streaming wedge in the project narrative is integration (the @pretable/stream-adapter + @cacheplane/json-stream pipeline), not raw throughput. Editorial homepage refresh based on this finding is a separate follow-up. S2-dependent hypotheses (H1, H6-H8, H10-H12, H16-H22) remain insufficient because S2 was not in this matrix; expected. Existing B2 + B2-with-autosize milestones for S2 are unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-09T05:36:20Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
pretable	Ready	Preview, Comment	May 9, 2026 5:37am

github-actions · 2026-05-09T05:40:26Z

Vercel preview ready

Preview: https://pretable-d7eieafpk-cacheplane.vercel.app
Commit: d607e4c9bf29cbb45bb5dfbe9c612e53e2cba607

_{Updated automatically by the deploy-preview job.}

…5 cross-validation PR #128's S5/S7 cross-validation matrix surfaced a finding: AG Grid Community matches pretable on every measured streaming numeric (frame p95, 25k/sec envelope, visible-row drift). The homepage's stub-era "purpose-built streaming pipeline" framing — and the implication that pretable is uniquely fast at streaming — is no longer supportable on hypothesis-scale numerics. The honest wedge is package surface: pretable ships the SSE → partial-JSON → batcher → applyTransaction pipeline as a single import; AG Grid expects you to wire it yourself. Three editorial edits: - ComparisonTable.tsx: streaming row renamed from "purpose-built streaming pipeline" to "streaming pipeline (SSE → partial JSON → batcher → applyTransaction)" — same yes/n/a/n/a/n/a shape, sharper capability claim. Header docblock updated to cite the S5/S7 cross-validation milestone alongside the existing B2 sources. - ReceiptsBand.tsx: replaced the "25k/s · max sustained update rate" hero stat (no longer pretable-unique) with "OpenAI · Anthropic · SSE · streaming sources, one import". Added a `compact: true` flag to the Stat interface so the longer label renders at 20–24 px instead of 44–56 px, preserving the four-cell grid without overflowing the hero font scale. - FeatureGrid.tsx: Stream-aware card — dropped "sustained from 100 to 25,000 updates/sec" tail; rewrote the description around the pipeline that ships as one import. Test added: ReceiptsBand.test.tsx regression-guards the new capability anchor (`streaming sources` + `openai`). Repo-memory entry appended (B2 follow-up #7); MEMORY.md index updated; project_b2_followups.md regenerated to reflect everything resolved except item #5 (open comparator interaction scripts). No source/package changes outside apps/website + the docs entry; all 190+ website tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…5 cross-validation (#129) PR #128's S5/S7 cross-validation matrix surfaced a finding: AG Grid Community matches pretable on every measured streaming numeric (frame p95, 25k/sec envelope, visible-row drift). The homepage's stub-era "purpose-built streaming pipeline" framing — and the implication that pretable is uniquely fast at streaming — is no longer supportable on hypothesis-scale numerics. The honest wedge is package surface: pretable ships the SSE → partial-JSON → batcher → applyTransaction pipeline as a single import; AG Grid expects you to wire it yourself. Three editorial edits: - ComparisonTable.tsx: streaming row renamed from "purpose-built streaming pipeline" to "streaming pipeline (SSE → partial JSON → batcher → applyTransaction)" — same yes/n/a/n/a/n/a shape, sharper capability claim. Header docblock updated to cite the S5/S7 cross-validation milestone alongside the existing B2 sources. - ReceiptsBand.tsx: replaced the "25k/s · max sustained update rate" hero stat (no longer pretable-unique) with "OpenAI · Anthropic · SSE · streaming sources, one import". Added a `compact: true` flag to the Stat interface so the longer label renders at 20–24 px instead of 44–56 px, preserving the four-cell grid without overflowing the hero font scale. - FeatureGrid.tsx: Stream-aware card — dropped "sustained from 100 to 25,000 updates/sec" tail; rewrote the description around the pipeline that ships as one import. Test added: ReceiptsBand.test.tsx regression-guards the new capability anchor (`streaming sources` + `openai`). Repo-memory entry appended (B2 follow-up #7); MEMORY.md index updated; project_b2_followups.md regenerated to reflect everything resolved except item #5 (open comparator interaction scripts). No source/package changes outside apps/website + the docs entry; all 190+ website tests pass. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

blove and others added 2 commits May 8, 2026 22:28

docs(plans): B2 follow-up #6 — S5/S7 cross-validation

2d10d35

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

blove enabled auto-merge (squash) May 9, 2026 05:36

blove merged commit a505779 into main May 9, 2026
13 checks passed

blove deleted the b2-followup-6-s5-s7 branch May 9, 2026 05:38

blove mentioned this pull request May 9, 2026

fix(website): reframe streaming claims as capability-anchored after S5 cross-validation #129

Merged

4 tasks

blove mentioned this pull request May 13, 2026

chore(deps): bump ag-grid-community from 33.3.2 to 35.2.1 #136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(bench): B2 follow-up #6 — S5/S7 cross-validation#128

chore(bench): B2 follow-up #6 — S5/S7 cross-validation#128
blove merged 2 commits into
mainfrom
b2-followup-6-s5-s7

blove commented May 9, 2026

Uh oh!

vercel Bot commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blove commented May 9, 2026

Summary

Hypothesis status delta

What's NOT in this PR

Test plan

Uh oh!

vercel Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

Vercel preview ready

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 9, 2026 •

edited

Loading