Skip to content

chore(bench): B2 follow-up #6 — S5/S7 cross-validation#128

Merged
blove merged 2 commits into
mainfrom
b2-followup-6-s5-s7
May 9, 2026
Merged

chore(bench): B2 follow-up #6 — S5/S7 cross-validation#128
blove merged 2 commits into
mainfrom
b2-followup-6-s5-s7

Conversation

@blove
Copy link
Copy Markdown
Contributor

@blove blove commented May 9, 2026

Summary

  • Matrix re-run at S5 (streaming updates) and S7 (filter-metadata) for all four real adapters (pretable, ag-grid, tanstack, mui) populates H9, H13, H14, H15 — all previously insufficient because the B2 Phase 4 retry was S2-only.
  • No source-code changes. Just a matrix run, a committed milestone, and a repo-memory entry.
  • Wall-clock ~3.5 min on hypothesis scale; command exactly as specified in the plan (--repeats=3 --update-rates=1000,25000).

Hypothesis status delta

H# Before After Notes
H9 insufficient satisfied Pretable matches MUI on S7 scroll: 9.2 ms p95, 0 blank gaps, 0 long tasks, row-height error ≤ 1 px. TanStack 16.7 ms p95 with 1 blank gap. Mirrors H1's parity story on S7.
H13 insufficient directional Pretable holds the frame budget at 1000/sec and 25000/sec; AG Grid also clears it. Frame-budget threshold alone does not differentiate.
H14 insufficient directional Pretable reaches 25000/sec; AG Grid also reaches 25000/sec — no order-of-magnitude gap inside the configured rates.
H15 insufficient directional Pretable visible-row drift = 1, AG Grid drift = 0; differentiation threshold (5 rows) not exceeded by either side.

No other hypothesis status changed. S2-dependent hypotheses (H1, H6–H8, H10–H12, H16–H22) remain insufficient because S2 was not in this matrix — expected. Existing B2 + B2-with-autosize S2 milestones are unchanged.

What's NOT in this PR

  • Editorial homepage refresh (potentially repopulating the deleted streaming row from this evidence) — distinct prose work, separate follow-up.
  • Comparative interaction scripts (sort, filter-text, filter-metadata, cell-renderer) on S7 — still pretable-only per the supportedScripts gate; tracked as B2 follow-up feat: S4 off-screen autosize columns #5.
  • No threshold tuning. H13/H14/H15 came back directional (not satisfied) because AG Grid Community's native streaming clears the same bars; that's the news, not a threshold problem.

Test plan

  • `pnpm --filter @pretable/app-bench build`
  • `pnpm bench:matrix` (S5+S7, 4 adapters, repeats=3, update rates 1000+25000) — completed exit 0
  • Inspect `status/runsets/.../hypotheses.json` for H9/H13/H14/H15 status flips and unexpected changes elsewhere — none
  • Copy to `status/milestones/2026-05-09-b2-s5-s7-cross-validation.hypotheses.json`
  • Append repo-memory entry
  • `pnpm -w typecheck && pnpm -w test && pnpm -w lint && pnpm format` — clean

🤖 Generated with Claude Code

blove and others added 2 commits May 8, 2026 22:28
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Matrix re-run at S5 (streaming updates) + S7 (filter-metadata) for all
four real adapters; populates H9, H13, H14, H15 (previously insufficient).

Run command:
  pnpm bench:matrix --project=chromium \
    --adapters=pretable,ag-grid,tanstack,mui \
    --scenarios=S5,S7 --scripts=scroll,updates \
    --scale=hypothesis --repeats=3 --update-rates=1000,25000

Wall-clock ~3.5 min. Milestone:
  status/milestones/2026-05-09-b2-s5-s7-cross-validation.hypotheses.json

Status delta:

| H#  | Before        | After       | Notes                                                           |
| --- | ------------- | ----------- | --------------------------------------------------------------- |
| H9  | insufficient  | satisfied   | Mirrors H1 parity story on S7 scroll (9.2ms p95, 0 blank gaps). |
| H13 | insufficient  | directional | AG Grid clears the streaming frame budget too (9.2ms vs 9.2ms). |
| H14 | insufficient  | directional | AG Grid sustains 25k/sec — no order-of-magnitude gap.           |
| H15 | insufficient  | directional | AG Grid drift 0 vs pretable drift 1; threshold not exceeded.    |

The streaming-uniqueness wedge (H13/H14/H15) is no longer numeric on
hypothesis scale — AG Grid Community's native applyTransaction matches
or beats pretable on every measured streaming metric. Pretable's
streaming wedge in the project narrative is integration (the
@pretable/stream-adapter + @cacheplane/json-stream pipeline), not raw
throughput. Editorial homepage refresh based on this finding is a
separate follow-up.

S2-dependent hypotheses (H1, H6-H8, H10-H12, H16-H22) remain
insufficient because S2 was not in this matrix; expected. Existing B2
+ B2-with-autosize milestones for S2 are unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pretable Ready Ready Preview, Comment May 9, 2026 5:37am

@blove blove enabled auto-merge (squash) May 9, 2026 05:36
@blove blove merged commit a505779 into main May 9, 2026
13 checks passed
@blove blove deleted the b2-followup-6-s5-s7 branch May 9, 2026 05:38
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Vercel preview ready

Preview: https://pretable-d7eieafpk-cacheplane.vercel.app
Commit: d607e4c9bf29cbb45bb5dfbe9c612e53e2cba607

Updated automatically by the deploy-preview job.

blove added a commit that referenced this pull request May 11, 2026
…5 cross-validation

PR #128's S5/S7 cross-validation matrix surfaced a finding: AG Grid
Community matches pretable on every measured streaming numeric (frame
p95, 25k/sec envelope, visible-row drift). The homepage's stub-era
"purpose-built streaming pipeline" framing — and the implication that
pretable is uniquely fast at streaming — is no longer supportable on
hypothesis-scale numerics. The honest wedge is package surface:
pretable ships the SSE → partial-JSON → batcher → applyTransaction
pipeline as a single import; AG Grid expects you to wire it yourself.

Three editorial edits:

- ComparisonTable.tsx: streaming row renamed from "purpose-built
  streaming pipeline" to "streaming pipeline (SSE → partial JSON →
  batcher → applyTransaction)" — same yes/n/a/n/a/n/a shape, sharper
  capability claim. Header docblock updated to cite the S5/S7
  cross-validation milestone alongside the existing B2 sources.

- ReceiptsBand.tsx: replaced the "25k/s · max sustained update rate"
  hero stat (no longer pretable-unique) with "OpenAI · Anthropic · SSE
  · streaming sources, one import". Added a `compact: true` flag to
  the Stat interface so the longer label renders at 20–24 px instead
  of 44–56 px, preserving the four-cell grid without overflowing the
  hero font scale.

- FeatureGrid.tsx: Stream-aware card — dropped "sustained from 100 to
  25,000 updates/sec" tail; rewrote the description around the pipeline
  that ships as one import.

Test added: ReceiptsBand.test.tsx regression-guards the new capability
anchor (`streaming sources` + `openai`).

Repo-memory entry appended (B2 follow-up #7); MEMORY.md index updated;
project_b2_followups.md regenerated to reflect everything resolved
except item #5 (open comparator interaction scripts).

No source/package changes outside apps/website + the docs entry; all
190+ website tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
blove added a commit that referenced this pull request May 11, 2026
…5 cross-validation (#129)

PR #128's S5/S7 cross-validation matrix surfaced a finding: AG Grid
Community matches pretable on every measured streaming numeric (frame
p95, 25k/sec envelope, visible-row drift). The homepage's stub-era
"purpose-built streaming pipeline" framing — and the implication that
pretable is uniquely fast at streaming — is no longer supportable on
hypothesis-scale numerics. The honest wedge is package surface:
pretable ships the SSE → partial-JSON → batcher → applyTransaction
pipeline as a single import; AG Grid expects you to wire it yourself.

Three editorial edits:

- ComparisonTable.tsx: streaming row renamed from "purpose-built
  streaming pipeline" to "streaming pipeline (SSE → partial JSON →
  batcher → applyTransaction)" — same yes/n/a/n/a/n/a shape, sharper
  capability claim. Header docblock updated to cite the S5/S7
  cross-validation milestone alongside the existing B2 sources.

- ReceiptsBand.tsx: replaced the "25k/s · max sustained update rate"
  hero stat (no longer pretable-unique) with "OpenAI · Anthropic · SSE
  · streaming sources, one import". Added a `compact: true` flag to
  the Stat interface so the longer label renders at 20–24 px instead
  of 44–56 px, preserving the four-cell grid without overflowing the
  hero font scale.

- FeatureGrid.tsx: Stream-aware card — dropped "sustained from 100 to
  25,000 updates/sec" tail; rewrote the description around the pipeline
  that ships as one import.

Test added: ReceiptsBand.test.tsx regression-guards the new capability
anchor (`streaming sources` + `openai`).

Repo-memory entry appended (B2 follow-up #7); MEMORY.md index updated;
project_b2_followups.md regenerated to reflect everything resolved
except item #5 (open comparator interaction scripts).

No source/package changes outside apps/website + the docs entry; all
190+ website tests pass.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant