[FOLLOW-UP — parked] Perf instrumentation, harness, cross-commit report#579
Open
SuuBro wants to merge 76 commits into
Open
[FOLLOW-UP — parked] Perf instrumentation, harness, cross-commit report#579SuuBro wants to merge 76 commits into
SuuBro wants to merge 76 commits into
Conversation
- src/app/perf-trace.ts: tiny client-side span/mark primitive with ring
buffer, cost-when-disabled invariant (no-op singleton handle), localStorage
/ ?perf=1 opt-in, window.__bobbitPerf surface. Pinned by tests/perf-trace.spec.ts
(12 tests, including 100k startSpan heap-growth check).
- src/app/perf-flags.ts: feature-flag helper for Phase 2 experiments.
- Instrumentation hooks (Phase 1 owner — instrumentation only):
- main.ts: 'app.boot' mark as first statement.
- api.ts: gatewayFetch wrapper dispatches api.session.fetch /
api.goal.fetch / api.goal.gates.fetch / api.goal.agents.fetch /
paint.tool-content.lazy by URL pattern. Cheap when perf disabled.
- sidebar-nav.ts: nav.click + nav.session.ready/nav.goal.ready opened on
openForNavItem(); pending span stashed on state.
- routing.ts: closes nav.click on setHashRoute completion.
- render.ts: paint.first span wrapping doRenderApp; ends pending nav span
on next rAF once the destination view's sentinel is in the DOM
(pi-chat-panel for session, wf-checklist-row/.gate-detail-panel/.tab-empty
for goal). Sets data-perf-ready on #app for harness wait.
- message-reducer.ts: reducer.rehydrate wraps the snapshot case.
- state.ts: pendingNavSpan field.
- src/server/server.ts: extended _timingEnabled block — always-on logging
with BOBBIT_TIMING_LOG_MIN_MS threshold env var; response wrapper tallies
bytes; per-request io counter bumped at entry to the five hot endpoints
(GET /api/sessions/:id, /api/goals/:id, /api/goals/:id/gates,
/api/goals/:id/team/agents, /api/sessions/:id/tool-content/:mi/:bi). Log
format: '[timing] METHOD path Xms bytes=B io=N'.
- tests/manual-integration/perf-sidebar-nav.spec.ts: Playwright harness —
boots gateway, seeds 10 sessions + 1 goal via REST, drives cold/warm/goal
passes, dumps client perf entries + server [timing] tail to .perf-out/
JSON + HTML report. Hard-fails (process.exit(1)) when any of the five
canonical gate spans has zero samples. NOT in CI.
Status:
- npm run check + test:unit + test:e2e all green when last run.
- Perf-trace unit suite (12 tests) passes.
- Harness boots and produces api.* / reducer.rehydrate / paint.first samples
but nav.click / nav.session.ready / nav.goal.ready don't fire yet because
the sidebar row selectors don't match the seeded sessions (sessions land
under an 'ungrouped' header that may need expansion, or the seeded REST
sessions render in a sidebar shape the harness doesn't click into).
Follow-up coder needs to: get the sidebar row click path working, then
produce the docs/perf/sidebar-nav-baseline.md with real numbers, then
build the cross-commit comparison report (docs/perf/history/ + scripts/
perf-report.mjs + docs/perf/sidebar-nav-report.html) per the scope
addition. Harness exit-1-on-missing-spans invariant is intentional and
protects against silently-broken instrumentation.
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…port Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
All five canonical gate spans produce non-zero samples on a single harness run. nav.session.ready p50 ~89ms is the dominant hotspot on click→ready; see docs/perf/sidebar-nav-baseline.md for the full ranking + reproduction recipe. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…les) Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Static design reference for the cross-commit perf report at docs/perf/mockups/sidebar-nav-report.html. - Synthetic 8-commit data telling a mixed story (improvements, regressions, flat spans, a newly-appearing span). - Headlines strip surfaces top movers by |Δp50|. - Summary table grouped by nav / api / render with green / red tinted Δ pills and inline p50 sparklines. - Per-span trend cards with inline-SVG line charts (p50 solid, p95 dashed), auto-scaled Y axis, commit SHAs on X. Uses Bobbit CSS tokens only (--chart-1/4, --positive, --negative, surface tokens) with :root fallbacks for the preview-bridge HMR race per defaults/docs/html-rendering.md. No hardcoded colours, no prefers-color-scheme, no external libs. Also relax .gitignore's '*-report.html' rule (which silently covered docs/perf reports) by re-including docs/perf/**/*-report.html so the committed report stays under version control. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Bring the generated docs/perf/sidebar-nav-report.html in line with the static mockup at docs/perf/mockups/sidebar-nav-report.html: - Header line: 'Generated YYYY-MM-DD HH:MM from N commits' + range + branch. - Headlines strip: top 6 spans by |Δp50 ms|, classified good/bad/flat, with green/red border accent and tinted delta caption. - Summary table: grouped by category (nav / api / render) with sub-headers, rows sorted by |Δp50 ms| within each group; Δ cells rendered as tinted pills; per-span p50 sparkline column (skips gaps for missing samples). - Per-span charts: inline-SVG line charts with auto-scaled 'nice' Y range, 4 gridlines + tabular Y labels, p50 solid fill + line, p95 dashed line, hover <title> tooltips on every dot, evenly-spaced SHA ticks on X. - Runs table: sortable visual, latest row highlighted with '← latest' tag. - Empty / single-run states render a clean explanatory card instead of a misleading 'no data' table. - Classifier treats <1ms AND <5% movement as 'flat' so reducer.rehydrate doesn't flash red for sub-ms jitter. All theming via Bobbit CSS tokens with :root fallbacks for the preview-bridge HMR race (see defaults/docs/html-rendering.md). No hardcoded colours, no prefers-color-scheme, no external libs. Regenerated docs/perf/sidebar-nav-report.html against the single existing history entry (commit 999bdc2) is included so the in-repo report matches the new generator. Re-running the manual harness will refresh it. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Scaffolds tests/e2e/ui/perf-sidebar-nav.spec.ts. Drives one cold session load (for api.session.fetch + reducer.rehydrate), one warm sidebar-row click (for nav.session.ready), and one warm goal-dashboard click (for nav.goal.ready + api.goal.fetch). Reads window.__bobbitPerf.entries() and asserts each of the five canonical gate spans has at least one sample below a generous regression-net budget derived from docs/perf/sidebar-nav-baseline.md (commit 999bdc2). Budgets are inflated ~10-100x p95 so transient CI slowness never trips the assert; a real regression still trips. Each budget cites its source baseline number inline. Skips cleanly (test.skip) if window.__bobbitPerf is gated off so the test doesn't silently no-op. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Behind the new `lazyToolContent` perf flag:
- Server: `GET /api/sessions/:id?stripToolContent=1` opts in to include a
`messages` array with tool-call content blocks above a configurable
threshold (default 4KB) replaced by the existing
`{ _truncated, _originalLength, preview }` shape. The renderer +
fetchToolContent flow already lazy-load via the existing
`/tool-content/:mi/:bi` endpoint. Default response unchanged.
- Strip helper: src/server/agent/strip-tool-content.ts. Pure
data-shape function, referential-equality fast path when no strip is
needed.
- Client: gatewayFetch rewrites GET /api/sessions/:id to add
`?stripToolContent=1` when the flag is on. Idempotent.
- Pinning test: tests/session-strip-tool-content.test.ts (12 cases
covering both tool_use and toolCall shapes, custom thresholds,
referential equality, parseStripThreshold).
- docs/perf/sidebar-nav-baseline.md: Phase 2B A/B section with run
instructions and decision rule.
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Closes the design-doc §2.2 gap flagged by implementation gate verification:
the canonical `nav.session.cold` and `nav.goal.cold` spans were specified
but never wired up. `mark("app.boot")` existed; no consumer did.
main.ts now captures BOOT_T0 immediately after the boot mark and installs
a MutationObserver on `#app` for `data-perf-ready` transitions. The first
transition to "session" or "goal" records the corresponding cold-load
perf span (with sessionId/goalId pulled from location.hash) and disconnects
— it is a one-shot, only meaningful on hard refresh.
Cheap when disabled: returns a noop disposer without installing the
observer when `perfIsEnabled()` is false.
Tests in `tests/perf-trace-cold-spans.spec.ts` extract the function from
a transpile of main.ts (no bundling — main.ts is wired into the UI graph
that parallel coders own) and exercise it in a real browser against the
real perf-trace module. Covers: session/goal sentinels, one-shot
behaviour, non-sentinel ignored, pre-set attribute synchronous emission,
disabled path noop, null target, and hash-based detail capture.
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Replace the empty-session warm/cold passes with a deterministic, seeded
archived-session fixture so the harness measures reducer.rehydrate,
api.session.fetch, and nav.session.ready against realistic transcript
sizes — not the artificial messages:0 baseline.
Mechanism (no src/ changes):
- After project registration, stop the gateway, write N archived rows
to <projectStateDir>/sessions.json pointing at synthetic JSONL files,
restart. ProjectContext.SessionStore reads them on boot.
- WS archived-attach (getArchivedMessages) parses the JSONLs and emits
real messages frames, driving reducer.rehydrate with non-trivial work.
- Warm pass drives nav via window.__bobbitOpenForNavItem (the keyboard
path) so nav.click + nav.session.ready fire identically for archived
and live rows. Direct row clicks on archived sessions bypass
openForNavItem (see render-helpers.ts:501).
Fixture mix per session: ~50% user/assistant text, 5–10 tool_use +
tool_result pairs, plus one >=50 KB tool-result blob (deterministic ASCII
so JSONL sizes are stable across runs).
BOBBIT_PERF_FIXTURE_SIZE = small | medium | large selects 10 / 50 / 200
messages per session. Default medium.
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Append Realistic-fixture baseline section to sidebar-nav-baseline.md. Key findings from the seeded archived fixture (medium = 50 msgs/session, large = 200 msgs/session): - reducer.rehydrate is decisively NOT a hotspot. 0.2ms p50 at medium, 0.4ms p50 at large, max 3.6ms across all runs. The Phase 1 candidate 'LRU-cache reducer state by session id' can be deprioritised. - paint.first is the new transcript-scaling hotspot: p95 = 27.5ms medium → 103ms large; max 73ms → 177ms. Synchronous markdown / syntax-highlight render of the whole transcript on click dominates at scale. - nav.session.ready p50 stays under the 100ms snappy threshold at medium (34.1ms) but p95 clears it at large (208ms), and the driver is paint.first scaling — that is the real perceived-snappiness lever. - Doc explicitly flags the live-vs-archived caveat: archived attach is lighter than live (no rpcClient, no event-buffer subscribe), so absolute numbers improve vs Phase 1's empty-live baseline. Once Phase 2B lazy- tool-content lands the harness should add a live-fixture pass. Cross-commit JSON files: docs/perf/history/c25e40be730b.json (medium, canonical) docs/perf/history/c25e40be730b-large.json (stress) Harness side-tweak: history filename now suffixes non-medium fixture sizes so multiple runs at the same SHA don't overwrite each other. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai> # Conflicts: # docs/perf/sidebar-nav-baseline.md
Read-only profiling of ~1.94K agent-CLI session transcripts under ~/.bobbit/agent/sessions. Produces docs/perf/real-session-profile.md covering corpus stats, message-type and role distribution, per-tool result-byte distribution, top-10 large-blob shapes, and concrete recommendations + anti-recommendations for buildRealisticJsonl() in tests/manual-integration/perf-sidebar-nav.spec.ts. Adds scripts/perf-profile-real-sessions.mjs, a one-shot Node helper that emits the underlying JSON aggregates. Filters out e2e/manual/ observe/restart-harness fixture directories. No source under src/ or tests/ touched. No PII or raw transcript content included in the report. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
- New docs/perf/README.md: orientation, harness env vars, opt-in flags, cross-commit report, workflow for adding optimisations. - docs/debugging.md: new 'Sidebar nav feels slow' walkthrough next to Render performance, linking to perf docs. - AGENTS.md: one architecture-map bullet + footer link to docs/perf/README.md. - docs/perf/sidebar-nav-baseline.md: one-line cross-link to README at top. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Adds 4 rapid-nav sub-passes (cached/uncached × 150ms/50ms cadence) that
fire the canonical Ctrl+ArrowDown shortcut without awaiting the previous
nav's sentinel. Derives rapidnav.keystroke.{cached,uncached},
rapidnav.gap, and rapidnav.stall.ms spans from the existing
nav.session.ready / nav.goal.ready entries.
Fixture seed count bumped from 10 to 32 with disjoint zones so each
cadence pass gets 10 run-wide-fresh rows on lap 1 and 10 cached rows on
lap 2 with no boundary contamination.
§5.6 verdict: walking the sidebar with Ctrl+Down does NOT feel smooth -
median keystroke→ready 100-170ms across all 8 cells, with no path under
the 16.7ms one-frame budget. Render-side cost (paint.first) dominates
even the cached path; cache misses add only ~10-40ms p50. Opt-A
(defer off-screen paint) becomes the headline target.
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…l+Down keystrokes `getActiveNavId` previously discarded `state.keyboardNavActiveId` whenever the URL hash hadn't caught up to the override's expected hash. Session navigation goes through an async dynamic import + connectToSession, so rapid Ctrl+Down keystrokes landing on a live session at the top of the sidebar (~200ms attach) would each fall back to a cold start in `navigateSidebar` and re-open the same row, eating 3-4 keystrokes during the attach window. The override is installed synchronously by `openForNavItem` and reflects the most recent user intent. `installKeyboardNavOverrideClearListener` continues to clear it on any subsequent hashchange whose URL doesn't match the override, so staleness is bounded. Pinned by tests/rapid-keystroke-nav.spec.ts: - behavioural mirror with buggy + fixed `getActiveNavId` proves the drop pre-fix and 10-for-10 distinct rows post-fix - source-level grep asserts the buggy `window.location.hash === expected` gate doesn't get reintroduced Updates docs/perf/sidebar-nav-baseline.md §5.6 with before/after numbers. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Phase 2 Opt-A — target paint.first p95 on large transcripts (medium p95 27.5ms → large p95 103ms, max 177ms per docs/perf/sidebar-nav-baseline.md §5.4). Synchronous markdown / syntax-highlight render of every message dominates first paint when the session has 100+ messages; rendering only the bottom-tail synchronously and deferring the rest via IntersectionObserver + requestIdleCallback should cut large-fixture paint.first p95 dramatically without affecting median. - New <deferred-block> Lit element wraps each transcript item when the flag is on. Eager items (last 8 in <message-list>) render inline; the rest render a height-preserving placeholder until IO (rootMargin 500px) fires and rIC swaps in the real template. - Ctrl+F / Cmd+F / F3 trigger DeferredBlock.forceResolveAll() so native browser-find sees the full transcript. - Perf-flag OFF path is unchanged (no <deferred-block> wrapper at all). - 7 new unit tests under tests/defer-offscreen-render.spec.ts pin the eager path, deferred-then-intersect resolve, Ctrl+F escape hatch, and the perf-flag-OFF historical behaviour. --trailer Co-authored-by: bobbit-ai <bobbit@bobbit.ai> Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Postmortem retained in docs/perf/sidebar-nav-baseline.md §6.1. Root cause is architectural: REST is metadata-only and the transcript ships over WS, so ?stripToolContent=1 doubles the work for negative gain. Fixing properly is out of scope for this goal. - DELETE src/server/agent/strip-tool-content.ts - DELETE tests/session-strip-tool-content.test.ts - src/server/server.ts: drop ?stripToolContent=1 parsing + invocation - src/app/api.ts: drop _maybeLazyToolContent URL rewrite; keep perf-trace dispatch and Opt-C prefetch logic intact - src/app/perf-flags.ts: remove lazyToolContent registry entry + PERF_FLAG_LAZY_TOOL_CONTENT const Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
n=5 confirmed within-noise; original "win" was a cold-cache artefact. Postmortem retained in docs/perf/sidebar-nav-baseline.md §6.3. - Restore loadDashboardData to pre-Opt-D 7-fetch Promise.all + sequential getTeamState await. - Drop src/app/goal-dashboard-fetches.ts helper. - Drop parallelGoalFetches entry + PERF_FLAG_PARALLEL_GOAL_FETCHES const from src/app/perf-flags.ts. - Drop tests/parallel-goal-fetches.spec.ts + fixtures. - Drop one-off analysers scripts/opt-d-analyse.mjs + scripts/opt-c-summary.mjs (their aggregated results live in the cross-commit report). Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Per task spec, append a one-line revert note pointing at the revert commit. Postmortem stays intact. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
n=5 confirmed only ~19ms median gain on nav.session.ready, far below the 100ms ship bar. Removing the cache + listener complexity per docs/perf/sidebar-nav-baseline.md §6.2 disposition revisited. - src/app/api.ts: remove prefetchUrl/Session/Goal + 20-entry LRU + the _consumePrefetch consultation in gatewayFetch. Phase 1 perf-trace URL-dispatch and Opt-B's _maybeLazyToolContent are untouched. - src/app/sidebar.ts: remove installSidebarPrefetchListener and its pointerover/focusin delegated handler. - src/app/main.ts: remove the prefetch listener install call. - src/app/perf-flags.ts: drop the prefetchOnHover registry entry and PERF_FLAG_PREFETCH_ON_HOVER const. - tests/prefetch-on-hover.spec.ts: deleted. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai> # Conflicts: # src/app/api.ts # src/app/perf-flags.ts
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…ss-commit report Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai> # Conflicts: # docs/perf/sidebar-nav-baseline.md
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Add <deferred-code-block> wrapper that renders a plain <pre><code> placeholder synchronously and upgrades to a real <code-block> (which runs hljs.highlight()) on requestIdleCallback (200ms timeout fallback to setTimeout(0)). Transcript-path renderers now go through codeBlock(code, lang) helper: flag OFF emits <code-block> directly (byte-identical to today); flag ON emits the deferred wrapper. Eager-tail messages (Opt-A) can paint their visible code blocks immediately as plain monospace, freeing the click \xe2\x86\x92 first-paint critical path of hljs work. Files: - src/ui/components/syntax-highlight.ts (new, owns the element + helper) - src/app/perf-flags.ts (deferSyntaxHighlight, default OFF) - Swaps in Messages.ts + all transcript tool renderers - Unit test tests/defer-syntax-highlight.spec.ts (4 cases) Artifact viewers (src/ui/tools/artifacts/*) still call hljs directly via unsafeHTML; they sit on a separate panel off the sidebar-nav critical path and are out of scope. Not touched: MessageList.ts (Opt-H), DeferredBlock.ts (Opt-A frozen), src/app/* other than perf-flags.ts. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…for bench spawn Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Replaces Opt-A's fixed DEFER_EAGER_TAIL = 8 with a viewport-driven eager set when the new virtualiseTail perf flag is on: walk items bottom-up, accumulating estimateMessageHeight, eager only the bottom-most messages whose cumulative height fills window.innerHeight (plus the one that partially overflows the top edge). On a 1280x800 desktop with chunky messages (~400px each) this is 2-3 eager messages instead of 8 -> fewer synchronous renders at first paint. OFF path is byte-for-byte the Opt-A baseline (verified by existing defer-offscreen-render.spec.ts). New tests/virtualise-tail.spec.ts pins: - flag-on, 200 fat msgs, 800px viewport -> 2 eager / 198 placeholders - flag-on, short transcript -> all eager - flag-on, single message taller than viewport -> bottom-most stays eager - flag-off -> 8 eager regardless of message size Append-only flag entry in src/app/perf-flags.ts (default-OFF, experiment). A/B benchmarking still to run. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Opt-C revert (commit 1fd1563) renamed the BOBBIT_PERF_FLAGS local from perfFlagsArg to perfFlagsCsv but missed the wantHoverWarmup reference on line 843, leaving a ReferenceError that fails every harness run. Blocks all parallel Phase 2 A/B experiments (Opt-F WS-attach, Opt-G defer-highlight, Opt-H virtualise-tail) on the goal branch. One-token rename. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…apper Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
A/B'd at commit 6edd880 (large fixture, n=5 per arm, interleaved). Hypothesis was that Opt-A's fixed eager-tail of 8 was still over-eager and shrinking it to a viewport-driven 2-3 would cut paint.first further. Data says no: span OFF p95 ON p95 Δmed paint.first 57.7 56.2 −1.5 nav.session.ready 182.3 188.9 +6.6 rapidnav.keystroke.cached 174.9 170.0 −4.9 nav.session.cold p50 330.6 315.1 −15.5 Largest delta is −1.5ms on paint.first p95 with fully overlapping replicate ranges. No critical span moves ≥100ms or past the 100ms snappy threshold. Postmortem in docs/perf/sidebar-nav-baseline.md §6.4 explains why: Opt-A's win came from collapsing the 190+ off-screen messages to placeholders (200 → 8); shaving 8 → 2-3 is rounding-error because the Lit reconciler + IO bookkeeping over 200 wrappers dominates whatever per-message render cost we save in the tail. Files reverted: src/app/perf-flags.ts — flag entry + const removed src/ui/components/MessageList.ts — back to DEFER_EAGER_TAIL = 8 Files deleted: tests/virtualise-tail.spec.ts tests/fixtures/virtualise-tail-{entry.ts,.html} .gitignore entries for the test bundle Data retained: docs/perf/history/6edd880cb47b-opt-h-{off,on}-{1..5}.json as the durable record behind the postmortem. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…ints n=5 replicates each on the canonical realistic-large fixture, SHA d9750ca. step 0 (baseline, Opt-A off via -deferOffscreenRender): nav.session.ready p50 median 140.1ms paint.first p50 median 25.3ms rapidnav.keystroke.cached p50 median 140.1ms step 1 (+Opt-A, default flags): nav.session.ready p50 median 132.8ms paint.first p50 median 22.9ms rapidnav.keystroke.cached p50 median 133.5ms Future steps land via scripts/perf-progression.mjs --step N+1. Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai> # Conflicts: # docs/perf/sidebar-nav-report.html # tests/manual-integration/perf-sidebar-nav.spec.ts
Tried deferring `RemoteAgent.connect()` off the click → first-paint
critical path, with pre-`auth_ok` send buffering and a small inline
indicator on the chat panel. n=5 medium A/B at SHA 552fb246b4b4
(10 history JSONs under docs/perf/history/).
Headline numbers (median across replicates):
nav.session.ready p50: 115 → 107 ms (−8 ms, within noise)
nav.session.ready p95: 156 → 171 ms (+15 ms, within noise)
paint.first p50: 25 → 23 ms (noise)
ws.attach p50: 46 → 59 ms (+13 ms, opposite direction)
rapidnav.keystroke.cached p50: 131 → 117 ms (−13 ms, within noise)
No span clears the ≥100 ms p50 reduction bar, none move from
>100 ms to <100 ms, and all median deltas sit inside the per-arm
min/max ranges (i.e. inside the noise floor).
Why the hypothesis missed: `connectToSession()` already constructs
the ChatPanel and calls `renderApp()` BEFORE `await remote.connect()`,
so the `nav.session.ready` sentinel (`pi-chat-panel` committed +
`appView === 'authenticated'`) closes on the first paint and never
sees ws.attach on its critical path. Removing the await can't move a
span that didn't include it.
Per the HOW-TO-REPEAT §7 discipline:
- Reverted all src changes (remote-agent.ts, session-manager.ts,
perf-flags.ts entry + const).
- Deleted the unit test (tests/defer-ws-attach.spec.ts +
fixtures/defer-ws-attach.html); the plumbing is gone, so the test
has nothing to pin.
- Kept the 10 history JSONs + §6.4 postmortem in
docs/perf/sidebar-nav-baseline.md as the durable record.
- docs/perf/sidebar-nav-report.html regenerated by the harness with
the opt-f-{off,on} A/B pair included.
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai> # Conflicts: # docs/perf/sidebar-nav-baseline.md # docs/perf/sidebar-nav-report.html
…ure) A/B at ea634d6, n=5, realistic-large fixture (200 msgs x 32 sessions). Critical-span p50 medians: nav.session.ready 127.0 -> 116.4 (-10.6 ms, within noise) nav.session.cold 332.5 -> 318.2 (-14.3 ms, within noise) nav.goal.ready 31.4 -> 32.3 (+0.9 ms, noise) nav.goal.cold 1752.8 -> 1753.2 (+0.4 ms, noise) paint.first 23.3 -> 22.9 (-0.4 ms, noise) rapidnav.keystroke.cached 135.4 -> 126.0 (-9.4 ms, within noise) rapidnav.keystroke.uncached 136.6 -> 138.4 (+1.8 ms, noise) Largest critical-span move is -14 ms (nav.session.cold), an order of magnitude below the >=100 ms p50 threshold from HOW-TO-REPEAT section 5. Every delta sits inside the run-to-run noise floor (off/on ranges overlap on every row). No span crosses the 100 ms snappy threshold under either arm. The report marks all four opt-g pair-rows as "within noise". Why the theoretical win didn't materialise: Opt-A already defers off-screen messages behind an IntersectionObserver, so the bulk of code-block density (which lives off-screen in realistic transcripts) is already deferred. The eager-tail messages that render synchronously are dominated by markdown / DOM layout cost, not hljs tokenisation -- paint.first p50 is flat +/-0.4 ms across arms. Opt-G stacked deferral on deferral and ran out of marginal wins on the metric the harness keys the decision on. Reverts 5f0fdeb (Opt-G implementation). Keeps: - docs/perf/history/ea634d62dc3b-opt-g-{off,on}-{1..5}.json (10 history JSONs -- durable evidence behind the decision) - docs/perf/sidebar-nav-baseline.md section 6.6 -- postmortem - regenerated docs/perf/sidebar-nav-report.html (now 4 A/B pairs) Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…532b0 Co-authored-by: bobbit-ai <bobbit@bobbit.ai> # Conflicts: # .gitignore
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Parked as a follow-up. The user-facing perf win + the dropped-keystroke fix were extracted into a much smaller, targeted PR: #584 (`feat/defer-offscreen-render`). This branch contains the supporting infrastructure that enabled the discovery and would let future engineers reproduce / extend the analysis:
Not for merge as-is. Either:
🤖 Generated with Bobbit