Skip to content

[FOLLOW-UP — parked] Perf instrumentation, harness, cross-commit report#579

Open
SuuBro wants to merge 76 commits into
masterfrom
goal/profile-si-320532b0
Open

[FOLLOW-UP — parked] Perf instrumentation, harness, cross-commit report#579
SuuBro wants to merge 76 commits into
masterfrom
goal/profile-si-320532b0

Conversation

@SuuBro
Copy link
Copy Markdown
Owner

@SuuBro SuuBro commented May 13, 2026

Parked as a follow-up. The user-facing perf win + the dropped-keystroke fix were extracted into a much smaller, targeted PR: #584 (`feat/defer-offscreen-render`). This branch contains the supporting infrastructure that enabled the discovery and would let future engineers reproduce / extend the analysis:

  • Client perf-trace primitive (`src/app/perf-trace.ts`)
  • Server `[timing]` log extension
  • Sidebar/render/api/ws instrumentation hooks
  • Manual harness `tests/manual-integration/perf-sidebar-nav.spec.ts` with realistic-corpus-tuned fixture
  • Cross-commit comparison report (`scripts/perf-{bench,report,progression}.mjs` + `docs/perf/sidebar-nav-report.html`)
  • Phase 3 budget E2E test
  • Real-session JSONL corpus profile
  • `docs/perf/HOW-TO-REPEAT.md` workflow doc
  • `docs/perf/sidebar-nav-baseline.md` with all the postmortems for the experiments that didn't pay off (Opt-B / C / D / F / G / H)

Not for merge as-is. Either:

  1. Land a curated subset (e.g. just the perf-trace primitive + server timing) when there's appetite for ongoing perf work, or
  2. Close this PR if the team prefers to keep master lean and revive the harness from scratch when the next perf goal arrives.

🤖 Generated with Bobbit

SuuBro and others added 30 commits May 13, 2026 21:19
- src/app/perf-trace.ts: tiny client-side span/mark primitive with ring
  buffer, cost-when-disabled invariant (no-op singleton handle), localStorage
  / ?perf=1 opt-in, window.__bobbitPerf surface. Pinned by tests/perf-trace.spec.ts
  (12 tests, including 100k startSpan heap-growth check).
- src/app/perf-flags.ts: feature-flag helper for Phase 2 experiments.
- Instrumentation hooks (Phase 1 owner — instrumentation only):
  - main.ts: 'app.boot' mark as first statement.
  - api.ts: gatewayFetch wrapper dispatches api.session.fetch /
    api.goal.fetch / api.goal.gates.fetch / api.goal.agents.fetch /
    paint.tool-content.lazy by URL pattern. Cheap when perf disabled.
  - sidebar-nav.ts: nav.click + nav.session.ready/nav.goal.ready opened on
    openForNavItem(); pending span stashed on state.
  - routing.ts: closes nav.click on setHashRoute completion.
  - render.ts: paint.first span wrapping doRenderApp; ends pending nav span
    on next rAF once the destination view's sentinel is in the DOM
    (pi-chat-panel for session, wf-checklist-row/.gate-detail-panel/.tab-empty
    for goal). Sets data-perf-ready on #app for harness wait.
  - message-reducer.ts: reducer.rehydrate wraps the snapshot case.
  - state.ts: pendingNavSpan field.
- src/server/server.ts: extended _timingEnabled block — always-on logging
  with BOBBIT_TIMING_LOG_MIN_MS threshold env var; response wrapper tallies
  bytes; per-request io counter bumped at entry to the five hot endpoints
  (GET /api/sessions/:id, /api/goals/:id, /api/goals/:id/gates,
  /api/goals/:id/team/agents, /api/sessions/:id/tool-content/:mi/:bi). Log
  format: '[timing] METHOD path Xms bytes=B io=N'.
- tests/manual-integration/perf-sidebar-nav.spec.ts: Playwright harness —
  boots gateway, seeds 10 sessions + 1 goal via REST, drives cold/warm/goal
  passes, dumps client perf entries + server [timing] tail to .perf-out/
  JSON + HTML report. Hard-fails (process.exit(1)) when any of the five
  canonical gate spans has zero samples. NOT in CI.

Status:
- npm run check + test:unit + test:e2e all green when last run.
- Perf-trace unit suite (12 tests) passes.
- Harness boots and produces api.* / reducer.rehydrate / paint.first samples
  but nav.click / nav.session.ready / nav.goal.ready don't fire yet because
  the sidebar row selectors don't match the seeded sessions (sessions land
  under an 'ungrouped' header that may need expansion, or the seeded REST
  sessions render in a sidebar shape the harness doesn't click into).
  Follow-up coder needs to: get the sidebar row click path working, then
  produce the docs/perf/sidebar-nav-baseline.md with real numbers, then
  build the cross-commit comparison report (docs/perf/history/ + scripts/
  perf-report.mjs + docs/perf/sidebar-nav-report.html) per the scope
  addition. Harness exit-1-on-missing-spans invariant is intentional and
  protects against silently-broken instrumentation.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…port

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
All five canonical gate spans produce non-zero samples on a single harness
run. nav.session.ready p50 ~89ms is the dominant hotspot on click→ready;
see docs/perf/sidebar-nav-baseline.md for the full ranking + reproduction
recipe.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…les)

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Static design reference for the cross-commit perf report at
docs/perf/mockups/sidebar-nav-report.html.

- Synthetic 8-commit data telling a mixed story (improvements,
  regressions, flat spans, a newly-appearing span).
- Headlines strip surfaces top movers by |Δp50|.
- Summary table grouped by nav / api / render with green / red
  tinted Δ pills and inline p50 sparklines.
- Per-span trend cards with inline-SVG line charts (p50 solid,
  p95 dashed), auto-scaled Y axis, commit SHAs on X.

Uses Bobbit CSS tokens only (--chart-1/4, --positive, --negative,
surface tokens) with :root fallbacks for the preview-bridge HMR
race per defaults/docs/html-rendering.md. No hardcoded colours,
no prefers-color-scheme, no external libs.

Also relax .gitignore's '*-report.html' rule (which silently
covered docs/perf reports) by re-including docs/perf/**/*-report.html
so the committed report stays under version control.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Bring the generated docs/perf/sidebar-nav-report.html in line with the
static mockup at docs/perf/mockups/sidebar-nav-report.html:

- Header line: 'Generated YYYY-MM-DD HH:MM from N commits' + range + branch.
- Headlines strip: top 6 spans by |Δp50 ms|, classified good/bad/flat,
  with green/red border accent and tinted delta caption.
- Summary table: grouped by category (nav / api / render) with sub-headers,
  rows sorted by |Δp50 ms| within each group; Δ cells rendered as tinted
  pills; per-span p50 sparkline column (skips gaps for missing samples).
- Per-span charts: inline-SVG line charts with auto-scaled 'nice' Y range,
  4 gridlines + tabular Y labels, p50 solid fill + line, p95 dashed line,
  hover <title> tooltips on every dot, evenly-spaced SHA ticks on X.
- Runs table: sortable visual, latest row highlighted with '← latest' tag.
- Empty / single-run states render a clean explanatory card instead of
  a misleading 'no data' table.
- Classifier treats <1ms AND <5% movement as 'flat' so reducer.rehydrate
  doesn't flash red for sub-ms jitter.

All theming via Bobbit CSS tokens with :root fallbacks for the preview-bridge
HMR race (see defaults/docs/html-rendering.md). No hardcoded colours,
no prefers-color-scheme, no external libs.

Regenerated docs/perf/sidebar-nav-report.html against the single existing
history entry (commit 999bdc2) is included so the in-repo report matches
the new generator. Re-running the manual harness will refresh it.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Scaffolds tests/e2e/ui/perf-sidebar-nav.spec.ts. Drives one cold session
load (for api.session.fetch + reducer.rehydrate), one warm sidebar-row
click (for nav.session.ready), and one warm goal-dashboard click (for
nav.goal.ready + api.goal.fetch). Reads window.__bobbitPerf.entries()
and asserts each of the five canonical gate spans has at least one
sample below a generous regression-net budget derived from
docs/perf/sidebar-nav-baseline.md (commit 999bdc2).

Budgets are inflated ~10-100x p95 so transient CI slowness never trips
the assert; a real regression still trips. Each budget cites its source
baseline number inline.

Skips cleanly (test.skip) if window.__bobbitPerf is gated off so the
test doesn't silently no-op.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Behind the new `lazyToolContent` perf flag:

- Server: `GET /api/sessions/:id?stripToolContent=1` opts in to include a
  `messages` array with tool-call content blocks above a configurable
  threshold (default 4KB) replaced by the existing
  `{ _truncated, _originalLength, preview }` shape. The renderer +
  fetchToolContent flow already lazy-load via the existing
  `/tool-content/:mi/:bi` endpoint. Default response unchanged.
- Strip helper: src/server/agent/strip-tool-content.ts. Pure
  data-shape function, referential-equality fast path when no strip is
  needed.
- Client: gatewayFetch rewrites GET /api/sessions/:id to add
  `?stripToolContent=1` when the flag is on. Idempotent.
- Pinning test: tests/session-strip-tool-content.test.ts (12 cases
  covering both tool_use and toolCall shapes, custom thresholds,
  referential equality, parseStripThreshold).
- docs/perf/sidebar-nav-baseline.md: Phase 2B A/B section with run
  instructions and decision rule.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Closes the design-doc §2.2 gap flagged by implementation gate verification:
the canonical `nav.session.cold` and `nav.goal.cold` spans were specified
but never wired up. `mark("app.boot")` existed; no consumer did.

main.ts now captures BOOT_T0 immediately after the boot mark and installs
a MutationObserver on `#app` for `data-perf-ready` transitions. The first
transition to "session" or "goal" records the corresponding cold-load
perf span (with sessionId/goalId pulled from location.hash) and disconnects
— it is a one-shot, only meaningful on hard refresh.

Cheap when disabled: returns a noop disposer without installing the
observer when `perfIsEnabled()` is false.

Tests in `tests/perf-trace-cold-spans.spec.ts` extract the function from
a transpile of main.ts (no bundling — main.ts is wired into the UI graph
that parallel coders own) and exercise it in a real browser against the
real perf-trace module. Covers: session/goal sentinels, one-shot
behaviour, non-sentinel ignored, pre-set attribute synchronous emission,
disabled path noop, null target, and hash-based detail capture.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Replace the empty-session warm/cold passes with a deterministic, seeded
archived-session fixture so the harness measures reducer.rehydrate,
api.session.fetch, and nav.session.ready against realistic transcript
sizes — not the artificial messages:0 baseline.

Mechanism (no src/ changes):
  - After project registration, stop the gateway, write N archived rows
    to <projectStateDir>/sessions.json pointing at synthetic JSONL files,
    restart. ProjectContext.SessionStore reads them on boot.
  - WS archived-attach (getArchivedMessages) parses the JSONLs and emits
    real messages frames, driving reducer.rehydrate with non-trivial work.
  - Warm pass drives nav via window.__bobbitOpenForNavItem (the keyboard
    path) so nav.click + nav.session.ready fire identically for archived
    and live rows. Direct row clicks on archived sessions bypass
    openForNavItem (see render-helpers.ts:501).

Fixture mix per session: ~50% user/assistant text, 5–10 tool_use +
tool_result pairs, plus one >=50 KB tool-result blob (deterministic ASCII
so JSONL sizes are stable across runs).

BOBBIT_PERF_FIXTURE_SIZE = small | medium | large selects 10 / 50 / 200
messages per session. Default medium.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Append Realistic-fixture baseline section to sidebar-nav-baseline.md.

Key findings from the seeded archived fixture (medium = 50 msgs/session,
large = 200 msgs/session):

- reducer.rehydrate is decisively NOT a hotspot. 0.2ms p50 at medium,
  0.4ms p50 at large, max 3.6ms across all runs. The Phase 1 candidate
  'LRU-cache reducer state by session id' can be deprioritised.
- paint.first is the new transcript-scaling hotspot: p95 = 27.5ms medium
  → 103ms large; max 73ms → 177ms. Synchronous markdown / syntax-highlight
  render of the whole transcript on click dominates at scale.
- nav.session.ready p50 stays under the 100ms snappy threshold at medium
  (34.1ms) but p95 clears it at large (208ms), and the driver is
  paint.first scaling — that is the real perceived-snappiness lever.
- Doc explicitly flags the live-vs-archived caveat: archived attach is
  lighter than live (no rpcClient, no event-buffer subscribe), so absolute
  numbers improve vs Phase 1's empty-live baseline. Once Phase 2B lazy-
  tool-content lands the harness should add a live-fixture pass.

Cross-commit JSON files:
  docs/perf/history/c25e40be730b.json         (medium, canonical)
  docs/perf/history/c25e40be730b-large.json   (stress)

Harness side-tweak: history filename now suffixes non-medium fixture
sizes so multiple runs at the same SHA don't overwrite each other.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

# Conflicts:
#	docs/perf/sidebar-nav-baseline.md
Read-only profiling of ~1.94K agent-CLI session transcripts under
~/.bobbit/agent/sessions. Produces docs/perf/real-session-profile.md
covering corpus stats, message-type and role distribution, per-tool
result-byte distribution, top-10 large-blob shapes, and concrete
recommendations + anti-recommendations for buildRealisticJsonl() in
tests/manual-integration/perf-sidebar-nav.spec.ts.

Adds scripts/perf-profile-real-sessions.mjs, a one-shot Node helper
that emits the underlying JSON aggregates. Filters out e2e/manual/
observe/restart-harness fixture directories.

No source under src/ or tests/ touched. No PII or raw transcript
content included in the report.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
- New docs/perf/README.md: orientation, harness env vars, opt-in flags, cross-commit report, workflow for adding optimisations.
- docs/debugging.md: new 'Sidebar nav feels slow' walkthrough next to Render performance, linking to perf docs.
- AGENTS.md: one architecture-map bullet + footer link to docs/perf/README.md.
- docs/perf/sidebar-nav-baseline.md: one-line cross-link to README at top.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Adds 4 rapid-nav sub-passes (cached/uncached × 150ms/50ms cadence) that
fire the canonical Ctrl+ArrowDown shortcut without awaiting the previous
nav's sentinel. Derives rapidnav.keystroke.{cached,uncached},
rapidnav.gap, and rapidnav.stall.ms spans from the existing
nav.session.ready / nav.goal.ready entries.

Fixture seed count bumped from 10 to 32 with disjoint zones so each
cadence pass gets 10 run-wide-fresh rows on lap 1 and 10 cached rows on
lap 2 with no boundary contamination.

§5.6 verdict: walking the sidebar with Ctrl+Down does NOT feel smooth -
median keystroke→ready 100-170ms across all 8 cells, with no path under
the 16.7ms one-frame budget. Render-side cost (paint.first) dominates
even the cached path; cache misses add only ~10-40ms p50. Opt-A
(defer off-screen paint) becomes the headline target.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…l+Down keystrokes

`getActiveNavId` previously discarded `state.keyboardNavActiveId` whenever the URL
hash hadn't caught up to the override's expected hash. Session navigation goes through
an async dynamic import + connectToSession, so rapid Ctrl+Down keystrokes landing on a
live session at the top of the sidebar (~200ms attach) would each fall back to a cold
start in `navigateSidebar` and re-open the same row, eating 3-4 keystrokes during the
attach window.

The override is installed synchronously by `openForNavItem` and reflects the most
recent user intent. `installKeyboardNavOverrideClearListener` continues to clear it
on any subsequent hashchange whose URL doesn't match the override, so staleness is
bounded.

Pinned by tests/rapid-keystroke-nav.spec.ts:
- behavioural mirror with buggy + fixed `getActiveNavId` proves the drop pre-fix
  and 10-for-10 distinct rows post-fix
- source-level grep asserts the buggy `window.location.hash === expected` gate
  doesn't get reintroduced

Updates docs/perf/sidebar-nav-baseline.md §5.6 with before/after numbers.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Phase 2 Opt-A — target paint.first p95 on large transcripts (medium p95
27.5ms → large p95 103ms, max 177ms per docs/perf/sidebar-nav-baseline.md
§5.4). Synchronous markdown / syntax-highlight render of every message
dominates first paint when the session has 100+ messages; rendering only
the bottom-tail synchronously and deferring the rest via
IntersectionObserver + requestIdleCallback should cut large-fixture
paint.first p95 dramatically without affecting median.

- New <deferred-block> Lit element wraps each transcript item when the
  flag is on. Eager items (last 8 in <message-list>) render inline; the
  rest render a height-preserving placeholder until IO (rootMargin 500px)
  fires and rIC swaps in the real template.
- Ctrl+F / Cmd+F / F3 trigger DeferredBlock.forceResolveAll() so native
  browser-find sees the full transcript.
- Perf-flag OFF path is unchanged (no <deferred-block> wrapper at all).
- 7 new unit tests under tests/defer-offscreen-render.spec.ts pin the
  eager path, deferred-then-intersect resolve, Ctrl+F escape hatch, and
  the perf-flag-OFF historical behaviour.

--trailer Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
SuuBro and others added 27 commits May 14, 2026 08:49
Postmortem retained in docs/perf/sidebar-nav-baseline.md §6.1. Root cause
is architectural: REST is metadata-only and the transcript ships over WS,
so ?stripToolContent=1 doubles the work for negative gain. Fixing
properly is out of scope for this goal.

- DELETE src/server/agent/strip-tool-content.ts
- DELETE tests/session-strip-tool-content.test.ts
- src/server/server.ts: drop ?stripToolContent=1 parsing + invocation
- src/app/api.ts: drop _maybeLazyToolContent URL rewrite; keep
  perf-trace dispatch and Opt-C prefetch logic intact
- src/app/perf-flags.ts: remove lazyToolContent registry entry +
  PERF_FLAG_LAZY_TOOL_CONTENT const

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
n=5 confirmed within-noise; original "win" was a cold-cache artefact.
Postmortem retained in docs/perf/sidebar-nav-baseline.md §6.3.

- Restore loadDashboardData to pre-Opt-D 7-fetch Promise.all + sequential
  getTeamState await.
- Drop src/app/goal-dashboard-fetches.ts helper.
- Drop parallelGoalFetches entry + PERF_FLAG_PARALLEL_GOAL_FETCHES const
  from src/app/perf-flags.ts.
- Drop tests/parallel-goal-fetches.spec.ts + fixtures.
- Drop one-off analysers scripts/opt-d-analyse.mjs + scripts/opt-c-summary.mjs
  (their aggregated results live in the cross-commit report).

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Per task spec, append a one-line revert note pointing at the revert
commit. Postmortem stays intact.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
n=5 confirmed only ~19ms median gain on nav.session.ready, far below
the 100ms ship bar. Removing the cache + listener complexity per
docs/perf/sidebar-nav-baseline.md §6.2 disposition revisited.

- src/app/api.ts: remove prefetchUrl/Session/Goal + 20-entry LRU + the
  _consumePrefetch consultation in gatewayFetch. Phase 1 perf-trace
  URL-dispatch and Opt-B's _maybeLazyToolContent are untouched.
- src/app/sidebar.ts: remove installSidebarPrefetchListener and its
  pointerover/focusin delegated handler.
- src/app/main.ts: remove the prefetch listener install call.
- src/app/perf-flags.ts: drop the prefetchOnHover registry entry and
  PERF_FLAG_PREFETCH_ON_HOVER const.
- tests/prefetch-on-hover.spec.ts: deleted.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

# Conflicts:
#	src/app/api.ts
#	src/app/perf-flags.ts
…ss-commit report

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

# Conflicts:
#	docs/perf/sidebar-nav-baseline.md
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Add <deferred-code-block> wrapper that renders a plain <pre><code>
placeholder synchronously and upgrades to a real <code-block> (which
runs hljs.highlight()) on requestIdleCallback (200ms timeout fallback
to setTimeout(0)).

Transcript-path renderers now go through codeBlock(code, lang) helper:
flag OFF emits <code-block> directly (byte-identical to today); flag
ON emits the deferred wrapper. Eager-tail messages (Opt-A) can paint
their visible code blocks immediately as plain monospace, freeing the
click \xe2\x86\x92 first-paint critical path of hljs work.

Files:
- src/ui/components/syntax-highlight.ts (new, owns the element + helper)
- src/app/perf-flags.ts (deferSyntaxHighlight, default OFF)
- Swaps in Messages.ts + all transcript tool renderers
- Unit test tests/defer-syntax-highlight.spec.ts (4 cases)

Artifact viewers (src/ui/tools/artifacts/*) still call hljs directly
via unsafeHTML; they sit on a separate panel off the sidebar-nav
critical path and are out of scope.

Not touched: MessageList.ts (Opt-H), DeferredBlock.ts (Opt-A frozen),
src/app/* other than perf-flags.ts.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…for bench spawn

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Replaces Opt-A's fixed DEFER_EAGER_TAIL = 8 with a viewport-driven eager set
when the new virtualiseTail perf flag is on: walk items bottom-up,
accumulating estimateMessageHeight, eager only the bottom-most messages
whose cumulative height fills window.innerHeight (plus the one that
partially overflows the top edge). On a 1280x800 desktop with chunky
messages (~400px each) this is 2-3 eager messages instead of 8 -> fewer
synchronous renders at first paint.

OFF path is byte-for-byte the Opt-A baseline (verified by existing
defer-offscreen-render.spec.ts). New tests/virtualise-tail.spec.ts pins:
  - flag-on, 200 fat msgs, 800px viewport -> 2 eager / 198 placeholders
  - flag-on, short transcript -> all eager
  - flag-on, single message taller than viewport -> bottom-most stays eager
  - flag-off -> 8 eager regardless of message size

Append-only flag entry in src/app/perf-flags.ts (default-OFF, experiment).
A/B benchmarking still to run.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Opt-C revert (commit 1fd1563) renamed the BOBBIT_PERF_FLAGS local from
perfFlagsArg to perfFlagsCsv but missed the wantHoverWarmup reference on
line 843, leaving a ReferenceError that fails every harness run. Blocks
all parallel Phase 2 A/B experiments (Opt-F WS-attach, Opt-G defer-highlight,
Opt-H virtualise-tail) on the goal branch. One-token rename.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…apper

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
A/B'd at commit 6edd880 (large fixture, n=5 per arm, interleaved).
Hypothesis was that Opt-A's fixed eager-tail of 8 was still over-eager
and shrinking it to a viewport-driven 2-3 would cut paint.first further.
Data says no:

  span                           OFF p95    ON p95     Δmed
  paint.first                     57.7       56.2      −1.5
  nav.session.ready              182.3      188.9      +6.6
  rapidnav.keystroke.cached      174.9      170.0      −4.9
  nav.session.cold p50           330.6      315.1     −15.5

Largest delta is −1.5ms on paint.first p95 with fully overlapping
replicate ranges. No critical span moves ≥100ms or past the 100ms
snappy threshold. Postmortem in docs/perf/sidebar-nav-baseline.md §6.4
explains why: Opt-A's win came from collapsing the 190+ off-screen
messages to placeholders (200 → 8); shaving 8 → 2-3 is rounding-error
because the Lit reconciler + IO bookkeeping over 200 wrappers
dominates whatever per-message render cost we save in the tail.

Files reverted:
  src/app/perf-flags.ts          — flag entry + const removed
  src/ui/components/MessageList.ts — back to DEFER_EAGER_TAIL = 8

Files deleted:
  tests/virtualise-tail.spec.ts
  tests/fixtures/virtualise-tail-{entry.ts,.html}
  .gitignore entries for the test bundle

Data retained: docs/perf/history/6edd880cb47b-opt-h-{off,on}-{1..5}.json
as the durable record behind the postmortem.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
…ints

n=5 replicates each on the canonical realistic-large fixture, SHA d9750ca.

step 0 (baseline, Opt-A off via -deferOffscreenRender):
  nav.session.ready  p50 median 140.1ms
  paint.first        p50 median 25.3ms
  rapidnav.keystroke.cached p50 median 140.1ms

step 1 (+Opt-A, default flags):
  nav.session.ready  p50 median 132.8ms
  paint.first        p50 median 22.9ms
  rapidnav.keystroke.cached p50 median 133.5ms

Future steps land via scripts/perf-progression.mjs --step N+1.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

# Conflicts:
#	docs/perf/sidebar-nav-report.html
#	tests/manual-integration/perf-sidebar-nav.spec.ts
Tried deferring `RemoteAgent.connect()` off the click → first-paint
critical path, with pre-`auth_ok` send buffering and a small inline
indicator on the chat panel. n=5 medium A/B at SHA 552fb246b4b4
(10 history JSONs under docs/perf/history/).

Headline numbers (median across replicates):
  nav.session.ready p50: 115 → 107 ms  (−8 ms, within noise)
  nav.session.ready p95: 156 → 171 ms  (+15 ms, within noise)
  paint.first      p50:  25 → 23 ms   (noise)
  ws.attach        p50:  46 → 59 ms   (+13 ms, opposite direction)
  rapidnav.keystroke.cached p50: 131 → 117 ms (−13 ms, within noise)

No span clears the ≥100 ms p50 reduction bar, none move from
>100 ms to <100 ms, and all median deltas sit inside the per-arm
min/max ranges (i.e. inside the noise floor).

Why the hypothesis missed: `connectToSession()` already constructs
the ChatPanel and calls `renderApp()` BEFORE `await remote.connect()`,
so the `nav.session.ready` sentinel (`pi-chat-panel` committed +
`appView === 'authenticated'`) closes on the first paint and never
sees ws.attach on its critical path. Removing the await can't move a
span that didn't include it.

Per the HOW-TO-REPEAT §7 discipline:
- Reverted all src changes (remote-agent.ts, session-manager.ts,
  perf-flags.ts entry + const).
- Deleted the unit test (tests/defer-ws-attach.spec.ts +
  fixtures/defer-ws-attach.html); the plumbing is gone, so the test
  has nothing to pin.
- Kept the 10 history JSONs + §6.4 postmortem in
  docs/perf/sidebar-nav-baseline.md as the durable record.
- docs/perf/sidebar-nav-report.html regenerated by the harness with
  the opt-f-{off,on} A/B pair included.

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

# Conflicts:
#	docs/perf/sidebar-nav-baseline.md
#	docs/perf/sidebar-nav-report.html
…ure)

A/B at ea634d6, n=5, realistic-large fixture (200 msgs x 32 sessions).
Critical-span p50 medians:

  nav.session.ready           127.0 -> 116.4  (-10.6 ms, within noise)
  nav.session.cold            332.5 -> 318.2  (-14.3 ms, within noise)
  nav.goal.ready               31.4 ->  32.3  (+0.9  ms, noise)
  nav.goal.cold              1752.8 -> 1753.2 (+0.4  ms, noise)
  paint.first                  23.3 ->  22.9  (-0.4  ms, noise)
  rapidnav.keystroke.cached   135.4 -> 126.0  (-9.4  ms, within noise)
  rapidnav.keystroke.uncached 136.6 -> 138.4  (+1.8  ms, noise)

Largest critical-span move is -14 ms (nav.session.cold), an order of
magnitude below the >=100 ms p50 threshold from HOW-TO-REPEAT section 5.
Every delta sits inside the run-to-run noise floor (off/on ranges
overlap on every row). No span crosses the 100 ms snappy threshold
under either arm. The report marks all four opt-g pair-rows as
"within noise".

Why the theoretical win didn't materialise: Opt-A already defers
off-screen messages behind an IntersectionObserver, so the bulk of
code-block density (which lives off-screen in realistic transcripts)
is already deferred. The eager-tail messages that render synchronously
are dominated by markdown / DOM layout cost, not hljs tokenisation --
paint.first p50 is flat +/-0.4 ms across arms. Opt-G stacked deferral
on deferral and ran out of marginal wins on the metric the harness
keys the decision on.

Reverts 5f0fdeb (Opt-G implementation). Keeps:
  - docs/perf/history/ea634d62dc3b-opt-g-{off,on}-{1..5}.json
    (10 history JSONs -- durable evidence behind the decision)
  - docs/perf/sidebar-nav-baseline.md section 6.6 -- postmortem
  - regenerated docs/perf/sidebar-nav-report.html (now 4 A/B pairs)

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>
@SuuBro SuuBro changed the title Profile sidebar nav perf — instrumentation, baseline, cross-commit report [FOLLOW-UP — parked] Perf instrumentation, harness, cross-commit report May 14, 2026
…532b0

Co-authored-by: bobbit-ai <bobbit@bobbit.ai>

# Conflicts:
#	.gitignore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants