Skip to content

perf(sessions): coalesce each flush batch into one store append per notification run#3147

Open
KarloAldrete wants to merge 2 commits into
PostHog:mainfrom
KarloAldrete:perf/batch-session-flush-appends
Open

perf(sessions): coalesce each flush batch into one store append per notification run#3147
KarloAldrete wants to merge 2 commits into
PostHog:mainfrom
KarloAldrete:perf/batch-session-flush-appends

Conversation

@KarloAldrete

Copy link
Copy Markdown
Contributor

Problem

The 16ms flush introduced in #3063 batches processing, but not store writes: flushSessionEvents still hands its buffered events to handleSessionEvent one at a time, so a 15-event burst does 15 setState calls through the session store. Each write is O(transcript) — immer produces a new session map and every store subscriber gets notified — so during heavy streaming the store write path alone costs, measured on a real 30k-event transcript, ~248ms of main-thread time per streamed second (V8, see below). It scales linearly with transcript length, which is why long-running tasks get progressively more janky while streaming: components subscribed to the session (useSessionForTask callers) get re-render-notified ~900 times/second.

Changes

handleSessionEvents (new) applies one flush batch per task: runs of plain notifications — agent message chunks and tool-call updates, the bulk of a turn — are coalesced into a single appendEvents call, followed by one updatePromptStateFromEvents for the run (it already takes arrays).

  • Events carrying a JSON-RPC id (user-prompt echoes, stop-reason responses) still go through handleSessionEvent individually, unchanged, so turn-lifecycle semantics (optimistic replacement, currentPromptId matching, queue draining, prompt-complete notifications) are untouched.
  • Global order is preserved: the pending run is flushed before each id-carrying event is handled.
  • The notification side effects (config/usage/adapter/compaction updates) split out into applyNotificationEffects, shared verbatim by both paths.

Complements #3132 (which adds useSessionSelector to narrow the read side); this PR narrows the write side. No overlap in files.

How did you test this?

  • New unit tests in sessionEventBatching.test.ts: a 3-chunk burst produces exactly one appendEvents call in order; an id-carrying echo splits the run but preserves global transcript order; a stop-reason response batched together with chunks still fires notifyPromptComplete. (The harness updateSession mock now replaces the session object like the real store, instead of mutating in place — the stop-reason check depends on pre-update snapshots.)
  • Full suites: @posthog/core 1,972 passed, @posthog/ui sessions 317 passed; core typecheck and Biome clean.
  • Measured with the included scripts/measure-append-hotpath.ts (imports the real store module; run with tsx on V8): 60 flushes × 15 events over a seeded transcript, median of 7 runs:
Transcript per-event (main) coalesced (this PR)
10,000 events 20.9 ms/s 2.9 ms/s ~7x
30,000 events 248.1 ms/s 25.9 ms/s ~10x

Store subscriber notifications during streaming drop from ~900/s to ~60/s.

Honest caveats: numbers are store-write CPU measured in isolation (Node/V8), not end-to-end frame timings in the app; and a handwritten (immer-free) appendEvents was also tried and measured within noise of the immer version on the real module, so it was deliberately left out to keep the diff minimal.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Created with PostHog Code

…otification run

Every 16ms flush handed its buffered events to handleSessionEvent one at a
time, so a 15-event burst did 15 setState calls through the session store —
each one an O(transcript) walk (session-map copy + subscriber notification
fan-out). On a 30k-event transcript that is ~248ms of main-thread time per
second of heavy streaming, burned before React paints anything.

handleSessionEvents now coalesces runs of plain notifications (agent message
chunks, tool-call updates — the bulk of a turn) into a single appendEvents
call per flush. Events carrying a JSON-RPC id (user-prompt echoes,
stop-reason responses) still go through handleSessionEvent individually, so
turn-lifecycle semantics are untouched; global order is preserved by flushing
the pending run before each id-carrying event. The notification side effects
(config/usage/adapter/compaction updates) split into applyNotificationEffects
so both paths share them.

Measured with scripts/measure-append-hotpath.ts (real store module, V8 via
tsx): 60 flushes x 15 events over a 30k-event transcript drops from ~248ms to
~26ms per streamed second (~10x); store subscriber notifications drop from
900/s to 60/s.

Generated-By: PostHog Code
Task-Id: e3bcc511-15af-4f46-a3c4-7b29e65f026b
@trunk-io

trunk-io Bot commented Jul 4, 2026

Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@greptile-apps

greptile-apps Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "perf(sessions): coalesce each flush batc..." | Re-trigger Greptile

Comment on lines +186 to +219
it("coalesces a run of plain notifications into one appendEvents call", () => {
const h = createHarness();

h.emit(chunk("a"));
h.emit(chunk("b"));
h.emit(chunk("c"));
vi.advanceTimersByTime(FLUSH_MS);

// One store write for the whole run — not one per event. Each write costs
// O(transcript) (map copy + subscriber fan-out), so this is the batching
// that keeps big transcripts smooth during streaming.
expect(h.appendEvents).toHaveBeenCalledTimes(1);
expect(h.appendEvents.mock.calls[0][1].map(chunkText)).toEqual([
"a",
"b",
"c",
]);
});

it("an id-carrying event splits the run but preserves global order", () => {
const h = createHarness();

h.emit(chunk("a"));
h.emit(chunk("b"));
h.emit(promptEcho(7, "user steer-less prompt"));
h.emit(chunk("c"));
vi.advanceTimersByTime(FLUSH_MS);

// Two coalesced appends around the individually-handled echo.
expect(h.appendEvents).toHaveBeenCalledTimes(2);
expect(h.appendEvents.mock.calls[0][1].map(chunkText)).toEqual(["a", "b"]);
expect(h.appendEvents.mock.calls[1][1].map(chunkText)).toEqual(["c"]);
expect(h.events().map(chunkText)).toEqual(["a", "b", "c"]);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Prefer parameterised tests for the two coalescing cases

The team rule is to prefer parameterised tests. "coalesces a run of plain notifications" and "an id-carrying event splits the run" both exercise the same handleSessionEvents coalescing behaviour with different event sequences, producing different expected appendEvents call counts and groupings. They are strong candidates for a it.each table covering (events, expectedGroups) pairs, which would also make it easy to add edge cases (e.g. two consecutive id-carrying events, or a batch of a single notification) without new boilerplate createHarness blocks.

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4abb864 — the two cases are now an it.each table of (emitted events → expected append groups) pairs, plus two new edge cases: consecutive id-carrying events (no empty appends) and a single-notification batch.

Comment thread scripts/measure-append-hotpath.ts Outdated
Comment on lines +1 to +4
// Mide el hot path real de appendEvents (módulo real, sin réplicas):
// 1 segundo de streaming pesado = 60 flushes x 15 eventos sobre un transcript
// de N eventos. Correr en main y en la rama para comparar.
// bun scripts/measure-append-hotpath.ts

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Spanish comments and output strings in an English codebase

The top-of-file comment (Mide el hot path real de appendEvents…), the output labels (per-evento x15, batcheado x1, mediana, ms/seg-streaming), and variable names inside console.log are all in Spanish, while the rest of the codebase uses English. A developer unfamiliar with Spanish reading the benchmark output won't immediately understand the columns. The PR description table already uses English equivalents ("per-event", "coalesced", "median").

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4abb864 — comments and output labels are now in English (matching the PR description's table wording).

…pt to English

Review feedback: the coalescing tests now share an it.each table (and gained
two edge cases: consecutive id-carrying events, single-notification batch),
and the benchmark script's comments and output labels are in English.

Generated-By: PostHog Code
Task-Id: e3bcc511-15af-4f46-a3c4-7b29e65f026b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant