perf(sessions): coalesce each flush batch into one store append per notification run by KarloAldrete · Pull Request #3147 · PostHog/code

KarloAldrete · 2026-07-04T22:48:40Z

Problem

The 16ms flush introduced in #3063 batches processing, but not store writes: flushSessionEvents still hands its buffered events to handleSessionEvent one at a time, so a 15-event burst does 15 setState calls through the session store. Each write is O(transcript) — immer produces a new session map and every store subscriber gets notified — so during heavy streaming the store write path alone costs, measured on a real 30k-event transcript, ~248ms of main-thread time per streamed second (V8, see below). It scales linearly with transcript length, which is why long-running tasks get progressively more janky while streaming: components subscribed to the session (useSessionForTask callers) get re-render-notified ~900 times/second.

Changes

handleSessionEvents (new) applies one flush batch per task: runs of plain notifications — agent message chunks and tool-call updates, the bulk of a turn — are coalesced into a single appendEvents call, followed by one updatePromptStateFromEvents for the run (it already takes arrays).

Events carrying a JSON-RPC id (user-prompt echoes, stop-reason responses) still go through handleSessionEvent individually, unchanged, so turn-lifecycle semantics (optimistic replacement, currentPromptId matching, queue draining, prompt-complete notifications) are untouched.
Global order is preserved: the pending run is flushed before each id-carrying event is handled.
The notification side effects (config/usage/adapter/compaction updates) split out into applyNotificationEffects, shared verbatim by both paths.

Complements #3132 (which adds useSessionSelector to narrow the read side); this PR narrows the write side. No overlap in files.

How did you test this?

New unit tests in sessionEventBatching.test.ts: a 3-chunk burst produces exactly one appendEvents call in order; an id-carrying echo splits the run but preserves global transcript order; a stop-reason response batched together with chunks still fires notifyPromptComplete. (The harness updateSession mock now replaces the session object like the real store, instead of mutating in place — the stop-reason check depends on pre-update snapshots.)
Full suites: @posthog/core 1,972 passed, @posthog/ui sessions 317 passed; core typecheck and Biome clean.
Measured with the included scripts/measure-append-hotpath.ts (imports the real store module; run with tsx on V8): 60 flushes × 15 events over a seeded transcript, median of 7 runs:

Transcript	per-event (main)	coalesced (this PR)
10,000 events	20.9 ms/s	2.9 ms/s	~7x
30,000 events	248.1 ms/s	25.9 ms/s	~10x

Store subscriber notifications during streaming drop from ~900/s to ~60/s.

Honest caveats: numbers are store-write CPU measured in isolation (Node/V8), not end-to-end frame timings in the app; and a handwritten (immer-free) appendEvents was also tried and measured within noise of the immer version on the real module, so it was deliberately left out to keep the diff minimal.

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

Created with PostHog Code

…otification run Every 16ms flush handed its buffered events to handleSessionEvent one at a time, so a 15-event burst did 15 setState calls through the session store — each one an O(transcript) walk (session-map copy + subscriber notification fan-out). On a 30k-event transcript that is ~248ms of main-thread time per second of heavy streaming, burned before React paints anything. handleSessionEvents now coalesces runs of plain notifications (agent message chunks, tool-call updates — the bulk of a turn) into a single appendEvents call per flush. Events carrying a JSON-RPC id (user-prompt echoes, stop-reason responses) still go through handleSessionEvent individually, so turn-lifecycle semantics are untouched; global order is preserved by flushing the pending run before each id-carrying event. The notification side effects (config/usage/adapter/compaction updates) split into applyNotificationEffects so both paths share them. Measured with scripts/measure-append-hotpath.ts (real store module, V8 via tsx): 60 flushes x 15 events over a 30k-event transcript drops from ~248ms to ~26ms per streamed second (~10x); store subscriber notifications drop from 900/s to 60/s. Generated-By: PostHog Code Task-Id: e3bcc511-15af-4f46-a3c4-7b29e65f026b

trunk-io · 2026-07-04T22:48:44Z

Merging to main in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

greptile-apps · 2026-07-04T22:54:07Z

_{Reviews (1): Last reviewed commit: "perf(sessions): coalesce each flush batc..." | Re-trigger Greptile}

greptile-apps · 2026-07-04T22:54:14Z

+  it("coalesces a run of plain notifications into one appendEvents call", () => {
+    const h = createHarness();
+
+    h.emit(chunk("a"));
+    h.emit(chunk("b"));
+    h.emit(chunk("c"));
+    vi.advanceTimersByTime(FLUSH_MS);
+
+    // One store write for the whole run — not one per event. Each write costs
+    // O(transcript) (map copy + subscriber fan-out), so this is the batching
+    // that keeps big transcripts smooth during streaming.
+    expect(h.appendEvents).toHaveBeenCalledTimes(1);
+    expect(h.appendEvents.mock.calls[0][1].map(chunkText)).toEqual([
+      "a",
+      "b",
+      "c",
+    ]);
+  });
+
+  it("an id-carrying event splits the run but preserves global order", () => {
+    const h = createHarness();
+
+    h.emit(chunk("a"));
+    h.emit(chunk("b"));
+    h.emit(promptEcho(7, "user steer-less prompt"));
+    h.emit(chunk("c"));
+    vi.advanceTimersByTime(FLUSH_MS);
+
+    // Two coalesced appends around the individually-handled echo.
+    expect(h.appendEvents).toHaveBeenCalledTimes(2);
+    expect(h.appendEvents.mock.calls[0][1].map(chunkText)).toEqual(["a", "b"]);
+    expect(h.appendEvents.mock.calls[1][1].map(chunkText)).toEqual(["c"]);
+    expect(h.events().map(chunkText)).toEqual(["a", "b", "c"]);
+  });


Prefer parameterised tests for the two coalescing cases

The team rule is to prefer parameterised tests. "coalesces a run of plain notifications" and "an id-carrying event splits the run" both exercise the same handleSessionEvents coalescing behaviour with different event sequences, producing different expected appendEvents call counts and groupings. They are strong candidates for a it.each table covering (events, expectedGroups) pairs, which would also make it easy to add edge cases (e.g. two consecutive id-carrying events, or a batch of a single notification) without new boilerplate createHarness blocks.

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Done in 4abb864 — the two cases are now an it.each table of (emitted events → expected append groups) pairs, plus two new edge cases: consecutive id-carrying events (no empty appends) and a single-notification batch.

greptile-apps · 2026-07-04T22:54:15Z

+// Mide el hot path real de appendEvents (módulo real, sin réplicas):
+// 1 segundo de streaming pesado = 60 flushes x 15 eventos sobre un transcript
+// de N eventos. Correr en main y en la rama para comparar.
+//   bun scripts/measure-append-hotpath.ts


Spanish comments and output strings in an English codebase

The top-of-file comment (Mide el hot path real de appendEvents…), the output labels (per-evento x15, batcheado x1, mediana, ms/seg-streaming), and variable names inside console.log are all in Spanish, while the rest of the codebase uses English. A developer unfamiliar with Spanish reading the benchmark output won't immediately understand the columns. The PR description table already uses English equivalents ("per-event", "coalesced", "median").

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Done in 4abb864 — comments and output labels are now in English (matching the PR description's table wording).

…pt to English Review feedback: the coalescing tests now share an it.each table (and gained two edge cases: consecutive id-carrying events, single-notification batch), and the benchmark script's comments and output labels are in English. Generated-By: PostHog Code Task-Id: e3bcc511-15af-4f46-a3c4-7b29e65f026b

greptile-apps Bot reviewed Jul 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(sessions): coalesce each flush batch into one store append per notification run#3147

perf(sessions): coalesce each flush batch into one store append per notification run#3147
KarloAldrete wants to merge 2 commits into
PostHog:mainfrom
KarloAldrete:perf/batch-session-flush-appends

KarloAldrete commented Jul 4, 2026

Uh oh!

trunk-io Bot commented Jul 4, 2026

Uh oh!

greptile-apps Bot commented Jul 4, 2026

Uh oh!

greptile-apps Bot Jul 4, 2026

Uh oh!

KarloAldrete Jul 4, 2026

Uh oh!

greptile-apps Bot Jul 4, 2026

Uh oh!

KarloAldrete Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

KarloAldrete commented Jul 4, 2026

Problem

Changes

How did you test this?

Automatic notifications

Uh oh!

trunk-io Bot commented Jul 4, 2026

Uh oh!

greptile-apps Bot commented Jul 4, 2026

Uh oh!

greptile-apps Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

KarloAldrete Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

KarloAldrete Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant