Skip to content

perf(chat): batch and pack SQLite writes for chat persistence#1686

Merged
threepointone merged 1 commit into
mainfrom
perf/batch-pack-sqlite-writes
Jun 5, 2026
Merged

perf(chat): batch and pack SQLite writes for chat persistence#1686
threepointone merged 1 commit into
mainfrom
perf/batch-pack-sqlite-writes

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

@threepointone threepointone commented Jun 5, 2026

Summary

Reduces SQLite statements written, rows written, and rows scanned on replay across the chat-persistence paths in agents, @cloudflare/ai-chat, and @cloudflare/think.

The headline change is chunk packing in ResumableStream: instead of writing one SQLite row per streamed chunk, each buffer flush now writes a single packed row. Combined with batched deletes elsewhere, a normal turn drops from hundreds of single-row writes to a handful.

There are no user-facing behaviour changes beyond an internal, forward-compatible storage-format change for stream chunks (details below).

Motivation

Stream chunks were persisted one-INSERT-per-chunk, one-row-per-chunk. For a medium assistant reply (~250 chunks) that is ~250 statements and ~250 rows per turn, repeated for every turn. Rows-written and rows-scanned are the meaningful SQLite cost/perf metrics, so collapsing them is a direct win for cost, latency, and replay/reconstruction time.

What changed

1. Pack stream chunks into one row per flush (agentsResumableStream)

  • flushBuffer() now writes one row per flush:
    • Multi-chunk segment → body is a JSON array of the chunk bodies.
    • Single-chunk segment → body stored unwrapped (legacy object shape), so a large lone chunk avoids JSON array-escaping inflation.
  • storeChunk() gains a per-segment byte cap (SEGMENT_MAX_BYTES = 512 KB): if appending a chunk would push the buffered segment over the cap, it flushes first. So a large chunk lands alone (unwrapped), and packed multi-chunk segments stay well under the 2 MB SQLite row limit even after re-escaping. The existing >1.8 MB per-chunk skip is unchanged.
  • All reads transparently unpack both packed segments and legacy per-chunk rows via unpackSegmentBody():
    • replayChunks, replayCompletedChunksByRequestId
    • getStreamChunks (now returns a running per-chunk index — stable across calls because rows are append-only)
  • chunk_index is now a per-segment ordering index; restore() resumes it past max(chunk_index).
  • Removed the now-unused multi-row INSERT helper (buildMultiRowInsertStrings) — packing supersedes it. Kept MAX_BOUND_PARAMS and buildInClauseStrings (used by the batched deletes), exported from agents/chat.

2. Fix agent-as-tool forwarding for the packed format (@cloudflare/ai-chat)

_getAgentToolStoredChunks() previously read the chunk table raw and filtered on chunk_index. Under packing, body became a packed array and chunk_index became a segment index — breaking tailing. It now delegates to the unpacking getStreamChunks(), preserving the exact per-chunk sequence semantics that align with the in-memory live counter (_agentToolLiveSequences), so a tailing parent transitions from stored replay to live forwarding without gaps or duplicates.

3. Batched deletes

  • @cloudflare/ai-chat: stale-row pruning and maxPersistedMessages enforcement now delete via batched DELETE ... WHERE id IN (...) (capped at 100 bound params/query) instead of one DELETE per row.
  • @cloudflare/think: deleteSubmissions() cleanup now uses batched DELETE ... WHERE submission_id IN (...).
  • @cloudflare/ai-chat & @cloudflare/think: the chat-recovery incident TTL sweep now deletes via batched storage.delete(keys) (≤128 keys/call), which also re-enables Durable Object write-coalescing (previously defeated by per-key awaited deletes).

Estimated savings (per turn)

Flush cadence is every 10 chunks (or the byte cap), so rows ≈ ⌈chunks / 10⌉.

Reply size chunks INSERT statements (before → after) rows written (before → after)
Short (~60) ~70 70 → 7 70 → 7
Medium (~250) ~260 260 → 26 260 → 26
Long (~800) ~810 810 → 81 810 → 81

90% fewer chunk INSERT statements and ~90% fewer chunk rows written per turn (≈ 0.9 × chunks saved on each). Replay/orphan-reconstruction scans the same ~10× fewer rows. A clean turn does no chunk reads; read savings apply per reconnect/resume and during agent-as-tool tailing.

Compatibility

  • Forward-compatible: new code reads existing legacy per-chunk rows (unpackSegmentBody handles both shapes). Verified by tests that seed legacy rows.
  • Rollback caveat: old code cannot interpret packed rows, so a rollback would mis-replay only the streams in-flight at rollback time. Stream chunks are ephemeral (24h TTL) and chat recovery papers over it. Accepted by design (discussed and deemed acceptable for ephemeral data).
  • Durability: unchanged flush cadence; a flush is now a single-row INSERT (marginally more atomic than the previous multi-row write).

Tests

  • resumable-streaming: packing into fewer rows (45 chunks → 5 rows), single-flush packing, single unwrapped row, byte-cap splitting a large chunk into its own row, and backward-compat reads of legacy per-chunk rows. Added getStreamChunkRowCount + insertLegacyChunkRows test helpers.
  • agent-tools: assert forwarded chunks are individual (non-array) events with contiguous per-chunk sequences and a correct afterSequence cursor (laterChunks === chunks.slice(1)).
  • think-session / worker fixtures: updated to read via the unpacking getStreamChunks.
  • Chat-recovery suites exercised against the packed format (orphan reconstruction, settled-tool-result durability boundary, persist/no-persist, retry-vs-continue, fiber recovery).

Verification

  • pnpm run check — sherif, export checks, oxfmt, oxlint, and typecheck (92 projects): green.
  • Tests — @cloudflare/ai-chat 633, @cloudflare/think 549, agents chat 231; recovery-focused: ai-chat 59, agents 5.

Changeset

.changeset/batch-stream-chunk-writes.md — patch bumps for agents, @cloudflare/ai-chat, @cloudflare/think.


Open in Devin Review

Cost impact (Durable Objects pricing)

This change targets rows written (chunk packing ~90% fewer, plus batched deletes) and rows read (~90% fewer rows scanned on replay/reconstruction). It does not materially change duration or request billing.

Why it matters: LLM streaming emits hundreds of tiny chunks per turn, each previously its own row write at $1.00 / M rows, whereas duration is $12.50 / M GB-s but only a fraction of a GB-s per turn — so for chat agents, row writes dominate the bill (~10× duration).

Worked "medium" turn (~250-chunk reply, ~15 s active streaming, hibernates between turns):

Dimension Before After Cost before Cost after
Duration (0.125 GB × 15 s = 1.875 GB-s) 1.875 GB-s ~1.875 GB-s ~23.4 µ$ ~23.4 µ$
Rows written (chunks ~250 + msgs/meta ~30) ~280 ~40 ~280 µ$ ~40 µ$
Total ~304 µ$ ~64 µ$

79% lower per-turn cost for a streaming-dominant turn (the dominant write term drops ~85%; unchanged duration is the floor).

Overall savings are workload-dependent:

  • Chat/streaming-dominant (hibernates between turns — i.e. AIChatAgent/think chat): ~60–80% off DO compute + storage-write cost.
  • Mixed/typical agent: ~30–50%.
  • Compute/tool-heavy (long wall-clock dominates duration): ~10–25% (duration is unchanged by this PR).

Scale "cliff": the included allowance is 50 M rows written/month. At ~1 M turns/month, chunk writes drop from ~280 M rows (≈ $230/mo billable) to ~40 M rows — below the free tier, i.e. potentially ~$0.

Caveats: hibernation behaviour is unchanged (idle-between-turns is already free); the duration win from 260→26 INSERTs/turn is negligible (<1%); rows-read savings are real but financially tiny at $0.001/M (the win there is latency, not cost). All figures are order-of-magnitude estimates anchored to published Workers Paid rates; SQLite storage billing has been in effect since Jan 7, 2026.

Reduces SQL statements, rows written, and rows scanned on replay across
the chat-persistence paths in agents, @cloudflare/ai-chat, and
@cloudflare/think. No user-facing behaviour changes beyond an internal,
forward-compatible storage-format change for stream chunks.

ResumableStream — pack stream chunks (agents)
- flushBuffer() now writes ONE row per flush instead of one row per chunk.
  A multi-chunk segment is stored as a JSON array of chunk bodies; a
  single-chunk segment is stored unwrapped (legacy object shape) so large
  chunks avoid array-escaping inflation.
- storeChunk() gains a per-segment byte cap (SEGMENT_MAX_BYTES = 512 KB):
  if adding a chunk would push the buffered segment over the cap, it flushes
  first, so a large chunk lands alone (unwrapped) and packed segments stay
  well under the 2 MB SQLite row limit even after JSON re-escaping. The
  existing >1.8 MB per-chunk skip is unchanged.
- All reads transparently unpack both packed segments and legacy per-chunk
  rows via unpackSegmentBody(): replayChunks, replayCompletedChunksByRequestId,
  and getStreamChunks (which now returns a running per-chunk index, stable
  across calls because rows are append-only).
- chunk_index is now a per-segment ordering index; restore() resumes it past
  max(chunk_index). Removed the now-unused multi-row INSERT machinery
  (buildMultiRowInsertStrings) from sql-batch.ts and the agents/chat barrel.

Net effect: ~10x fewer chunk INSERT statements AND ~10x fewer chunk rows
written per turn; replay/reconstruction scan ~10x fewer rows.

agent-as-tool forwarding fix (ai-chat)
- _getAgentToolStoredChunks() previously read the chunk table raw and filtered
  on chunk_index. With packing, body became a packed array and chunk_index
  became a segment index, breaking tailing. It now delegates to the unpacking
  getStreamChunks(), preserving the exact per-chunk sequence semantics that
  align with the in-memory live counter (_agentToolLiveSequences) so a tailing
  parent transitions from stored replay to live without gaps or duplicates.

Batched deletes
- ai-chat: stale-row pruning and maxPersistedMessages enforcement now delete
  via batched DELETE ... WHERE id IN (...) (capped at 100 bound params).
- think: deleteSubmissions() cleanup now uses batched
  DELETE ... WHERE submission_id IN (...).
- ai-chat & think: chat-recovery incident TTL sweep now deletes via batched
  storage.delete(keys) (<=128 keys/call), re-enabling DO write-coalescing.

Shared helpers (agents/chat)
- Export MAX_BOUND_PARAMS and buildInClauseStrings from sql-batch.ts.

Tests
- resumable-streaming: packing into fewer rows, single-flush packing,
  single unwrapped row, byte-cap splitting large chunks into their own row,
  and backward-compat reads of legacy per-chunk rows (+ getStreamChunkRowCount
  and insertLegacyChunkRows test helpers).
- agent-tools: assert forwarded chunks are individual (non-array) events with
  contiguous per-chunk sequences and a correct afterSequence cursor.
- think-session / worker fixtures updated to read via the unpacking
  getStreamChunks.

Compatibility
- Forward-compatible: new code reads existing legacy rows. Rollback is not
  backward-compatible for streams in-flight at rollback time (old code cannot
  interpret packed rows); chunks are ephemeral (24h TTL) and recovery papers
  over it. Accepted by design.

Verification: npm run check (sherif, export checks, oxfmt, oxlint, typecheck
across 92 projects) green; tests pass — ai-chat 633, think 549, agents chat
231, plus recovery suites (ai-chat 59, agents 5).
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 5, 2026

🦋 Changeset detected

Latest commit: 9e8dc8b

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
agents Patch
@cloudflare/ai-chat Patch
@cloudflare/think Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 5, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1686

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1686

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1686

hono-agents

npm i https://pkg.pr.new/hono-agents@1686

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1686

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1686

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1686

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1686

commit: 9e8dc8b

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@threepointone threepointone merged commit 1e49880 into main Jun 5, 2026
5 checks passed
@threepointone threepointone deleted the perf/batch-pack-sqlite-writes branch June 5, 2026 22:25
@github-actions github-actions Bot mentioned this pull request Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant