perf(chat): batch and pack SQLite writes for chat persistence by threepointone · Pull Request #1686 · cloudflare/agents

threepointone · 2026-06-05T22:06:51Z

Summary

Reduces SQLite statements written, rows written, and rows scanned on replay across the chat-persistence paths in agents, @cloudflare/ai-chat, and @cloudflare/think.

The headline change is chunk packing in ResumableStream: instead of writing one SQLite row per streamed chunk, each buffer flush now writes a single packed row. Combined with batched deletes elsewhere, a normal turn drops from hundreds of single-row writes to a handful.

There are no user-facing behaviour changes beyond an internal, forward-compatible storage-format change for stream chunks (details below).

Motivation

Stream chunks were persisted one-INSERT-per-chunk, one-row-per-chunk. For a medium assistant reply (~250 chunks) that is ~250 statements and ~250 rows per turn, repeated for every turn. Rows-written and rows-scanned are the meaningful SQLite cost/perf metrics, so collapsing them is a direct win for cost, latency, and replay/reconstruction time.

What changed

1. Pack stream chunks into one row per flush (`agents` — `ResumableStream`)

flushBuffer() now writes one row per flush:
- Multi-chunk segment → body is a JSON array of the chunk bodies.
- Single-chunk segment → body stored unwrapped (legacy object shape), so a large lone chunk avoids JSON array-escaping inflation.
storeChunk() gains a per-segment byte cap (SEGMENT_MAX_BYTES = 512 KB): if appending a chunk would push the buffered segment over the cap, it flushes first. So a large chunk lands alone (unwrapped), and packed multi-chunk segments stay well under the 2 MB SQLite row limit even after re-escaping. The existing >1.8 MB per-chunk skip is unchanged.
All reads transparently unpack both packed segments and legacy per-chunk rows via unpackSegmentBody():
- replayChunks, replayCompletedChunksByRequestId
- getStreamChunks (now returns a running per-chunk index — stable across calls because rows are append-only)
chunk_index is now a per-segment ordering index; restore() resumes it past max(chunk_index).
Removed the now-unused multi-row INSERT helper (buildMultiRowInsertStrings) — packing supersedes it. Kept MAX_BOUND_PARAMS and buildInClauseStrings (used by the batched deletes), exported from agents/chat.

2. Fix agent-as-tool forwarding for the packed format (`@cloudflare/ai-chat`)

_getAgentToolStoredChunks() previously read the chunk table raw and filtered on chunk_index. Under packing, body became a packed array and chunk_index became a segment index — breaking tailing. It now delegates to the unpacking getStreamChunks(), preserving the exact per-chunk sequence semantics that align with the in-memory live counter (_agentToolLiveSequences), so a tailing parent transitions from stored replay to live forwarding without gaps or duplicates.

3. Batched deletes

@cloudflare/ai-chat: stale-row pruning and maxPersistedMessages enforcement now delete via batched DELETE ... WHERE id IN (...) (capped at 100 bound params/query) instead of one DELETE per row.
@cloudflare/think: deleteSubmissions() cleanup now uses batched DELETE ... WHERE submission_id IN (...).
@cloudflare/ai-chat & @cloudflare/think: the chat-recovery incident TTL sweep now deletes via batched storage.delete(keys) (≤128 keys/call), which also re-enables Durable Object write-coalescing (previously defeated by per-key awaited deletes).

Estimated savings (per turn)

Flush cadence is every 10 chunks (or the byte cap), so rows ≈ ⌈chunks / 10⌉.

Reply size	chunks	INSERT statements (before → after)	rows written (before → after)
Short (~60)	~70	70 → 7	70 → 7
Medium (~250)	~260	260 → 26	260 → 26
Long (~800)	~810	810 → 81	810 → 81

≈ 90% fewer chunk INSERT statements and ~90% fewer chunk rows written per turn (≈ 0.9 × chunks saved on each). Replay/orphan-reconstruction scans the same ~10× fewer rows. A clean turn does no chunk reads; read savings apply per reconnect/resume and during agent-as-tool tailing.

Compatibility

Forward-compatible: new code reads existing legacy per-chunk rows (unpackSegmentBody handles both shapes). Verified by tests that seed legacy rows.
Rollback caveat: old code cannot interpret packed rows, so a rollback would mis-replay only the streams in-flight at rollback time. Stream chunks are ephemeral (24h TTL) and chat recovery papers over it. Accepted by design (discussed and deemed acceptable for ephemeral data).
Durability: unchanged flush cadence; a flush is now a single-row INSERT (marginally more atomic than the previous multi-row write).

Tests

resumable-streaming: packing into fewer rows (45 chunks → 5 rows), single-flush packing, single unwrapped row, byte-cap splitting a large chunk into its own row, and backward-compat reads of legacy per-chunk rows. Added getStreamChunkRowCount + insertLegacyChunkRows test helpers.
agent-tools: assert forwarded chunks are individual (non-array) events with contiguous per-chunk sequences and a correct afterSequence cursor (laterChunks === chunks.slice(1)).
think-session / worker fixtures: updated to read via the unpacking getStreamChunks.
Chat-recovery suites exercised against the packed format (orphan reconstruction, settled-tool-result durability boundary, persist/no-persist, retry-vs-continue, fiber recovery).

Verification

pnpm run check — sherif, export checks, oxfmt, oxlint, and typecheck (92 projects): green.
Tests — @cloudflare/ai-chat 633, @cloudflare/think 549, agents chat 231; recovery-focused: ai-chat 59, agents 5.

Changeset

.changeset/batch-stream-chunk-writes.md — patch bumps for agents, @cloudflare/ai-chat, @cloudflare/think.

Cost impact (Durable Objects pricing)

This change targets rows written (chunk packing ~90% fewer, plus batched deletes) and rows read (~90% fewer rows scanned on replay/reconstruction). It does not materially change duration or request billing.

Why it matters: LLM streaming emits hundreds of tiny chunks per turn, each previously its own row write at $1.00 / M rows, whereas duration is $12.50 / M GB-s but only a fraction of a GB-s per turn — so for chat agents, row writes dominate the bill (~10× duration).

Worked "medium" turn (~250-chunk reply, ~15 s active streaming, hibernates between turns):

Dimension	Before	After	Cost before	Cost after
Duration (0.125 GB × 15 s = 1.875 GB-s)	1.875 GB-s	~1.875 GB-s	~23.4 µ$	~23.4 µ$
Rows written (chunks ~250 + msgs/meta ~30)	~280	~40	~280 µ$	~40 µ$
Total			~304 µ$	~64 µ$

≈ 79% lower per-turn cost for a streaming-dominant turn (the dominant write term drops ~85%; unchanged duration is the floor).

Overall savings are workload-dependent:

Chat/streaming-dominant (hibernates between turns — i.e. AIChatAgent/think chat): ~60–80% off DO compute + storage-write cost.
Mixed/typical agent: ~30–50%.
Compute/tool-heavy (long wall-clock dominates duration): ~10–25% (duration is unchanged by this PR).

Scale "cliff": the included allowance is 50 M rows written/month. At ~1 M turns/month, chunk writes drop from ~280 M rows (≈ $230/mo billable) to ~40 M rows — below the free tier, i.e. potentially ~$0.

Caveats: hibernation behaviour is unchanged (idle-between-turns is already free); the duration win from ~~260→~~26 INSERTs/turn is negligible (<1%); rows-read savings are real but financially tiny at $0.001/M (the win there is latency, not cost). All figures are order-of-magnitude estimates anchored to published Workers Paid rates; SQLite storage billing has been in effect since Jan 7, 2026.

Reduces SQL statements, rows written, and rows scanned on replay across the chat-persistence paths in agents, @cloudflare/ai-chat, and @cloudflare/think. No user-facing behaviour changes beyond an internal, forward-compatible storage-format change for stream chunks. ResumableStream — pack stream chunks (agents) - flushBuffer() now writes ONE row per flush instead of one row per chunk. A multi-chunk segment is stored as a JSON array of chunk bodies; a single-chunk segment is stored unwrapped (legacy object shape) so large chunks avoid array-escaping inflation. - storeChunk() gains a per-segment byte cap (SEGMENT_MAX_BYTES = 512 KB): if adding a chunk would push the buffered segment over the cap, it flushes first, so a large chunk lands alone (unwrapped) and packed segments stay well under the 2 MB SQLite row limit even after JSON re-escaping. The existing >1.8 MB per-chunk skip is unchanged. - All reads transparently unpack both packed segments and legacy per-chunk rows via unpackSegmentBody(): replayChunks, replayCompletedChunksByRequestId, and getStreamChunks (which now returns a running per-chunk index, stable across calls because rows are append-only). - chunk_index is now a per-segment ordering index; restore() resumes it past max(chunk_index). Removed the now-unused multi-row INSERT machinery (buildMultiRowInsertStrings) from sql-batch.ts and the agents/chat barrel. Net effect: ~10x fewer chunk INSERT statements AND ~10x fewer chunk rows written per turn; replay/reconstruction scan ~10x fewer rows. agent-as-tool forwarding fix (ai-chat) - _getAgentToolStoredChunks() previously read the chunk table raw and filtered on chunk_index. With packing, body became a packed array and chunk_index became a segment index, breaking tailing. It now delegates to the unpacking getStreamChunks(), preserving the exact per-chunk sequence semantics that align with the in-memory live counter (_agentToolLiveSequences) so a tailing parent transitions from stored replay to live without gaps or duplicates. Batched deletes - ai-chat: stale-row pruning and maxPersistedMessages enforcement now delete via batched DELETE ... WHERE id IN (...) (capped at 100 bound params). - think: deleteSubmissions() cleanup now uses batched DELETE ... WHERE submission_id IN (...). - ai-chat & think: chat-recovery incident TTL sweep now deletes via batched storage.delete(keys) (<=128 keys/call), re-enabling DO write-coalescing. Shared helpers (agents/chat) - Export MAX_BOUND_PARAMS and buildInClauseStrings from sql-batch.ts. Tests - resumable-streaming: packing into fewer rows, single-flush packing, single unwrapped row, byte-cap splitting large chunks into their own row, and backward-compat reads of legacy per-chunk rows (+ getStreamChunkRowCount and insertLegacyChunkRows test helpers). - agent-tools: assert forwarded chunks are individual (non-array) events with contiguous per-chunk sequences and a correct afterSequence cursor. - think-session / worker fixtures updated to read via the unpacking getStreamChunks. Compatibility - Forward-compatible: new code reads existing legacy rows. Rollback is not backward-compatible for streams in-flight at rollback time (old code cannot interpret packed rows); chunks are ephemeral (24h TTL) and recovery papers over it. Accepted by design. Verification: npm run check (sherif, export checks, oxfmt, oxlint, typecheck across 92 projects) green; tests pass — ai-chat 633, think 549, agents chat 231, plus recovery suites (ai-chat 59, agents 5).

changeset-bot · 2026-06-05T22:06:56Z

🦋 Changeset detected

Latest commit: 9e8dc8b

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
agents	Patch
@cloudflare/ai-chat	Patch
@cloudflare/think	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2026-06-05T22:13:37Z

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1686

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1686

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1686

hono-agents

npm i https://pkg.pr.new/hono-agents@1686

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1686

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1686

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1686

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1686

commit: 9e8dc8b

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

devin-ai-integration Bot reviewed Jun 5, 2026

View reviewed changes

threepointone merged commit 1e49880 into main Jun 5, 2026
5 checks passed

threepointone deleted the perf/batch-pack-sqlite-writes branch June 5, 2026 22:25

github-actions Bot mentioned this pull request Jun 5, 2026

Version Packages #1687

Merged

redbg mentioned this pull request Jun 6, 2026

Add server-side option to disable resumable stream chunk persistence #1681

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(chat): batch and pack SQLite writes for chat persistence#1686

perf(chat): batch and pack SQLite writes for chat persistence#1686
threepointone merged 1 commit into
mainfrom
perf/batch-pack-sqlite-writes

threepointone commented Jun 5, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 5, 2026

Uh oh!

pkg-pr-new Bot commented Jun 5, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

threepointone commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

What changed

1. Pack stream chunks into one row per flush (agents — ResumableStream)

2. Fix agent-as-tool forwarding for the packed format (@cloudflare/ai-chat)

3. Batched deletes

Estimated savings (per turn)

Compatibility

Tests

Verification

Changeset

Cost impact (Durable Objects pricing)

Uh oh!

changeset-bot Bot commented Jun 5, 2026

🦋 Changeset detected

Uh oh!

pkg-pr-new Bot commented Jun 5, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

threepointone commented Jun 5, 2026 •

edited

Loading

1. Pack stream chunks into one row per flush (`agents` — `ResumableStream`)

2. Fix agent-as-tool forwarding for the packed format (`@cloudflare/ai-chat`)