feat(chat): durable internal tool trace persistence#54
Merged
Conversation
Before this PR, the chat loop stored exactly two rows per POST: the
user prompt and the final assistant message (collapsed text + the
final iteration's tool_calls jsonb). Intermediate assistant turns
were dropped on the floor and tool_result blocks were never persisted
— the engine streamed them to Anthropic, fed them back through the
in-memory `config.messages` array, then forgot them. Resume reads
couldn't reconstruct the Anthropic protocol shape Claude had seen on
the prior turn, so multi-iteration conversations effectively lost
their tool history.
This PR persists the full Anthropic-protocol trace under a single
`turn_id` per POST while keeping the user-facing transcript clean.
Schema (migration 009):
ALTER TABLE messages
ADD content_blocks jsonb NULL -- structured Anthropic blocks
ADD turn_id uuid NOT NULL -- groups all rows from one POST
ADD turn_sequence smallint NOT NULL -- deterministic order in batch
ADD iteration smallint NULL -- engine iteration counter
ADD internal boolean NOT NULL DEFAULT false
RLS rewrite:
SELECT internal = false AND user owns conversation
INSERT internal = false AND user owns conversation
Service-role queries bypass RLS and read/write the full trace —
this is a defense-in-depth boundary so a stray client query OR a
buggy route forgetting to pass `includeInternal: false` cannot
leak internal trace rows. Two indexes: one ordering by
(conversation, created_at, turn_sequence) for the resume path, one
partial on `internal = false` for the public transcript hot path.
Persistence shape per POST:
- 1 seed user row internal=false, iteration=NULL, turn_sequence=0
- N assistant rows internal=true for intermediate, false for final
- 0..N tool_result rows internal=true, role='user'
All share one `turn_id`; `turn_sequence` increments per row.
Token columns land only on the final visible assistant row —
Anthropic returns usage as a per-call total, not per-iteration.
Engine:
- `runConversationLoop` accumulates an `IterationTrace[]` and
surfaces it on the `done` event. The in-memory `config.messages`
push for the next AI call is unchanged.
Persistence helpers (`saveChatResult` / `saveApiChatResult`):
- Object-form args.
- One `randomUUID()` per POST allocated as `turnId`.
- `buildTraceRows` composes the row list deterministically and
`db.insertMessages` writes them as a single batched INSERT — one
round-trip instead of N, atomic at the batch level.
Provider surface:
- `MessageInsertInput` exposes the new columns + `internal`.
- `insertMessages(rows[])` is the batch path; `insertMessage` stays
for the rare one-row sites.
- `loadConversationMessages(..., { includeInternal })` — defaults
to false; the chat / Conversation API resume paths set true.
Turn-safe history budget (`buildPromptMessages`):
The single most protocol-critical change in this PR. The old
row-level cutoff could drop an assistant `tool_use` while keeping
its matching `tool_result` (or vice-versa), and Anthropic rejects
that — orphaned tool_use blocks invalidate the request, orphaned
tool_result blocks silently drift the conversation. The walker
now:
1. Groups rows by `turn_id` preserving DB order.
2. Walks groups newest → oldest summing per-group estimates.
3. Drops ENTIRE turns at the budget boundary, never half.
4. If the DB row limit truncated mid-turn (rare), drops the
leading partial group so a turn never starts with a
tool_result missing its tool_use.
Legacy rows without `turn_id` fall back to single-row "turns"
through the helper's null-handling — protocol-safe by definition
since the legacy path never persisted tool blocks.
Read priority in `extractContent`:
content_blocks (post-009) → tool_calls (legacy) → content (text)
Public transcript routes (Studio `/messages.get`, EE `/history.get`)
use the provider default `includeInternal: false`; internal rows
stay hidden at the DB layer (RLS) AND at the provider layer.
Quota unchanged: `agent_usage.message_count` and
`api_message_usage.message_count` increment once per POST via the
existing reservation step. Multiple persisted rows do NOT bill the
user multiple messages — cache and trace persistence are
Contentrain-side observability, not customer-facing meters.
Tests:
- db: trace shape (seed user + assistant + tool_result rows under
one turn_id), single batched insertMessages call, intermediate
rows internal=true / final internal=false, cache token landing,
Conversation API symmetric path.
- conversation-history: content_blocks priority over legacy
columns; turn-grouped budget keeps whole multi-row turns
together (assistant tool_use + matching tool_result); whole
older turns dropped when budget overflows — never half.
- chat-route integration: resume path explicitly passes
includeInternal: true; saveChatResult receives iterations array
instead of the old assistantText/assistantContent pair.
Out of scope:
- Backfill of pre-009 rows. Pre-launch system, no real data.
Legacy rows get distinct turn_id defaults via gen_random_uuid()
and live as single-row turns under the new walker.
- block_kind discriminator column. Discriminator lives inside
each block's `type`; query patterns can grow expression indexes
when there's a real need.
- Per-iteration token breakdown. Provider returns per-call totals
only.
- UI rendering of tool_use chips. Data is there; UI can adapt at
its own pace.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Before this PR the chat loop stored exactly two rows per POST — user prompt and final assistant message — collapsing every multi-iteration tool turn into a flat row pair. Intermediate assistant narration was dropped;
tool_resultblocks were never persisted at all. Resume reads couldn't reconstruct the Anthropic-protocol shape Claude saw on prior turns, so multi-iteration conversations effectively lost their tool history once the request ended.This PR persists the full protocol-replay trace under a single
turn_idper POST while keeping the user-facing transcript clean.Schema (migration 009)
RLS rewrite (the security boundary):
Service-role queries bypass RLS and load the full trace. This is defense-in-depth — a stray client query OR a buggy route forgetting
includeInternal: falsecannot leak internal rows.Persistence shape per POST
internal=false,iteration=NULL,turn_sequence=0internal=truefor intermediate iterations,internal=falsefor the final oneinternal=true,role='user'(matches Anthropic protocol where tool_results are sent as user messages)turn_id;turn_sequenceincrements per rowTurn-safe history budget — the protocol-critical change
The old row-level cutoff could drop an assistant
tool_useblock while keeping its matchingtool_result(or vice-versa); Anthropic rejects orphanedtool_useblocks and silently drifts on orphanedtool_result. The walker now:turn_idpreserving DB orderLegacy rows (no
turn_id) fall back to single-row "turns" — protocol-safe by definition since they never carried tool blocks.Provider surface
MessageInsertInputexposes new columns +internalinsertMessages(rows[])— single batched INSERT, one round-trip instead of N, atomic at the batch levelloadConversationMessages(..., { includeInternal })— defaultsfalse; resume paths settrueQuota — UNCHANGED
agent_usage.message_countandapi_message_usage.message_countstill increment exactly once per POST via the existing reservation step. Multiple persisted rows do NOT bill the user multiple messages. Trace persistence is Contentrain-side observability/replay, not customer-facing metering.Test plan
pnpm typecheckcleanpnpm lint— 0 errors on changed filespnpm test— 622 passed (618 + 4 new)db: trace row shape (seed user + assistant + tool_result under oneturn_id), single batched insertMessages call, intermediate rowsinternal=true/ finalinternal=false, cache token landing on final, Conversation API symmetric pathconversation-history:content_blockspriority over legacy columns; turn-grouped budget keeps whole multi-row turns together; older turns dropped whole when budget overflowschat-routeintegration: resume passesincludeInternal: true;saveChatResultreceivesiterationsarray instead of the oldassistantText/assistantContentpairOut of scope (separate follow-ups)
block_kinddiscriminator column. Discriminator lives inside each block'stype; partial indexes can grow when there's a real query pattern.Diff stat
11 files changed, +684 / −137. Migration + 4 new unit tests + 1 new integration assertion.