feat(chat): durable internal tool trace persistence by ABB65 · Pull Request #54 · Contentrain/studio

ABB65 · 2026-05-18T10:09:33Z

Summary

Before this PR the chat loop stored exactly two rows per POST — user prompt and final assistant message — collapsing every multi-iteration tool turn into a flat row pair. Intermediate assistant narration was dropped; tool_result blocks were never persisted at all. Resume reads couldn't reconstruct the Anthropic-protocol shape Claude saw on prior turns, so multi-iteration conversations effectively lost their tool history once the request ended.

This PR persists the full protocol-replay trace under a single turn_id per POST while keeping the user-facing transcript clean.

Schema (migration 009)

ALTER TABLE messages
  ADD content_blocks jsonb NULL           -- structured Anthropic blocks
  ADD turn_id uuid NOT NULL               -- groups all rows from one POST
  ADD turn_sequence smallint NOT NULL     -- deterministic order in batch
  ADD iteration smallint NULL             -- engine iteration counter
  ADD internal boolean NOT NULL DEFAULT false

RLS rewrite (the security boundary):

SELECT  internal = false AND user owns conversation
INSERT  internal = false AND user owns conversation

Service-role queries bypass RLS and load the full trace. This is defense-in-depth — a stray client query OR a buggy route forgetting includeInternal: false cannot leak internal rows.

Persistence shape per POST

1 seed user row — internal=false, iteration=NULL, turn_sequence=0
N assistant rows — internal=true for intermediate iterations, internal=false for the final one
0..N tool_result rows — internal=true, role='user' (matches Anthropic protocol where tool_results are sent as user messages)
All share one turn_id; turn_sequence increments per row
Token columns land only on the final visible assistant row (Anthropic returns per-call totals, not per-iteration)

Turn-safe history budget — the protocol-critical change

The old row-level cutoff could drop an assistant tool_use block while keeping its matching tool_result (or vice-versa); Anthropic rejects orphaned tool_use blocks and silently drifts on orphaned tool_result. The walker now:

Groups rows by turn_id preserving DB order
Walks groups newest → oldest summing per-group estimates
Drops ENTIRE turns at the budget boundary — never half
If the DB row limit truncated mid-turn (rare), drops the leading partial group

Legacy rows (no turn_id) fall back to single-row "turns" — protocol-safe by definition since they never carried tool blocks.

Provider surface

MessageInsertInput exposes new columns + internal
insertMessages(rows[]) — single batched INSERT, one round-trip instead of N, atomic at the batch level
loadConversationMessages(..., { includeInternal }) — defaults false; resume paths set true

Quota — UNCHANGED

agent_usage.message_count and api_message_usage.message_count still increment exactly once per POST via the existing reservation step. Multiple persisted rows do NOT bill the user multiple messages. Trace persistence is Contentrain-side observability/replay, not customer-facing metering.

Test plan

pnpm typecheck clean
pnpm lint — 0 errors on changed files
pnpm test — 622 passed (618 + 4 new)
- db: trace row shape (seed user + assistant + tool_result under one turn_id), single batched insertMessages call, intermediate rows internal=true / final internal=false, cache token landing on final, Conversation API symmetric path
- conversation-history: content_blocks priority over legacy columns; turn-grouped budget keeps whole multi-row turns together; older turns dropped whole when budget overflows
- chat-route integration: resume passes includeInternal: true; saveChatResult receives iterations array instead of the old assistantText/assistantContent pair

Out of scope (separate follow-ups)

Backfill of pre-009 rows. Pre-launch system, no real data; legacy rows live as single-row turns under the new walker.
block_kind discriminator column. Discriminator lives inside each block's type; partial indexes can grow when there's a real query pattern.
Per-iteration token breakdown. Provider returns per-call totals only.
UI rendering of tool_use chips. Data is there; UI can adapt at its own pace.

Diff stat

11 files changed, +684 / −137. Migration + 4 new unit tests + 1 new integration assertion.

Before this PR, the chat loop stored exactly two rows per POST: the user prompt and the final assistant message (collapsed text + the final iteration's tool_calls jsonb). Intermediate assistant turns were dropped on the floor and tool_result blocks were never persisted — the engine streamed them to Anthropic, fed them back through the in-memory `config.messages` array, then forgot them. Resume reads couldn't reconstruct the Anthropic protocol shape Claude had seen on the prior turn, so multi-iteration conversations effectively lost their tool history. This PR persists the full Anthropic-protocol trace under a single `turn_id` per POST while keeping the user-facing transcript clean. Schema (migration 009): ALTER TABLE messages ADD content_blocks jsonb NULL -- structured Anthropic blocks ADD turn_id uuid NOT NULL -- groups all rows from one POST ADD turn_sequence smallint NOT NULL -- deterministic order in batch ADD iteration smallint NULL -- engine iteration counter ADD internal boolean NOT NULL DEFAULT false RLS rewrite: SELECT internal = false AND user owns conversation INSERT internal = false AND user owns conversation Service-role queries bypass RLS and read/write the full trace — this is a defense-in-depth boundary so a stray client query OR a buggy route forgetting to pass `includeInternal: false` cannot leak internal trace rows. Two indexes: one ordering by (conversation, created_at, turn_sequence) for the resume path, one partial on `internal = false` for the public transcript hot path. Persistence shape per POST: - 1 seed user row internal=false, iteration=NULL, turn_sequence=0 - N assistant rows internal=true for intermediate, false for final - 0..N tool_result rows internal=true, role='user' All share one `turn_id`; `turn_sequence` increments per row. Token columns land only on the final visible assistant row — Anthropic returns usage as a per-call total, not per-iteration. Engine: - `runConversationLoop` accumulates an `IterationTrace[]` and surfaces it on the `done` event. The in-memory `config.messages` push for the next AI call is unchanged. Persistence helpers (`saveChatResult` / `saveApiChatResult`): - Object-form args. - One `randomUUID()` per POST allocated as `turnId`. - `buildTraceRows` composes the row list deterministically and `db.insertMessages` writes them as a single batched INSERT — one round-trip instead of N, atomic at the batch level. Provider surface: - `MessageInsertInput` exposes the new columns + `internal`. - `insertMessages(rows[])` is the batch path; `insertMessage` stays for the rare one-row sites. - `loadConversationMessages(..., { includeInternal })` — defaults to false; the chat / Conversation API resume paths set true. Turn-safe history budget (`buildPromptMessages`): The single most protocol-critical change in this PR. The old row-level cutoff could drop an assistant `tool_use` while keeping its matching `tool_result` (or vice-versa), and Anthropic rejects that — orphaned tool_use blocks invalidate the request, orphaned tool_result blocks silently drift the conversation. The walker now: 1. Groups rows by `turn_id` preserving DB order. 2. Walks groups newest → oldest summing per-group estimates. 3. Drops ENTIRE turns at the budget boundary, never half. 4. If the DB row limit truncated mid-turn (rare), drops the leading partial group so a turn never starts with a tool_result missing its tool_use. Legacy rows without `turn_id` fall back to single-row "turns" through the helper's null-handling — protocol-safe by definition since the legacy path never persisted tool blocks. Read priority in `extractContent`: content_blocks (post-009) → tool_calls (legacy) → content (text) Public transcript routes (Studio `/messages.get`, EE `/history.get`) use the provider default `includeInternal: false`; internal rows stay hidden at the DB layer (RLS) AND at the provider layer. Quota unchanged: `agent_usage.message_count` and `api_message_usage.message_count` increment once per POST via the existing reservation step. Multiple persisted rows do NOT bill the user multiple messages — cache and trace persistence are Contentrain-side observability, not customer-facing meters. Tests: - db: trace shape (seed user + assistant + tool_result rows under one turn_id), single batched insertMessages call, intermediate rows internal=true / final internal=false, cache token landing, Conversation API symmetric path. - conversation-history: content_blocks priority over legacy columns; turn-grouped budget keeps whole multi-row turns together (assistant tool_use + matching tool_result); whole older turns dropped when budget overflows — never half. - chat-route integration: resume path explicitly passes includeInternal: true; saveChatResult receives iterations array instead of the old assistantText/assistantContent pair. Out of scope: - Backfill of pre-009 rows. Pre-launch system, no real data. Legacy rows get distinct turn_id defaults via gen_random_uuid() and live as single-row turns under the new walker. - block_kind discriminator column. Discriminator lives inside each block's `type`; query patterns can grow expression indexes when there's a real need. - Per-iteration token breakdown. Provider returns per-call totals only. - UI rendering of tool_use chips. Data is there; UI can adapt at its own pace.

ABB65 merged commit 9369614 into main May 18, 2026
1 check passed

ABB65 deleted the feat/chat-tool-trace-persistence branch May 18, 2026 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chat): durable internal tool trace persistence#54

feat(chat): durable internal tool trace persistence#54
ABB65 merged 1 commit into
mainfrom
feat/chat-tool-trace-persistence

ABB65 commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant