feat(billing): revert reserved message slot on failed/aborted chats#51
Merged
Conversation
Both the Studio chat handler and the Conversation API handler reserve
a message slot up-front via `incrementAgentUsageIfAllowed` /
`incrementAPIUsageIfAllowed` so concurrent requests can't double-spend
the last cap slot. Previously every reservation was terminal — a
pre-AI DB error, brain-cache failure, Anthropic auth failure, or
client abort before any tokens streamed back still consumed a slot
even though no LLM cost was incurred.
The runtime rule established in review: a message becomes billable
the moment Anthropic streams its first real provider event (`text` or
`tool_use`). `tool_result` is internal (engine-emitted post tool
exec), `done` is synthetic, and `error` can fire before any tokens —
none of those qualify. Any failure before the first billable event
refunds the slot. After the first billable event we keep the
reservation regardless of downstream outcome (mid-stream error,
client abort, DB save failure) — we paid Anthropic for those tokens.
Migration 007 adds `decrement_agent_usage` and `decrement_api_usage`
SECURITY DEFINER RPCs with a `GREATEST(message_count - 1, 0)` clamp
so a double-revert race (caller's outer catch + inner finally both
calling through best-effort wrappers) can't drive the counter
negative.
Handler restructure mirrors the rule:
- `reserved` / `committed` flags scoped above the request body.
- `tryRevert(reason)` helper logs RPC failures rather than swallowing
them (silent revert failure makes quota drift hard to debug, per
review feedback).
- Studio chat: outer try wraps reserve + all pre-AI work + processChat
launch; outer catch hits `tryRevert('pre-AI')` then rethrows.
ProcessChat's own finally hits `tryRevert('post-reserve')` for any
failure once the SSE pipeline is live.
- Conversation API: single try/finally — non-SSE request/response, so
one scope covers both pre-AI and AI paths.
- `recordAIUsage` / `recordAPIUsage` outbox writes moved into the
for-await loop, gated on the same first-event commit so failed
requests never emit a meter event to Polar.
Three new integration tests cover the matrix:
- Provider errors before any event → revert called, no meter event.
- Provider yields a text event then errors → no revert, meter event
written (committed = true).
- Pre-AI step (loadConversationMessages) throws after reservation →
outer catch fires revert, no meter event.
Token accounting on aborted/partial streams stays out of scope — the
engine still only captures totals on `message_end`, so a stream that
errors after first token writes 0 tokens to the usage row. That's a
separate observability follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Both chat handlers reserve a message slot up-front for race-free cap enforcement, but every reservation was previously terminal. A pre-AI failure (DB error, brain-cache failure, Anthropic auth failure, abort before first token) consumed a slot even though no LLM cost was incurred — workspaces lost monthly quota to work that never happened.
Billing semantic (review-agreed)
tool_resultis internal (engine-emitted post tool exec) → not a commit triggerdoneis synthetic → not a commit triggererrormay fire before any tokens → not a commit triggerPre-first-event failure → refund slot. Post-first-event failure (mid-stream error, abort, save fail) → keep slot, we paid Anthropic.
Changes
Migration
007_agent_usage_revert_rpcs.sqlAdds
decrement_agent_usageanddecrement_api_usageSECURITY DEFINER RPCs withGREATEST(message_count - 1, 0)clamp — double-revert race (caller's outer catch + inner finally both invoking best-effort wrappers) cannot drive the counter negative.Provider surface
decrementAgentUsage({ workspaceId, userId, month, source })decrementAPIUsage({ workspaceId, apiKeyId, month })Handlers
Per review feedback (
finallyinside processChat would miss pre-AI failures because conversation/history/brain/branches load happens at handler scope), the structure now scopes the revert correctly:reserved/committeddeclared above request body; outer try wraps reserve + all pre-AI work + processChat launch +eventStream.send(); outer catch invokestryRevert('pre-AI')then rethrows so h3 converts to HTTP error. ProcessChat's own finally invokestryRevert('post-reserve')for failures once the SSE pipeline is live.recordAIUsage/recordAPIUsagemoved into the for-await loop, gated on the same first-event commit so failed/aborted requests never emit a meter event to Polar.tryRevert(reason)logs RPC failures (console.error) rather than silently swallowing them — silent revert failure makes quota drift hard to debug.Out of scope
Partial token accounting on aborted streams. The engine still only captures
inputTokens/outputTokensonmessage_end, so a stream that errors after first token writes 0 tokens to the usage row. Anthropic billed us for those tokens but the dashboard underreports. Separate observability follow-up — different concern (cost capture vs. quota refund).Test plan
pnpm test— 590 passed (587 + 3 new integration tests)pnpm typecheckcleanpnpm lint— 0 errors on changed filesThree new integration tests in
chat-route.integration.test.ts:refunds the reserved slot when the provider errors before yielding any eventnext()decrementAgentUsagecalled once,recordAIUsageNOT calledkeeps the reservation when the provider yields at least one event before erroringdecrementAgentUsageNOT called,recordAIUsagecalled oncerefunds the reserved slot when a pre-AI step fails after reservationloadConversationMessagesrejectsdecrementAgentUsagecalled once,recordAIUsageNOT called, HTTP 500Conversation API path uses the symmetric helper (
decrementAPIUsage); its integration test coverage stays in the same E2E harness pattern when we add the Conversation API integration suite (separate PR for the harness).