fix(observability): truncate raw SSE frames in error logs + better classification of mid-stream 400s#705
Merged
Conversation
…assification of mid-stream 400s
The `[legacy] DEBUG Agent result with error: API Error: 400 ...` log
line was dumping the full failing SSE response body — hundreds of
`event: message_start` / `data: {"type":"content_block_delta",...}`
framing lines plus `partial_json` `tool_use` deltas — into the
user-visible TUI Logs tab when the Anthropic gateway terminated a
streaming response with a 4xx (most commonly `400 terminated`
mid-stream; Sentry #7442894144).
Two leaks: the SDK Result-message branch in `agent-interface.ts`
called `logToFile('Agent result with error:', message.result)` with
no truncation, AND `agent-runner.ts`'s `abortOnApiError` /
`GATEWAY_DOWN` / soft-error branches interpolated the raw `rawMessage`
straight into user-facing copy and Sentry context. Both surfaced the
SSE body verbatim — past sessions surfaced 50KB+ `log.message`
strings polluting orchestrator context.
Add `suppressSseFrames(message)` and `sanitizeErrorMessageForLog`
helpers to `agent-events.ts` that:
- detect runs of Anthropic SSE protocol frames (event:/data:/bare-JSON
forms for the eight known stream-event subtypes)
- collapse each run into a single `[N SSE frames suppressed]` marker
- cap the result at `MAX_LOG_MESSAGE_LENGTH` (existing 2KB budget)
- preserve any non-frame content (real errors / stack traces riding
alongside the protocol noise survive — same defense as the
existing `stripStreamEventNoise` / `partitionHookBridgeRace` pair)
Apply the sanitizer at every callsite that logs / interpolates an
agent error string: the two `logToFile('Agent result with error:',
message.result)` paths in `agent-interface.ts`, and the GATEWAY_DOWN /
GATEWAY_INVALID_REQUEST / API_ERROR / RATE_LIMIT branches plus the
soft-error pushStatus path in `agent-runner.ts`. Classification still
runs against the raw form (the `400 terminated` regex matches the head
of the message, not the SSE body) — only logging / user-surface paths
take the sanitized form.
Tests: `agent-events-sse-suppression.test.ts` (10 cases — fast-path
no-op, contiguous-block collapse, inline-prefix split, real-error
preservation, singular vs plural wording, bare-JSON form, unknown
event-type passthrough, oversized-input pipeline) covers the matcher
end-to-end and the truncation cap.
Verdict: pre-existing leak in `agent-interface.ts:4220` going back to
the original SDK Result handler; unrelated to #698 (which only adds
TUI per-event status; doesn't touch error logging).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bug report: TUI Logs tab dumped raw Anthropic SSE protocol frames as
[legacy] DEBUGtext:Root cause — Part 1 (logging) — confirmed
src/lib/agent-interface.tsSDK Result-message handler at lines 4220 / 4249 callslogToFile('Agent result with error:', message.result)with NO truncation or SSE-frame stripping. When the Anthropic gateway terminates a stream with a 4xx,message.resultcontains the full failing SSE response body — kilobytes ofevent:/data:/partial_jsonlines. The own-comment at that callsite even says "Full message already logged above via JSON dump" — but that "JSON dump" was the unfiltered call.Same leak in
agent-runner.tsabortOnApiError/GATEWAY_DOWN/GATEWAY_INVALID_REQUEST/ soft-error branches:rawMessagegot interpolated straight intouserMessage,WizardErrorctx, andpushStatus— surfacing the SSE body verbatim in the TUI Outro screen and Sentry payloads.Root cause — Part 2 (gateway 400 retry storm) — not addressed at source
The 400s themselves originate at the LLM gateway (intermittent — agent's
tool_usepayload was constructing a glob like~/worktree-repos/Next*frompartial_jsonfragments). The "6 errors of 6" the user saw is the wizard's existing inner-retry loop hitting 400s repeatedly. That's an upstream gateway issue and the existing classifier (classifyApiErrorSubtype) already routes it toterminated_400with a clean user message. The fix here is purely "stop dumping the protocol noise" — the underlying retry-storm behavior is unchanged.Fix
Add
suppressSseFrames(msg)andsanitizeErrorMessageForLog(msg)tosrc/lib/agent-events.ts:event:/data:/ bare-JSON for the 8 known stream-event subtypes)[N SSE frames suppressed]markerMAX_LOG_MESSAGE_LENGTHTypeError: .../ stack traces survive)Apply at every leak site: both
logToFile('Agent result with error:', ...)paths inagent-interface.ts, and the GATEWAY_DOWN / GATEWAY_INVALID_REQUEST / API_ERROR / RATE_LIMIT abort + soft-error branches inagent-runner.ts. Classification still runs on the raw form (the400 terminatedregex matches the head of the message, not the SSE body) — only logging / user-surface callsites use the sanitized form.Verdict
Pre-existing leak, not a regression from main. The
logToFile('Agent result with error:', message.result)callsite has been there since the original SDK Result-message handler. Unrelated to #698 — that PR only adds TUI per-event status tracking and does not touch error logging or the model-stream pipeline.Test plan
src/lib/__tests__/agent-events-sse-suppression.test.ts(10 cases): fast-path no-op, contiguous-block collapse, inline-prefix split, real-error preservation alongside SSE noise, singular vs plural wording, bare-JSON form, unknown event-type passthrough, oversized-input pipeline (50KB SSE body → ≤2KB sanitized output)pnpm test— 254 files / 3815 tests passpnpm test:bdd— 100 scenarios / 445 steps passpnpm lint— clean (only pre-existing unrelated warning)pnpm build— clean🤖 Generated with Claude Code
Note
Low Risk
Low risk: changes are limited to sanitizing/truncating error strings before logging and displaying them, without altering agent control flow or API retry behavior.
Overview
Prevents raw Anthropic streaming (SSE) protocol frames from leaking into user-visible logs, status messages, and Sentry context by collapsing recognized stream frames into a single
[.. SSE frame(s) suppressed]marker and then enforcing the existingMAX_LOG_MESSAGE_LENGTHcap.Adds
suppressSseFrames/sanitizeErrorMessageForLoginagent-events.ts, updatesagent-interface.tsandagent-runner.tsto use the sanitized form anywhere SDK/gateway error strings are logged or interpolated (while keeping subtype classification on the raw message), and introduces focused unit tests covering frame detection, inline-prefix handling, preservation of real errors, and truncation behavior.Reviewed by Cursor Bugbot for commit 55c0305. Bugbot is set up for automated code reviews on this repo. Configure here.