Skip to content

fix(observability): truncate raw SSE frames in error logs + better classification of mid-stream 400s#705

Merged
kelsonpw merged 1 commit into
mainfrom
fix/observability-truncate-sse-frames
May 10, 2026
Merged

fix(observability): truncate raw SSE frames in error logs + better classification of mid-stream 400s#705
kelsonpw merged 1 commit into
mainfrom
fix/observability-truncate-sse-frames

Conversation

@kelsonpw
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw commented May 9, 2026

Summary

Bug report: TUI Logs tab dumped raw Anthropic SSE protocol frames as [legacy] DEBUG text:

[2026-05-09T...] [legacy] DEBUG Agent result with error: API Error: 400 event: message_start
data: {"type":"message_start","message":{"model":"claude-sonnet-4-6",...}}
event: ping
data: {"type":"ping"}
event: content_block_start
... (hundreds more lines including partial_json tool_use deltas with file-path fragments)

Root cause — Part 1 (logging) — confirmed

src/lib/agent-interface.ts SDK Result-message handler at lines 4220 / 4249 calls logToFile('Agent result with error:', message.result) with NO truncation or SSE-frame stripping. When the Anthropic gateway terminates a stream with a 4xx, message.result contains the full failing SSE response body — kilobytes of event: / data: / partial_json lines. The own-comment at that callsite even says "Full message already logged above via JSON dump" — but that "JSON dump" was the unfiltered call.

Same leak in agent-runner.ts abortOnApiError / GATEWAY_DOWN / GATEWAY_INVALID_REQUEST / soft-error branches: rawMessage got interpolated straight into userMessage, WizardError ctx, and pushStatus — surfacing the SSE body verbatim in the TUI Outro screen and Sentry payloads.

Root cause — Part 2 (gateway 400 retry storm) — not addressed at source

The 400s themselves originate at the LLM gateway (intermittent — agent's tool_use payload was constructing a glob like ~/worktree-repos/Next* from partial_json fragments). The "6 errors of 6" the user saw is the wizard's existing inner-retry loop hitting 400s repeatedly. That's an upstream gateway issue and the existing classifier (classifyApiErrorSubtype) already routes it to terminated_400 with a clean user message. The fix here is purely "stop dumping the protocol noise" — the underlying retry-storm behavior is unchanged.

Fix

Add suppressSseFrames(msg) and sanitizeErrorMessageForLog(msg) to src/lib/agent-events.ts:

  • Detect runs of SSE protocol frames (event: / data: / bare-JSON for the 8 known stream-event subtypes)
  • Collapse each run into [N SSE frames suppressed] marker
  • Cap result at existing 2KB MAX_LOG_MESSAGE_LENGTH
  • Preserve non-frame content (real TypeError: ... / stack traces survive)

Apply at every leak site: both logToFile('Agent result with error:', ...) paths in agent-interface.ts, and the GATEWAY_DOWN / GATEWAY_INVALID_REQUEST / API_ERROR / RATE_LIMIT abort + soft-error branches in agent-runner.ts. Classification still runs on the raw form (the 400 terminated regex matches the head of the message, not the SSE body) — only logging / user-surface callsites use the sanitized form.

Verdict

Pre-existing leak, not a regression from main. The logToFile('Agent result with error:', message.result) callsite has been there since the original SDK Result-message handler. Unrelated to #698 — that PR only adds TUI per-event status tracking and does not touch error logging or the model-stream pipeline.

Test plan

  • New unit tests src/lib/__tests__/agent-events-sse-suppression.test.ts (10 cases): fast-path no-op, contiguous-block collapse, inline-prefix split, real-error preservation alongside SSE noise, singular vs plural wording, bare-JSON form, unknown event-type passthrough, oversized-input pipeline (50KB SSE body → ≤2KB sanitized output)
  • pnpm test — 254 files / 3815 tests pass
  • pnpm test:bdd — 100 scenarios / 445 steps pass
  • pnpm lint — clean (only pre-existing unrelated warning)
  • pnpm build — clean

🤖 Generated with Claude Code


Note

Low Risk
Low risk: changes are limited to sanitizing/truncating error strings before logging and displaying them, without altering agent control flow or API retry behavior.

Overview
Prevents raw Anthropic streaming (SSE) protocol frames from leaking into user-visible logs, status messages, and Sentry context by collapsing recognized stream frames into a single [.. SSE frame(s) suppressed] marker and then enforcing the existing MAX_LOG_MESSAGE_LENGTH cap.

Adds suppressSseFrames/sanitizeErrorMessageForLog in agent-events.ts, updates agent-interface.ts and agent-runner.ts to use the sanitized form anywhere SDK/gateway error strings are logged or interpolated (while keeping subtype classification on the raw message), and introduces focused unit tests covering frame detection, inline-prefix handling, preservation of real errors, and truncation behavior.

Reviewed by Cursor Bugbot for commit 55c0305. Bugbot is set up for automated code reviews on this repo. Configure here.

…assification of mid-stream 400s

The `[legacy] DEBUG Agent result with error: API Error: 400 ...` log
line was dumping the full failing SSE response body — hundreds of
`event: message_start` / `data: {"type":"content_block_delta",...}`
framing lines plus `partial_json` `tool_use` deltas — into the
user-visible TUI Logs tab when the Anthropic gateway terminated a
streaming response with a 4xx (most commonly `400 terminated`
mid-stream; Sentry #7442894144).

Two leaks: the SDK Result-message branch in `agent-interface.ts`
called `logToFile('Agent result with error:', message.result)` with
no truncation, AND `agent-runner.ts`'s `abortOnApiError` /
`GATEWAY_DOWN` / soft-error branches interpolated the raw `rawMessage`
straight into user-facing copy and Sentry context. Both surfaced the
SSE body verbatim — past sessions surfaced 50KB+ `log.message`
strings polluting orchestrator context.

Add `suppressSseFrames(message)` and `sanitizeErrorMessageForLog`
helpers to `agent-events.ts` that:
  - detect runs of Anthropic SSE protocol frames (event:/data:/bare-JSON
    forms for the eight known stream-event subtypes)
  - collapse each run into a single `[N SSE frames suppressed]` marker
  - cap the result at `MAX_LOG_MESSAGE_LENGTH` (existing 2KB budget)
  - preserve any non-frame content (real errors / stack traces riding
    alongside the protocol noise survive — same defense as the
    existing `stripStreamEventNoise` / `partitionHookBridgeRace` pair)

Apply the sanitizer at every callsite that logs / interpolates an
agent error string: the two `logToFile('Agent result with error:',
message.result)` paths in `agent-interface.ts`, and the GATEWAY_DOWN /
GATEWAY_INVALID_REQUEST / API_ERROR / RATE_LIMIT branches plus the
soft-error pushStatus path in `agent-runner.ts`. Classification still
runs against the raw form (the `400 terminated` regex matches the head
of the message, not the SSE body) — only logging / user-surface paths
take the sanitized form.

Tests: `agent-events-sse-suppression.test.ts` (10 cases — fast-path
no-op, contiguous-block collapse, inline-prefix split, real-error
preservation, singular vs plural wording, bare-JSON form, unknown
event-type passthrough, oversized-input pipeline) covers the matcher
end-to-end and the truncation cap.

Verdict: pre-existing leak in `agent-interface.ts:4220` going back to
the original SDK Result handler; unrelated to #698 (which only adds
TUI per-event status; doesn't touch error logging).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw requested a review from a team as a code owner May 9, 2026 21:38
@kelsonpw kelsonpw merged commit 95be862 into main May 10, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant