Skip to content

fix(image-handling): close 4 audit follow-ups from PR #155#156

Merged
ericleepi314 merged 1 commit into
mainfrom
fix/image-handling-audit-followups
May 16, 2026
Merged

fix(image-handling): close 4 audit follow-ups from PR #155#156
ericleepi314 merged 1 commit into
mainfrom
fix/image-handling-audit-followups

Conversation

@ericleepi314
Copy link
Copy Markdown
Collaborator

Summary

Closes the four pre-existing gaps the PR #155 audit surfaced. Each was out of scope for the user's original bug but a real correctness hazard.

1. validate_images_for_api skipped on every provider path

Only the Anthropic-direct services/api/claude.py:call_model path validated; all provider.chat_stream_response paths (anthropic, openai, glm, deepseek, openrouter, minimax) bypassed it. A >5 MB base64 that slipped past the @-mention compression fallback would surface as an opaque API error.

  • Promoted into BaseProvider._prepare_messages — every provider now validates client-side. OpenAICompatibleProvider._prepare_messages calls super() first so validation runs on the Anthropic-shape payload BEFORE translation to image_url/file.
  • query._call_model_sync catches ImageSizeError and emits a media_size assistant API-error message (matches the existing server-side classification used by is_media_size_error).
  • agent_loop._call_provider_for_turn re-raises ImageSizeError instead of falling back to provider.chat()chat() runs the same _prepare_messages and would re-trigger the validation, masking the original error.
  • Bonus fix uncovered during the work: _iter_image_blocks only walked top-level content. Post-feat(read): port TS image-handling pipeline to Python (Tier C parity) #154 the Read tool ships images inside tool_result.content, which the validator was blind to. New recursion with _MAX_TOOL_RESULT_DEPTH=32 depth cap walks nested tool_result content while bounding pathological depth.

2. OpenAI tool→user split: asymmetric tool_use_id correlation

The pre-existing [multimodal content delivered ...] placeholder only fired when the original tool_result was image-only. For text+image tool_results the text body landed on the role=tool message with no marker, and the synthetic role=user carrying the image_url had no link back to the parent tool_call at all.

  • Tool message now ALWAYS carries [multimodal content for tool_use_id=X delivered in the following message] when a synthetic user message follows (appended after the text body for text+image; standalone for image-only).
  • Synthetic user message leads with a [content for tool_use_id=X] text block, then the image_url / file blocks. Correlation visible from both directions.
  • Empty-content sentinel ([empty tool result]) honors OpenAI's "tool message content must be non-empty" requirement for content: [] edge cases.

3. @file.pdf mojibake (Bug A pattern for non-image binaries)

The pre-#155 text-mode open(path, "r", encoding="utf-8", errors="replace") produced utf-8 replacement chars over PDF/zip/docx binary bytes — exact Bug A failure mode, just for a different file class. TS routes PDFs through the Read tool's pages parameter, not @-mention.

  • Known binary extensions (pdf/zip/tar/docx/...) skip the text read. PDF gets a Read-tool hint mentioning the pages parameter; other binaries get a generic hint.
  • NUL-byte content sniff (_looks_like_binary) catches binaries whose extension we haven't enumerated.
  • BOM-aware decoder (_read_text_with_encoding) handles Windows-emitted UTF-16 LE/BE, UTF-32 LE/BE, and UTF-8 BOM files via the BOM-aware codecs (utf-16, utf-32, utf-8-sig) so the BOM is consumed rather than preserved as a literal U+FEFF char.
  • Post-decode garble detector (_decoded_text_looks_garbled): NUL char zero-tolerance + 3% U+FFFD threshold. Falls back to the binary branch when an adversarial spoofed-BOM blob would otherwise leak NUL bytes or replacement-char mojibake into the prompt.
  • New binary attachment kind in expand_at_mentions + REPL rendering (Skipped binary file <path>).

4. DocumentBlock translation missing in OpenAI-compat converter

_anthropic_image_block_to_openai handled type=image only. If a DocumentBlock ever reached an OpenAI-compat provider it would pass through unchanged and the server would reject or drop it.

  • New _anthropic_document_block_to_openai translates {type:"document", source:{type:"base64", media_type:"application/pdf", ...}}{type:"file", file:{filename, file_data}}. Mirrors the image translator's defensive contract (None on bad shape, empty-data guard).
  • New _translate_anthropic_multimodal_block dispatcher centralises image+document translation for both the user-message and tool_result branches.
  • Defensive: no production path currently produces DocumentBlock for OpenAI-compat (PDFs go through Read tool's text path), but a future caller won't silently pass through.

Test plan

  • 94 new tests pass across:
    • tests/test_at_mention_binary_files.py (18) — PDF/zip/docx routing, UTF-16/UTF-32/UTF-8 BOM decoding with clean rendered output, NUL-heavy and high-FFD adversarial cases, image branch unchanged.
    • tests/test_base_provider_image_validation.py (10) — direct _prepare_messages validation for both BaseProvider and OpenAICompatibleProvider.
    • tests/test_agent_loop_image_size_propagation.py (3) — _call_provider_for_turn re-raises ImageSizeError; NotImplementedError + generic Exception fallbacks still work.
    • tests/test_openai_compat_document_translation.py (13) — document translator + multimodal tool_result split.
    • Extensions to test_image_validation.py (5: nested-tool_result, deep-nesting depth-cap, depth-cap-within-limit, sentinel paths), test_openai_compat_image_translation.py (1 new + 1 updated for tool_use_id correlation), test_query_loop.py (1: ImageSizeError → media_size mapping).
  • Wider suite: 5015 pass, 10 pre-existing failures unchanged. (mcp/zhipuai import issues + workspace-path tests + REPL/skill — same failures on a stash'd baseline.)
  • Critic review loop: APPROVE after three rounds. Round-2 caught a UTF-16 BOM regression I introduced; round-3 fixed it with the proper-decode pipeline and added the post-decode garble detector with NUL zero-tolerance + 3% U+FFFD threshold.

🤖 Generated with Claude Code

Addresses the four pre-existing gaps the PR #155 audit surfaced:

1. validate_images_for_api ran only on the Anthropic-direct call_model
   path; all provider.chat_stream_response paths skipped it. Promoted
   into BaseProvider._prepare_messages so every provider validates
   client-side. query._call_model_sync catches ImageSizeError and
   surfaces a media_size assistant error (matches the existing
   server-side classification). agent_loop._call_provider_for_turn
   re-raises ImageSizeError instead of falling back to chat() (which
   would re-trigger the same validation). Validator also now walks
   images nested inside tool_result.content — post-PR #154 the Read
   tool returns image blocks there and the old walker missed them.

2. OpenAI tool->user split symmetric correlation. Tool message body
   now always carries [multimodal content for tool_use_id=X ...] when
   the synthetic role=user message follows; the user message leads
   with a [content for tool_use_id=X] text block so the link is
   visible from both directions. Empty content sentinel ([empty tool
   result]) guards the "non-empty content required" invariant for
   content=[] cases.

3. @file.pdf and other binary @-mentions no longer hit the text-mode
   open() that produced mojibake-in-system-reminder. Known binary
   extensions (pdf/zip/docx/...) plus a NUL-byte content sniff route
   to a new "binary" attachment kind with a Read-tool hint (PDF
   specifically mentions the pages parameter). BOM-aware decoder
   (_read_text_with_encoding) handles Windows-emitted UTF-16/UTF-32
   files via the BOM-aware codecs (utf-16, utf-32, utf-8-sig) so the
   BOM is consumed rather than preserved as a literal U+FEFF char.
   Post-decode garbled-text detector (NUL zero-tolerance + 3% U+FFFD
   threshold) falls back to a binary attachment when an adversarial
   spoofed-BOM blob would otherwise leak.

4. _anthropic_image_block_to_openai now has a sibling
   _anthropic_document_block_to_openai that translates
   Anthropic DocumentBlock (PDF) to OpenAI's {type:"file"} shape.
   Defensive: no production path currently produces DocumentBlock
   for OpenAI-compat, but future ones won't silently pass through.

Test plan:
- 94 new tests across test_at_mention_binary_files.py,
  test_base_provider_image_validation.py,
  test_agent_loop_image_size_propagation.py,
  test_openai_compat_document_translation.py, plus extensions to
  test_image_validation.py, test_openai_compat_image_translation.py,
  test_query_loop.py.
- Wider suite: 5015 pass / 10 pre-existing failures unchanged.
- Critic review loop: APPROVE after three rounds (round-2 caught a
  UTF-16 BOM regression I introduced; round-3 fixed it and added the
  post-decode garble detector).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant