fix(image-handling): close 4 audit follow-ups from PR #155 by ericleepi314 · Pull Request #156 · agentforce314/clawcodex

ericleepi314 · 2026-05-16T17:58:30Z

Summary

Closes the four pre-existing gaps the PR #155 audit surfaced. Each was out of scope for the user's original bug but a real correctness hazard.

1. `validate_images_for_api` skipped on every provider path

Only the Anthropic-direct services/api/claude.py:call_model path validated; all provider.chat_stream_response paths (anthropic, openai, glm, deepseek, openrouter, minimax) bypassed it. A >5 MB base64 that slipped past the @-mention compression fallback would surface as an opaque API error.

Promoted into BaseProvider._prepare_messages — every provider now validates client-side. OpenAICompatibleProvider._prepare_messages calls super() first so validation runs on the Anthropic-shape payload BEFORE translation to image_url/file.
query._call_model_sync catches ImageSizeError and emits a media_size assistant API-error message (matches the existing server-side classification used by is_media_size_error).
agent_loop._call_provider_for_turn re-raises ImageSizeError instead of falling back to provider.chat() — chat() runs the same _prepare_messages and would re-trigger the validation, masking the original error.
Bonus fix uncovered during the work: _iter_image_blocks only walked top-level content. Post-feat(read): port TS image-handling pipeline to Python (Tier C parity) #154 the Read tool ships images inside tool_result.content, which the validator was blind to. New recursion with _MAX_TOOL_RESULT_DEPTH=32 depth cap walks nested tool_result content while bounding pathological depth.

2. OpenAI tool→user split: asymmetric `tool_use_id` correlation

The pre-existing [multimodal content delivered ...] placeholder only fired when the original tool_result was image-only. For text+image tool_results the text body landed on the role=tool message with no marker, and the synthetic role=user carrying the image_url had no link back to the parent tool_call at all.

Tool message now ALWAYS carries [multimodal content for tool_use_id=X delivered in the following message] when a synthetic user message follows (appended after the text body for text+image; standalone for image-only).
Synthetic user message leads with a [content for tool_use_id=X] text block, then the image_url / file blocks. Correlation visible from both directions.
Empty-content sentinel ([empty tool result]) honors OpenAI's "tool message content must be non-empty" requirement for content: [] edge cases.

3. `@file.pdf` mojibake (Bug A pattern for non-image binaries)

The pre-#155 text-mode open(path, "r", encoding="utf-8", errors="replace") produced utf-8 replacement chars over PDF/zip/docx binary bytes — exact Bug A failure mode, just for a different file class. TS routes PDFs through the Read tool's pages parameter, not @-mention.

Known binary extensions (pdf/zip/tar/docx/...) skip the text read. PDF gets a Read-tool hint mentioning the pages parameter; other binaries get a generic hint.
NUL-byte content sniff (_looks_like_binary) catches binaries whose extension we haven't enumerated.
BOM-aware decoder (_read_text_with_encoding) handles Windows-emitted UTF-16 LE/BE, UTF-32 LE/BE, and UTF-8 BOM files via the BOM-aware codecs (utf-16, utf-32, utf-8-sig) so the BOM is consumed rather than preserved as a literal U+FEFF char.
Post-decode garble detector (_decoded_text_looks_garbled): NUL char zero-tolerance + 3% U+FFFD threshold. Falls back to the binary branch when an adversarial spoofed-BOM blob would otherwise leak NUL bytes or replacement-char mojibake into the prompt.
New binary attachment kind in expand_at_mentions + REPL rendering (Skipped binary file <path>).

4. DocumentBlock translation missing in OpenAI-compat converter

_anthropic_image_block_to_openai handled type=image only. If a DocumentBlock ever reached an OpenAI-compat provider it would pass through unchanged and the server would reject or drop it.

New _anthropic_document_block_to_openai translates {type:"document", source:{type:"base64", media_type:"application/pdf", ...}} → {type:"file", file:{filename, file_data}}. Mirrors the image translator's defensive contract (None on bad shape, empty-data guard).
New _translate_anthropic_multimodal_block dispatcher centralises image+document translation for both the user-message and tool_result branches.
Defensive: no production path currently produces DocumentBlock for OpenAI-compat (PDFs go through Read tool's text path), but a future caller won't silently pass through.

Test plan

94 new tests pass across:
- tests/test_at_mention_binary_files.py (18) — PDF/zip/docx routing, UTF-16/UTF-32/UTF-8 BOM decoding with clean rendered output, NUL-heavy and high-FFD adversarial cases, image branch unchanged.
- tests/test_base_provider_image_validation.py (10) — direct _prepare_messages validation for both BaseProvider and OpenAICompatibleProvider.
- tests/test_agent_loop_image_size_propagation.py (3) — _call_provider_for_turn re-raises ImageSizeError; NotImplementedError + generic Exception fallbacks still work.
- tests/test_openai_compat_document_translation.py (13) — document translator + multimodal tool_result split.
- Extensions to test_image_validation.py (5: nested-tool_result, deep-nesting depth-cap, depth-cap-within-limit, sentinel paths), test_openai_compat_image_translation.py (1 new + 1 updated for tool_use_id correlation), test_query_loop.py (1: ImageSizeError → media_size mapping).
Wider suite: 5015 pass, 10 pre-existing failures unchanged. (mcp/zhipuai import issues + workspace-path tests + REPL/skill — same failures on a stash'd baseline.)
Critic review loop: APPROVE after three rounds. Round-2 caught a UTF-16 BOM regression I introduced; round-3 fixed it with the proper-decode pipeline and added the post-decode garble detector with NUL zero-tolerance + 3% U+FFFD threshold.

🤖 Generated with Claude Code

Addresses the four pre-existing gaps the PR #155 audit surfaced: 1. validate_images_for_api ran only on the Anthropic-direct call_model path; all provider.chat_stream_response paths skipped it. Promoted into BaseProvider._prepare_messages so every provider validates client-side. query._call_model_sync catches ImageSizeError and surfaces a media_size assistant error (matches the existing server-side classification). agent_loop._call_provider_for_turn re-raises ImageSizeError instead of falling back to chat() (which would re-trigger the same validation). Validator also now walks images nested inside tool_result.content — post-PR #154 the Read tool returns image blocks there and the old walker missed them. 2. OpenAI tool->user split symmetric correlation. Tool message body now always carries [multimodal content for tool_use_id=X ...] when the synthetic role=user message follows; the user message leads with a [content for tool_use_id=X] text block so the link is visible from both directions. Empty content sentinel ([empty tool result]) guards the "non-empty content required" invariant for content=[] cases. 3. @file.pdf and other binary @-mentions no longer hit the text-mode open() that produced mojibake-in-system-reminder. Known binary extensions (pdf/zip/docx/...) plus a NUL-byte content sniff route to a new "binary" attachment kind with a Read-tool hint (PDF specifically mentions the pages parameter). BOM-aware decoder (_read_text_with_encoding) handles Windows-emitted UTF-16/UTF-32 files via the BOM-aware codecs (utf-16, utf-32, utf-8-sig) so the BOM is consumed rather than preserved as a literal U+FEFF char. Post-decode garbled-text detector (NUL zero-tolerance + 3% U+FFFD threshold) falls back to a binary attachment when an adversarial spoofed-BOM blob would otherwise leak. 4. _anthropic_image_block_to_openai now has a sibling _anthropic_document_block_to_openai that translates Anthropic DocumentBlock (PDF) to OpenAI's {type:"file"} shape. Defensive: no production path currently produces DocumentBlock for OpenAI-compat, but future ones won't silently pass through. Test plan: - 94 new tests across test_at_mention_binary_files.py, test_base_provider_image_validation.py, test_agent_loop_image_size_propagation.py, test_openai_compat_document_translation.py, plus extensions to test_image_validation.py, test_openai_compat_image_translation.py, test_query_loop.py. - Wider suite: 5015 pass / 10 pre-existing failures unchanged. - Critic review loop: APPROVE after three rounds (round-2 caught a UTF-16 BOM regression I introduced; round-3 fixed it and added the post-decode garble detector). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ericleepi314 merged commit 2581a20 into main May 16, 2026

This was referenced May 16, 2026

docs(readme): refresh stats + news + Core Systems for image/ESC work #157

Merged

docs(readme): shorten 2026-05-16 image-handling-parity news item #158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(image-handling): close 4 audit follow-ups from PR #155#156

fix(image-handling): close 4 audit follow-ups from PR #155#156
ericleepi314 merged 1 commit into
mainfrom
fix/image-handling-audit-followups

ericleepi314 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericleepi314 commented May 16, 2026

Summary

1. validate_images_for_api skipped on every provider path

2. OpenAI tool→user split: asymmetric tool_use_id correlation

3. @file.pdf mojibake (Bug A pattern for non-image binaries)

4. DocumentBlock translation missing in OpenAI-compat converter

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `validate_images_for_api` skipped on every provider path

2. OpenAI tool→user split: asymmetric `tool_use_id` correlation

3. `@file.pdf` mojibake (Bug A pattern for non-image binaries)