fix: @image.png mentions hallucinate; image tool_result stringified by ericleepi314 · Pull Request #155 · agentforce314/clawcodex

ericleepi314 · 2026-05-16T16:35:47Z

Summary

User reported wildly wrong output when asking about an @-mentioned image. Same provider + model:

Python (before): "The screenshot shows a Google search results page for the query 'how to use claude code'."
TypeScript: "The image shows an aerial view of a white research or expedition-style ship..."

Two independent bugs both contribute. The recently merged PR #154 (image-handling parity) fixed the Read tool's image pipeline but missed both of these paths.

Bug A — `@image.png` reads PNG as UTF-8 text and floods system-reminder with mojibake

src/command_system/input_processing.py:expand_at_mentions was opening every @-mentioned file with open(path, "r", encoding="utf-8", errors="replace"). For a PNG that produces utf-8 replacement chars over the binary bytes — and format_at_mention_attachments wrapped the garbage in <system-reminder>Contents of foo.png:\n\``\n\n```\n` and prepended it to the user message. The model latched onto ASCII fragments (PNG XMP "Screenshot" metadata, type tags) and hallucinated the rest.

Fix: detect image extensions (png/jpg/jpeg/gif/webp, aligned with the Read tool's IMAGE_EXTENSIONS) BEFORE the text-mode open(). Image files go through the same pipeline the Read tool uses (post-#154): bounded byte read → magic-byte format sniff → maybe_resize_image → compress_image_to_token_budget fallback when still over the 5 MB base64 API ceiling. Return a kind="image" attachment carrying base64 + media_type. The REPL builds a mixed [TextBlock, ImageBlock, ...] user message via new helper build_image_content_blocks so the API receives a real multimodal payload — matching TS's auto-Read-on-@image behaviour.

Bug B — `_dispatch_single_tool` JSON-stringifies list content, destroying image blocks

src/query/query.py:_dispatch_single_tool was running json.dumps on any non-string tool_result content, turning Read's properly-shaped [{"type": "image", "source": {...}}] into a text JSON blob. The Anthropic API then received an image tool_result whose content was text JSON; the model literally could not see the image and would continue hallucinating even after explicitly calling Read. PR #154's regression test only checked for the synthetic-error placeholder so this slipped through.

Fix: preserve list shape end-to-end. ToolResultBlock.content already accepts str | list[Any]; maybe_persist_large_tool_result already short-circuits image lists via _has_image_block; content_block_to_dict already serializes per-element. The dispatcher just needed to stop coercing.

Collateral fix — Anthropic image blocks didn't translate for OpenAI-compatible providers

_convert_anthropic_messages_to_openai was passing Anthropic image blocks through unchanged — OpenAI/GLM/Minimax/DeepSeek/OpenRouter either rejected the request or silently dropped the image. This was pre-existing (Read tool images had the same problem post-#154) but my @image.png work would have widened the attack surface, so addressed it here.

New _anthropic_image_block_to_openai translates {"type": "image", "source": {"type": "base64", ...}} → {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}.
For tool_result content with images: emits role=tool (text body, with a placeholder when image-only) followed by a synthetic role=user carrying the image_url blocks. Unavoidable split since OpenAI's role=tool doesn't accept multimodal content. Comment in-place documents the model-perception risk.
Empty-data guard returns None rather than producing data:image/png;base64, (which OpenAI rejects with a confusing error).

Signature widening

Conversation.add_user_message(content: str | list[ContentBlock])
QueryEngine.submit_message(prompt: str | list[ContentBlock])

Both bodies already supported list content via _normalize_message_content / MessageContent; only the annotations changed, so existing string-callers (TUI, headless) are unaffected.

REPL UX

Image @-mentions skip the direct-stream short-circuit (it can only carry plain text) and print Read image <path> instead of Listed file <path>.

Test plan

29 new tests across tests/test_at_mention_images.py (17), tests/test_openai_compat_image_translation.py (13), and tests/parity/test_e2e_file_read.py (+2 covering Bug B dispatcher list-preservation and the aggregate-budget pressure path).
Bug A coverage: single + multi-image + mixed text+image @-mentions; magic-byte detection beats extension (misnamed .png with JPEG bytes); empty/undecodable images dropped silently; oversize-image compression brings base64 under API limit (real 4000×4000 random-noise PNG); doubly-oversize image dropped (monkeypatched); end-to-end through normalize_messages_for_api.
Bug B coverage: dispatcher returns list content with {"type": "image", ...}; full pipeline through normalize_messages_for_api yields proper image block in API payload; aggregate-pressure path (image still survives at tool_result_chars_so_far = MAX - 1, counter NOT bumped by image bytes).
OpenAI translation coverage: valid/invalid block translation; JPEG/PNG/missing-media-type defaults; empty-data guard; tool_result image-only + image+text + text-only-no-regression paths.
Wider suite: 4984 pass, 0 new failures. 9 pre-existing failures (mcp/zhipuai import issues + workspace-path tests) unchanged.
Manual REPL smoke: critic-agent verified the BLOCKER fix against a real 48 MB random-noise PNG (compresses to 1.16 MB base64, doubly-oversize drops). Final manual REPL test with a real screenshot still recommended before declaring shipped.
Critic review loop: APPROVE after three rounds.

🤖 Generated with Claude Code

@image

…mage tool_result shape User-reported correctness bug: the Python build hallucinated wildly when asked about an @-mentioned image ("a Google search results page") while TS correctly described it ("an aerial view of a research ship"). Root cause was two independent bugs both contributing to the failure. Bug A: ``expand_at_mentions`` opened every @-mentioned file with ``open(path, "r", encoding="utf-8", errors="replace")`` regardless of extension. For a PNG that produced mojibake (utf-8 replacement chars over binary bytes) which ``format_at_mention_attachments`` wrapped in a ``<system-reminder>Contents of foo.png:`` block and prepended to the user message. The model latched onto ASCII fragments (XMP "Screenshot" metadata, type tags) and hallucinated. Fix: detect image extensions (png/jpg/jpeg/gif/webp, matching the Read tool's IMAGE_EXTENSIONS) BEFORE the text-mode open. For image files, run the same image pipeline the Read tool now uses (post PR #154): bounded read -> magic-byte format sniff -> resize-to-envelope -> ``compress_image_to_token_budget`` fallback when still over the 5 MB base64 API ceiling. Return a ``kind="image"`` attachment carrying ``base64`` + ``media_type``. New helper ``build_image_content_blocks`` materialises ``ImageBlock`` instances and the REPL composes a mixed ``[TextBlock, ImageBlock, ...]`` user message so the API receives a real multimodal payload matching the TS auto-Read-on-@image behaviour. Bug B: ``_dispatch_single_tool`` ran ``json.dumps`` on any non-string tool_result content, turning Read's properly-shaped image list ``[{"type": "image", "source": {...}}]`` into a text JSON blob. The Anthropic API then received an image tool_result whose content was text JSON; the model literally could not see the image and would hallucinate. PR #154's regression test only checked for the synthetic- error placeholder so this slipped through. Fix: preserve list shape end-to-end. ``ToolResultBlock.content`` already accepts ``str | list[Any]``, ``maybe_persist_large_tool_result`` already short- circuits image lists via ``_has_image_block``, and ``content_block_to_dict`` already serializes per-element -- the dispatcher just needed to stop coercing. Collateral fix: PR #154's Read tool image content also failed silently on OpenAI-compatible providers (GLM, Minimax, DeepSeek, OpenRouter) because ``_convert_anthropic_messages_to_openai`` passed Anthropic image blocks through unchanged and OpenAI rejects them. New helper ``_anthropic_image_block_to_openai`` translates the base64-source shape to OpenAI's ``image_url`` data-URI shape. For tool_result image content the converter now emits ``role=tool`` (text body, with a placeholder when the original was image-only) followed by a synthetic ``role=user`` message carrying the image_url blocks, since OpenAI's ``role=tool`` doesn't accept multimodal content. Empty-data guard returns ``None`` rather than producing ``data:image/png;base64,``. Signature widening to support the multi-block flow: - ``Conversation.add_user_message(content: str | list[ContentBlock])`` - ``QueryEngine.submit_message(prompt: str | list[ContentBlock])`` Both bodies already supported list content via ``_normalize_message_ content`` / ``MessageContent``; only the type annotations changed, so existing string-callers (TUI, headless) are unaffected. REPL UX: image @-mentions skip the direct-stream short-circuit (it can only carry plain text) and print ``Read image <path>`` instead of ``Listed file <path>`` so the user sees the image was attached. Test coverage (29 new tests): - tests/test_at_mention_images.py (17): single + multi-image + mixed text+image @-mentions, magic-byte detection beats extension, empty/ undecodable images dropped silently, oversize-image compression brings base64 under API limit (real 4000x4000 random-noise PNG), doubly-oversize image dropped (monkeypatched), end-to-end through ``normalize_messages_for_api``. - tests/test_openai_compat_image_translation.py (13): user message text+image, multi-image, JPEG/PNG/missing-media-type defaults, empty-data guard, tool_result image-only + image+text + text-only- no-regression. - tests/parity/test_e2e_file_read.py (+2): Bug B dispatcher list- preservation; aggregate-budget pressure path (image still survives when ``tool_result_chars_so_far = MAX - 1``, counter NOT bumped by image bytes). Wider suite: 4984 pass, 0 new failures, 9 pre-existing failures (mcp/ zhipuai import issues + workspace-path tests) unchanged. Critic review loop: APPROVE after three rounds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ollowups fix(image-handling): close 4 audit follow-ups from PR #155

ericleepi314 merged commit 25dbdc7 into main May 16, 2026

ericleepi314 mentioned this pull request May 16, 2026

fix(image-handling): close 4 audit follow-ups from PR #155 #156

Merged

3 tasks

ericleepi314 added a commit that referenced this pull request May 16, 2026

Merge pull request #156 from agentforce314/fix/image-handling-audit-f…

2581a20

…ollowups fix(image-handling): close 4 audit follow-ups from PR #155

This was referenced May 16, 2026

docs(readme): refresh stats + news + Core Systems for image/ESC work #157

Merged

docs(readme): shorten 2026-05-16 image-handling-parity news item #158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: @image.png mentions hallucinate; image tool_result stringified#155

fix: @image.png mentions hallucinate; image tool_result stringified#155
ericleepi314 merged 1 commit into
mainfrom
fix/at-image-mention-hallucination

ericleepi314 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericleepi314 commented May 16, 2026

Summary

Bug A — @image.png reads PNG as UTF-8 text and floods system-reminder with mojibake

Bug B — _dispatch_single_tool JSON-stringifies list content, destroying image blocks

Collateral fix — Anthropic image blocks didn't translate for OpenAI-compatible providers

Signature widening

REPL UX

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug A — `@image.png` reads PNG as UTF-8 text and floods system-reminder with mojibake

Bug B — `_dispatch_single_tool` JSON-stringifies list content, destroying image blocks