fix: @image.png mentions hallucinate; image tool_result stringified#155
Merged
Conversation
…mage tool_result shape
User-reported correctness bug: the Python build hallucinated wildly when
asked about an @-mentioned image ("a Google search results page") while
TS correctly described it ("an aerial view of a research ship"). Root
cause was two independent bugs both contributing to the failure.
Bug A: ``expand_at_mentions`` opened every @-mentioned file with
``open(path, "r", encoding="utf-8", errors="replace")`` regardless of
extension. For a PNG that produced mojibake (utf-8 replacement chars
over binary bytes) which ``format_at_mention_attachments`` wrapped in a
``<system-reminder>Contents of foo.png:`` block and prepended to the
user message. The model latched onto ASCII fragments (XMP "Screenshot"
metadata, type tags) and hallucinated. Fix: detect image extensions
(png/jpg/jpeg/gif/webp, matching the Read tool's IMAGE_EXTENSIONS)
BEFORE the text-mode open. For image files, run the same image
pipeline the Read tool now uses (post PR #154): bounded read ->
magic-byte format sniff -> resize-to-envelope ->
``compress_image_to_token_budget`` fallback when still over the 5 MB
base64 API ceiling. Return a ``kind="image"`` attachment carrying
``base64`` + ``media_type``. New helper ``build_image_content_blocks``
materialises ``ImageBlock`` instances and the REPL composes a mixed
``[TextBlock, ImageBlock, ...]`` user message so the API receives a
real multimodal payload matching the TS auto-Read-on-@image behaviour.
Bug B: ``_dispatch_single_tool`` ran ``json.dumps`` on any non-string
tool_result content, turning Read's properly-shaped image list
``[{"type": "image", "source": {...}}]`` into a text JSON blob. The
Anthropic API then received an image tool_result whose content was
text JSON; the model literally could not see the image and would
hallucinate. PR #154's regression test only checked for the synthetic-
error placeholder so this slipped through. Fix: preserve list shape
end-to-end. ``ToolResultBlock.content`` already accepts ``str |
list[Any]``, ``maybe_persist_large_tool_result`` already short-
circuits image lists via ``_has_image_block``, and
``content_block_to_dict`` already serializes per-element -- the
dispatcher just needed to stop coercing.
Collateral fix: PR #154's Read tool image content also failed silently
on OpenAI-compatible providers (GLM, Minimax, DeepSeek, OpenRouter)
because ``_convert_anthropic_messages_to_openai`` passed Anthropic
image blocks through unchanged and OpenAI rejects them. New helper
``_anthropic_image_block_to_openai`` translates the base64-source
shape to OpenAI's ``image_url`` data-URI shape. For tool_result image
content the converter now emits ``role=tool`` (text body, with a
placeholder when the original was image-only) followed by a synthetic
``role=user`` message carrying the image_url blocks, since OpenAI's
``role=tool`` doesn't accept multimodal content. Empty-data guard
returns ``None`` rather than producing ``data:image/png;base64,``.
Signature widening to support the multi-block flow:
- ``Conversation.add_user_message(content: str | list[ContentBlock])``
- ``QueryEngine.submit_message(prompt: str | list[ContentBlock])``
Both bodies already supported list content via ``_normalize_message_
content`` / ``MessageContent``; only the type annotations changed, so
existing string-callers (TUI, headless) are unaffected.
REPL UX: image @-mentions skip the direct-stream short-circuit (it
can only carry plain text) and print ``Read image <path>`` instead of
``Listed file <path>`` so the user sees the image was attached.
Test coverage (29 new tests):
- tests/test_at_mention_images.py (17): single + multi-image + mixed
text+image @-mentions, magic-byte detection beats extension, empty/
undecodable images dropped silently, oversize-image compression
brings base64 under API limit (real 4000x4000 random-noise PNG),
doubly-oversize image dropped (monkeypatched), end-to-end through
``normalize_messages_for_api``.
- tests/test_openai_compat_image_translation.py (13): user message
text+image, multi-image, JPEG/PNG/missing-media-type defaults,
empty-data guard, tool_result image-only + image+text + text-only-
no-regression.
- tests/parity/test_e2e_file_read.py (+2): Bug B dispatcher list-
preservation; aggregate-budget pressure path (image still survives
when ``tool_result_chars_so_far = MAX - 1``, counter NOT bumped by
image bytes).
Wider suite: 4984 pass, 0 new failures, 9 pre-existing failures (mcp/
zhipuai import issues + workspace-path tests) unchanged.
Critic review loop: APPROVE after three rounds.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
ericleepi314
added a commit
that referenced
this pull request
May 16, 2026
…ollowups fix(image-handling): close 4 audit follow-ups from PR #155
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
User reported wildly wrong output when asking about an
@-mentioned image. Same provider + model:Two independent bugs both contribute. The recently merged PR #154 (image-handling parity) fixed the Read tool's image pipeline but missed both of these paths.
Bug A —
@image.pngreads PNG as UTF-8 text and floods system-reminder with mojibakesrc/command_system/input_processing.py:expand_at_mentionswas opening every@-mentioned file withopen(path, "r", encoding="utf-8", errors="replace"). For a PNG that produces utf-8 replacement chars over the binary bytes — andformat_at_mention_attachmentswrapped the garbage in<system-reminder>Contents of foo.png:\n\``\n\n```\n` and prepended it to the user message. The model latched onto ASCII fragments (PNG XMP "Screenshot" metadata, type tags) and hallucinated the rest.Fix: detect image extensions (
png/jpg/jpeg/gif/webp, aligned with the Read tool'sIMAGE_EXTENSIONS) BEFORE the text-modeopen(). Image files go through the same pipeline the Read tool uses (post-#154): bounded byte read → magic-byte format sniff →maybe_resize_image→compress_image_to_token_budgetfallback when still over the 5 MB base64 API ceiling. Return akind="image"attachment carryingbase64+media_type. The REPL builds a mixed[TextBlock, ImageBlock, ...]user message via new helperbuild_image_content_blocksso the API receives a real multimodal payload — matching TS's auto-Read-on-@imagebehaviour.Bug B —
_dispatch_single_toolJSON-stringifies list content, destroying image blockssrc/query/query.py:_dispatch_single_toolwas runningjson.dumpson any non-string tool_result content, turning Read's properly-shaped[{"type": "image", "source": {...}}]into a text JSON blob. The Anthropic API then received an image tool_result whose content was text JSON; the model literally could not see the image and would continue hallucinating even after explicitly calling Read. PR #154's regression test only checked for the synthetic-error placeholder so this slipped through.Fix: preserve list shape end-to-end.
ToolResultBlock.contentalready acceptsstr | list[Any];maybe_persist_large_tool_resultalready short-circuits image lists via_has_image_block;content_block_to_dictalready serializes per-element. The dispatcher just needed to stop coercing.Collateral fix — Anthropic image blocks didn't translate for OpenAI-compatible providers
_convert_anthropic_messages_to_openaiwas passing Anthropic image blocks through unchanged — OpenAI/GLM/Minimax/DeepSeek/OpenRouter either rejected the request or silently dropped the image. This was pre-existing (Read tool images had the same problem post-#154) but my@image.pngwork would have widened the attack surface, so addressed it here._anthropic_image_block_to_openaitranslates{"type": "image", "source": {"type": "base64", ...}}→{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}.role=tool(text body, with a placeholder when image-only) followed by a syntheticrole=usercarrying theimage_urlblocks. Unavoidable split since OpenAI'srole=tooldoesn't accept multimodal content. Comment in-place documents the model-perception risk.Nonerather than producingdata:image/png;base64,(which OpenAI rejects with a confusing error).Signature widening
Conversation.add_user_message(content: str | list[ContentBlock])QueryEngine.submit_message(prompt: str | list[ContentBlock])Both bodies already supported list content via
_normalize_message_content/MessageContent; only the annotations changed, so existing string-callers (TUI, headless) are unaffected.REPL UX
Image @-mentions skip the direct-stream short-circuit (it can only carry plain text) and print
Read image <path>instead ofListed file <path>.Test plan
tests/test_at_mention_images.py(17),tests/test_openai_compat_image_translation.py(13), andtests/parity/test_e2e_file_read.py(+2 covering Bug B dispatcher list-preservation and the aggregate-budget pressure path).@-mentions; magic-byte detection beats extension (misnamed.pngwith JPEG bytes); empty/undecodable images dropped silently; oversize-image compression brings base64 under API limit (real 4000×4000 random-noise PNG); doubly-oversize image dropped (monkeypatched); end-to-end throughnormalize_messages_for_api.{"type": "image", ...}; full pipeline throughnormalize_messages_for_apiyields proper image block in API payload; aggregate-pressure path (image still survives attool_result_chars_so_far = MAX - 1, counter NOT bumped by image bytes).🤖 Generated with Claude Code