feat(read): port TS image-handling pipeline to Python (Tier C parity)#154
Merged
Conversation
Brings the Read tool's image handling to full parity with the TypeScript
reference at typescript/src/utils/imageResizer.ts + FileReadTool.ts image
flow. Pillow replaces sharp.
What changes for the user:
- Oversized images (>3.75 MB) are now downscaled to fit the API's 5 MB
base64 limit instead of being rejected outright.
- File extension is no longer trusted blindly — magic-byte detection
sniffs the actual format (PNG/JPEG/GIF/WebP), so a misnamed file.png
containing JPEG bytes gets media_type=image/jpeg correctly.
- Read emits a supplemental isMeta UserMessage carrying image dimensions
("Multiply coordinates by X to map to original image") for
coordinate-mapping prompts.
- The PDF Read with pages="N-M" parameter now extracts page images via
poppler's pdftoppm and routes them through the same image pipeline.
- Bash commands that print data:image/...;base64,... (matplotlib,
mermaid, etc.) surface as image content blocks in the tool_result
instead of garbage text.
- Images >5 MB base64 are rejected pre-API with an actionable error
rather than letting the API round-trip fail.
What changes architecturally:
- New module src/utils/image_processor.py: magic-byte detection,
bounded readFileBytes, maybe_resize_image (3.75 MB / 1568px envelope,
JPEG fallback at q={80,60,40,20}), compress_image_to_byte_budget
(progressive scale × quality, 800×800 PNG palette, 400×400 q=20
ultra-fallback), compress_image_to_token_budget,
create_image_metadata_text.
- New module src/utils/image_validation.py: validate_images_for_api
walks messages, rejects any base64 image >5 MB before the API call.
Wired into claude.py:call_model.
- New module src/utils/pdf_extraction.py: extract_pdf_pages shells out
to pdftoppm, 100 MB input cap, empty-file guard, install-hint error
when poppler-utils is missing.
- New module src/tool_system/tools/bash/image_output.py:
is_image_output / parse_data_uri / build_image_tool_result for shell
image output, with 25 MB pre-decode cap to prevent OOM from hostile
shells.
- src/tool_system/tools/read.py image branch rewritten: bounded read
-> magic-byte sniff -> resize -> token-budget compress -> returns
type='image' with dimensions field + supplemental metadata message.
PDF pages= path wired with try/finally tempdir cleanup. Old
MAX_IMAGE_SIZE_BYTES rejection removed.
- src/query/query.py _dispatch_single_tool now returns
tuple[UserMessage, list[UserMessage]] (primary, extras) so
result.new_messages reach the model. Callers concatenate all
primaries first, then all extras, so multi-tool batches don't break
ensure_tool_result_pairing (a regression the critic caught and the
primaries-first ordering fixes).
- New EventType.IMAGE_PROCESSING analytics event with subtype in data.
Dependency: Pillow>=10.0 (pure-Python wheels on all platforms).
System dependency for PDF page extraction (optional, runtime-detected):
pdftoppm from poppler-utils.
Tests: 60+ new tests across tests/test_image_processor.py,
test_image_validation.py, test_bash_image_output.py,
test_pdf_extraction.py, plus 8 added to tests/parity/test_e2e_file_read.py.
The critical regression test test_multi_tool_batch_preserves_tool_result_pairing
drives _run_tools_partitioned -> normalize_messages_for_api end-to-end to
lock in the primaries-first ordering. Wider parity suite: 442 pass,
4 pre-existing unrelated failures.
Critic review loop completed with APPROVE after two rounds.
See my-docs/image-handling-gap-analysis.md and
my-docs/image-handling-refactoring-plan.md for the full analysis.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full parity with the TypeScript reference at
typescript/src/utils/imageResizer.ts+ the FileReadTool image flow. Pillow replaces sharp.Adds the image processing pipeline that was missing: oversized images are now downscaled instead of rejected, file format is sniffed from magic bytes (not the extension), image dimensions are surfaced to the model for coordinate-mapping prompts, PDF pages can be extracted as images, and shell commands that print data-URI images (matplotlib, mermaid) get rendered natively.
See
my-docs/image-handling-gap-analysis.mdandmy-docs/image-handling-refactoring-plan.mdfor the gap analysis and approved plan this PR implements.What changes for the user
foo.pngcontaining JPEG bytes correctly getsmedia_type=image/jpegso the API doesn't reject the mislabeled image.isMetauser message with[Image: source: /path, original WxH, displayed at WxH. Multiply coordinates by X to map to original image.]for coordinate-mapping workflows.Read(file_path=foo.pdf, pages='1-5')extracts pages viapdftoppm(poppler-utils) and routes them through the same image pipeline.python -c 'matplotlib stuff; print(base64...)'surfaces as an image content block instead of garbage text.Files
New modules (lazy Pillow import — module load is cheap)
src/utils/image_processor.py— magic-byte detection, boundedread_file_bytes,maybe_resize_image(3.75 MB / 1568px envelope, JPEG fallback at q={80,60,40,20}),compress_image_to_byte_budget(progressive scale × quality, 800×800 PNG palette, 400×400 q=20 ultra-fallback),compress_image_to_token_budget,create_image_metadata_text.src/utils/image_validation.py—validate_images_for_api(walks messages, rejects >5 MB base64).src/utils/pdf_extraction.py—extract_pdf_pagesshells out topdftoppm, 100 MB input cap, empty-file guard, install-hint error when poppler is missing.src/tool_system/tools/bash/image_output.py—is_image_output/parse_data_uri/build_image_tool_resultwith 25 MB pre-decode cap.Modified
src/tool_system/tools/read.py— image branch rewritten: bounded read → magic-byte sniff → resize → token-budget compress → returnstype='image'with dimensions + supplemental metadata message. PDFpages=path wired with try/finally tempdir cleanup. DeadMAX_IMAGE_SIZE_BYTESremoved.src/query/query.py—_dispatch_single_toolreturnstuple[UserMessage, list[UserMessage]]soresult.new_messagesreach the model. Critical: callers concatenate all primaries first, then all extras, so multi-tool batches don't breakensure_tool_result_pairing.src/services/api/claude.py— wiresvalidate_images_for_apiintocall_model.src/services/analytics/events.py— newEventType.IMAGE_PROCESSINGevent type with subtype in data.src/tool_system/tools/bash/bash_tool.py— detects data-URI output, routes throughbuild_image_tool_result.pyproject.toml+uv.lock— addsPillow>=10.0.Dependencies
Pillow>=10.0(pure-Python wheels on all platforms; no native compilation needed).pdftoppmfrom poppler-utils. PDF extraction without it gives an install-hint error rather than failing silently.Test plan
tests/test_image_processor.py,test_image_validation.py,test_bash_image_output.py,test_pdf_extraction.py. 1 PDF test skipped when poppler isn't installed.tests/parity/test_e2e_file_read.pyincluding the critical regression testtest_multi_tool_batch_preserves_tool_result_pairingthat drives_run_tools_partitionedend-to-end throughnormalize_messages_for_apiwith two image Reads.tests/test_tool_result_budget.pyandtests/test_esc_reject_message_dispatch.pyfor the new_dispatch_single_tooltuple return shape.*_outside_workspace_blockedfrombypassPermissionsdefault).ensure_tool_result_pairingorphaned the second tool's result. Fix and lock-in test are in this PR.Out of scope (intentional)
validate_images_for_apiinto the non-Anthropic provider paths (TODO note inclaude.py). They don't currently emit image content blocks.image-processor-napiequivalent (TS's preferred backend). Pillow is sufficient.🤖 Generated with Claude Code