feat(read): port TS image-handling pipeline to Python (Tier C parity) by ericleepi314 · Pull Request #154 · agentforce314/clawcodex

ericleepi314 · 2026-05-16T15:16:35Z

Summary

Full parity with the TypeScript reference at typescript/src/utils/imageResizer.ts + the FileReadTool image flow. Pillow replaces sharp.

Adds the image processing pipeline that was missing: oversized images are now downscaled instead of rejected, file format is sniffed from magic bytes (not the extension), image dimensions are surfaced to the model for coordinate-mapping prompts, PDF pages can be extracted as images, and shell commands that print data-URI images (matplotlib, mermaid) get rendered natively.

See my-docs/image-handling-gap-analysis.md and my-docs/image-handling-refactoring-plan.md for the gap analysis and approved plan this PR implements.

What changes for the user

Oversized images (>3.75 MB) are now downscaled to fit the API's 5 MB base64 limit instead of being rejected with a hard error. Real-world phone screenshots (often 4-8 MB) work without manual resizing.
Magic-byte format detection: a foo.png containing JPEG bytes correctly gets media_type=image/jpeg so the API doesn't reject the mislabeled image.
Image dimensions metadata: Read emits a supplemental isMeta user message with [Image: source: /path, original WxH, displayed at WxH. Multiply coordinates by X to map to original image.] for coordinate-mapping workflows.
PDF page extraction: Read(file_path=foo.pdf, pages='1-5') extracts pages via pdftoppm (poppler-utils) and routes them through the same image pipeline.
Bash data-URI images: python -c 'matplotlib stuff; print(base64...)' surfaces as an image content block instead of garbage text.
Pre-API image validation rejects any image >5 MB base64 with an actionable error rather than letting the API round-trip fail.

Files

New modules (lazy Pillow import — module load is cheap)

src/utils/image_processor.py — magic-byte detection, bounded read_file_bytes, maybe_resize_image (3.75 MB / 1568px envelope, JPEG fallback at q={80,60,40,20}), compress_image_to_byte_budget (progressive scale × quality, 800×800 PNG palette, 400×400 q=20 ultra-fallback), compress_image_to_token_budget, create_image_metadata_text.
src/utils/image_validation.py — validate_images_for_api (walks messages, rejects >5 MB base64).
src/utils/pdf_extraction.py — extract_pdf_pages shells out to pdftoppm, 100 MB input cap, empty-file guard, install-hint error when poppler is missing.
src/tool_system/tools/bash/image_output.py — is_image_output / parse_data_uri / build_image_tool_result with 25 MB pre-decode cap.

Modified

src/tool_system/tools/read.py — image branch rewritten: bounded read → magic-byte sniff → resize → token-budget compress → returns type='image' with dimensions + supplemental metadata message. PDF pages= path wired with try/finally tempdir cleanup. Dead MAX_IMAGE_SIZE_BYTES removed.
src/query/query.py — _dispatch_single_tool returns tuple[UserMessage, list[UserMessage]] so result.new_messages reach the model. Critical: callers concatenate all primaries first, then all extras, so multi-tool batches don't break ensure_tool_result_pairing.
src/services/api/claude.py — wires validate_images_for_api into call_model.
src/services/analytics/events.py — new EventType.IMAGE_PROCESSING event type with subtype in data.
src/tool_system/tools/bash/bash_tool.py — detects data-URI output, routes through build_image_tool_result.
pyproject.toml + uv.lock — adds Pillow>=10.0.

Dependencies

Python: Pillow>=10.0 (pure-Python wheels on all platforms; no native compilation needed).
System (optional, runtime-detected): pdftoppm from poppler-utils. PDF extraction without it gives an install-hint error rather than failing silently.

Test plan

60+ new unit tests across tests/test_image_processor.py, test_image_validation.py, test_bash_image_output.py, test_pdf_extraction.py. 1 PDF test skipped when poppler isn't installed.
8 new e2e tests in tests/parity/test_e2e_file_read.py including the critical regression test test_multi_tool_batch_preserves_tool_result_pairing that drives _run_tools_partitioned end-to-end through normalize_messages_for_api with two image Reads.
Updated tests/test_tool_result_budget.py and tests/test_esc_reject_message_dispatch.py for the new _dispatch_single_tool tuple return shape.
Wider test run: 524 pass, 4 pre-existing unrelated failures (MCP module missing, two *_outside_workspace_blocked from bypassPermissions default).
Critic review loop completed with APPROVE after two rounds. Round 2 caught a real regression I introduced: my initial flatten broke multi-tool batches by interleaving primaries and extras → ensure_tool_result_pairing orphaned the second tool's result. Fix and lock-in test are in this PR.

Out of scope (intentional)

Wiring validate_images_for_api into the non-Anthropic provider paths (TODO note in claude.py). They don't currently emit image content blocks.
Native image-processor-napi equivalent (TS's preferred backend). Pillow is sufficient.
Image content-hash dedup. TS doesn't dedupe images either.

🤖 Generated with Claude Code

Brings the Read tool's image handling to full parity with the TypeScript reference at typescript/src/utils/imageResizer.ts + FileReadTool.ts image flow. Pillow replaces sharp. What changes for the user: - Oversized images (>3.75 MB) are now downscaled to fit the API's 5 MB base64 limit instead of being rejected outright. - File extension is no longer trusted blindly — magic-byte detection sniffs the actual format (PNG/JPEG/GIF/WebP), so a misnamed file.png containing JPEG bytes gets media_type=image/jpeg correctly. - Read emits a supplemental isMeta UserMessage carrying image dimensions ("Multiply coordinates by X to map to original image") for coordinate-mapping prompts. - The PDF Read with pages="N-M" parameter now extracts page images via poppler's pdftoppm and routes them through the same image pipeline. - Bash commands that print data:image/...;base64,... (matplotlib, mermaid, etc.) surface as image content blocks in the tool_result instead of garbage text. - Images >5 MB base64 are rejected pre-API with an actionable error rather than letting the API round-trip fail. What changes architecturally: - New module src/utils/image_processor.py: magic-byte detection, bounded readFileBytes, maybe_resize_image (3.75 MB / 1568px envelope, JPEG fallback at q={80,60,40,20}), compress_image_to_byte_budget (progressive scale × quality, 800×800 PNG palette, 400×400 q=20 ultra-fallback), compress_image_to_token_budget, create_image_metadata_text. - New module src/utils/image_validation.py: validate_images_for_api walks messages, rejects any base64 image >5 MB before the API call. Wired into claude.py:call_model. - New module src/utils/pdf_extraction.py: extract_pdf_pages shells out to pdftoppm, 100 MB input cap, empty-file guard, install-hint error when poppler-utils is missing. - New module src/tool_system/tools/bash/image_output.py: is_image_output / parse_data_uri / build_image_tool_result for shell image output, with 25 MB pre-decode cap to prevent OOM from hostile shells. - src/tool_system/tools/read.py image branch rewritten: bounded read -> magic-byte sniff -> resize -> token-budget compress -> returns type='image' with dimensions field + supplemental metadata message. PDF pages= path wired with try/finally tempdir cleanup. Old MAX_IMAGE_SIZE_BYTES rejection removed. - src/query/query.py _dispatch_single_tool now returns tuple[UserMessage, list[UserMessage]] (primary, extras) so result.new_messages reach the model. Callers concatenate all primaries first, then all extras, so multi-tool batches don't break ensure_tool_result_pairing (a regression the critic caught and the primaries-first ordering fixes). - New EventType.IMAGE_PROCESSING analytics event with subtype in data. Dependency: Pillow>=10.0 (pure-Python wheels on all platforms). System dependency for PDF page extraction (optional, runtime-detected): pdftoppm from poppler-utils. Tests: 60+ new tests across tests/test_image_processor.py, test_image_validation.py, test_bash_image_output.py, test_pdf_extraction.py, plus 8 added to tests/parity/test_e2e_file_read.py. The critical regression test test_multi_tool_batch_preserves_tool_result_pairing drives _run_tools_partitioned -> normalize_messages_for_api end-to-end to lock in the primaries-first ordering. Wider parity suite: 442 pass, 4 pre-existing unrelated failures. Critic review loop completed with APPROVE after two rounds. See my-docs/image-handling-gap-analysis.md and my-docs/image-handling-refactoring-plan.md for the full analysis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ericleepi314 merged commit e12f86f into main May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(read): port TS image-handling pipeline to Python (Tier C parity)#154

feat(read): port TS image-handling pipeline to Python (Tier C parity)#154
ericleepi314 merged 1 commit into
mainfrom
feat/image-handling-parity

ericleepi314 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericleepi314 commented May 16, 2026

Summary

What changes for the user

Files

Dependencies

Test plan

Out of scope (intentional)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant