Skip to content

feat(read): port TS image-handling pipeline to Python (Tier C parity)#154

Merged
ericleepi314 merged 1 commit into
mainfrom
feat/image-handling-parity
May 16, 2026
Merged

feat(read): port TS image-handling pipeline to Python (Tier C parity)#154
ericleepi314 merged 1 commit into
mainfrom
feat/image-handling-parity

Conversation

@ericleepi314
Copy link
Copy Markdown
Collaborator

Summary

Full parity with the TypeScript reference at typescript/src/utils/imageResizer.ts + the FileReadTool image flow. Pillow replaces sharp.

Adds the image processing pipeline that was missing: oversized images are now downscaled instead of rejected, file format is sniffed from magic bytes (not the extension), image dimensions are surfaced to the model for coordinate-mapping prompts, PDF pages can be extracted as images, and shell commands that print data-URI images (matplotlib, mermaid) get rendered natively.

See my-docs/image-handling-gap-analysis.md and my-docs/image-handling-refactoring-plan.md for the gap analysis and approved plan this PR implements.

What changes for the user

  • Oversized images (>3.75 MB) are now downscaled to fit the API's 5 MB base64 limit instead of being rejected with a hard error. Real-world phone screenshots (often 4-8 MB) work without manual resizing.
  • Magic-byte format detection: a foo.png containing JPEG bytes correctly gets media_type=image/jpeg so the API doesn't reject the mislabeled image.
  • Image dimensions metadata: Read emits a supplemental isMeta user message with [Image: source: /path, original WxH, displayed at WxH. Multiply coordinates by X to map to original image.] for coordinate-mapping workflows.
  • PDF page extraction: Read(file_path=foo.pdf, pages='1-5') extracts pages via pdftoppm (poppler-utils) and routes them through the same image pipeline.
  • Bash data-URI images: python -c 'matplotlib stuff; print(base64...)' surfaces as an image content block instead of garbage text.
  • Pre-API image validation rejects any image >5 MB base64 with an actionable error rather than letting the API round-trip fail.

Files

New modules (lazy Pillow import — module load is cheap)

  • src/utils/image_processor.py — magic-byte detection, bounded read_file_bytes, maybe_resize_image (3.75 MB / 1568px envelope, JPEG fallback at q={80,60,40,20}), compress_image_to_byte_budget (progressive scale × quality, 800×800 PNG palette, 400×400 q=20 ultra-fallback), compress_image_to_token_budget, create_image_metadata_text.
  • src/utils/image_validation.pyvalidate_images_for_api (walks messages, rejects >5 MB base64).
  • src/utils/pdf_extraction.pyextract_pdf_pages shells out to pdftoppm, 100 MB input cap, empty-file guard, install-hint error when poppler is missing.
  • src/tool_system/tools/bash/image_output.pyis_image_output / parse_data_uri / build_image_tool_result with 25 MB pre-decode cap.

Modified

  • src/tool_system/tools/read.py — image branch rewritten: bounded read → magic-byte sniff → resize → token-budget compress → returns type='image' with dimensions + supplemental metadata message. PDF pages= path wired with try/finally tempdir cleanup. Dead MAX_IMAGE_SIZE_BYTES removed.
  • src/query/query.py_dispatch_single_tool returns tuple[UserMessage, list[UserMessage]] so result.new_messages reach the model. Critical: callers concatenate all primaries first, then all extras, so multi-tool batches don't break ensure_tool_result_pairing.
  • src/services/api/claude.py — wires validate_images_for_api into call_model.
  • src/services/analytics/events.py — new EventType.IMAGE_PROCESSING event type with subtype in data.
  • src/tool_system/tools/bash/bash_tool.py — detects data-URI output, routes through build_image_tool_result.
  • pyproject.toml + uv.lock — adds Pillow>=10.0.

Dependencies

  • Python: Pillow>=10.0 (pure-Python wheels on all platforms; no native compilation needed).
  • System (optional, runtime-detected): pdftoppm from poppler-utils. PDF extraction without it gives an install-hint error rather than failing silently.

Test plan

  • 60+ new unit tests across tests/test_image_processor.py, test_image_validation.py, test_bash_image_output.py, test_pdf_extraction.py. 1 PDF test skipped when poppler isn't installed.
  • 8 new e2e tests in tests/parity/test_e2e_file_read.py including the critical regression test test_multi_tool_batch_preserves_tool_result_pairing that drives _run_tools_partitioned end-to-end through normalize_messages_for_api with two image Reads.
  • Updated tests/test_tool_result_budget.py and tests/test_esc_reject_message_dispatch.py for the new _dispatch_single_tool tuple return shape.
  • Wider test run: 524 pass, 4 pre-existing unrelated failures (MCP module missing, two *_outside_workspace_blocked from bypassPermissions default).
  • Critic review loop completed with APPROVE after two rounds. Round 2 caught a real regression I introduced: my initial flatten broke multi-tool batches by interleaving primaries and extras → ensure_tool_result_pairing orphaned the second tool's result. Fix and lock-in test are in this PR.

Out of scope (intentional)

  • Wiring validate_images_for_api into the non-Anthropic provider paths (TODO note in claude.py). They don't currently emit image content blocks.
  • Native image-processor-napi equivalent (TS's preferred backend). Pillow is sufficient.
  • Image content-hash dedup. TS doesn't dedupe images either.

🤖 Generated with Claude Code

Brings the Read tool's image handling to full parity with the TypeScript
reference at typescript/src/utils/imageResizer.ts + FileReadTool.ts image
flow. Pillow replaces sharp.

What changes for the user:
- Oversized images (>3.75 MB) are now downscaled to fit the API's 5 MB
  base64 limit instead of being rejected outright.
- File extension is no longer trusted blindly — magic-byte detection
  sniffs the actual format (PNG/JPEG/GIF/WebP), so a misnamed file.png
  containing JPEG bytes gets media_type=image/jpeg correctly.
- Read emits a supplemental isMeta UserMessage carrying image dimensions
  ("Multiply coordinates by X to map to original image") for
  coordinate-mapping prompts.
- The PDF Read with pages="N-M" parameter now extracts page images via
  poppler's pdftoppm and routes them through the same image pipeline.
- Bash commands that print data:image/...;base64,... (matplotlib,
  mermaid, etc.) surface as image content blocks in the tool_result
  instead of garbage text.
- Images >5 MB base64 are rejected pre-API with an actionable error
  rather than letting the API round-trip fail.

What changes architecturally:
- New module src/utils/image_processor.py: magic-byte detection,
  bounded readFileBytes, maybe_resize_image (3.75 MB / 1568px envelope,
  JPEG fallback at q={80,60,40,20}), compress_image_to_byte_budget
  (progressive scale × quality, 800×800 PNG palette, 400×400 q=20
  ultra-fallback), compress_image_to_token_budget,
  create_image_metadata_text.
- New module src/utils/image_validation.py: validate_images_for_api
  walks messages, rejects any base64 image >5 MB before the API call.
  Wired into claude.py:call_model.
- New module src/utils/pdf_extraction.py: extract_pdf_pages shells out
  to pdftoppm, 100 MB input cap, empty-file guard, install-hint error
  when poppler-utils is missing.
- New module src/tool_system/tools/bash/image_output.py:
  is_image_output / parse_data_uri / build_image_tool_result for shell
  image output, with 25 MB pre-decode cap to prevent OOM from hostile
  shells.
- src/tool_system/tools/read.py image branch rewritten: bounded read
  -> magic-byte sniff -> resize -> token-budget compress -> returns
  type='image' with dimensions field + supplemental metadata message.
  PDF pages= path wired with try/finally tempdir cleanup. Old
  MAX_IMAGE_SIZE_BYTES rejection removed.
- src/query/query.py _dispatch_single_tool now returns
  tuple[UserMessage, list[UserMessage]] (primary, extras) so
  result.new_messages reach the model. Callers concatenate all
  primaries first, then all extras, so multi-tool batches don't break
  ensure_tool_result_pairing (a regression the critic caught and the
  primaries-first ordering fixes).
- New EventType.IMAGE_PROCESSING analytics event with subtype in data.

Dependency: Pillow>=10.0 (pure-Python wheels on all platforms).
System dependency for PDF page extraction (optional, runtime-detected):
pdftoppm from poppler-utils.

Tests: 60+ new tests across tests/test_image_processor.py,
test_image_validation.py, test_bash_image_output.py,
test_pdf_extraction.py, plus 8 added to tests/parity/test_e2e_file_read.py.
The critical regression test test_multi_tool_batch_preserves_tool_result_pairing
drives _run_tools_partitioned -> normalize_messages_for_api end-to-end to
lock in the primaries-first ordering. Wider parity suite: 442 pass,
4 pre-existing unrelated failures.

Critic review loop completed with APPROVE after two rounds.

See my-docs/image-handling-gap-analysis.md and
my-docs/image-handling-refactoring-plan.md for the full analysis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant