Skip to content

dev-mcp: add view_image tool#602

Merged
tlongwell-block merged 2 commits into
mainfrom
dawn/view-image-tool
May 15, 2026
Merged

dev-mcp: add view_image tool#602
tlongwell-block merged 2 commits into
mainfrom
dawn/view-image-tool

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

@tlongwell-block tlongwell-block commented May 15, 2026

Summary

Adds one new tool to sprout-dev-mcp: view_image.

view_image loads an image from a workspace-safe file path, an http(s):// URL, or a data:image/...;base64,... URL and returns it as a standard MCP image content block (Content::image(base64, mime_type)).

This PR also updates sprout-agent so agents can actually see those MCP image tool results:

  • MCP RawContent::Image is preserved internally instead of flattened to [image elided: ...].
  • Anthropic requests serialize image tool results as {type: "image", source: {type: "base64", media_type, data}} inside tool_result.content.
  • OpenAI-compatible requests follow Goose prior art: the role: "tool" message remains textual and any image blocks are sent in a follow-up role: "user" message as image_url data URLs.
  • ACP/session update output stays text-safe: clients see concise summaries, not raw base64 payloads.

Behaviour

  • Default resize: longest edge capped at 1568px (Anthropic's published recommendation, comfortably inside OpenAI's high-detail tile budget). Overridable via max_dim (clamped to [64, 2048]).
  • Pass-through when the input is already small enough (longest edge ≤ max_dim AND raw bytes ≤ 3 MiB) — preserves originals byte-for-byte for tiny PNGs/JPEGs/GIFs/WebPs.
  • Resize+transcode otherwise: Lanczos3 → PNG if the decoded image has an alpha channel, else JPEG q85.
  • Animation: rejected with animated images not supported; provide a still frame. Detection is an allocation-free byte-level scan of the GIF block structure / WebP VP8X+ANIM bit — we never hand attacker-controlled dimensions to a decoder.
  • MIME: sniffed from magic bytes only (extensions and Content-Type are not trusted). Supports png/jpeg/gif/webp.

Agent bridge

Before this PR, sprout-agent had a text-only ToolResult; MCP images were collapsed in mcp.rs to a string marker before the LLM request was built. That made view_image callable but not visible to the model.

The bridge now uses structured tool-result content:

  • ToolResultContent::Text(String)
  • ToolResultContent::Image { data, mime_type }

Image bytes are counted at their serialized base64 size for history/request pressure accounting. This is intentional: the accounting protects request/body size, not visual-token cost. To make a single valid view_image result survive to the next model turn, the tool-result cap is raised to 8 MiB and the default agent history cap is raised to 16 MiB.

Defences

All reachable from attacker-controlled source:

Layer Mechanism
Source bytes Hard cap 20 MiB across path/URL/data URL
HTTP fetch reqwest::chunk() streaming loop with running counter; up-front rejection if Content-Length advertises over budget; 10s connect+read timeout; http(s) only
Path source Funnels through paths::resolve_within; rejects escapes via .., absolute, or symlinks; File::take(cap + 1) so a file growing between metadata and read still cannot exceed budget
Data URL Encoded-length precheck before .decode(); base64 only (percent-encoded data URLs rejected); non-image MIME rejected
Decompression bomb 64 megapixel cap enforced after a cheap header-only dim probe and before any decode; image::Limits { max_alloc: 256 MiB } set on the resize decoder as defence in depth
Unknown schemes ftp://, file://, etc. rejected explicitly with a clear error
Final size Output capped at ~4 MiB on the wire (3 MiB raw × 4/3 base64); one-shot 75 % retry if the first encode is over

Dependencies

Added to sprout-dev-mcp/Cargo.toml:

  • base64 = "0.22" — already transitive in the workspace
  • reqwest = { workspace = true } — already used by sprout-cli (a direct dep of sprout-dev-mcp), so net binary impact is ~zero
  • image = { default-features = false, features = ["jpeg", "png", "gif", "webp"] } — the one non-trivial addition; needed for honest resize/transcode. Alternatives (zune-image, hand-rolled) were considered and rejected as worse on minimalism/correctness trade-offs

No URL/HTTP parser, no hand-rolled crypto.

Refactor

resolve_within is moved from str_replace.rs to a new paths module shared by str_replace and view_image. The existing symlink-escape test moves with it.

Tests

Local verification:

  • cargo test -p sprout-agent -p sprout-dev-mcp101 tests passed
  • cargo fmt --check → clean
  • cargo clippy -p sprout-agent -p sprout-dev-mcp --all-targets -- -D warnings → clean
  • cargo build -p sprout-agent -p sprout-dev-mcp → clean
  • Push hooks also passed the broader web/desktop/mobile/rust suite.

New coverage includes:

  • small PNG passes through verbatim (byte equality)
  • oversize PNG with alpha resizes to PNG, dims ≤ max_dim
  • oversize JPEG resizes to JPEG
  • path outside workspace rejected
  • BMP / unsupported MIME rejected at sniff
  • animated GIF (real 2-frame encoded sample) rejected
  • single-frame GIF accepted (no false positive)
  • animated WebP detected via VP8X+ANIM flag
  • data: URL round-trip
  • data: URL non-base64 rejected
  • data: URL non-image MIME rejected
  • data: URL oversized payload rejected before decode
  • unknown URL scheme (ftp://) rejected
  • decompression bomb (synthetic 9000×9000 IHDR-only PNG, ~50 bytes on disk) rejected at pixel-count cap
  • magic-byte sniff recognises png/jpeg/gif/webp
  • MCP image tool result is preserved as structured image content
  • oversized image tool result is elided instead of exceeding cap
  • Anthropic request body emits image blocks inside tool_result.content
  • OpenAI-compatible request body emits text tool result plus follow-up user image_url message

Live test

Built sprout-agent + sprout-dev-mcp from this branch and ran an ACP-style live test against Anthropic with the agent instructed to inspect:

~/Downloads/ChatGPT Image May 14, 2026, 12_49_39 PM.png

Constraints: only dev__view_image allowed; fail if any shell/rg/file-read/etc. tool is called.

Result:

  • Exactly one tool call: dev__view_image
  • No other tools used
  • Stop reason: end_turn
  • Model accurately described the Sprout “PLANT YOUR RELAY TODAY!” poster, including the sprout character, eco-city background, four callouts, Sprout branding, and GitHub URL.

Repeated once with default agent history settings after the cap changes; same result.

Prior art surveyed

  • modelcontextprotocol/servers filesystem::read_media_file (path → base64, no resize) and everything::get-tiny-image (protocol smoke test)
  • block/goose computercontroller emits Content::image(data, "image/png"); provider code converts image content to Anthropic/OpenAI shapes. Goose also uses the safer OpenAI-compatible pattern of a textual tool result plus follow-up user image message.
  • mirrorange/LiteCode Read (extension-based image return, no resize)
  • m13253/pdflens-mcp (image_dimension knob = our max_dim)
  • catalystneuro/mcp_read_images, ah-wq/mcp-vision-relay, dangpolly927-eng/mcp-vision-web-bridge (helpful limit precedents, mostly proxy-to-vision-model, not local viewer)

Coordination

Designed, reviewed, and live-tested in #sprout-agent-image-viewing by Dawn, Max, and Mari. Dawn independently verified the original bridge bug and reviewed the agent-side fix; her feedback on tests and the safer OpenAI-compatible message shape is incorporated here.

A new tool on `sprout-dev-mcp` that loads an image from a file path,
http(s) URL, or `data:` URL and returns it as an MCP `image` content
block. The MCP host translates the block into the right multimodal
shape for whichever provider it talks to — for OpenAI-compatible
endpoints it becomes `image_url` with a data URL; for Anthropic it
becomes a base64 `image` source. There is no provider-specific code
in this crate.

Default behaviour caps the longest edge at 1568px (Anthropic's
published recommendation, comfortably inside OpenAI's high-detail
tile budget) and the final payload at ~4 MiB on the wire (≤ 3 MiB raw
× 4/3 base64 expansion). Already-small PNG/JPEG/GIF/WebP pass through
verbatim; oversized inputs are decoded, resized with Lanczos3, and
re-encoded as PNG (if the decoded image has alpha) or JPEG q85.

Defences (all reachable from attacker-controlled `source`):

- Source bytes hard-capped at 20 MiB across path/URL/data URL.
- HTTP fetch streams via reqwest `chunk()` with a running byte
  counter; up-front rejection if Content-Length advertises over
  budget. 10-second connect+read timeout. Only http(s) accepted —
  other `scheme://` forms are rejected so they don't become
  filesystem paths.
- Data URLs precheck encoded length before base64 decoding so we
  cannot allocate past the cap. Only base64 payloads accepted (no
  percent-encoded data URLs).
- Pixel-count cap of 64 megapixels enforced after a cheap header-only
  dimension probe and before any decode, protecting against
  compressed sources that expand to hundreds of megabytes.
- `image::Limits { max_alloc: 256 MiB }` set on the resize decoder
  as defence in depth.
- Animation rejected with a clear error rather than collapsed to a
  silent first frame. Detection is an allocation-free byte-level
  scan of the GIF block structure / WebP VP8X+ANIM bit — we do not
  hand attacker-controlled dimensions to a decoder.
- Path sources funnel through a shared `paths::resolve_within` helper
  that canonicalises against `workdir` (default cwd) and rejects any
  escape via `..`, absolute paths, or symlinks. File reads use
  `File::take(MAX_SOURCE_BYTES + 1)` so a file growing between the
  metadata check and the read still cannot exceed budget.

Deps added to dev-mcp: `base64 = "0.22"`, `reqwest` (workspace —
already used by sprout-cli, so this is essentially free in the
sprout-dev-mcp binary), and `image = { default-features = false,
features = ["jpeg", "png", "gif", "webp"] }`. No URL/HTTP parsing
or hand-rolled crypto.

`resolve_within` is moved to a new `paths` module shared by
`str_replace` and `view_image`; the symlink-escape test moves with
it.

Tests cover: small-PNG pass-through (byte-equality), oversize-PNG
with alpha resizes to PNG, oversize JPEG resizes to JPEG, path
escape rejection, BMP rejected at magic-byte sniff, animated GIF
rejected, single-frame GIF accepted, animated WebP VP8X+ANIM
detection, data-URL round-trip, data-URL non-base64 rejected,
data-URL non-image MIME rejected, oversized base64 payload pre-cap
rejection, unknown URL scheme rejected, decompression bomb
(synthetic 9000×9000 PNG IHDR) rejected at pixel cap.

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Teach sprout-agent to carry MCP image tool results through to the model instead of collapsing them to text. Tool results now retain structured text/image content internally while UI and ACP tool-call updates still render lossy text summaries so base64 payloads are not emitted to clients or logs.

Anthropic requests serialize image tool results as base64 image blocks inside tool_result content. OpenAI-compatible requests follow Goose prior art: the tool message remains textual and any image blocks are sent in a follow-up user message as image_url data URLs, preserving compatibility with providers that reject multimodal tool messages.

Raise the default tool-result and history budgets so view_image's bounded ~4 MiB base64 payload survives long enough for the next model turn. The image bytes are still counted at their serialized base64 size for request-size pressure accounting, so image-heavy sessions cannot silently exceed provider body limits.

Tests cover MCP image preservation plus Anthropic and OpenAI-compatible request shapes.

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
@tlongwell-block tlongwell-block merged commit 92c8590 into main May 15, 2026
15 checks passed
@tlongwell-block tlongwell-block deleted the dawn/view-image-tool branch May 15, 2026 20:21
tlongwell-block added a commit that referenced this pull request May 15, 2026
* origin/main: (33 commits)
  dev-mcp: add view_image tool (#602)
  fix(relay,desktop): only advertise NIP-43 when enforced; probe pairing by supported_nips (#601)
  fix(desktop): derive unread state from NIP-RS + relay catch-up only (#599)
  docs(testing): rewrite TESTING.md for current API and CLI-first workflow (#597)
  fix(agent): fix OpenAI-compat request body serialization and max_tokens (#595)
  feat(desktop): per-persona and per-agent env var overrides (#594)
  fix(desktop): stop pinning agents to deprecated SPROUT_ACP_TURN_TIMEOUT (#592)
  fix(desktop): populate member_count in get_channels so channel browser shows real counts (#548)
  fix(desktop): autofocus message composer on channel/thread open (#572)
  refactor(cli): restructure flat commands into 12 subcommand groups (#585)
  feat(sdk): add builder functions for workflows, DMs, and presence (#589)
  feat(desktop): add message more-actions dropdown menu (#590)
  fix(mobile): preserve channel list across background/resume reconnection (#588)
  Redesign Home as an inbox (#582)
  fix(desktop): drive unread badges from live subscription, not refetched lastMessageAt (#581)
  fix(desktop): refine header scaling and shadow (#573)
  fix(desktop): keep day dividers below header (#574)
  Move agent activity below composer (#579)
  docs(nips): NIP-AE — Agent Engrams (#575)
  refactor: extract shared @mention resolver into sprout-sdk (#580)
  ...

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
tlongwell-block added a commit that referenced this pull request May 15, 2026
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>

* origin/main:
  dev-mcp: add view_image tool (#602)
  fix(relay,desktop): only advertise NIP-43 when enforced; probe pairing by supported_nips (#601)
  fix(desktop): derive unread state from NIP-RS + relay catch-up only (#599)
  docs(testing): rewrite TESTING.md for current API and CLI-first workflow (#597)
  fix(agent): fix OpenAI-compat request body serialization and max_tokens (#595)
  feat(desktop): per-persona and per-agent env var overrides (#594)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant