dev-mcp: add view_image tool#602
Merged
Merged
Conversation
A new tool on `sprout-dev-mcp` that loads an image from a file path,
http(s) URL, or `data:` URL and returns it as an MCP `image` content
block. The MCP host translates the block into the right multimodal
shape for whichever provider it talks to — for OpenAI-compatible
endpoints it becomes `image_url` with a data URL; for Anthropic it
becomes a base64 `image` source. There is no provider-specific code
in this crate.
Default behaviour caps the longest edge at 1568px (Anthropic's
published recommendation, comfortably inside OpenAI's high-detail
tile budget) and the final payload at ~4 MiB on the wire (≤ 3 MiB raw
× 4/3 base64 expansion). Already-small PNG/JPEG/GIF/WebP pass through
verbatim; oversized inputs are decoded, resized with Lanczos3, and
re-encoded as PNG (if the decoded image has alpha) or JPEG q85.
Defences (all reachable from attacker-controlled `source`):
- Source bytes hard-capped at 20 MiB across path/URL/data URL.
- HTTP fetch streams via reqwest `chunk()` with a running byte
counter; up-front rejection if Content-Length advertises over
budget. 10-second connect+read timeout. Only http(s) accepted —
other `scheme://` forms are rejected so they don't become
filesystem paths.
- Data URLs precheck encoded length before base64 decoding so we
cannot allocate past the cap. Only base64 payloads accepted (no
percent-encoded data URLs).
- Pixel-count cap of 64 megapixels enforced after a cheap header-only
dimension probe and before any decode, protecting against
compressed sources that expand to hundreds of megabytes.
- `image::Limits { max_alloc: 256 MiB }` set on the resize decoder
as defence in depth.
- Animation rejected with a clear error rather than collapsed to a
silent first frame. Detection is an allocation-free byte-level
scan of the GIF block structure / WebP VP8X+ANIM bit — we do not
hand attacker-controlled dimensions to a decoder.
- Path sources funnel through a shared `paths::resolve_within` helper
that canonicalises against `workdir` (default cwd) and rejects any
escape via `..`, absolute paths, or symlinks. File reads use
`File::take(MAX_SOURCE_BYTES + 1)` so a file growing between the
metadata check and the read still cannot exceed budget.
Deps added to dev-mcp: `base64 = "0.22"`, `reqwest` (workspace —
already used by sprout-cli, so this is essentially free in the
sprout-dev-mcp binary), and `image = { default-features = false,
features = ["jpeg", "png", "gif", "webp"] }`. No URL/HTTP parsing
or hand-rolled crypto.
`resolve_within` is moved to a new `paths` module shared by
`str_replace` and `view_image`; the symlink-escape test moves with
it.
Tests cover: small-PNG pass-through (byte-equality), oversize-PNG
with alpha resizes to PNG, oversize JPEG resizes to JPEG, path
escape rejection, BMP rejected at magic-byte sniff, animated GIF
rejected, single-frame GIF accepted, animated WebP VP8X+ANIM
detection, data-URL round-trip, data-URL non-base64 rejected,
data-URL non-image MIME rejected, oversized base64 payload pre-cap
rejection, unknown URL scheme rejected, decompression bomb
(synthetic 9000×9000 PNG IHDR) rejected at pixel cap.
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Teach sprout-agent to carry MCP image tool results through to the model instead of collapsing them to text. Tool results now retain structured text/image content internally while UI and ACP tool-call updates still render lossy text summaries so base64 payloads are not emitted to clients or logs. Anthropic requests serialize image tool results as base64 image blocks inside tool_result content. OpenAI-compatible requests follow Goose prior art: the tool message remains textual and any image blocks are sent in a follow-up user message as image_url data URLs, preserving compatibility with providers that reject multimodal tool messages. Raise the default tool-result and history budgets so view_image's bounded ~4 MiB base64 payload survives long enough for the next model turn. The image bytes are still counted at their serialized base64 size for request-size pressure accounting, so image-heavy sessions cannot silently exceed provider body limits. Tests cover MCP image preservation plus Anthropic and OpenAI-compatible request shapes. Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
tlongwell-block
added a commit
that referenced
this pull request
May 15, 2026
* origin/main: (33 commits) dev-mcp: add view_image tool (#602) fix(relay,desktop): only advertise NIP-43 when enforced; probe pairing by supported_nips (#601) fix(desktop): derive unread state from NIP-RS + relay catch-up only (#599) docs(testing): rewrite TESTING.md for current API and CLI-first workflow (#597) fix(agent): fix OpenAI-compat request body serialization and max_tokens (#595) feat(desktop): per-persona and per-agent env var overrides (#594) fix(desktop): stop pinning agents to deprecated SPROUT_ACP_TURN_TIMEOUT (#592) fix(desktop): populate member_count in get_channels so channel browser shows real counts (#548) fix(desktop): autofocus message composer on channel/thread open (#572) refactor(cli): restructure flat commands into 12 subcommand groups (#585) feat(sdk): add builder functions for workflows, DMs, and presence (#589) feat(desktop): add message more-actions dropdown menu (#590) fix(mobile): preserve channel list across background/resume reconnection (#588) Redesign Home as an inbox (#582) fix(desktop): drive unread badges from live subscription, not refetched lastMessageAt (#581) fix(desktop): refine header scaling and shadow (#573) fix(desktop): keep day dividers below header (#574) Move agent activity below composer (#579) docs(nips): NIP-AE — Agent Engrams (#575) refactor: extract shared @mention resolver into sprout-sdk (#580) ... Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
tlongwell-block
added a commit
that referenced
this pull request
May 15, 2026
Signed-off-by: Tyler Longwell <tlongwell@squareup.com> * origin/main: dev-mcp: add view_image tool (#602) fix(relay,desktop): only advertise NIP-43 when enforced; probe pairing by supported_nips (#601) fix(desktop): derive unread state from NIP-RS + relay catch-up only (#599) docs(testing): rewrite TESTING.md for current API and CLI-first workflow (#597) fix(agent): fix OpenAI-compat request body serialization and max_tokens (#595) feat(desktop): per-persona and per-agent env var overrides (#594)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds one new tool to
sprout-dev-mcp:view_image.view_imageloads an image from a workspace-safe file path, anhttp(s)://URL, or adata:image/...;base64,...URL and returns it as a standard MCPimagecontent block (Content::image(base64, mime_type)).This PR also updates
sprout-agentso agents can actually see those MCP image tool results:RawContent::Imageis preserved internally instead of flattened to[image elided: ...].{type: "image", source: {type: "base64", media_type, data}}insidetool_result.content.role: "tool"message remains textual and any image blocks are sent in a follow-uprole: "user"message asimage_urldata URLs.Behaviour
max_dim(clamped to[64, 2048]).max_dimAND raw bytes ≤ 3 MiB) — preserves originals byte-for-byte for tiny PNGs/JPEGs/GIFs/WebPs.animated images not supported; provide a still frame. Detection is an allocation-free byte-level scan of the GIF block structure / WebPVP8X+ANIMbit — we never hand attacker-controlled dimensions to a decoder.Content-Typeare not trusted). Supports png/jpeg/gif/webp.Agent bridge
Before this PR,
sprout-agenthad a text-onlyToolResult; MCP images were collapsed inmcp.rsto a string marker before the LLM request was built. That madeview_imagecallable but not visible to the model.The bridge now uses structured tool-result content:
ToolResultContent::Text(String)ToolResultContent::Image { data, mime_type }Image bytes are counted at their serialized base64 size for history/request pressure accounting. This is intentional: the accounting protects request/body size, not visual-token cost. To make a single valid
view_imageresult survive to the next model turn, the tool-result cap is raised to 8 MiB and the default agent history cap is raised to 16 MiB.Defences
All reachable from attacker-controlled
source:reqwest::chunk()streaming loop with running counter; up-front rejection ifContent-Lengthadvertises over budget; 10s connect+read timeout;http(s)onlypaths::resolve_within; rejects escapes via.., absolute, or symlinks;File::take(cap + 1)so a file growing between metadata and read still cannot exceed budget.decode(); base64 only (percent-encoded data URLs rejected); non-image MIME rejectedimage::Limits { max_alloc: 256 MiB }set on the resize decoder as defence in depthftp://,file://, etc. rejected explicitly with a clear errorDependencies
Added to
sprout-dev-mcp/Cargo.toml:base64 = "0.22"— already transitive in the workspacereqwest = { workspace = true }— already used bysprout-cli(a direct dep ofsprout-dev-mcp), so net binary impact is ~zeroimage = { default-features = false, features = ["jpeg", "png", "gif", "webp"] }— the one non-trivial addition; needed for honest resize/transcode. Alternatives (zune-image, hand-rolled) were considered and rejected as worse on minimalism/correctness trade-offsNo URL/HTTP parser, no hand-rolled crypto.
Refactor
resolve_withinis moved fromstr_replace.rsto a newpathsmodule shared bystr_replaceandview_image. The existing symlink-escape test moves with it.Tests
Local verification:
cargo test -p sprout-agent -p sprout-dev-mcp→ 101 tests passedcargo fmt --check→ cleancargo clippy -p sprout-agent -p sprout-dev-mcp --all-targets -- -D warnings→ cleancargo build -p sprout-agent -p sprout-dev-mcp→ cleanNew coverage includes:
VP8X+ANIMflagftp://) rejectedtool_result.contentimage_urlmessageLive test
Built
sprout-agent+sprout-dev-mcpfrom this branch and ran an ACP-style live test against Anthropic with the agent instructed to inspect:~/Downloads/ChatGPT Image May 14, 2026, 12_49_39 PM.pngConstraints: only
dev__view_imageallowed; fail if any shell/rg/file-read/etc. tool is called.Result:
dev__view_imageend_turnRepeated once with default agent history settings after the cap changes; same result.
Prior art surveyed
modelcontextprotocol/serversfilesystem::read_media_file(path → base64, no resize) andeverything::get-tiny-image(protocol smoke test)block/goosecomputercontrolleremitsContent::image(data, "image/png"); provider code converts image content to Anthropic/OpenAI shapes. Goose also uses the safer OpenAI-compatible pattern of a textual tool result plus follow-up user image message.mirrorange/LiteCodeRead(extension-based image return, no resize)m13253/pdflens-mcp(image_dimensionknob = ourmax_dim)catalystneuro/mcp_read_images,ah-wq/mcp-vision-relay,dangpolly927-eng/mcp-vision-web-bridge(helpful limit precedents, mostly proxy-to-vision-model, not local viewer)Coordination
Designed, reviewed, and live-tested in #sprout-agent-image-viewing by Dawn, Max, and Mari. Dawn independently verified the original bridge bug and reviewed the agent-side fix; her feedback on tests and the safer OpenAI-compatible message shape is incorporated here.