fix(tui): send /attach images as multimodal content#2587
Conversation
|
Thanks @xyuai for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
There was a problem hiding this comment.
Code Review
This pull request introduces support for multimodal user messages by allowing images to be attached and sent as base64-encoded data URLs to OpenAI-compatible endpoints. It updates message inspection, session serialization, context purging, and the seam manager to handle the new ContentBlock::ImageUrl variant. Feedback on the changes highlights two key areas for improvement: restricting the supported image formats in image_mime_type_for_path to only those officially supported by major LLM providers (PNG, JPEG, WEBP, GIF) to prevent API errors, and implementing a file size limit check using std::fs::metadata before reading image files to avoid excessive memory usage and potential API failures.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| fn image_mime_type_for_path(path: &Path) -> Option<&'static str> { | ||
| let ext = path.extension()?.to_str()?.to_ascii_lowercase(); | ||
| match ext.as_str() { | ||
| "png" => Some("image/png"), | ||
| "jpg" | "jpeg" => Some("image/jpeg"), | ||
| "gif" => Some("image/gif"), | ||
| "webp" => Some("image/webp"), | ||
| "bmp" => Some("image/bmp"), | ||
| "tif" | "tiff" => Some("image/tiff"), | ||
| "ppm" => Some("image/x-portable-pixmap"), | ||
| _ => None, | ||
| } | ||
| if buffer.contains(&0) { | ||
| return Err(std::io::Error::new( | ||
| std::io::ErrorKind::InvalidData, | ||
| "file appears to be binary", | ||
| )); | ||
| } | ||
| let text = std::str::from_utf8(&buffer) | ||
| .map_err(|_| std::io::Error::new(std::io::ErrorKind::InvalidData, "file is not UTF-8"))? | ||
| .to_string(); | ||
| Ok((text, truncated)) | ||
| } |
There was a problem hiding this comment.
The image MIME type resolver currently supports formats like BMP, TIFF, and PPM. However, major LLM providers (such as OpenAI and Anthropic) only officially support PNG, JPEG, WEBP, and non-animated GIF. Sending unsupported image formats will result in a 400 Bad Request API error from the provider, disrupting the chat session.
It is recommended to restrict the supported image extensions to only those officially supported by the LLM APIs.
fn image_mime_type_for_path(path: &Path) -> Option<&'static str> {
let ext = path.extension()?.to_str()?.to_ascii_lowercase();
match ext.as_str() {
"png" => Some("image/png"),
"jpg" | "jpeg" => Some("image/jpeg"),
"gif" => Some("image/gif"),
"webp" => Some("image/webp"),
_ => None,
}
}| match std::fs::read(path) { | ||
| Ok(bytes) => blocks.push(ContentBlock::ImageUrl { | ||
| image_url: ImageUrlContent { | ||
| url: format!("data:{mime_type};base64,{}", BASE64.encode(bytes)), | ||
| }, | ||
| }), | ||
| Err(err) => blocks.push(ContentBlock::Text { | ||
| text: format!( | ||
| "<unreadable-attachment kind=\"image\" path=\"{}\">\n{err}\n</unreadable-attachment>", | ||
| reference.path | ||
| ), | ||
| cache_control: None, | ||
| }), | ||
| } |
There was a problem hiding this comment.
Reading arbitrary image files without a size limit can lead to high memory consumption, TUI lag, or API errors (OpenAI limits image uploads to 20MB, and Anthropic limits them to 5MB).
Checking the file size using std::fs::metadata before reading the file prevents loading excessively large files into memory and avoids guaranteed API failures.
match std::fs::metadata(path) {
Ok(meta) if meta.len() > 10 * 1024 * 1024 => {
blocks.push(ContentBlock::Text {
text: format!(
"<attachment-too-large kind=\"image\" path=\"{}\" limit=\"10MB\" />",
reference.path
),
cache_control: None,
});
continue;
}
_ => {}
}
match std::fs::read(path) {
Ok(bytes) => blocks.push(ContentBlock::ImageUrl {
image_url: ImageUrlContent {
url: format!("data:{mime_type};base64,{}", BASE64.encode(bytes)),
},
}),
Err(err) => blocks.push(ContentBlock::Text {
text: format!(
"<unreadable-attachment kind=\"image\" path=\"{}\">\n{err}\n</unreadable-attachment>",
reference.path
),
cache_control: None,
}),
}| match std::fs::read(path) { | ||
| Ok(bytes) => blocks.push(ContentBlock::ImageUrl { | ||
| image_url: ImageUrlContent { | ||
| url: format!("data:{mime_type};base64,{}", BASE64.encode(bytes)), | ||
| }, | ||
| }), |
There was a problem hiding this comment.
Unbounded image file read — no size cap
std::fs::read(path) loads the entire file into memory with no ceiling. The text-mention path enforces MAX_MENTION_FILE_BYTES = 128 KiB via take(), but this image path has no equivalent guard. A user who accidentally attaches a multi-hundred-MB raw image (or a file masquerading as one via its path) will fully base64-encode it in memory, blocking the async runtime and potentially sending a payload large enough to be rejected by the API or exhaust process memory.
| content.extend(crate::tui::file_mention::media_attachment_content_blocks( | ||
| &text, | ||
| )); |
There was a problem hiding this comment.
Blocking file I/O on the async executor thread
media_attachment_content_blocks calls std::fs::read synchronously from a non-async function that is itself called from the async fn handle_send_message. The codebase's own comment at line ~1316 ("Run the git work on the blocking pool so the async runtime stays responsive") documents why this pattern is avoided elsewhere. For small images the impact is negligible, but the same call path is now used unconditionally on every message send that contains any [Attached image: …] placeholder, including messages with large files.
| fn summarize_image_url_for_inspect(url: &str) -> String { | ||
| let Some((prefix, encoded)) = url.split_once(";base64,") else { | ||
| return first_chars(url, 96); | ||
| }; | ||
| format!("{prefix};base64,<{} chars>", encoded.len()) | ||
| } | ||
|
|
There was a problem hiding this comment.
Duplicate summarisation helper
summarize_image_url_for_inspect here and image_url_summary_for_compaction in compaction.rs (line 306) are byte-for-byte identical. If the truncation length (96 chars) or the format string ever needs to change, both copies must be updated. Extracting a shared helper to utils.rs or models.rs would eliminate the divergence risk.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary
/attachimage placeholders into OpenAI-compatible multimodalimage_urlcontent blocksdata:image/...;base64,...URLs while keeping the transcript placeholder readableFixes #2584.
Verification
cargo fmtCARGO_TARGET_DIR=C:\Users\ky\AppData\Local\Temp\codewhale-target cargo check -p codewhale-tuiCARGO_TARGET_DIR=C:\Users\ky\AppData\Local\Temp\codewhale-target cargo test -p codewhale-tui --bin codewhale-tui media_attachment_content_blocks_embed_image_as_data_url -- --nocaptureCARGO_TARGET_DIR=C:\Users\ky\AppData\Local\Temp\codewhale-target cargo test -p codewhale-tui --bin codewhale-tui request_builder_emits_openai_image_url_parts_for_user_images -- --nocaptureGreptile Summary
This PR wires
/attachimage placeholders into OpenAI-compatible multimodal requests by adding aContentBlock::ImageUrlvariant and a newmedia_attachment_content_blockshelper that reads local image files and encodes them asdata:image/…;base64,…URLs.ContentBlock::ImageUrlinmodels.rs: addsImageUrlContentstruct and updates everymatchacrosscompaction.rs,purge.rs,working_set.rs, andclient/chat.rs; most non-chat files just emit a compact placeholder or skip the block.file_mention.rs—media_attachment_content_blocks: parses[Attached image: … at <path>]lines, resolves the extension to a MIME type, and encodes the file as base64; unsupported types and I/O errors produce informative<Text>blocks instead of panicking.client/chat.rs—build_chat_messages_with_reasoning: whenimage_partsis non-empty, user message content is serialised as a JSON array with separatetextandimage_urlparts instead of a bare string; the bulk of the diff in several other files is whitespace-only reformatting.Confidence Score: 3/5
Safe to review but needs the unbounded file-read fixed before merging — a large attached image will block the event loop and send an oversized payload.
The core multimodal wiring in chat.rs and models.rs is correct and the tests pass. The main concern is in media_attachment_content_blocks: it reads image files with no byte ceiling, while the analogous text-mention code explicitly caps reads at 128 KiB. On a system where a user attaches a large raw image (or accidentally attaches a non-image large file whose extension maps to a known MIME type), the entire file is read synchronously inside the async send-message handler, base64-encoded in memory, and forwarded to the API — with no guard against OOM or a payload the API will reject.
crates/tui/src/tui/file_mention.rs — the new media_attachment_content_blocks function needs a file-size cap matching the existing MAX_MENTION_FILE_BYTES pattern.
Important Files Changed
media_attachment_content_blockswhich reads image files and emitsContentBlock::ImageUrl— no file-size cap, unlike the existing text-mention path that enforces MAX_MENTION_FILE_BYTES.ContentBlock::ImageUrlblocks into OpenAI multimodal array content for user messages; addssummarize_image_url_for_inspectwhich duplicatesimage_url_summary_for_compactionfrom compaction.rs.media_attachment_content_blocks(which performs blocking fs::read) from within the asynchandle_send_message, inconsistent with the codebase's pattern of offloading blocking I/O to the blocking pool.ImageUrlContentstruct andContentBlock::ImageUrlvariant; straightforward, well-typed additions with correct serde rename attributes.ContentBlock::ImageUrlarm to all match statements; bulk of diff is whitespace reformatting with no logic changes. The new arms correctly emit a placeholder summary and return an empty path list.Sequence Diagram
sequenceDiagram participant User participant Engine as engine.rs participant FM as file_mention.rs participant FS as Filesystem participant Chat as client/chat.rs participant API as LLM API User->>Engine: handle_send_message(content) Engine->>FM: media_attachment_content_blocks(text) FM->>FM: extract_media_attachment_references(input) FM->>FM: image_mime_type_for_path(path) FM->>FS: fs::read(path) blocking, no size cap FS-->>FM: bytes FM-->>Engine: Vec ContentBlock::ImageUrl Engine->>Engine: push ImageUrl blocks into user Message Engine->>Chat: build_chat_messages(messages) Chat->>Chat: collect image_parts from ContentBlock::ImageUrl Chat->>Chat: wrap as multimodal array with text and image_url parts Chat-->>API: POST /chat/completions with content arrayReviews (1): Last reviewed commit: "fix(tui): send attached images as multim..." | Re-trigger Greptile