Skip to content

fix(tui): send /attach images as multimodal content#2587

Open
xyuai wants to merge 1 commit into
Hmbown:mainfrom
xyuai:fix-attach-image-base64-2584
Open

fix(tui): send /attach images as multimodal content#2587
xyuai wants to merge 1 commit into
Hmbown:mainfrom
xyuai:fix-attach-image-base64-2584

Conversation

@xyuai
Copy link
Copy Markdown
Contributor

@xyuai xyuai commented Jun 2, 2026

Summary

  • convert /attach image placeholders into OpenAI-compatible multimodal image_url content blocks
  • read local image files and send them as data:image/...;base64,... URLs while keeping the transcript placeholder readable
  • add regression tests for attachment base64 conversion and chat request image parts

Fixes #2584.

Verification

  • cargo fmt
  • CARGO_TARGET_DIR=C:\Users\ky\AppData\Local\Temp\codewhale-target cargo check -p codewhale-tui
  • CARGO_TARGET_DIR=C:\Users\ky\AppData\Local\Temp\codewhale-target cargo test -p codewhale-tui --bin codewhale-tui media_attachment_content_blocks_embed_image_as_data_url -- --nocapture
  • CARGO_TARGET_DIR=C:\Users\ky\AppData\Local\Temp\codewhale-target cargo test -p codewhale-tui --bin codewhale-tui request_builder_emits_openai_image_url_parts_for_user_images -- --nocapture

Greptile Summary

This PR wires /attach image placeholders into OpenAI-compatible multimodal requests by adding a ContentBlock::ImageUrl variant and a new media_attachment_content_blocks helper that reads local image files and encodes them as data:image/…;base64,… URLs.

  • New ContentBlock::ImageUrl in models.rs: adds ImageUrlContent struct and updates every match across compaction.rs, purge.rs, working_set.rs, and client/chat.rs; most non-chat files just emit a compact placeholder or skip the block.
  • file_mention.rsmedia_attachment_content_blocks: parses [Attached image: … at <path>] lines, resolves the extension to a MIME type, and encodes the file as base64; unsupported types and I/O errors produce informative <Text> blocks instead of panicking.
  • client/chat.rsbuild_chat_messages_with_reasoning: when image_parts is non-empty, user message content is serialised as a JSON array with separate text and image_url parts instead of a bare string; the bulk of the diff in several other files is whitespace-only reformatting.

Confidence Score: 3/5

Safe to review but needs the unbounded file-read fixed before merging — a large attached image will block the event loop and send an oversized payload.

The core multimodal wiring in chat.rs and models.rs is correct and the tests pass. The main concern is in media_attachment_content_blocks: it reads image files with no byte ceiling, while the analogous text-mention code explicitly caps reads at 128 KiB. On a system where a user attaches a large raw image (or accidentally attaches a non-image large file whose extension maps to a known MIME type), the entire file is read synchronously inside the async send-message handler, base64-encoded in memory, and forwarded to the API — with no guard against OOM or a payload the API will reject.

crates/tui/src/tui/file_mention.rs — the new media_attachment_content_blocks function needs a file-size cap matching the existing MAX_MENTION_FILE_BYTES pattern.

Important Files Changed

Filename Overview
crates/tui/src/tui/file_mention.rs Adds media_attachment_content_blocks which reads image files and emits ContentBlock::ImageUrl — no file-size cap, unlike the existing text-mention path that enforces MAX_MENTION_FILE_BYTES.
crates/tui/src/client/chat.rs Correctly routes ContentBlock::ImageUrl blocks into OpenAI multimodal array content for user messages; adds summarize_image_url_for_inspect which duplicates image_url_summary_for_compaction from compaction.rs.
crates/tui/src/core/engine.rs Calls media_attachment_content_blocks (which performs blocking fs::read) from within the async handle_send_message, inconsistent with the codebase's pattern of offloading blocking I/O to the blocking pool.
crates/tui/src/models.rs Adds ImageUrlContent struct and ContentBlock::ImageUrl variant; straightforward, well-typed additions with correct serde rename attributes.
crates/tui/src/compaction.rs Adds ContentBlock::ImageUrl arm to all match statements; bulk of diff is whitespace reformatting with no logic changes. The new arms correctly emit a placeholder summary and return an empty path list.

Sequence Diagram

sequenceDiagram
    participant User
    participant Engine as engine.rs
    participant FM as file_mention.rs
    participant FS as Filesystem
    participant Chat as client/chat.rs
    participant API as LLM API

    User->>Engine: handle_send_message(content)
    Engine->>FM: media_attachment_content_blocks(text)
    FM->>FM: extract_media_attachment_references(input)
    FM->>FM: image_mime_type_for_path(path)
    FM->>FS: fs::read(path) blocking, no size cap
    FS-->>FM: bytes
    FM-->>Engine: Vec ContentBlock::ImageUrl
    Engine->>Engine: push ImageUrl blocks into user Message
    Engine->>Chat: build_chat_messages(messages)
    Chat->>Chat: collect image_parts from ContentBlock::ImageUrl
    Chat->>Chat: wrap as multimodal array with text and image_url parts
    Chat-->>API: POST /chat/completions with content array
Loading

Fix All in Codex Fix All in Claude Code Fix All in Cursor

Reviews (1): Last reviewed commit: "fix(tui): send attached images as multim..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Thanks @xyuai for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multimodal user messages by allowing images to be attached and sent as base64-encoded data URLs to OpenAI-compatible endpoints. It updates message inspection, session serialization, context purging, and the seam manager to handle the new ContentBlock::ImageUrl variant. Feedback on the changes highlights two key areas for improvement: restricting the supported image formats in image_mime_type_for_path to only those officially supported by major LLM providers (PNG, JPEG, WEBP, GIF) to prevent API errors, and implementing a file size limit check using std::fs::metadata before reading image files to avoid excessive memory usage and potential API failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +665 to 677
fn image_mime_type_for_path(path: &Path) -> Option<&'static str> {
let ext = path.extension()?.to_str()?.to_ascii_lowercase();
match ext.as_str() {
"png" => Some("image/png"),
"jpg" | "jpeg" => Some("image/jpeg"),
"gif" => Some("image/gif"),
"webp" => Some("image/webp"),
"bmp" => Some("image/bmp"),
"tif" | "tiff" => Some("image/tiff"),
"ppm" => Some("image/x-portable-pixmap"),
_ => None,
}
if buffer.contains(&0) {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidData,
"file appears to be binary",
));
}
let text = std::str::from_utf8(&buffer)
.map_err(|_| std::io::Error::new(std::io::ErrorKind::InvalidData, "file is not UTF-8"))?
.to_string();
Ok((text, truncated))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The image MIME type resolver currently supports formats like BMP, TIFF, and PPM. However, major LLM providers (such as OpenAI and Anthropic) only officially support PNG, JPEG, WEBP, and non-animated GIF. Sending unsupported image formats will result in a 400 Bad Request API error from the provider, disrupting the chat session.

It is recommended to restrict the supported image extensions to only those officially supported by the LLM APIs.

fn image_mime_type_for_path(path: &Path) -> Option<&'static str> {
    let ext = path.extension()?.to_str()?.to_ascii_lowercase();
    match ext.as_str() {
        "png" => Some("image/png"),
        "jpg" | "jpeg" => Some("image/jpeg"),
        "gif" => Some("image/gif"),
        "webp" => Some("image/webp"),
        _ => None,
    }
}

Comment on lines +646 to 659
match std::fs::read(path) {
Ok(bytes) => blocks.push(ContentBlock::ImageUrl {
image_url: ImageUrlContent {
url: format!("data:{mime_type};base64,{}", BASE64.encode(bytes)),
},
}),
Err(err) => blocks.push(ContentBlock::Text {
text: format!(
"<unreadable-attachment kind=\"image\" path=\"{}\">\n{err}\n</unreadable-attachment>",
reference.path
),
cache_control: None,
}),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Reading arbitrary image files without a size limit can lead to high memory consumption, TUI lag, or API errors (OpenAI limits image uploads to 20MB, and Anthropic limits them to 5MB).

Checking the file size using std::fs::metadata before reading the file prevents loading excessively large files into memory and avoids guaranteed API failures.

        match std::fs::metadata(path) {
            Ok(meta) if meta.len() > 10 * 1024 * 1024 => {
                blocks.push(ContentBlock::Text {
                    text: format!(
                        "<attachment-too-large kind=\"image\" path=\"{}\" limit=\"10MB\" />",
                        reference.path
                    ),
                    cache_control: None,
                });
                continue;
            }
            _ => {}
        }

        match std::fs::read(path) {
            Ok(bytes) => blocks.push(ContentBlock::ImageUrl {
                image_url: ImageUrlContent {
                    url: format!("data:{mime_type};base64,{}", BASE64.encode(bytes)),
                },
            }),
            Err(err) => blocks.push(ContentBlock::Text {
                text: format!(
                    "<unreadable-attachment kind=\"image\" path=\"{}\">\n{err}\n</unreadable-attachment>",
                    reference.path
                ),
                cache_control: None,
            }),
        }

Comment on lines +646 to +651
match std::fs::read(path) {
Ok(bytes) => blocks.push(ContentBlock::ImageUrl {
image_url: ImageUrlContent {
url: format!("data:{mime_type};base64,{}", BASE64.encode(bytes)),
},
}),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unbounded image file read — no size cap

std::fs::read(path) loads the entire file into memory with no ceiling. The text-mention path enforces MAX_MENTION_FILE_BYTES = 128 KiB via take(), but this image path has no equivalent guard. A user who accidentally attaches a multi-hundred-MB raw image (or a file masquerading as one via its path) will fully base64-encode it in memory, blocking the async runtime and potentially sending a payload large enough to be rejected by the API or exhaust process memory.

Fix in Codex Fix in Claude Code Fix in Cursor

Comment on lines +1264 to +1266
content.extend(crate::tui::file_mention::media_attachment_content_blocks(
&text,
));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Blocking file I/O on the async executor thread

media_attachment_content_blocks calls std::fs::read synchronously from a non-async function that is itself called from the async fn handle_send_message. The codebase's own comment at line ~1316 ("Run the git work on the blocking pool so the async runtime stays responsive") documents why this pattern is avoided elsewhere. For small images the impact is negligible, but the same call path is now used unconditionally on every message send that contains any [Attached image: …] placeholder, including messages with large files.

Fix in Codex Fix in Claude Code Fix in Cursor

Comment on lines +806 to +812
fn summarize_image_url_for_inspect(url: &str) -> String {
let Some((prefix, encoded)) = url.split_once(";base64,") else {
return first_chars(url, 96);
};
format!("{prefix};base64,<{} chars>", encoded.len())
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate summarisation helper

summarize_image_url_for_inspect here and image_url_summary_for_compaction in compaction.rs (line 306) are byte-for-byte identical. If the truncation length (96 chars) or the format string ever needs to change, both copies must be updated. Extracting a shared helper to utils.rs or models.rs would eliminate the divergence risk.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code Fix in Cursor

@Hmbown Hmbown added this to the v0.8.51 milestone Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

无法上传本地图片

2 participants