feat(tui): send /attach images as multimodal content (#2584)#2607
Conversation
Adds OpenAI-compatible image_url content blocks to the chat message model, wiring attached images through build_chat_messages_with_reasoning as multimodal user-content arrays. When images are present, user messages emit a content array of text + image_url parts instead of a plain string, matching the OpenAI vision API shape. - models.rs: new ImageUrlContent struct, ContentBlock::ImageUrl variant - client/chat.rs: image_parts collection, multimodal wire format for user messages, image-aware message inspection, stream-event no-op - Exhaustiveness arms added across 10 files (compaction, seam_manager, capacity_flow, purge, notifications, session_picker, utils, working_set, rlm/session, runtime_api) - Test: request_builder_emits_openai_image_url_parts_for_user_images Credit: @xyuai (PR #2587 — root cause + initial implementation) Closes: #2584 Co-authored-by: xyuai <xyuai@users.noreply.github.com>
There was a problem hiding this comment.
Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Code Review
This pull request adds support for multimodal ImageUrl content blocks across the TUI client, updating message serialization, compaction, token estimation, and UI rendering. The feedback highlights three critical issues where ImageUrl blocks are handled incorrectly: returning serde_json::Value::Null in runtime_api.rs and rlm/session.rs introduces invalid nulls into serialized content arrays, and estimating ImageUrl blocks as 0 tokens in compaction.rs can lead to context window overflows. Suggestions are provided to properly serialize the image blocks and use a default estimate of 85 tokens.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| crate::models::ContentBlock::CodeExecutionToolResult { tool_use_id, content } => { | ||
| json!({ "type": "tool_result", "tool_use_id": tool_use_id, "content": content }) | ||
| } | ||
| crate::models::ContentBlock::ImageUrl { .. } => serde_json::Value::Null, |
There was a problem hiding this comment.
Returning serde_json::Value::Null for ImageUrl blocks introduces null elements into the content array of the API response. This violates the expected schema (where all elements should be valid ContentBlock objects) and can cause deserialization or rendering failures in API clients. Instead, represent the ImageUrl block properly using its type and URL.
crate::models::ContentBlock::ImageUrl { image_url } => {
json!({ "type": "image_url", "image_url": { "url": image_url.url.clone() } })
}| "tool_use_id": tool_use_id, | ||
| "content": content, | ||
| }), | ||
| ContentBlock::ImageUrl { .. } => serde_json::Value::Null, |
There was a problem hiding this comment.
Returning serde_json::Value::Null for ImageUrl blocks introduces null elements into the compacted content array. This violates the expected schema of ContentBlock objects and can cause deserialization or parsing errors when the RLM session is processed. Instead, represent the ImageUrl block as a valid JSON object, truncating the URL to prevent context bloat from large base64 strings.
ContentBlock::ImageUrl { image_url } => json!({
"type": "image_url",
"image_url": {
"url": truncate_chars(&image_url.url, 128),
}
}),| | ContentBlock::CodeExecutionToolResult { .. } | ||
| | ContentBlock::ImageUrl { .. } => 0, |
There was a problem hiding this comment.
Estimating ImageUrl blocks as 0 tokens will cause the compaction logic to significantly underestimate the total token count of the conversation. This can lead to context window overflow errors (HTTP 400) from the LLM provider because compaction won't be triggered in time when images are attached. Every image in OpenAI's multimodal API costs at least 85 tokens (low detail base cost). We should use a conservative default estimate of 85 tokens for ImageUrl blocks.
| | ContentBlock::CodeExecutionToolResult { .. } | |
| | ContentBlock::ImageUrl { .. } => 0, | |
| | ContentBlock::CodeExecutionToolResult { .. } => 0, | |
| ContentBlock::ImageUrl { .. } => 85, |
…bown#2587) (Hmbown#2607) Adds OpenAI-compatible image_url content blocks to the chat message model, wiring attached images through build_chat_messages_with_reasoning as multimodal user-content arrays. When images are present, user messages emit a content array of text + image_url parts instead of a plain string, matching the OpenAI vision API shape. - models.rs: new ImageUrlContent struct, ContentBlock::ImageUrl variant - client/chat.rs: image_parts collection, multimodal wire format for user messages, image-aware message inspection, stream-event no-op - Exhaustiveness arms added across 10 files (compaction, seam_manager, capacity_flow, purge, notifications, session_picker, utils, working_set, rlm/session, runtime_api) - Test: request_builder_emits_openai_image_url_parts_for_user_images Credit: @xyuai (PR Hmbown#2587 — root cause + initial implementation) Closes: Hmbown#2584 Co-authored-by: xyuai <xyuai@users.noreply.github.com>
Adds OpenAI-compatible image_url content blocks for multimodal image support. Rebased from @xyuai's PR #2587 onto v0.8.51-era main with cycle-removal and compaction-refactor conflicts resolved.
Changes
models.rs:ImageUrlContentstruct,ContentBlock::ImageUrlvariantclient/chat.rs: image_parts collection, multimodal wire format, image-aware inspection, stream-event no-opTest
cargo test -p codewhale-tui: 3,931 passed, 0 failed (includes newrequest_builder_emits_openai_image_url_parts_for_user_imagestest)Closes #2584. Credit: @xyuai for the original implementation in #2587.