Skip to content

feat(tui): send /attach images as multimodal content (#2584)#2607

Merged
Hmbown merged 1 commit into
mainfrom
fix/image-attach-2587
Jun 3, 2026
Merged

feat(tui): send /attach images as multimodal content (#2584)#2607
Hmbown merged 1 commit into
mainfrom
fix/image-attach-2587

Conversation

@Hmbown
Copy link
Copy Markdown
Owner

@Hmbown Hmbown commented Jun 3, 2026

Adds OpenAI-compatible image_url content blocks for multimodal image support. Rebased from @xyuai's PR #2587 onto v0.8.51-era main with cycle-removal and compaction-refactor conflicts resolved.

Changes

  • models.rs: ImageUrlContent struct, ContentBlock::ImageUrl variant
  • client/chat.rs: image_parts collection, multimodal wire format, image-aware inspection, stream-event no-op
  • 10 files: exhaustiveness arms for new variant

Test

cargo test -p codewhale-tui: 3,931 passed, 0 failed (includes new request_builder_emits_openai_image_url_parts_for_user_images test)

Closes #2584. Credit: @xyuai for the original implementation in #2587.

Adds OpenAI-compatible image_url content blocks to the chat message
model, wiring attached images through build_chat_messages_with_reasoning
as multimodal user-content arrays. When images are present, user
messages emit a content array of text + image_url parts instead of a
plain string, matching the OpenAI vision API shape.

- models.rs: new ImageUrlContent struct, ContentBlock::ImageUrl variant
- client/chat.rs: image_parts collection, multimodal wire format for
  user messages, image-aware message inspection, stream-event no-op
- Exhaustiveness arms added across 10 files (compaction, seam_manager,
  capacity_flow, purge, notifications, session_picker, utils,
  working_set, rlm/session, runtime_api)
- Test: request_builder_emits_openai_image_url_parts_for_user_images

Credit: @xyuai (PR #2587 — root cause + initial implementation)
Closes: #2584

Co-authored-by: xyuai <xyuai@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for multimodal ImageUrl content blocks across the TUI client, updating message serialization, compaction, token estimation, and UI rendering. The feedback highlights three critical issues where ImageUrl blocks are handled incorrectly: returning serde_json::Value::Null in runtime_api.rs and rlm/session.rs introduces invalid nulls into serialized content arrays, and estimating ImageUrl blocks as 0 tokens in compaction.rs can lead to context window overflows. Suggestions are provided to properly serialize the image blocks and use a default estimate of 85 tokens.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

crate::models::ContentBlock::CodeExecutionToolResult { tool_use_id, content } => {
json!({ "type": "tool_result", "tool_use_id": tool_use_id, "content": content })
}
crate::models::ContentBlock::ImageUrl { .. } => serde_json::Value::Null,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Returning serde_json::Value::Null for ImageUrl blocks introduces null elements into the content array of the API response. This violates the expected schema (where all elements should be valid ContentBlock objects) and can cause deserialization or rendering failures in API clients. Instead, represent the ImageUrl block properly using its type and URL.

                    crate::models::ContentBlock::ImageUrl { image_url } => {
                        json!({ "type": "image_url", "image_url": { "url": image_url.url.clone() } })
                    }

"tool_use_id": tool_use_id,
"content": content,
}),
ContentBlock::ImageUrl { .. } => serde_json::Value::Null,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Returning serde_json::Value::Null for ImageUrl blocks introduces null elements into the compacted content array. This violates the expected schema of ContentBlock objects and can cause deserialization or parsing errors when the RLM session is processed. Instead, represent the ImageUrl block as a valid JSON object, truncating the URL to prevent context bloat from large base64 strings.

        ContentBlock::ImageUrl { image_url } => json!({
            "type": "image_url",
            "image_url": {
                "url": truncate_chars(&image_url.url, 128),
            }
        }),

Comment on lines +567 to +568
| ContentBlock::CodeExecutionToolResult { .. }
| ContentBlock::ImageUrl { .. } => 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Estimating ImageUrl blocks as 0 tokens will cause the compaction logic to significantly underestimate the total token count of the conversation. This can lead to context window overflow errors (HTTP 400) from the LLM provider because compaction won't be triggered in time when images are attached. Every image in OpenAI's multimodal API costs at least 85 tokens (low detail base cost). We should use a conservative default estimate of 85 tokens for ImageUrl blocks.

Suggested change
| ContentBlock::CodeExecutionToolResult { .. }
| ContentBlock::ImageUrl { .. } => 0,
| ContentBlock::CodeExecutionToolResult { .. } => 0,
ContentBlock::ImageUrl { .. } => 85,

@Hmbown Hmbown merged commit dd26114 into main Jun 3, 2026
14 of 16 checks passed
@Hmbown Hmbown deleted the fix/image-attach-2587 branch June 3, 2026 04:27
yanghucai pushed a commit to yanghucai/CodeWhale that referenced this pull request Jun 3, 2026
…bown#2587) (Hmbown#2607)

Adds OpenAI-compatible image_url content blocks to the chat message
model, wiring attached images through build_chat_messages_with_reasoning
as multimodal user-content arrays. When images are present, user
messages emit a content array of text + image_url parts instead of a
plain string, matching the OpenAI vision API shape.

- models.rs: new ImageUrlContent struct, ContentBlock::ImageUrl variant
- client/chat.rs: image_parts collection, multimodal wire format for
  user messages, image-aware message inspection, stream-event no-op
- Exhaustiveness arms added across 10 files (compaction, seam_manager,
  capacity_flow, purge, notifications, session_picker, utils,
  working_set, rlm/session, runtime_api)
- Test: request_builder_emits_openai_image_url_parts_for_user_images

Credit: @xyuai (PR Hmbown#2587 — root cause + initial implementation)
Closes: Hmbown#2584

Co-authored-by: xyuai <xyuai@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

无法上传本地图片

1 participant