fix(llm): emit structured input_image content for tool-result media in OpenAI Responses#28754
Conversation
…n OpenAI Responses
The native OpenAI Responses protocol previously lowered every tool-result
into a string via toolResultText, which for content-typed results
(`{ type: 'content', value: [text, media] }`) JSON-stringified the entire
array — including multi-megabyte base64 image data URLs — into a single
`function_call_output.output` string. OpenAI Responses rejects this
shape and emits a contentless stream `error` event, surfacing to the
caller as the bare "OpenAI Responses stream error".
Widen `function_call_output.output` in the body schema to accept the
real API shape (string or array of input_text/input_image) and add a
media-aware lowering helper that:
- emits structured `input_image` items for image media in tool results
- keeps the legacy string path for text/json/error results so existing
cassettes and provider expectations are unchanged
- raises a clear LLMError for unsupported tool-result media types (e.g.
audio) instead of silently encoding them
Adds three protocol-level reproducer tests for the lowering and a
RECORD-gated golden scenario (`image-tool-result`) that exercises a
real OpenAI Responses tool-image roundtrip end-to-end.
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
|
The following comment was made by an LLM, it may be inaccurate: Based on my search results, I found one related PR: fix(llm): emit structured image blocks for tool-result media in Anthropic Messages - PR #28755 This is a sibling fix mentioned in the current PR's description. It addresses the same issue but for Anthropic Messages instead of OpenAI Responses. Both PRs are part of the same effort to properly handle tool-result media with structured content types rather than JSON-stringifying them, which prevents token bloat when switching between providers and avoids OpenAI/Anthropic API rejections. |
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
Issue for this PR
Closes #28859
Type of change
What does this PR do?
OpenAI Responses was lowering all tool results through
toolResultText. For content-typed tool results, that JSON-stringified the entire content array, including base64 image media, intofunction_call_output.output.This PR widens the Responses request schema so
function_call_output.outputcan be either a string or an array of structured content items. Image tool-result media is now emitted asinput_imagecontent, while text/json/error tool results keep the existing string behavior. Unsupported non-image tool-result media now returns a clearLLMError.It also adds protocol-level regression tests and a recorded golden scenario for a real image returned from a tool result.
How did you verify your code works?
packages/llm: bun run testpassed with 209 pass, 28 skipbun run typecheckpassed with 15 successful tasksopenai-responses-gpt-5-5-image-tool-resultpassedScreenshots / recordings
Not applicable. This is an LLM protocol wire-shape fix.
Checklist