fix(provider/openai-compatible): trust user-declared modalities, dont inject fake ERROR text#83
Merged
Conversation
…t inject fake ERROR text unsupportedParts (transform.ts) replaces image/file parts with an inline 'ERROR: Cannot read <name> (this model does not support <modality> input)' text part when model.capabilities.input.<modality> is false. The capability is derived in provider.ts:1298-1304 — for @ai-sdk/openai-compatible (the user-configured proxy) there is no models.dev entry, so image/audio/video/pdf all default to false unless the user explicitly declares modalities in opencode.json. This is a silent footgun in the user-configured-proxy case. The screenshot is stripped before the wire, and the model receives the fabricated 'ERROR: Cannot read image' text as if it came from the user — producing nonsensical replies like 'I can't see the image' even though upstream supports vision. Forwarding the part for openai-compatible providers is honest: if upstream truly can't handle it, we get a real provider error from the API call, which is far more debuggable than fabricated capability text. Native providers (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/google, @ai-sdk/amazon-bedrock) keep the existing check because models.dev IS authoritative for them — there the filter prevents a real 4xx upstream and gives the user a clear local error. Reproed against browser-use cloud's V4 LLM gateway (an openai-compatible proxy in front of Anthropic): vision smoketest replied 'I cannot see the image' on every run; gateway-side request audit confirmed zero image_url / file / data:image/ signals in the body even when the agent attached a screenshot via browser_execute. With this change image parts reach the wire and the model produces real vision replies.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
packages/opencode/src/provider/transform.ts—unsupportedPartsreplaces image/file parts with an inlineERROR: Cannot read <name> (this model does not support <modality> input)text part whenmodel.capabilities.input.<modality>is false. This is the right behavior for native providers where models.dev knows the truth. For@ai-sdk/openai-compatible(the user-configured proxy provider) it's a silent footgun: there is no models.dev entry to consult, soimage/audio/video/pdfall default tofalseunless the user explicitly declaresmodalitiesinopencode.json, and bcode silently strips screenshots from vision-capable upstream models.Repro
Browser Use cloud's V4 LLM gateway is an openai-compatible proxy in front of Anthropic. Worker
opencode.jsondeclared:Vision smoketest: agent screenshots example.com via
browser_execute, writes its verdict to a file. Result every run: "I CANNOT SEE THE IMAGE". Gateway-side request audit confirmed zeroimage_url/file/data:image/signals in the wire body even when the synthetic "Attached media from tool result:" user message was hoisted with file parts. The file part was being replaced byunsupportedPartsbeforestreamTextserialized the body — the model saw the fabricated error text as if it came from the user.Capability derivation reference (provider.ts:1298-1304):
When neither the user-declared model nor a models.dev fallback declares modalities, image defaults to false.
unsupportedPartsthen replaces the file part with:Fix
Early-return for
@ai-sdk/openai-compatible— forward image/file parts as-is. If upstream truly can't handle them, the provider call returns a real error (e.g. Anthropic 400 "unsupported media type") which is far more debuggable than fabricated capability text the model reads back to the user.Native providers (
@ai-sdk/anthropic,@ai-sdk/openai,@ai-sdk/google,@ai-sdk/amazon-bedrock) keep the existing check — models.dev IS authoritative for them, and the filter prevents a real 4xx upstream while giving the user a clear local error.Why not require users to declare modalities?
That's the workaround on our side (declaring
modalities: { input: ['text', 'image'] }inopencode.json), which we're also shipping. But:Risk
Other openai-compatible users who point bcode at a non-vision endpoint will now receive an upstream 4xx instead of a local stripped-with-fake-error. Strictly an error-message-quality regression for them; the model never received the image in either case.
Summary by cubic
Forward media parts for
@ai-sdk/openai-compatibleinstead of replacing them with fake "ERROR: Cannot read…" text. This fixes silent stripping of images/files and lets upstream return real, debuggable errors when a modality isn’t supported.unsupportedPartsfor@ai-sdk/openai-compatible; media is forwarded as-is.@ai-sdk/anthropic,@ai-sdk/openai,@ai-sdk/google,@ai-sdk/amazon-bedrock) where models.dev is authoritative.Written for commit cbd37a8. Summary will update on new commits. Review in cubic