fix(provider/openai-compatible): trust user-declared modalities, dont inject fake ERROR text by Alezander9 · Pull Request #83 · browser-use/browsercode

Alezander9 · 2026-05-18T21:55:46Z

Summary

packages/opencode/src/provider/transform.ts — unsupportedParts replaces image/file parts with an inline ERROR: Cannot read <name> (this model does not support <modality> input) text part when model.capabilities.input.<modality> is false. This is the right behavior for native providers where models.dev knows the truth. For @ai-sdk/openai-compatible (the user-configured proxy provider) it's a silent footgun: there is no models.dev entry to consult, so image/audio/video/pdf all default to false unless the user explicitly declares modalities in opencode.json, and bcode silently strips screenshots from vision-capable upstream models.

Repro

Browser Use cloud's V4 LLM gateway is an openai-compatible proxy in front of Anthropic. Worker opencode.json declared:

"models": { "claude-opus-4.7": { "name": "claude-opus-4.7 via V4 gateway" } }

Vision smoketest: agent screenshots example.com via browser_execute, writes its verdict to a file. Result every run: "I CANNOT SEE THE IMAGE". Gateway-side request audit confirmed zero image_url / file / data:image/ signals in the wire body even when the synthetic "Attached media from tool result:" user message was hoisted with file parts. The file part was being replaced by unsupportedParts before streamText serialized the body — the model saw the fabricated error text as if it came from the user.

Capability derivation reference (provider.ts:1298-1304):

image: model.modalities?.input?.includes("image") ?? existingModel?.capabilities.input.image ?? false,

When neither the user-declared model nor a models.dev fallback declares modalities, image defaults to false. unsupportedParts then replaces the file part with:

{
  type: "text",
  text: `ERROR: Cannot read "${filename}" (this model does not support image input). Inform the user.`,
}

Fix

Early-return for @ai-sdk/openai-compatible — forward image/file parts as-is. If upstream truly can't handle them, the provider call returns a real error (e.g. Anthropic 400 "unsupported media type") which is far more debuggable than fabricated capability text the model reads back to the user.

if (model.api.npm === "@ai-sdk/openai-compatible") return msgs

Native providers (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/google, @ai-sdk/amazon-bedrock) keep the existing check — models.dev IS authoritative for them, and the filter prevents a real 4xx upstream while giving the user a clear local error.

Why not require users to declare modalities?

That's the workaround on our side (declaring modalities: { input: ['text', 'image'] } in opencode.json), which we're also shipping. But:

The current behavior is a silent footgun — there's no warning, the model just receives weird text.
The error text masquerades as user input, which trains the model to "explain why it can't see the image" instead of surfacing the bug.
For user-configured proxies bcode genuinely doesn't know what's downstream; defaulting to "strip everything" is the wrong direction. "Forward and let upstream answer" is honest.
Diff is 1 line + comment.

Risk

Other openai-compatible users who point bcode at a non-vision endpoint will now receive an upstream 4xx instead of a local stripped-with-fake-error. Strictly an error-message-quality regression for them; the model never received the image in either case.

Summary by cubic

Forward media parts for @ai-sdk/openai-compatible instead of replacing them with fake "ERROR: Cannot read…" text. This fixes silent stripping of images/files and lets upstream return real, debuggable errors when a modality isn’t supported.

Bug Fixes
- Bypass unsupportedParts for @ai-sdk/openai-compatible; media is forwarded as-is.
- Keep capability filtering for native providers (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/google, @ai-sdk/amazon-bedrock) where models.dev is authoritative.

^{Written for commit cbd37a8. Summary will update on new commits. Review in cubic}

…t inject fake ERROR text unsupportedParts (transform.ts) replaces image/file parts with an inline 'ERROR: Cannot read <name> (this model does not support <modality> input)' text part when model.capabilities.input.<modality> is false. The capability is derived in provider.ts:1298-1304 — for @ai-sdk/openai-compatible (the user-configured proxy) there is no models.dev entry, so image/audio/video/pdf all default to false unless the user explicitly declares modalities in opencode.json. This is a silent footgun in the user-configured-proxy case. The screenshot is stripped before the wire, and the model receives the fabricated 'ERROR: Cannot read image' text as if it came from the user — producing nonsensical replies like 'I can't see the image' even though upstream supports vision. Forwarding the part for openai-compatible providers is honest: if upstream truly can't handle it, we get a real provider error from the API call, which is far more debuggable than fabricated capability text. Native providers (@ai-sdk/anthropic, @ai-sdk/openai, @ai-sdk/google, @ai-sdk/amazon-bedrock) keep the existing check because models.dev IS authoritative for them — there the filter prevents a real 4xx upstream and gives the user a clear local error. Reproed against browser-use cloud's V4 LLM gateway (an openai-compatible proxy in front of Anthropic): vision smoketest replied 'I cannot see the image' on every run; gateway-side request audit confirmed zero image_url / file / data:image/ signals in the body even when the agent attached a screenshot via browser_execute. With this change image parts reach the wire and the model produces real vision replies.

cubic-dev-ai

No issues found across 1 file

_{Re-trigger cubic}

cubic-dev-ai Bot reviewed May 18, 2026

View reviewed changes

Alezander9 merged commit 8f9db17 into main May 18, 2026
3 checks passed

Alezander9 mentioned this pull request May 19, 2026

Revert: trust user-declared modalities in openai-compatible #84

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(provider/openai-compatible): trust user-declared modalities, dont inject fake ERROR text#83

fix(provider/openai-compatible): trust user-declared modalities, dont inject fake ERROR text#83
Alezander9 merged 1 commit into
mainfrom
alex/openai-compat-trust-user-modalities

Alezander9 commented May 18, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Alezander9 commented May 18, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Repro

Fix

Why not require users to declare modalities?

Risk

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alezander9 commented May 18, 2026 •

edited by cubic-dev-ai Bot

Loading