Skip to content

fix(cua): stop custom tools from leaving Gemini on stale screenshots#2061

Draft
cooleryu wants to merge 1 commit intobrowserbase:mainfrom
cooleryu:fix-google-cua-custom-tool-screenshot
Draft

fix(cua): stop custom tools from leaving Gemini on stale screenshots#2061
cooleryu wants to merge 1 commit intobrowserbase:mainfrom
cooleryu:fix-google-cua-custom-tool-screenshot

Conversation

@cooleryu
Copy link
Copy Markdown

@cooleryu cooleryu commented Apr 28, 2026

Summary

Fixes stale visual feedback after Google CUA custom tools by attaching a fresh screenshot to custom tool function responses.

Also normalizes side-effect-only custom tool results so tools like Playwright fill() no longer report undefined back to the model, and keeps the CUA client's current URL synchronized whenever the screenshot provider captures the active page.

Motivation

In #1635, custom tools can mutate the page successfully, but the next Google CUA turn may not receive a fresh visual observation. That can make the model conclude that a tool did nothing and then retry the same work with lower-level computer actions.

The repro also uses tools whose implementations return undefined, which is common for side-effect-only browser operations. Passing that raw value back makes a successful action look ambiguous in the model transcript.

Changes

  • Attach one post-action screenshot to Google custom tool functionResponse parts.
  • Reuse that screenshot when a step includes both custom tools and computer-use function calls, avoiding duplicate screenshots for the same post-action state.
  • Add a shared custom tool result formatter so undefined becomes Tool executed successfully across Google, OpenAI, and Anthropic CUA clients.
  • Update the V3 CUA screenshot provider to refresh the client URL from the active page whenever it captures a screenshot.
  • Add unit coverage for Google custom tool screenshots, undefined custom tool results, mixed custom/computer-use responses, and screenshot-provider URL synchronization.

Test Plan

corepack pnpm --filter @browserbasehq/stagehand run build:esm
corepack pnpm --dir packages/core exec vitest run --config vitest.esm.config.mjs dist/esm/tests/unit/google-cua-client.test.js dist/esm/tests/unit/openai-cua-client.test.js dist/esm/tests/unit/anthropic-cua-client.test.js dist/esm/tests/unit/agent-captcha-hooks.test.js dist/esm/tests/unit/safety-confirmation.test.js --reporter=default
corepack pnpm --dir packages/core exec prettier --check lib/v3/agent/GoogleCUAClient.ts lib/v3/agent/OpenAICUAClient.ts lib/v3/agent/AnthropicCUAClient.ts lib/v3/agent/utils/googleCustomToolHandler.ts lib/v3/agent/utils/customToolResult.ts lib/v3/handlers/v3CuaAgentHandler.ts tests/unit/google-cua-client.test.ts tests/unit/openai-cua-client.test.ts tests/unit/anthropic-cua-client.test.ts tests/unit/agent-captcha-hooks.test.ts
git diff --check

Result: build passed; 5 unit test files passed with 23 tests; Prettier passed; git diff --check passed.

Risk

The Google CUA path now includes an image in custom tool function responses, which is the intended behavior for this issue but may slightly increase payload size after custom tools. The screenshot is captured once per post-action state and reused for mixed responses to keep that cost bounded.

For OpenAI and Anthropic, this PR only normalizes successful undefined custom tool results; it does not add screenshots to their custom tool outputs.

Related Issue

Fixes #1635


Summary by cubic

Fixes stale screenshots in Google CUA after custom tools by attaching a fresh post-action screenshot and reusing it within the step. Also normalizes side-effect-only tool results and keeps the client URL in sync when screenshots are captured.

  • Bug Fixes
    • Google CUA: Attach one post-action screenshot to custom tool functionResponse, and reuse it for any computer-use responses in the same step.
    • Custom tools: Convert undefined returns to "Tool executed successfully" across Google, OpenAI, and Anthropic via a shared formatter.
    • Screenshot provider: Update the CUA client’s current URL from the active page on each capture.

Written for commit 7350c86. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 28, 2026

⚠️ No Changeset found

Latest commit: 7350c86

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent screenshot timing issues

1 participant