Skip to content

fix: prevent false success reporting on unresolved blocking error#4324

Merged
sauravpanda merged 5 commits into
mainfrom
add-better-data-grounding
Mar 11, 2026
Merged

fix: prevent false success reporting on unresolved blocking error#4324
sauravpanda merged 5 commits into
mainfrom
add-better-data-grounding

Conversation

@sauravpanda
Copy link
Copy Markdown
Collaborator

@sauravpanda sauravpanda commented Mar 11, 2026

Summary by cubic

Prevents false success by adding a blocking‑error check and tightens data grounding to verbatim copy across all browser agent prompts. Still forbids fabricated data; derived values are allowed with screenshot-based verification. Also reverts google-genai to 1.60.0 for stability.

  • Bug Fixes

    • Blocking error policy: payment declined, login without creds, email/verification wall, required paywall, access denied → set success=false. Temporary obstacles (CAPTCHA, popups) don’t count.
    • Data grounding: every URL/price/name/value must be observed verbatim in tool outputs, browser_state, or screenshot — copy exactly; don’t paraphrase or normalize URLs. Derived values (counts/totals) from observed data are allowed. Never construct; say not found.
    • Prompt cleanup: removed hard‑constraint enforcement, explicit scroll exceptions/“pages below” rules, and extract dedup guidance (already_collected/results file). Applied across all prompt variants.
  • Dependencies

    • Reverted google-genai from 1.65.0 to 1.60.0.

Written for commit b43c7dd. Summary will update on new commits.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="browser_use/agent/system_prompts/system_prompt_anthropic_flash.md">

<violation number="1" location="browser_use/agent/system_prompts/system_prompt_anthropic_flash.md:96">
P2: This grounding check omits `browser_vision`, so screenshot-only facts can no longer be treated as verified even though the prompt defines screenshots as the ground truth.</violation>
</file>

<file name="browser_use/agent/system_prompts/system_prompt.md">

<violation number="1" location="browser_use/agent/system_prompts/system_prompt.md:148">
P2: This wording forbids grounded derived results like counts, so tasks that require counting or simple computation can be incorrectly reported as `not found` or unsuccessful.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread browser_use/agent/system_prompts/system_prompt_anthropic_flash.md Outdated
Comment thread browser_use/agent/system_prompts/system_prompt.md Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 11, 2026

Agent Task Evaluation Results: 2/2 (100%)

View detailed results
Task Result Reason
amazon_laptop ✅ Pass Skipped - API key not available (fork PR or missing secret)
browser_use_pip ✅ Pass Skipped - API key not available (fork PR or missing secret)

Check the evaluate-tasks job for detailed task execution logs.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="browser_use/agent/system_prompts/system_prompt.md">

<violation number="1" location="browser_use/agent/system_prompts/system_prompt.md:148">
P2: Keep a verbatim requirement for direct fields; `observed` alone is loose enough to allow paraphrased names or normalized URLs back into `success=true` responses.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread browser_use/agent/system_prompts/system_prompt.md Outdated
@sauravpanda sauravpanda merged commit d2272ab into main Mar 11, 2026
80 checks passed
@sauravpanda sauravpanda deleted the add-better-data-grounding branch March 11, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant