Skip to content

[STG-NEW] Add ui-test skill for adversarial UI testing#56

Open
shrey150 wants to merge 7 commits intomainfrom
shrey/ui-test-skill
Open

[STG-NEW] Add ui-test skill for adversarial UI testing#56
shrey150 wants to merge 7 commits intomainfrom
shrey/ui-test-skill

Conversation

@shrey150
Copy link
Contributor

@shrey150 shrey150 commented Mar 26, 2026

Summary

  • Adds ui-test skill for AI-powered adversarial UI testing via the browse CLI
  • Builds on Add ui-test skill for agentic UI testing #52 — keeps the UX heuristics, browser recipes, codebase analysis, and exploratory testing references; adds local/remote mode selection, diff-driven testing, structured assertions, and adversarial patterns
  • Smoke-tested against a local Next.js app — found real bugs (Escape not closing modals, undersized mobile touch targets)

What's new vs #52

Feature #52 This PR
Local browser for localhost No (always Browserbase) Yes — browse env local, no API key needed
Cookie-sync for remote auth Mentioned but not wired up Full workflow with examples
Diff-driven testing No (full suite only) git diff → targeted tests for what changed
Assertion protocol Freeform STEP_PASS|id|evidence / STEP_FAIL|id|expected → actual
Before/after comparison No Snapshot before act, snapshot after, compare trees
Adversarial patterns No XSS, empty submit, rapid click, keyboard-only, focus trap
EXAMPLES.md No 8 examples with exact commands and expected output
Console capture fix about:blank injection (broken) On-page injection (working)
browse eval await fix Uses await (broken) Uses .then() (working)

Files

skills/ui-test/
├── SKILL.md                              # Skill definition (478 lines)
├── EXAMPLES.md                           # 8 worked examples with assertions
├── LICENSE.txt                           # MIT
├── README.md                             # Overview (from #52)
├── rules/ux-heuristics.md               # 6 evaluation frameworks (from #52)
├── references/
│   ├── browser-recipes.md               # Deterministic check recipes (fixed)
│   ├── codebase-analysis.md             # 8-step suite generation (from #52)
│   └── exploratory-testing.md           # Agent-driven QA guide (from #52)
└── examples/
    └── browserbase-dashboard-suite.yml   # Example suite (from #52)

Test plan

  • Smoke tested component rendering (before/after snapshot comparison)
  • Smoke tested form validation (happy path + adversarial: empty, XSS, long input, keyboard-only)
  • Smoke tested modal lifecycle (open, cancel, escape, confirm, focus trap)
  • Smoke tested axe-core accessibility audit (deterministic violation count)
  • Smoke tested responsive screenshots + deterministic overflow/touch-target checks
  • Smoke tested console error capture (on-page injection pattern)
  • Smoke tested remote Browserbase mode with API key
  • Found 2 real bugs in test app confirming adversarial patterns work

🤖 Generated with Claude Code


Note

Low Risk
Low risk: this PR adds new Markdown-based skill documentation and examples without changing runtime application code or existing behaviors.

Overview
Adds a new ui-test skill under skills/ui-test/ that defines a structured, evidence-based UI testing workflow using the browse CLI, including diff-driven, exploratory, and parallel (multi-session) testing modes.

Includes extensive worked examples (EXAMPLES.md), deterministic check recipes (axe-core, console/resource errors, responsive overflow/touch targets), and supporting reference/heuristics docs, plus an MIT LICENSE.txt and top-level README.md for installation and usage.

Written by Cursor Bugbot for commit 4b04ef0. This will update automatically on new commits. Configure here.

Builds on #52 with three key additions:

1. Local/remote mode selection — localhost uses local browser (no API key),
   deployed sites use Browserbase via cookie-sync for authenticated testing

2. Diff-driven testing — analyze git diff, generate targeted tests for what
   changed, execute with before/after snapshot comparison

3. Structured assertion protocol — STEP_PASS/STEP_FAIL markers with evidence,
   deterministic checks (axe-core, console errors, overflow detection), and
   adversarial testing patterns (XSS, empty submit, rapid click, keyboard-only)

Smoke-tested against a local Next.js app: found real bugs (Escape not closing
modals, undersized mobile touch targets) that confirmed the adversarial patterns
work. Fixed browse eval recipes (no top-level await, console capture on-page not
about:blank).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
shubh24 and others added 4 commits March 26, 2026 12:41
…ssions

Enables concurrent test execution by leveraging browse CLI's --session flag
to spin up independent Browserbase browsers per test group, with fan-out
via Agent tool and merged result reporting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents how to add Bash(browse:*) to project or user settings
so users don't get prompted on every browse snapshot/click/eval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t figure it out

- Remove .ui-tests/suite.yml format and generation pipeline
- Replace Workflow B (8-step codebase analysis) with lightweight exploratory testing
- Simplify references/codebase-analysis.md to quick hints (framework detection, route finding)
- Remove example YAML suite file
- Update README to reflect no-artifacts philosophy
- Drop Write tool from allowed-tools (no files to generate)

The codegen/suite approach can ship as v2 later.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- XSS check: replace false-positive inline script count with input value check
- Console capture: preserve original console.error in Examples 6 snippets
- Form labels: use native i.labels API in browser-recipes.md (matches SKILL.md)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

### Form structure

```bash
browse eval "JSON.stringify(Array.from(document.querySelectorAll('form')).map(f => ({ action: f.action, inputs: Array.from(f.querySelectorAll('input,select,textarea')).map(i => ({ name: i.name, type: i.type, required: i.required, hasLabel: !!i.labels?.length })) })))"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent hasLabel check misses aria-label inputs

Medium Severity

The form structure recipe in SKILL.md computes hasLabel as !!i.labels?.length, which only checks for associated <label> elements. The same recipe in browser-recipes.md correctly uses !!i.labels?.length || !!i.getAttribute('aria-label'), also covering aria-label attributes. Since SKILL.md is the primary instruction file and tells the agent "any false = accessibility FAIL," inputs that use aria-label instead of <label> will produce false accessibility failures.

Additional Locations (1)
Fix in Cursor Fix in Web


# Step 2: Wait for script to load, then run audit
# (wait 2-3 seconds for the script to load)
browse eval "axe.run().then(r => JSON.stringify({ violations: r.violations.map(v => ({ id: v.id, impact: v.impact, description: v.description, nodes: v.nodes.length, help: v.helpUrl })), passes: r.passes.length, incomplete: r.incomplete.length }))"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axe-core recipe lacks wait between load and run

Low Severity

The axe-core recipes across all three files inject the script via eval and immediately call axe.run() on the next line. There's only a comment ("wait 2-3 seconds") but no actual browse wait timeout command between them. Since browser-recipes.md is framed as "copy-paste recipes," the missing wait can cause a ReferenceError on axe if the script hasn't finished loading. A browse wait timeout 3000 between the two evals would match the pattern used elsewhere (e.g., the responsive screenshot sweep).

Additional Locations (2)
Fix in Cursor Fix in Web

browse screenshot /tmp/explore-home.png

# Console health check
browse eval "JSON.stringify({errors: (window.__capturedErrors || []).length})"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Console error variable name mismatch with injection recipe

Medium Severity

Example 8's console health check reads from window.__capturedErrors, but every console capture injection recipe across all files (SKILL.md, EXAMPLES.md Example 6, browser-recipes.md, exploratory-testing.md) stores errors in window.__logs. Since window.__capturedErrors is never defined anywhere, the fallback || [] ensures it always reports {errors: 0} — silently hiding any console errors the agent was supposed to detect.

Fix in Cursor Fix in Web

shubh24 and others added 2 commits March 26, 2026 13:32
Also strengthens auto-select rule: localhost → browse env local,
deployed URLs → browse env remote, applied consistently across
all workflows including parallel sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants