Skip to content

v3.1.0 — Richer clickable naming for DOM grounding

Choose a tag to compare

@github-actions github-actions released this 29 May 21:41
· 26 commits to main since this release
cd4ee22

v3.1.0 — Richer clickable naming

Improves the DOM-grounding clickable map the agent sees, so it can tell elements apart from the text channel instead of relying on the screenshot alone — especially anchors, JS-handler buttons, and icon-only controls.

What changed

  • Detection now unions the markup heuristic with CDP's native isClickable, catching elements made clickable via JS event listeners (not just <a>/<button>/onclick).
  • Naming falls back through a chain so elements stop arriving as bare [N] tag:
    aria-label/alt/title/name/placeholderdescendant text (e.g. <a><span>Home</span></a> → "Home") → onclick handler name (viewBankAccount('0') → "view bank account", verb only) → FontAwesome icon class (fa-eye → "view", fa-plus-circle → "add") → role.
  • role overrides the displayed tag, so <i role="button"> is shown as button, not i.
  • Clickable map cap raised 80 → 200, sorted visible-first so on-screen elements survive truncation on dense pages (e.g. long sidebars).
  • snapshot_parser split into snapshot_parser / clickables / clickable_naming to honor the 200-LOC module cap.

Verification

  • Unit tests for the naming chain (text/onclick/icon/role precedence, FontAwesome mapping, style-token filtering).
  • Live e2e against headless Chrome confirming anchor text, nested-span text, onclick-derived names, and rolebutton all resolve on a real page.

🤖 Generated with Claude Code