Skip to content

v2.5.0

Choose a tag to compare

@github-actions github-actions released this 10 May 06:05
· 66 commits to main since this release
2a1187b

Highlights

DomExtractor has a real implementation. Before v2.5.0 the trait existed in ras-dom with zero implementationsBrowserStateSummary was unreachable from any action and Phase B (numbered clickable index map for prompts) was blocked.

v2.5.0 ships ChromiumoxideDomExtractor using pure CDP via DOMSnapshot.captureSnapshot, the same primitive Puppeteer and Playwright use for fast structural snapshots.

This is Phase C of the agent grounding fix. Phase B (prompt wiring) ships as 2.6.0.

This release also carries the v2.4.1 anthropic ImageUrl fix (#27) to crates.io — that patch was held per publish.yml patch-skip gate and folded into this minor.

What's new

ras-dom::ChromiumoxideDomExtractor

  • New module ras-dom/src/infrastructure/chromiumoxide/ with extractor.rs, snapshot.rs, snapshot_parser.rs, highlight.rs.
  • ChromiumoxideDomExtractor::new(Arc<Mutex<Browser>>, Duration) — wires a browser handle plus request timeout.
  • Implements the DomExtractor trait that has lived without an impl since 2.0.
  • Re-exported as ras_dom::ChromiumoxideDomExtractor.

snapshot() — pure-CDP path

  • One Page.execute(CaptureSnapshotParams) round-trip with includePaintOrder = true, includeDOMRects = true.
  • Parser walks NodeTreeSnapshot parallel arrays (node_name, attributes, backendNodeId) resolving StringIndex references through resp.strings.
  • Layout: node_index → BoundingBox map from LayoutTreeSnapshot.bounds.
  • Clickable detection: tag in {a, button, input, select, textarea, summary, label, details} OR presence of onclick / tabindex / role / aria-pressed / aria-checked.
  • ax_name derived from first non-empty of aria-label, alt, title, name, placeholder.
  • label from value attribute.
  • Tabs via Browser.execute(GetTargetsParams) filtered to type=="page".
  • Inline screenshot via Page.screenshot (PNG, viewport).
  • Whole flow wrapped in tokio::time::timeout(request_timeout).

highlight() — draw-bbox canvas overlay

  • Page.evaluate installs a fixed-position 100vw/100vh pointer-events: none overlay div at z-index 2^31-1 (highest valid value, sits above app UI without intercepting events).
  • Same selector set as the snapshot parser. Slices to options.max_index (default 200).
  • Per visible element: 2px #ff3366 border box (gated on options.draw_bounding_boxes) and [N] index label above it (gated on options.include_text_labels).
  • After screenshot, a second Page.evaluate unconditionally removes the overlay.
  • Index labels match the index space snapshot() produces — a model that sees [3] in the highlighted screenshot can call click_element(index=3) directly. Phase B wires the prompt plumbing.

ras-llm-anthropic — ImageUrl native source.type=url (carried from v2.4.1 / #27)

  • AnthropicImageSource refactored from struct to enum; ContentPart::ImageUrl now emits Anthropic's native {\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"...\"}} shape.

Known gap (#31)

ChromiumoxideAdapter does not yet expose a browser_arc() accessor for its Arc<Mutex<Browser>> field. Today the only ways to construct ChromiumoxideDomExtractor are:

  1. Open a second Browser::connect_with_config to the same CDP URL (doubles WebSocket connections, separate target space).
  2. Custom adapter path.

The accessor will land in v2.6.0 alongside Phase B's ToolContext wiring. Tracked at #31.

Architecture decisions

  • Impl lives in ras-dom, not ras-cdp, because ras-dom → ras-cdp is the existing dependency direction. Reversing it would have caused a cycle. ras-dom now depends on chromiumoxide and tokio directly.
  • ChromiumoxideDomExtractor takes a shared Arc<Mutex<Browser>> instead of owning its own connection — caller decides how the handle is shared.
  • No fixture-JSON parser unit tests in this release. The chromiumoxide_cdp types are codegen'd from a .pdl file; constructing valid CaptureSnapshotReturns by hand is mechanical busywork that doesn't catch real bugs (which live at the CDP wire level). Real verification needs a live Chrome.

Deferred to follow-ups

  • #31ChromiumoxideAdapter::browser_arc() accessor (target: v2.6.0).
  • Full EnhancedDomTreeNode treetree: None in BrowserStateSummary. Phase B prompt injection only needs clickables.
  • stable_hash — empty string in ClickableElement.stable_hash. Wiring ras_dom::application::stable_hash requires building the tree first.
  • Real AX tree via Accessibility.getFullAXTree — current ax_name from attributes is a sound MVP but misses computed accessibility names.
  • Paint-order occlusionpaint_orders requested in the CDP call but not yet used. ras_dom::application::paint_order exists; can plug in.
  • Phase B (v2.6.0) — wire Arc<dyn DomExtractor> into ToolContext, post-action snapshot in click/navigate/scroll, numbered index map in agent prompt.

Verification

  • cargo test --workspace --no-fail-fast — all 97 test groups pass
  • cargo clippy --workspace --all-targets -- -D clippy::unwrap_used -D clippy::dbg_macro — clean
  • cargo fmt --all -- --check — clean
  • cargo doc --workspace --no-deps — clean

LOC per file (200 cap):

  • extractor.rs 51
  • snapshot.rs 150
  • snapshot_parser.rs 178
  • highlight.rs 115

Compatibility

  • New types are purely additive (ChromiumoxideDomExtractor, new module path).
  • ras-dom direct dependencies grew: now depends on chromiumoxide and tokio directly (transitively via ras-cdp before, but explicit now).
  • No breaking changes to public APIs in any existing crate.
  • Workspace MSRV unchanged.

Artifacts

  • Linux x86_64: ras-x86_64-unknown-linux-gnu, ras-daemon-x86_64-unknown-linux-gnu
  • macOS arm64: ras-aarch64-apple-darwin, ras-daemon-aarch64-apple-darwin
  • crates.io: all ras-* workspace crates published at 2.5.0 once publish.yml finishes (v2.4.1 anthropic fix carried)

Pull requests

  • #29feat(dom): ChromiumoxideDomExtractor via DOMSnapshot.captureSnapshot (v2.5.0)
  • #30release: v2.5.0 (CDP DomExtractor)

Sub-phase commits

  • feat(dom): scaffold ChromiumoxideDomExtractor (Phase C1) — 2.4.2
  • feat(dom): implement snapshot() via DOMSnapshot.captureSnapshot (Phase C2) — 2.4.3
  • feat(dom): implement highlight() with draw-bbox canvas overlay (Phase C3) — 2.4.4
  • chore: bump to 2.5.0

Full changelog: v2.4.1...v2.5.0