v2.5.0
Highlights
DomExtractor has a real implementation. Before v2.5.0 the trait existed in ras-dom with zero implementations — BrowserStateSummary was unreachable from any action and Phase B (numbered clickable index map for prompts) was blocked.
v2.5.0 ships ChromiumoxideDomExtractor using pure CDP via DOMSnapshot.captureSnapshot, the same primitive Puppeteer and Playwright use for fast structural snapshots.
This is Phase C of the agent grounding fix. Phase B (prompt wiring) ships as 2.6.0.
This release also carries the v2.4.1 anthropic ImageUrl fix (#27) to crates.io — that patch was held per publish.yml patch-skip gate and folded into this minor.
What's new
ras-dom::ChromiumoxideDomExtractor
- New module
ras-dom/src/infrastructure/chromiumoxide/withextractor.rs,snapshot.rs,snapshot_parser.rs,highlight.rs. ChromiumoxideDomExtractor::new(Arc<Mutex<Browser>>, Duration)— wires a browser handle plus request timeout.- Implements the
DomExtractortrait that has lived without an impl since 2.0. - Re-exported as
ras_dom::ChromiumoxideDomExtractor.
snapshot() — pure-CDP path
- One
Page.execute(CaptureSnapshotParams)round-trip withincludePaintOrder = true,includeDOMRects = true. - Parser walks
NodeTreeSnapshotparallel arrays (node_name,attributes,backendNodeId) resolvingStringIndexreferences throughresp.strings. - Layout:
node_index → BoundingBoxmap fromLayoutTreeSnapshot.bounds. - Clickable detection: tag in
{a, button, input, select, textarea, summary, label, details}OR presence ofonclick/tabindex/role/aria-pressed/aria-checked. ax_namederived from first non-empty ofaria-label,alt,title,name,placeholder.labelfromvalueattribute.- Tabs via
Browser.execute(GetTargetsParams)filtered totype=="page". - Inline screenshot via
Page.screenshot(PNG, viewport). - Whole flow wrapped in
tokio::time::timeout(request_timeout).
highlight() — draw-bbox canvas overlay
Page.evaluateinstalls a fixed-position 100vw/100vhpointer-events: noneoverlay div at z-index 2^31-1 (highest valid value, sits above app UI without intercepting events).- Same selector set as the snapshot parser. Slices to
options.max_index(default 200). - Per visible element: 2px
#ff3366border box (gated onoptions.draw_bounding_boxes) and[N]index label above it (gated onoptions.include_text_labels). - After screenshot, a second
Page.evaluateunconditionally removes the overlay. - Index labels match the index space
snapshot()produces — a model that sees[3]in the highlighted screenshot can callclick_element(index=3)directly. Phase B wires the prompt plumbing.
ras-llm-anthropic — ImageUrl native source.type=url (carried from v2.4.1 / #27)
AnthropicImageSourcerefactored from struct to enum;ContentPart::ImageUrlnow emits Anthropic's native{\"type\":\"image\",\"source\":{\"type\":\"url\",\"url\":\"...\"}}shape.
Known gap (#31)
ChromiumoxideAdapter does not yet expose a browser_arc() accessor for its Arc<Mutex<Browser>> field. Today the only ways to construct ChromiumoxideDomExtractor are:
- Open a second
Browser::connect_with_configto the same CDP URL (doubles WebSocket connections, separate target space). - Custom adapter path.
The accessor will land in v2.6.0 alongside Phase B's ToolContext wiring. Tracked at #31.
Architecture decisions
- Impl lives in
ras-dom, notras-cdp, becauseras-dom → ras-cdpis the existing dependency direction. Reversing it would have caused a cycle.ras-domnow depends onchromiumoxideandtokiodirectly. ChromiumoxideDomExtractortakes a sharedArc<Mutex<Browser>>instead of owning its own connection — caller decides how the handle is shared.- No fixture-JSON parser unit tests in this release. The
chromiumoxide_cdptypes are codegen'd from a.pdlfile; constructing validCaptureSnapshotReturnsby hand is mechanical busywork that doesn't catch real bugs (which live at the CDP wire level). Real verification needs a live Chrome.
Deferred to follow-ups
- #31 —
ChromiumoxideAdapter::browser_arc()accessor (target: v2.6.0). - Full
EnhancedDomTreeNodetree —tree: NoneinBrowserStateSummary. Phase B prompt injection only needsclickables. stable_hash— empty string inClickableElement.stable_hash. Wiringras_dom::application::stable_hashrequires building the tree first.- Real AX tree via
Accessibility.getFullAXTree— currentax_namefrom attributes is a sound MVP but misses computed accessibility names. - Paint-order occlusion —
paint_ordersrequested in the CDP call but not yet used.ras_dom::application::paint_orderexists; can plug in. - Phase B (v2.6.0) — wire
Arc<dyn DomExtractor>intoToolContext, post-action snapshot in click/navigate/scroll, numbered index map in agent prompt.
Verification
cargo test --workspace --no-fail-fast— all 97 test groups passcargo clippy --workspace --all-targets -- -D clippy::unwrap_used -D clippy::dbg_macro— cleancargo fmt --all -- --check— cleancargo doc --workspace --no-deps— clean
LOC per file (200 cap):
extractor.rs51snapshot.rs150snapshot_parser.rs178highlight.rs115
Compatibility
- New types are purely additive (
ChromiumoxideDomExtractor, new module path). ras-domdirect dependencies grew: now depends onchromiumoxideandtokiodirectly (transitively viaras-cdpbefore, but explicit now).- No breaking changes to public APIs in any existing crate.
- Workspace MSRV unchanged.
Artifacts
- Linux x86_64:
ras-x86_64-unknown-linux-gnu,ras-daemon-x86_64-unknown-linux-gnu - macOS arm64:
ras-aarch64-apple-darwin,ras-daemon-aarch64-apple-darwin - crates.io: all
ras-*workspace crates published at2.5.0oncepublish.ymlfinishes (v2.4.1 anthropic fix carried)
Pull requests
- #29 —
feat(dom): ChromiumoxideDomExtractor via DOMSnapshot.captureSnapshot (v2.5.0) - #30 —
release: v2.5.0 (CDP DomExtractor)
Sub-phase commits
feat(dom): scaffold ChromiumoxideDomExtractor (Phase C1)— 2.4.2feat(dom): implement snapshot() via DOMSnapshot.captureSnapshot (Phase C2)— 2.4.3feat(dom): implement highlight() with draw-bbox canvas overlay (Phase C3)— 2.4.4chore: bump to 2.5.0
Full changelog: v2.4.1...v2.5.0