feat(computer-use): TuriX-CUA inspired Interactive-View workflow + accuracy hardening#492
Merged
bobleer merged 1 commit intoGCWing:mainfrom Apr 23, 2026
Merged
Conversation
…curacy hardening Inspired by the TuriX-CUA open-source project, this overhauls BitFun's desktop Computer Use stack so the agent can reliably "see → plan → act → verify" on macOS GUIs. Highlights - Interactive-View pipeline (S1-S4): new `AxNode`-derived `InteractiveElement` / `InteractiveView` types, AX-tree filtering (`interactive_filter.rs`), Set-of-Mark JPEG overlay (`som_overlay.rs`), and `desktop_host.rs` wiring on top of the macOS AX dump. - ControlHub desktop actions (S5): four new `interactive_*` / `app_*` actions with a single `i` index and `before_view_digest` optimistic-locking. - Click reliability: digest is now geometry/role-stable (ignores focus/value jitter); `interactive_click` auto-rebuilds the view on `STALE_INTERACTIVE_VIEW` and falls back from AX-press to image-pixel pointer click; `click_element` accepts `text_contains` / `node_idx` directly. - Card-merging heuristic in `interactive_filter` collapses redundant child widgets inside actionable containers (cells, rows, buttons, links, groups), cutting overlay clutter on real apps. - Prompt update (`claw_mode.md`): mandatory OBSERVE → PLAN → EXPECT → VERIFY loop and Interactive-View-first guidance. - Supporting macOS plumbing: `macos_ax_dump`, `macos_ax_write`, `macos_bg_input`, `macos_list_apps` (background-input event injection, AX press, app enumeration). - Adds `recursion_limit = "256"` for the new generic-heavy modules. Tested with `cargo check -p bitfun-desktop -p bitfun-core` and focused unit tests in `interactive_filter` and `desktop_host`. Made-with: Cursor
bobleer
added a commit
to bobleer/BitFun
that referenced
this pull request
Apr 24, 2026
…curacy hardening (GCWing#492) Inspired by the TuriX-CUA open-source project, this overhauls BitFun's desktop Computer Use stack so the agent can reliably "see → plan → act → verify" on macOS GUIs. Highlights - Interactive-View pipeline (S1-S4): new `AxNode`-derived `InteractiveElement` / `InteractiveView` types, AX-tree filtering (`interactive_filter.rs`), Set-of-Mark JPEG overlay (`som_overlay.rs`), and `desktop_host.rs` wiring on top of the macOS AX dump. - ControlHub desktop actions (S5): four new `interactive_*` / `app_*` actions with a single `i` index and `before_view_digest` optimistic-locking. - Click reliability: digest is now geometry/role-stable (ignores focus/value jitter); `interactive_click` auto-rebuilds the view on `STALE_INTERACTIVE_VIEW` and falls back from AX-press to image-pixel pointer click; `click_element` accepts `text_contains` / `node_idx` directly. - Card-merging heuristic in `interactive_filter` collapses redundant child widgets inside actionable containers (cells, rows, buttons, links, groups), cutting overlay clutter on real apps. - Prompt update (`claw_mode.md`): mandatory OBSERVE → PLAN → EXPECT → VERIFY loop and Interactive-View-first guidance. - Supporting macOS plumbing: `macos_ax_dump`, `macos_ax_write`, `macos_bg_input`, `macos_list_apps` (background-input event injection, AX press, app enumeration). - Adds `recursion_limit = "256"` for the new generic-heavy modules. Tested with `cargo check -p bitfun-desktop -p bitfun-core` and focused unit tests in `interactive_filter` and `desktop_host`.
bobleer
added a commit
that referenced
this pull request
Apr 24, 2026
* feat: improve ControlHub browser session handling (#476) * feat: improve controlhub browser sessions Tighten ControlHub browser session routing and desktop browser guards, improve relay reconnect handling, and persist FlowChat session title updates alongside model config polish. Generated with BitFun Co-Authored-By: BitFun * fix: resolve SessionModule lint error Convert requireSessionWorkspacePath to a function declaration so eslint no-use-before-define passes in SessionModule. Generated with BitFun Co-Authored-By: BitFun * fix: handle Windows cert DER bytes correctly Replace the invalid to_der().ok() call with direct DER byte conversion so bitfun-core compiles on Windows CI. Generated with BitFun Co-Authored-By: BitFun * fix(web-ui): add horizontal padding to CLI auth empty state in AI model settings (#478) Co-authored-by: bowen628 <bowen628@noreply.gitcode.com> * Add browser restart confirm flow (#479) * chore: remove selfcontrol integration (#480) * feat(computer-use): TuriX-CUA inspired Interactive-View workflow + accuracy hardening (#492) Inspired by the TuriX-CUA open-source project, this overhauls BitFun's desktop Computer Use stack so the agent can reliably "see → plan → act → verify" on macOS GUIs. Highlights - Interactive-View pipeline (S1-S4): new `AxNode`-derived `InteractiveElement` / `InteractiveView` types, AX-tree filtering (`interactive_filter.rs`), Set-of-Mark JPEG overlay (`som_overlay.rs`), and `desktop_host.rs` wiring on top of the macOS AX dump. - ControlHub desktop actions (S5): four new `interactive_*` / `app_*` actions with a single `i` index and `before_view_digest` optimistic-locking. - Click reliability: digest is now geometry/role-stable (ignores focus/value jitter); `interactive_click` auto-rebuilds the view on `STALE_INTERACTIVE_VIEW` and falls back from AX-press to image-pixel pointer click; `click_element` accepts `text_contains` / `node_idx` directly. - Card-merging heuristic in `interactive_filter` collapses redundant child widgets inside actionable containers (cells, rows, buttons, links, groups), cutting overlay clutter on real apps. - Prompt update (`claw_mode.md`): mandatory OBSERVE → PLAN → EXPECT → VERIFY loop and Interactive-View-first guidance. - Supporting macOS plumbing: `macos_ax_dump`, `macos_ax_write`, `macos_bg_input`, `macos_list_apps` (background-input event injection, AX press, app enumeration). - Adds `recursion_limit = "256"` for the new generic-heavy modules. Tested with `cargo check -p bitfun-desktop -p bitfun-core` and focused unit tests in `interactive_filter` and `desktop_host`. * feat: align Codex client_version with local CLI; honor proxy in AI config tests (#499) - Resolve codex CLI version via codex --version for User-Agent and backend model discovery - Derive client_version query param from User-Agent in OpenAI common adapter - Use create_transient_ai_client_for_config for test/list-models so proxy and stream options apply * Fix stale Remote SSH restore entries (#501) * fix: adapt agentic_os main sync --------- Co-authored-by: bowen628 <bowen628@noreply.gitcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Inspired by the TuriX-CUA open-source project, this PR overhauls BitFun's desktop Computer Use stack on macOS so the agent can reliably see → plan → act → verify on real GUIs (Tauri WebViews, Cocoa apps, mini-apps embedded in BitFun itself).
The previous AX-only / coordinate-clicking flow had two systemic failure modes:
What's in this PR
Interactive-View pipeline (S1-S4)
bitfun-core:AxNode→InteractiveElement→InteractiveView(with stableiindex, role/label, image-pixel + screen geometry, optimistic-lockingdigest).interactive_filter.rs— turns the raw macOS AX dump into a clean, indexed, deduplicated list of actionable elements.som_overlay.rs— Set-of-Mark JPEG overlay that renders numbered bounding boxes on top of a focused-window screenshot (drop-in for the agent's vision).desktop_host.rs— wires the above intobuild_interactive_view, on top of the newmacos_ax_dump.ControlHub desktop actions (S5)
Four new actions exposed via ControlHub, all addressed by a single
iindex plus abefore_view_digest:interactive_click(with click count, modifiers, mouse button)interactive_type_text(focus an element byi, then type)interactive_scrollinteractive_key_chordClick reliability hardening
compute_interactive_view_digestnow hashes only role + bucketed geometry; ignores transientfocus/value/labeljitter, so harmless re-renders no longer invalidate the view.interactive_clickcatchesSTALE_INTERACTIVE_VIEW, rebuilds the view once, retries with the new digest, and tags the result withauto_rebuilt_view_after_staleso the agent knows to re-verify.AXPressnot supported, etc.), we fall back toapp_click { target: ImageXy }at the element's image-pixel center; result is taggedfallback_image_xy.click_elementvalidation:text_containsandnode_idxare now valid lone locators (previously you had to also passtitle_contains/role_substring/identifier_contains).UI consolidation
interactive_filter: when an actionable container (AXCell,AXRow,AXButton,AXLink,AXGroup, …) fully contains smaller actionable children and is ≥1.5× their area, the children are absorbed. This dramatically cuts overlay clutter on real apps (table rows, list items, card grids).Prompt update (
claw_mode.md)interactive_*turn (single biggest accuracy lever in our internal tests).auto_rebuilt_view_after_stale/fallback_image_xyrecovery notes and how to react to them.Supporting macOS plumbing
macos_ax_dump.rs— non-throwing AX tree snapshot with cachedAXUIElementReflookup by node idx.macos_ax_write.rs— safeAXPresswrapper (@try/@catch) returning structuredAxWriteOutcome.macos_bg_input.rs— background CGEvent-based mouse / scroll / keyboard / typing into a target pid (no cursor hijack).macos_list_apps.rs— running app enumeration.recursion_limit = "256"for the new generic-heavy modules.Test plan
cargo check -p bitfun-desktop -p bitfun-core— clean.interactive_filter(incl. newcard_container_absorbs_contained_actionable_children),desktop_hostdigest helpers.interactive_click, typing into search fields viainteractive_type_text, scrolling lists viainteractive_scroll— all succeed without falling back to coordinate guessing.Notes
"not available").computer_use_*tools — the newinteractive_*actions live alongside them.