Context
The cursorless UI automation PR eliminates cursor/focus conflicts for accessibility-enabled elements. This issue tracks the next layer of agent/user coexistence — workspace-level isolation so the agent and user can genuinely work in parallel.
Backlog items
1. Linux AT-SPI support
Add pyatspi (D-Bus accessibility API) as an optional dependency. Implement cursorless click/type for GNOME/GTK apps on Linux. Wayland needs ydotool for input injection.
2. Cooperative input locking
Detect user mouse/keyboard activity via CGEventTap (macOS) or a low-level WH_MOUSE_LL/WH_KEYBOARD_LL hook (Windows). Queue agent actions while the user is actively typing or clicking. Resume on idle. Prevents conflicts when the cursorless path falls back to coordinate events.
3. Agent Space (macOS Spaces)
Subscribe to NSWorkspaceActiveSpaceDidChangeNotification. When the agent opens an app, force it to Space 2 using CGSMoveWindowsToManagedSpace (private API) or AppleScript. User stays on Space 1 and sees zero agent windows.
4. Picture-in-picture agent monitor
A floating PySide6 or Electron window on the user's screen showing a live view of the agent's active window. User can observe without switching Space.
5. Windows virtual display confinement (WindowsPC-MCP integration)
Port the Parsec Virtual Display Driver layer from https://github.com/ShikeChen01/WindowsPC-MCP into Operator-Use as an optional Windows-only plugin. Creates a dedicated virtual monitor for the agent — spatial isolation on top of the API-level isolation already shipped.
Reference
Context
The cursorless UI automation PR eliminates cursor/focus conflicts for accessibility-enabled elements. This issue tracks the next layer of agent/user coexistence — workspace-level isolation so the agent and user can genuinely work in parallel.
Backlog items
1. Linux AT-SPI support
Add
pyatspi(D-Bus accessibility API) as an optional dependency. Implement cursorless click/type for GNOME/GTK apps on Linux. Wayland needsydotoolfor input injection.2. Cooperative input locking
Detect user mouse/keyboard activity via
CGEventTap(macOS) or a low-level WH_MOUSE_LL/WH_KEYBOARD_LL hook (Windows). Queue agent actions while the user is actively typing or clicking. Resume on idle. Prevents conflicts when the cursorless path falls back to coordinate events.3. Agent Space (macOS Spaces)
Subscribe to
NSWorkspaceActiveSpaceDidChangeNotification. When the agent opens an app, force it to Space 2 usingCGSMoveWindowsToManagedSpace(private API) or AppleScript. User stays on Space 1 and sees zero agent windows.4. Picture-in-picture agent monitor
A floating PySide6 or Electron window on the user's screen showing a live view of the agent's active window. User can observe without switching Space.
5. Windows virtual display confinement (WindowsPC-MCP integration)
Port the Parsec Virtual Display Driver layer from https://github.com/ShikeChen01/WindowsPC-MCP into Operator-Use as an optional Windows-only plugin. Creates a dedicated virtual monitor for the agent — spatial isolation on top of the API-level isolation already shipped.
Reference
docs/plans/2026-03-31-cursorless-ui-automation-design.md