A macOS MCP clone of Codex @Computer Use, built to make the same tool interfaces and desktop-control capabilities available to other coding agents.
This project reproduces the Codex Computer Use tool surface with a Node MCP server and a native Swift helper so any MCP client that supports local stdio servers can use a similar interface on macOS.
- macOS
- Node.js 20+ recommended
- Xcode Command Line Tools with
swift - a host app with:
AccessibilitypermissionScreen Recordingpermission
For most source installs, the host app is whichever app launches the MCP server, for example:
- Terminal
- iTerm
- Warp
- Codex
- Cursor
From npm:
npm install -g mac-useFrom the repo root for development:
npm install
npm run build
npm run check
npm testGrant permissions to the app that will launch the MCP server.
Examples:
- if you run it from Terminal, enable
Terminal - if you run it from Codex, enable
Codex - if you run it from Cursor, enable
Cursor
macOS settings to enable:
System Settings > Privacy & Security > AccessibilitySystem Settings > Privacy & Security > Screen RecordingorScreen & System Audio Recording
After changing permissions, fully restart the host app.
Recommended source-development path:
npm run startInstalled package path:
mac-useAlternative CLI backend:
npm run start:cliThe native-helper backend is the default path. The CLI backend exists mainly for comparison and fallback.
Smoke test:
npm run smokeOptional app override:
npm run smoke -- TextEditUse the packaged launcher:
{
"mcpServers": {
"computer-use": {
"command": "mac-use"
}
}
}If you are running from a local checkout instead of a global install:
{
"mcpServers": {
"computer-use": {
"command": "node",
"args": ["/absolute/path/to/mac-computer-use/bin/mac-use.js"]
}
}
}The server exposes the same high-level tool set we observed from Codex Computer Use:
list_appsget_app_stateclickdragtype_textpress_keyset_valuescrollperform_secondary_action
Under the hood:
- the MCP server is a Node
stdioprocess - the native behavior lives in
helper/ComputerUseNativeHelper.swift - the Swift helper is compiled into
~/Library/Caches/mac-useat runtime, or intoMAC_USE_CACHE_DIRwhen that environment variable is set - the helper handles accessibility inspection/actions, pointer/keyboard events, screenshots, and the visible second cursor overlay
Returns a user-facing inventory of running apps plus recent non-running apps when metadata is available.
Structured result includes:
namebundleIdpidrunningfrontmostvisiblelastUseduses
Returns:
- app identity
- window title
- accessibility tree text
- structured elements
- screenshot artifact when available
Structured elements include fields such as:
indexidroletitledescriptionvaluefocusedsettableactionsbounds
Currently implemented with coordinate clicks on the native backend.
Native pointer drag between coordinates.
Literal text input using native event synthesis.
Native key press support for:
- printable keys
- common special keys
- modifier combinations such as
cmd+c,shift+tab
Direct AX value mutation for settable UI elements.
This is one of the strongest background-safe paths.
Native scroll at the target app/window center.
Executes AX actions such as:
PressRaiseShowMenu
Accepts either:
- traversal index like
9 - semantic element ID like
AllClear
What already works:
- app listing
- app/window state snapshots with accessibility tree text
- screenshot artifacts in MCP responses
- Stage Manager thumbnail materialization for screenshots without moving the hardware cursor
- semantic element IDs like
main,AllClear,Delete - background-first AX actions where macOS allows it
- pointer actions with focus restore
- a visible animated second cursor overlay
- cursor observability during state reads, app activation, text input, and pointer actions
Stage Manager note:
- when a target app is represented only as a Stage Manager side thumbnail, the native helper attempts to add it to the current stage through
WindowManagerAccessibility actions before capturing - this avoids using the user's real cursor and then restores the previous frontmost app
- the mechanism relies on nonstandard macOS AX actions such as
AXAddToStage, so behavior may vary across macOS versions
Current limitations:
- unsigned packaging, so this is not a mainstream one-click install yet
- exact bundled text/localization parity is incomplete
- background semantics are strongest for AX-backed actions; pointer/keyboard actions are still best-effort restore, not guaranteed true background control
See:
Useful commands:
npm run build
npm run check
npm test
npm run smoke
npm run start:nativeMain files:
- restart the MCP host app after pulling new changes
- re-run a real pointer action like
clickorscroll - verify you are using the default native-helper backend:
npm run startMake sure the app launching the MCP server is enabled in:
System Settings > Privacy & Security > Accessibility- fully quit and reopen that host app
Common examples:
- Terminal
- Codex
- Cursor
- Warp
Make sure the host app is enabled in:
System Settings > Privacy & Security > Screen Recording
Then restart the host app.
Try again with an app that is currently open and visible:
npm run smoke -- com.apple.calculatoror:
npm run smoke -- TextEditThat is expected.
- AX-backed actions like
set_valueand someperform_secondary_actioncases can stay background-first - pointer and keyboard actions are still best-effort restore, not guaranteed true background control across all apps
Recently completed:
- made the native helper the default backend and kept the CLI backend as an explicit fallback
- improved normal interaction latency with shorter fixed waits and native helper reuse
- moved screenshots to ScreenCaptureKit-only window capture
- improved app listing speed and ordering with Spotlight metadata
- added Stage Manager thumbnail materialization without moving the hardware cursor
Likely next steps:
- expand background control capabilities closer to Codex
@Computer Use - harden Stage Manager behavior across macOS versions
- improve second cursor visualization and motion quality
- polish source-install packaging
- signed helper app packaging
- notarization
- cleaner permission onboarding
- tighter parity for text formatting and localization
