A cross-platform computer automation and control library supporting multiple interfaces:
- MCP Server (stdio + HTTP/SSE modes)
 - HTTP REST API (with OpenAPI spec)
 - CLI (command execution and server management)
 - Programmatic Module (stateless Python functions)
 
Provides tools for mouse/keyboard automation, screenshot capture, window management, virtual desktops, and comprehensive state tracking including accessibility tree support.
- Mouse Control: Click, double-click, triple-click, button down/up, drag operations
 - Keyboard Control: Type text, key down/up/press
 - Screenshot Capture: Fast cross-platform screenshot using 
mss, returns images as base64 or PNG - Window Management: List, switch, move, resize, minimize, maximize, snap windows
 - Virtual Desktops: List, switch, and move windows between virtual desktops
 - State Tracking: Configurable tracking of mouse position/buttons, keyboard keys, focused app, and accessibility tree
 - Accessibility Tree: Full platform-specific implementation for Windows, macOS, and Linux/Ubuntu
 
# Install core dependencies
pip install -e .
# Optional: Install API/HTTP dependencies
pip install -e ".[api]"    # For HTTP REST API server
pip install -e ".[http]"   # For MCP HTTP/SSE mode
pip install -e ".[dev]"    # All optional dependencies
# Platform-specific optional dependencies (for enhanced features)
pip install -e ".[windows]"   # Windows: pywin32 for accessibility tree
pip install -e ".[macos]"      # macOS: pyobjc for native accessibility (AppleScript fallback available)
pip install -e ".[linux]"      # Linux: PyGObject for AT-SPI (requires: sudo apt install python3-gi gir1.2-atspi-2.0)The default mode for MCP clients like Cursor or Claude Desktop.
Configuration (e.g., ~/.cursor/mcp.json):
{
  "mcpServers": {
    "computer-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "C:\\Users\\Jacob\\Code\\computer-mcp",
        "run",
        "computer-mcp"
      ]
    }
  }
}Or using uvx:
{
  "mcpServers": {
    "computer-mcp": {
      "command": "uvx",
      "args": ["computer-mcp"]
    }
  }
}Note: uvx automatically installs and runs the package if not already installed. Make sure you have uv installed.
For remote access via HTTP/SSE:
python -m computer_mcp serve mcp --http --host 127.0.0.1 --port 8000This starts the MCP server with:
- SSE endpoint: 
http://127.0.0.1:8000/sse - Tool call endpoint: 
http://127.0.0.1:8000/mcp 
Start the FastAPI server with automatic OpenAPI documentation:
python -m computer_mcp serve api --host 127.0.0.1 --port 8000Or using the CLI:
computer-mcp serve api --port 8000Then access:
- API Docs: http://127.0.0.1:8000/docs
 - ReDoc: http://127.0.0.1:8000/redoc
 - OpenAPI JSON: http://127.0.0.1:8000/openapi.json
 
Example API calls:
# Click mouse
curl -X POST http://localhost:8000/mouse/click -H "Content-Type: application/json" -d '{"button": "left"}'
# Type text
curl -X POST http://localhost:8000/keyboard/type -H "Content-Type: application/json" -d '{"text": "Hello World"}'
# Get screenshot as PNG
curl http://localhost:8000/screenshot/image -o screenshot.png
# List windows
curl http://localhost:8000/windows
# Switch to window
curl -X POST http://localhost:8000/windows/switch -H "Content-Type: application/json" -d '{"hwnd": 123456}'Execute commands directly from the command line:
# Mouse commands
computer-mcp mouse click --button right
computer-mcp mouse double-click
computer-mcp mouse move --x 500 --y 300
# Keyboard commands
computer-mcp keyboard type "Hello World"
computer-mcp keyboard key-press ctrl
# Window commands
computer-mcp window list
computer-mcp window switch --hwnd 123456
computer-mcp window snap-left --hwnd 123456
computer-mcp window close --hwnd 123456
# Screenshot
computer-mcp screenshot --save screenshot.png
# Start servers
computer-mcp serve api --port 8000
computer-mcp serve mcp --http --port 8001
# JSON output
computer-mcp mouse click --jsonImport and use stateless functions directly in your code:
from computer_mcp import (
    click, double_click, move_mouse, drag,
    type_text, key_press, key_down, key_up,
    get_screenshot,
    list_windows, switch_to_window, close_window,
    snap_window_left, snap_window_right,
)
# Mouse operations
click("left")
double_click("right")
move_mouse(500, 300)
drag({"x": 100, "y": 200}, {"x": 300, "y": 400})
# Keyboard operations
type_text("Hello World")
key_press("ctrl")
key_down("shift")
key_up("shift")
# Screenshot
screenshot_data = get_screenshot()
print(f"Screenshot: {screenshot_data['width']}x{screenshot_data['height']}")
# Window management
windows = list_windows()
for window in windows.get("windows", []):
    print(f"{window['title']} (hwnd: {window['hwnd']})")
# Switch to a window by title
switch_to_window(title="Notepad")
# Snap window to left half
snap_window_left(hwnd=123456)click(button='left'|'middle'|'right')- Click at current cursor positiondouble_click(button='left'|'middle'|'right')- Double-click at current cursor positiontriple_click(button='left'|'middle'|'right')- Triple-click at current cursor positionbutton_down(button='left'|'middle'|'right')- Press and hold a mouse buttonbutton_up(button='left'|'middle'|'right')- Release a mouse buttondrag(start={x, y}, end={x, y}, button='left')- Drag from start to end positionmouse_move(x, y)- Move cursor to specified coordinates
REST API: POST /mouse/click, POST /mouse/drag, POST /mouse/move, etc.
type(text)- Type text stringkey_down(key)- Press and hold a keykey_up(key)- Release a keykey_press(key)- Press and release a key (convenience)
REST API: POST /keyboard/type, POST /keyboard/key-press, etc.
screenshot()/get_screenshot()- Capture screenshot (included by default in MCP responses)
REST API:
GET /screenshot- Returns JSON with base64 dataGET /screenshot/image- Returns PNG image
list_windows()- List all visible windowsswitch_to_window(hwnd=<int>|title=<str>)- Switch focus to a windowmove_window(hwnd, x, y, width?, height?)- Move and/or resize windowresize_window(hwnd, width, height)- Resize windowminimize_window(hwnd)- Minimize windowmaximize_window(hwnd)- Maximize windowrestore_window(hwnd)- Restore windowset_window_topmost(hwnd, topmost=true)- Set window always-on-topget_window_info(hwnd)- Get detailed window informationclose_window(hwnd)- Close windowsnap_window_left(hwnd)- Snap to left halfsnap_window_right(hwnd)- Snap to right halfsnap_window_top(hwnd)- Snap to top halfsnap_window_bottom(hwnd)- Snap to bottom halfscreenshot_window(hwnd)- Capture screenshot of specific window
REST API:
GET /windows- List windowsPOST /windows/switch- Switch by handlePOST /windows/switch-by-title- Switch by titleGET /windows/{hwnd}- Get window infoDELETE /windows/{hwnd}- Close windowPOST /windows/{hwnd}/snap-left- Snap left, etc.
list_virtual_desktops()- List all virtual desktopsswitch_virtual_desktop(desktop_id=<int>|name=<str>)- Switch to virtual desktopmove_window_to_virtual_desktop(hwnd, desktop_id)- Move window to desktop
REST API:
GET /virtual-desktops- List desktopsPOST /virtual-desktops/switch- Switch desktopPOST /windows/{hwnd}/move-to-desktop- Move window
set_config(...)- Configure observation options:observe_screen(bool, default:true): Include screenshots in all responsesobserve_mouse_position(bool, default:false): Track and include mouse positionobserve_mouse_button_states(bool, default:false): Track and include mouse button statesobserve_keyboard_key_states(bool, default:false): Track and include keyboard key statesobserve_focused_app(bool, default:false): Include focused application informationobserve_accessibility_tree(bool, default:false): Include accessibility tree
REST API: POST /config - Update configuration
Special keys can be specified as strings:
"ctrl","alt","shift","cmd"(or"win"on Windows)"space","enter","tab","esc","backspace"- Arrow keys: 
"up","down","left","right" - Function keys: 
"f1"through"f12" - Regular characters: 
"a","b", etc. 
- Full Support: All mouse/keyboard operations work
 - Window Management: Full support via 
pywin32(included in[windows]extras) - Virtual Desktops: Full support via 
VirtualDesktopAccessor.dll - Focused App: Requires 
pywin32(install withpip install -e ".[windows]") - Accessibility Tree: Uses Windows UI Automation API (requires 
pywin32) 
- Full Support: All mouse/keyboard operations work
 - Window Management: Limited support via AppleScript (some operations not yet implemented)
 - Virtual Desktops: Limited support (Spaces enumeration/switching via Mission Control API)
 - Focused App: Uses AppleScript (no dependencies)
 - Accessibility Tree:
- Native: Uses AXUIElement via 
pyobjc(install withpip install -e ".[macos]") - Fallback: Uses AppleScript (works without dependencies, limited tree depth)
 
 - Native: Uses AXUIElement via 
 
- Full Support: All mouse/keyboard operations work
 - Window Management: Full support via 
xdotool(install:sudo apt install xdotool) - Virtual Desktops: Full support via 
wmctrlorxdotool(install:sudo apt install wmctrl) - Focused App: Uses 
xdotool(install:sudo apt install xdotool) - Accessibility Tree:
- Native: Uses AT-SPI via PyGObject (install: 
sudo apt install python3-gi gir1.2-atspi-2.0, thenpip install -e ".[linux]") - Fallback: Basic window info via 
xdotool 
 - Native: Uses AT-SPI via PyGObject (install: 
 
The codebase is organized into clear layers:
computer_mcp/
├── __init__.py          # Module API (stateless functions)
├── __main__.py          # CLI entry point
├── cli.py               # CLI implementation
├── mcp.py               # MCP server (stdio + HTTP/SSE)
├── api.py               # HTTP REST API server
├── actions/             # Business logic (pure functions)
│   ├── mouse.py
│   ├── keyboard.py
│   ├── window.py
│   ├── screenshot.py
│   ├── config.py
│   ├── focused_app.py
│   └── accessibility_tree.py
├── core/                # Core utilities
│   ├── state.py
│   ├── platform.py
│   ├── screenshot.py
│   ├── response.py
│   └── utils.py
└── resources/           # Platform-specific resources
Key Design Principles:
- Actions layer: Pure business logic functions, no interface dependencies
 - Interface adapters: MCP, API, CLI wrap the actions layer
 - Stateless module API: Clean functions for direct Python usage
 - State management: Optional, configurable per interface
 
By default (with observe_screen: true), all tool responses include a screenshot as MCP ImageContent:
Response Structure:
ImageContent(type: "image"): Contains the screenshot as base64-encoded PNG with mimeType "image/png"TextContent(type: "text"): Contains JSON with action results and screenshot metadata:
{
  "success": true,
  "action": "click",
  "button": "left",
  "screenshot": {
    "format": "base64_png",
    "width": 1920,
    "height": 1080
  }
}With full observation enabled, the TextContent includes additional state:
{
  "success": true,
  "action": "click",
  "button": "left",
  "screenshot": {
    "format": "base64_png",
    "width": 1920,
    "height": 1080
  },
  "mouse_position": {"x": 500, "y": 300},
  "mouse_button_states": ["Button.left"],
  "keyboard_key_states": ["ctrl"],
  "focused_app": {
    "name": "Code",
    "pid": 12345,
    "title": "main.py - computer-mcp"
  },
  "accessibility_tree": {
    "tree": {
      "name": "Application",
      "control_type": "...",
      "bounds": {"x": 0, "y": 0, "width": 1920, "height": 1080},
      "children": [...]
    }
  }
}Returns JSON directly:
{
  "success": true,
  "action": "click",
  "button": "left"
}Screenshots are returned as base64-encoded strings in JSON, or use the /screenshot/image endpoint for raw PNG.
Default: Human-readable success/error messages
With --json: JSON output matching API format
Returns plain Python dictionaries:
result = click("left")
# result = {"success": True, "action": "click", "button": "left"}- Screenshots are included by default in MCP tool responses (when 
observe_screen: true) - Mouse tools operate at the current cursor position unless you explicitly move the mouse first
 - State tracking listeners are automatically started/stopped based on configuration
 - Accessibility tree implementations may vary in depth and detail across platforms
 - Some platform-specific features require optional dependencies or system packages
 - Window management features vary by platform (Windows has full support, macOS/Linux have partial support)
 
MIT