Hands is a Rust MCP (Model Context Protocol) server that gives AI agents full desktop control through three automation tiers — not just pixel-guessing from screenshots.
See the examples/ directory for sample configurations and walkthroughs.
Part of CPC (Copy Paste Compute) — a multi-agent AI orchestration platform. Related repos: manager · local · workflow · cpc-paths · cpc-breadcrumbs
- Clippy + dead_code + unused cleanup — 3 crate-level allows removed, 60+ targeted allows added with justification, 22 supplemental mechanical fixes in
src/meta/*
v1.3.1
- HTTP dashboard endpoint migrated to tiny_http (smaller binary, simpler stack)
- Duration tracking for tool calls in dashboard status
- Credential redaction in dashboard output
- Field name alignment across dashboard JSON responses
- Metadata cleanup and documentation fixes
Previous releases
v1.3.0 (2026-04-16) — Path deps to git tags, Cargo.lock committed, README MIT to Apache-2.0, version sync. First version that builds as standalone public clone.
v1.2.2 — Phase C Fix3: meta-tool dispatch, async Send bound, notify parity.
v1.2.1 — Phase C fixes, meta-tool dispatch improvements.
v1.1.1 — Initial public release with 71 MCP tools across 3 automation tiers.
- Download
hands-v1.3.2-x64.exefrom the latest release. - Rename to
hands.exeand place in%LOCALAPPDATA%\CPC\servers\. - Add to your
claude_desktop_config.json:{ "mcpServers": { "hands": { "command": "%LOCALAPPDATA%\\CPC\\servers\\hands.exe" } } } - Restart Claude Desktop.
- Download
hands-v1.3.2-aarch64.exefrom the latest release. - Rename to
hands.exeand place in%LOCALAPPDATA%\CPC\servers\. - Add to your
claude_desktop_config.json:{ "mcpServers": { "hands": { "command": "%LOCALAPPDATA%\\CPC\\servers\\hands.exe" } } } - Restart Claude Desktop.
- Windows 10/11 (x64 or ARM64)
- Claude Desktop or any MCP-compatible client
For full per-machine setup (paths, skills, credentials), see docs/per_machine_setup.md.
Anthropic's Claude Computer Use takes screenshots and clicks pixel coordinates. It works, but it's slow (screenshot after every action), imprecise (guessing where to click), and blind (no DOM, no accessibility tree, no structured data).
Hands takes a different approach: use the right automation layer for each task.
| Layer | What it does | When to use it |
|---|---|---|
| Browser (Playwright) | Full DOM access, JS eval, network interception, form fill, multi-tab | Web apps, scraping, testing |
| UIA (Win UI Automation) | Accessibility tree, named elements, window management, app launch | Native Windows apps |
| Vision (OCR + template match) | Screenshot, OCR, image diff, visual analysis | Anything else, verification |
| Capability | Claude Computer Use | Hands |
|---|---|---|
| Element identification | Screenshot → guess coordinates | CSS selectors, XPath, UIA names, accessibility tree |
| Speed | ~2s per action (screenshot cycle) | Milliseconds (direct API calls) |
| Browser JS execution | No | Full eval, inject, extract |
| Network interception | No | Route, mock, log requests |
| Form handling | Type into coordinates | fill_form, submit_form, get_forms |
| Multi-tab/context | No | Full tab and context management |
| Native app control | Screenshot + click | Full UIA: find, click, type, read values, window snap/resize |
| OCR | Relies on vision model | Dedicated OCR engine |
| Template matching | No | vision_find_template |
| Image diff | No | vision_diff |
| Batch operations | One action per turn | browser_batch, uia_batch |
| Tracing | No | trace_start/stop/save, metrics |
| Platform | Anthropic-hosted sandbox or managed OS image | Windows (UIA + browser + vision) on your own machine |
| Setup | Zero (built into Claude) | MCP server binary |
Navigate, click, type, screenshot, extract content, fill forms, eval JS, manage tabs/contexts, intercept network, scroll-and-collect, accessibility snapshots, smart browse with auto-retry, batch operations, API discovery from traffic.
Find elements by name/type/automation ID, click, type, read values, get state, window management (snap, move, resize), app launch, keyboard shortcuts, batch operations, event watching.
Screenshot (full/window/region), OCR, template matching, image diff, visual analysis, screenshot+OCR combo.
Smart orchestration layer: reads page, clicks, navigates, captures, finds, types, fills forms, verifies, scans QR, launches apps, runs scripts, recovers login flows — picks the right tier automatically.
Cross-tier tools: find-and-click (OCR→UIA), read screen text, wait for visual, window screenshot, type into window, drag, element drag, retry click, file upload, status, health check.
# Build
cargo build --release -p hands
# Run as MCP server (stdio transport)
./hands.exe
# Add to Claude Desktop config
{
"mcpServers": {
"hands": {
"command": "C:/path/to/hands.exe",
"args": []
}
}
}hands runs standalone — one binary, one client, and you have browser + UIA + vision automation. Pair with other CPC servers when an automation task needs orchestration, file I/O, or credential-backed HTTP replay.
- Pair with manager to delegate long-running browser chores to a coding agent and monitor via breadcrumbs.
- Pair with workflow to graduate discovered API calls from browser automation to direct-HTTP replay (
browser_learn_apifeedsapi_store). - Pair with local for filesystem and shell steps before or after automation runs.
Host clients: Claude Desktop, Claude Code, OpenAI Codex CLI, Gemini CLI, or any MCP-compatible host.
hands exposes 116 tools spanning browser, UIA, and vision. Enable tools always loaded in your Claude client's tool settings before the first call — a lazy-loaded client sometimes misses layers on initial discovery and you'll get "tool not found" surprises mid-session.
hands.exe (MCP server, stdin/stdout JSON-RPC)
├── browser.rs — Playwright CDP automation
├── uia.rs — Windows UI Automation COM
├── vision.rs — Screenshot + OCR + template match
└── tools.rs — Tool definitions + dispatch
Single binary, no runtime dependencies. Playwright browser binaries are auto-managed.
- Browser automation powered by Playwright (Apache-2.0). Chromium/Firefox/WebKit binaries are downloaded on first use via Playwright's own install flow.
- Windows automation layer uses native UIA COM interfaces — no third-party dependency.
- OCR is done via an embedded Rust OCR crate (not Tesseract binaries) — no external install needed.
Is it a web page?
→ Yes → Browser layer (fast, structured, reliable)
→ No → Is it a Windows app?
→ Yes → UIA layer (named elements, accessibility tree)
→ No → Vision layer (screenshot + OCR fallback)
git clone https://github.com/AIWander/hands.git
cd hands
cargo build --releaseBinary appears at target/release/hands.exe. Requires Rust stable toolchain — nightly is not required.
- Windows 10/11 (x64 or ARM64) — required for UIA (Windows UI Automation) and CDP browser automation
- Rust stable toolchain (build from source only)
- Playwright browser binaries are auto-managed on first use
Hands is Windows-only. The UIA automation layer depends on Windows COM interfaces, and the vision layer uses Windows-specific screen capture APIs.
Automation across three different layers (browser, UIA, vision) means each layer has its own characteristic failures:
- Browser profile locked — a previous Chromium process still holds the profile.
browser_launchreturnsprofile_locked; close the stuck Chrome or use a fresh context viabrowser_context_create. - UIA element not found — selector name drift after an app update. Call
uia_findwith a broader query, or snapshot the accessibility tree withbrowser_a11y_snapshot/ UIA equivalents to see current names. - OCR misreads on tiny or low-contrast text — vision layer returns its best guess. Use
vision_zoombeforevision_ocr, or fall back tobrowser_extract_contentif the target is a web page with real text. - Playwright binary download blocked — first run triggers a Chromium/Firefox/WebKit download. If your network blocks it, pre-seed the Playwright cache manually (see Playwright docs).
- Popup or OS dialog steals focus mid-sequence — UIA actions target the wrong window. Use
uia_focus_windowbefore sensitive sequences, or batch viauia_batchwhich rechecks focus between steps.
Issues welcome; PRs considered but this is primarily maintained as part of the CPC stack.
Apache License 2.0 — see LICENSE.
Copyright 2026 Joseph Wander.
Joseph Wander
- GitHub: github.com/AIWander
- Email: josephwander@gmail.com