Skip to content

codingmachineedge/lowlevel-computer-use-mcp

Repository files navigation

lowlevel-computer-use-mcp

A low-level computer-use MCP server for Windows (most tools also work on macOS/Linux). It exposes raw desktop control to any MCP client — Claude Code, Codex, Claude Desktop, etc. — as a set of well-described tools:

  • 🖱️ Mouse — move, click, double/right/middle click, drag, scroll, cursor position
  • ⌨️ Keyboard — type text, press hotkey combinations (Ctrl+C, Alt+Tab, …)
  • 🖥️ Shell commands — run arbitrary system commands and capture output
  • 🪟 Windows — list, move, resize, focus, minimize, maximize, restore, close, show/hide
  • ⚙️ Processes — list running processes; kill by PID or name
  • 📸 Screenshots — all monitors, one monitor, a region, or one window via PrintWindow
  • ✂️ Cropping — crop any saved image to a sub-region
  • 🎥 Screen recording — record a monitor or region to mp4 in the background
  • 🎯 Background / unfocused targeting — drive a specific window via Win32 messages without focusing it
  • 🫥 Headless GUI — run real GUI apps on an off-screen desktop; show them only when a human login is needed
  • 🛡️ Run-as-admin — per-command UAC elevation or whole-server elevation
  • 🚀 Auto-start on boot — register a logon scheduled task (optionally as admin)
  • 🟢 AutoHotkey add-in — run AHK scripts; ControlSend/ControlClick for rock-solid background input
  • 🐧 Cross-platform — the same tools work natively on Linux (X11 via xdotool/wmctrl), with Xvfb virtual displays for headless-with-GUI
  • 🌀 Ephemeral WSL — on a Windows host, spin up a throwaway Linux distro on demand, run commands, tear it down
  • 🧩 GUI installer — one window that installs everything automatically
  • 💸 Cheap Version — a no-MCP command-line fallback that runs any tool directly from CLI args, for when MCP connections keep failing

⚠️ This server performs real, unsandboxed actions on the host machine — clicking, typing, killing processes and running shell/elevated commands with your user's privileges. Only register it in environments where that is acceptable.


Tools

Tool Description
get_screen_size Primary screen resolution
get_cursor_position Current mouse position
mouse_move Move cursor to (x, y)
mouse_click Click; hwnd/window_title → background click (client coords)
mouse_drag Press-drag-release between two points
mouse_scroll Scroll the wheel up/down
type_text Type text; hwnd/window_title → background WM_CHAR
press_keys Press a key / hotkey combo, e.g. ["ctrl","c"]
run_command Run a shell command, capture output/exit code
list_windows List top-level windows (title, handle, geometry, state)
get_active_window Info about the focused window
move_window / resize_window Move / resize a window
window_action focus / minimize / maximize / restore / close
show_window / hide_window Bring a window forward (e.g. for login), then hide it again
list_child_windows Enumerate a window's child controls (class, text, rect, handle)
win_set_control_text Set a control's text via WM_SETTEXT (reliable background text)
win_send_keys Post key presses to a window without focusing it
list_processes / kill_process List processes; kill by PID or name
screenshot Monitor/region/single-window (PrintWindow) capture to PNG
crop_image Crop an existing image to a box
start_screen_recording / stop_screen_recording / recording_status mp4 recording
create_headless_desktop Create an off-screen desktop
launch_on_headless_desktop Launch a GUI app onto it
list_headless_windows List windows on the off-screen desktop
show_headless_desktop / hide_headless_desktop Temporarily make it interactive (login), then hide
close_headless_desktop Release the off-screen desktop handle
ahk_status Whether AutoHotkey is installed
run_ahk Run an inline AutoHotkey script
ahk_control_send AHK ControlSend to a background window/control
is_admin Whether the server is running elevated
run_command_as_admin Run a shell command elevated (UAC prompt)
install_startup / uninstall_startup / startup_status Boot auto-start
linux_status Linux: X11 automation tooling availability
create_virtual_display Linux: start an Xvfb headless display
launch_on_virtual_display Linux: launch a GUI app on the Xvfb display
list_virtual_display_windows Linux: windows on the Xvfb display
screenshot_virtual_display Linux: capture the whole Xvfb display
stop_virtual_display Linux: stop the Xvfb display
wsl_status / wsl_list_distros WSL availability + installed distros
wsl_create_temp Provision a throwaway WSL distro (Alpine by default)
wsl_run Run a command inside a WSL distro
wsl_list_temp / wsl_destroy / wsl_destroy_all_temp Manage throwaway distros

Every tool returns a JSON string {"ok": true, ...} on success or {"ok": false, "error": "..."} on failure.


Quick start (GUI installer — fully automatic)

The easiest path. It installs uv if missing, runs uv sync, and registers the server with both Claude Code and Codex automatically on launch.

Clone the repo:

git clone https://github.com/codingmachineedge/lowlevel-computer-use-mcp.git

Enter it:

cd lowlevel-computer-use-mcp

Launch the installer (it auto-runs the full install):

uv run lowlevel-computer-use-mcp-installer

Then restart Claude Code / Codex so they spawn the server.


Manual install

Install dependencies and start the stdio server:

uv run lowlevel-computer-use-mcp

Or with pip — install in editable mode:

pip install -e .

Then run it:

lowlevel-computer-use-mcp

Captures (screenshots, recordings) are written to ~/lowlevel-computer-use-captures by default. Override with the LOWLEVEL_CU_CAPTURE_DIR environment variable.


Registering with clients

Replace the path below with wherever you cloned this repo.

Claude Code

Register at user scope (one line):

claude mcp add lowlevel-computer-use --scope user -- uv run --directory "C:\path\to\lowlevel-computer-use-mcp" lowlevel-computer-use-mcp

Or add this to ~/.claude.json under mcpServers:

{
  "mcpServers": {
    "lowlevel-computer-use": {
      "command": "uv",
      "args": ["run", "--directory", "C:\\path\\to\\lowlevel-computer-use-mcp", "lowlevel-computer-use-mcp"]
    }
  }
}

Codex (OpenAI Codex CLI)

Add this block to ~/.codex/config.toml:

[mcp_servers.lowlevel-computer-use]
command = "uv"
args = ["run", "--directory", "C:\\path\\to\\lowlevel-computer-use-mcp", "lowlevel-computer-use-mcp"]
startup_timeout_sec = 60

YOLO mode (auto-approve, no permission prompts)

The GUI installer enables this by default. To do it manually for Claude Code, add a wildcard allow rule for this server's tools to ~/.claude/settings.json:

{ "permissions": { "allow": ["mcp__lowlevel-computer-use__*"] } }

For Codex, set the global approval policy in ~/.codex/config.toml:

approval_policy = "never"

⚠️ YOLO means every tool runs without asking — including destructive ones (kill_process, run_command, run_command_as_admin, wsl_destroy). Only enable it if you trust the agents driving this server.


Cheap Version (no-MCP fallback)

If MCP connections keep failing, you don't need MCP at all. The Cheap Version is a command-line fallback that runs the exact same tool functions in-process and prints the JSON result — no client, transport, or server connection involved.

List every available tool:

uv run lowlevel-computer-use-cheap --list

Take a screenshot:

uv run lowlevel-computer-use-cheap screenshot --monitor 1

Double-click at a point:

uv run lowlevel-computer-use-cheap mouse_click --x 960 --y 540 --clicks 2

Press a hotkey (values are parsed as JSON, so lists work):

uv run lowlevel-computer-use-cheap press_keys --keys '["ctrl","s"]'

Run a shell command:

uv run lowlevel-computer-use-cheap run_command --command "ipconfig /all"

Pass a whole argument object as JSON:

uv run lowlevel-computer-use-cheap screenshot --json '{"window_title":"Notepad"}'

The same thing is also available as a subcommand of the main entry point:

uv run lowlevel-computer-use-mcp cheap get_screen_size

Every parameter from the Tool Reference is accepted as --<param> <value> (or via --json). Output is the identical {"ok": true, ...} JSON the MCP tools return.


Agent skill — every feature documented

A companion skill (skills/lowlevel-computer-use/) teaches agents the entire toolset: a top-level SKILL.md plus reference/TOOLS.md (exhaustive per-tool parameters), reference/WORKFLOWS.md (end-to-end recipes) and reference/PLATFORMS.md (Windows/Linux specifics & gotchas). Install it for Claude Code by copying it into your skills directory:

cp -r skills/lowlevel-computer-use ~/.claude/skills/

The MCP server also ships condensed instructions inline, so any MCP client gets a full feature overview on connect.


Background / unfocused window targeting

mouse_click, type_text and screenshot accept hwnd or window_title. When set, input is delivered to that exact window via Win32 messages without focusing or foregrounding it, and screenshot uses PrintWindow so the window is captured even if it's behind others, minimized, or on an off-screen desktop.

Typical flow:

  1. Find the window:
list_windows { "title_filter": "Notepad" }
  1. Find the control to target:
list_child_windows { "window_title": "Notepad" }
  1. Set its text in the background (most reliable for edit controls):
win_set_control_text { "hwnd": 23924320, "text": "typed without focus" }
  1. Or click it in the background (x/y are client coords of the window):
mouse_click { "window_title": "Notepad", "x": 200, "y": 120 }
  1. See the result without bringing it forward:
screenshot { "window_title": "Notepad" }

Caveat: message-based input is ignored by some apps (raw input / DirectInput / physical-key-state checks). For those, use the AutoHotkey ahk_control_send tool, or bring the window forward briefly.


Headless-but-with-GUI mode

Run a real GUI app on an off-screen Win32 desktop so it never touches your visible desktop, then automate and screenshot it via the background tools.

Create the off-screen desktop:

create_headless_desktop { "name": "work" }

Launch an app onto it:

launch_on_headless_desktop { "name": "work", "command": "notepad.exe" }

List its windows (to get handles):

list_headless_windows { "name": "work" }

Capture a window on it (works even though it's off-screen):

screenshot { "hwnd": 2495156 }

Showing it for an interactive login, then hiding again

Some steps (sign-in) need a human. Temporarily switch the live screen to the off-screen desktop:

show_headless_desktop { "name": "work" }

…let the user log in, then switch back to the normal desktop:

hide_headless_desktop { "name": "work" }

For an ordinary hidden window on the normal desktop, use show_window / hide_window instead:

show_window { "window_title": "My App" }
hide_window { "window_title": "My App" }

AutoHotkey add-in

Optional but powerful. AHK's ControlSend/ControlClick drive background windows very reliably, and run_ahk is a full scripting escape hatch.

Install AutoHotkey (one line):

winget install -e --id AutoHotkey.AutoHotkey

Check the server can find it:

ahk_status {}

Send text to a background window by HWND:

ahk_control_send { "text": "hello", "window": "ahk_id 0x1A2B3C" }

Run an arbitrary AHK script (must call ExitApp):

run_ahk { "code": "ControlSendText \"hi\", , \"ahk_exe notepad.exe\"\nExitApp" }

Point the server at a specific AHK exe by setting LOWLEVEL_CU_AHK to its path.


Macros — save repeated sequences as Skills

When you run a multi-step UI sequence the user is likely to repeat, don't leave it as ad-hoc tool calls — capture it as a reusable macro Skill. The server tells agents to do this automatically; see macros/MACRO_SKILL_TEMPLATE.md for the template and rules (resolve handles at run time, prefer background tools, parameterize the variable parts, verify with a screenshot).


Linux (native X11)

The mouse, keyboard, screenshot, process, recording, window management and background/unfocused targeting tools all work natively on Linux. Window control, background input and per-window capture use X11 CLI tools; on Linux hwnd is an X11 window id.

Install the X11 helpers (Debian/Ubuntu):

sudo apt install xdotool wmctrl x11-utils imagemagick xvfb

Check what the server can see:

linux_status {}

Background-type into a window without focusing it (X11):

type_text { "window_title": "Editor", "text": "typed in the background" }

Caveat: X11 background typing uses XSendEvent; most apps accept it, but a few (notably xterm with its default allowSendEvents: false) ignore synthetic events. For those, focus the window first or use a real X session.

Headless-with-GUI on Linux (Xvfb)

Start a virtual display:

create_virtual_display { "display": 99, "width": 1280, "height": 800 }

Launch a GUI app onto it:

launch_on_virtual_display { "display": 99, "command": "xterm -e bash" }

List its windows:

list_virtual_display_windows { "display": 99 }

Drive a window on that display (note the display field routes input there):

type_text { "hwnd": 2097164, "display": 99, "text": "hello from headless" }

Capture the whole virtual display:

screenshot_virtual_display { "display": 99 }

Stop it when done:

stop_virtual_display { "display": 99 }

Ephemeral WSL (Linux on a Windows host)

On Windows, spin up a throwaway Linux distro on demand to run Linux software, then tear it down. By default a tiny Alpine minirootfs is downloaded and imported in seconds — your existing distros are untouched.

Check WSL is available:

wsl_status {}

Provision a throwaway distro (downloads latest Alpine minirootfs):

wsl_create_temp {}

Run a command in it (use the name returned above):

wsl_run { "distro": "llcu-tmp-1782754365-53b8", "command": "apk add --no-cache curl && curl --version" }

Tear it down (irreversible — deletes the distro):

wsl_destroy { "name": "llcu-tmp-1782754365-53b8" }

You can also clone an existing distro instead of downloading:

wsl_create_temp { "clone_from": "Ubuntu-24.04" }

Run-as-admin mode

Per-command elevation — call run_command_as_admin (UAC prompt unless already elevated):

run_command_as_admin { "command": "net session" }

Or run the whole server elevated (intended for HTTP mode):

uv run lowlevel-computer-use-mcp --http --admin

Check elevation:

is_admin {}

Auto-start on boot

Register a logon Scheduled Task (admin + HTTP by default):

uv run lowlevel-computer-use-mcp install-startup

Install without admin privileges:

uv run lowlevel-computer-use-mcp install-startup --no-admin

Use a custom port:

uv run lowlevel-computer-use-mcp install-startup --port 9000

Check the task status:

uv run lowlevel-computer-use-mcp startup-status

Remove it:

uv run lowlevel-computer-use-mcp uninstall-startup

Once the boot service runs in HTTP mode, point a client at it as a remote MCP server:

claude mcp add --transport http lowlevel-computer-use-boot http://127.0.0.1:8765/mcp

The task uses an interactive logon trigger (not SYSTEM) so the desktop-automation tools keep access to your session.


Requirements

  • Python 3.10+
  • uv (recommended; the GUI installer can bootstrap it)
  • Windows: mouse/keyboard via pyautogui; windows via pygetwindow; background input + capture + headless desktop via ctypes/Win32 (winio.py); screenshots via mss; recording via imageio + bundled ffmpeg. Optional: AutoHotkey, WSL.
  • Linux: mouse/keyboard via pyautogui (X11); window mgmt, background input and per-window capture via xdotool/wmctrl/x11-utils/ImageMagick (linuxio.py); headless-with-GUI via Xvfb; screenshots/recording via mss. Install the X11 helpers with your package manager (see the Linux section). X11 (or XWayland) session.

Safety notes

  • run_command, run_command_as_admin, kill_process, run_ahk and window_action(close) are marked destructive.
  • pyautogui's fail-safe is disabled so automation isn't interrupted by the cursor reaching a screen corner; be deliberate with coordinates.
  • The server has no authentication of its own — it trusts the MCP client that spawns it.

License

MIT

About

Low-level computer-use MCP server for Windows: mouse, keyboard, shell commands, window/process management, screenshots, cropping and screen recording.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages