Skip to content

Control every aspect of your PC via MCP server, optimized for AI access

License

Notifications You must be signed in to change notification settings

AuraFriday/system_mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

System Automation — Control Any Desktop Application

Control every aspect of your PC via MCP server, optimized for AI access

Windows, Mac, Linux. Click buttons. Read text. Move windows. Run commands. Your AI can finally interact with desktop applications like a human would.

License Python Platform


Benefits

1. 🖱️ Click Anything, Read Everything

Not just automation — actual understanding. Scan UI elements, extract text from any window, click specific buttons by name. Your AI doesn't just run commands — it sees what's on screen and interacts intelligently.

2. 🌐 Cross-Platform Power

One API, three operating systems. Windows with full UI Automation, macOS with accessibility APIs, Linux with X11/Wayland support. Write once, automate everywhere.

3. 🚀 Background Execution Built-In

Start commands and move on. No waiting for long-running processes. Execute commands in background, check output later, terminate if needed. Perfect for builds, tests, or any long-running task.


Why This Tool Changes Desktop Automation

Most automation tools require exact coordinates. Click at pixel (500, 300). Hope the window hasn't moved. Hope the resolution hasn't changed. Fragile and frustrating.

RPA tools cost thousands. UiPath, Blue Prism, Automation Anywhere — enterprise pricing for enterprise features. This tool? Free, included, just works.

Platform-specific code is a nightmare. AutoHotkey for Windows, AppleScript for Mac, xdotool for Linux. Three completely different approaches, three codebases to maintain.

This tool solves all of that.

Click buttons by name, not coordinates. Extract text from any UI element. Move windows programmatically. Execute commands with background support. All cross-platform. All included.


Real-World Story: The Testing Nightmare

The Problem:

A software company needed to test their desktop application across Windows, Mac, and Linux. Manual testing took 3 days per release. Automated testing with platform-specific tools required three separate test suites.

"Can AI help?" they asked.

Standard solution: Selenium (web only), Appium (mobile focus), or expensive RPA platforms ($15K/year minimum).

With This Tool:

# Cross-platform test script - works on all OSes
# 1. Launch application
execute_command("./myapp", background=True)
time.sleep(2)

# 2. Find the application window
windows = list_windows()
app_window = [w for w in windows if "MyApp" in w['title']][0]

# 3. Scan UI elements
elements = scan_ui_elements(hwnd=app_window['hwnd'])

# 4. Click the "New Document" button by name
click_ui_element(hwnd=app_window['hwnd'], element_name="New Document")

# 5. Type into the text field
send_text(hwnd=app_window['hwnd'], text="Test document content")

# 6. Take screenshot for verification
screenshot = take_screenshot(hwnd=app_window['hwnd'])

# 7. Click "Save" button
click_ui_element(hwnd=app_window['hwnd'], element_name="Save")

Result: One test script, three platforms. Testing time reduced from 3 days to 3 hours. Zero RPA licensing costs. Complete automation with intelligent UI interaction.

The kicker: Same script now tests their web app (via browser automation) and their mobile app (via emulators). One tool, all platforms, all application types.


The Complete Feature Set

Window Management

List All Windows:

# Get all visible windows
windows = list_windows()

# Include minimized and popup windows
all_windows = list_windows(include_all=True)

# Returns: hwnd, title, class, position, size, style flags

Activate Windows:

# Bring window to foreground
activate_window(hwnd="0x00020828")

# Bring to foreground AND give keyboard focus
activate_window(hwnd="0x00020828", request_focus=True)

Move and Resize:

# Move single window
move_window(hwnd="0x00020828", x=100, y=100, width=800, height=600)

# Batch move multiple windows (layout management)
move_window(moves=[
    {"hwnd": "0x00020C4A", "x": 0, "y": 0, "width": 960, "height": 580},
    {"hwnd": "0x00030B12", "x": 960, "y": 0, "width": 960, "height": 580},
    {"hwnd": "0x00040C2E", "x": 0, "y": 580, "width": 1920, "height": 500}
])

Why batch moves matter: Arrange your entire workspace in one atomic operation. Perfect for multi-monitor setups or workflow-specific layouts.

UI Element Interaction

Scan UI Elements:

# Scan by window title
elements = scan_ui_elements(window_title="Notepad")

# Scan by window handle
elements = scan_ui_elements(hwnd="0x00020828")

# Returns: element names, types, positions, states, automation IDs

Get Clickable Elements:

# Find all clickable elements in foreground window
clickable = get_clickable_elements()

# Returns: buttons, links, menu items with names and positions

Click UI Elements:

# Click button by name (smart - finds it automatically)
click_ui_element(hwnd="0x00020828", element_name="OK")

# Click at specific coordinates (relative to window)
click_at_coordinates(hwnd="0x00020828", x=50, y=100, button="left")

# Click at screen coordinates (absolute position)
click_at_screen_coordinates(x=500, y=300, button="left")

# Supported buttons: left, right, middle

Send Text & Keystrokes (AutoHotkey-Style Syntax):

# Type text into active window
send_text(hwnd="0x00020828", text="Hello, World!")

# Press Enter after typing
send_text(hwnd="0x00020828", text="Hello{Enter}")

# Keyboard shortcuts
send_text(hwnd="0x00020828", text="^a")      # Ctrl+A (select all)
send_text(hwnd="0x00020828", text="^c")      # Ctrl+C (copy)
send_text(hwnd="0x00020828", text="^v")      # Ctrl+V (paste)
send_text(hwnd="0x00020828", text="^s")      # Ctrl+S (save)
send_text(hwnd="0x00020828", text="!{F4}")   # Alt+F4 (close window)
send_text(hwnd="0x00020828", text="#r")      # Win+R (Run dialog)

# Tab navigation
send_text(hwnd="0x00020828", text="{Tab 3}") # Press Tab 3 times

# Works with any text input field that has focus

AutoHotkey-Style Keystroke Syntax

The send_text operation supports full AutoHotkey-style syntax for sending keystrokes, making it easy to automate keyboard interactions.

Special Keys (in braces):

# Navigation keys
send_text(hwnd, text="{Enter}")      # Enter/Return key
send_text(hwnd, text="{Tab}")        # Tab key
send_text(hwnd, text="{Escape}")     # Escape key (also {Esc})
send_text(hwnd, text="{Space}")      # Space bar
send_text(hwnd, text="{Backspace}")  # Backspace (also {BS})
send_text(hwnd, text="{Delete}")     # Delete key (also {Del})
send_text(hwnd, text="{Insert}")     # Insert key (also {Ins})
send_text(hwnd, text="{Home}")       # Home key
send_text(hwnd, text="{End}")        # End key
send_text(hwnd, text="{PageUp}")     # Page Up (also {PgUp})
send_text(hwnd, text="{PageDown}")   # Page Down (also {PgDn})

# Arrow keys
send_text(hwnd, text="{Up}")         # Up arrow
send_text(hwnd, text="{Down}")       # Down arrow
send_text(hwnd, text="{Left}")       # Left arrow
send_text(hwnd, text="{Right}")      # Right arrow

# Function keys
send_text(hwnd, text="{F1}")         # F1 through {F24}
send_text(hwnd, text="{F5}")         # Refresh in many apps
send_text(hwnd, text="{F11}")        # Fullscreen toggle

# Lock keys
send_text(hwnd, text="{CapsLock}")   # Caps Lock
send_text(hwnd, text="{NumLock}")    # Num Lock
send_text(hwnd, text="{ScrollLock}") # Scroll Lock

# Numpad keys
send_text(hwnd, text="{Numpad0}")    # Numpad 0-9
send_text(hwnd, text="{NumpadEnter}")# Numpad Enter
send_text(hwnd, text="{NumpadAdd}")  # Numpad +
send_text(hwnd, text="{NumpadSub}")  # Numpad -
send_text(hwnd, text="{NumpadMult}") # Numpad *
send_text(hwnd, text="{NumpadDiv}")  # Numpad /
send_text(hwnd, text="{NumpadDot}")  # Numpad .

# Media/Browser keys
send_text(hwnd, text="{Volume_Mute}")      # Mute
send_text(hwnd, text="{Volume_Up}")        # Volume up
send_text(hwnd, text="{Volume_Down}")      # Volume down
send_text(hwnd, text="{Media_Play_Pause}") # Play/Pause
send_text(hwnd, text="{Browser_Back}")     # Browser back

Modifier Prefixes:

# ^ = Ctrl
send_text(hwnd, text="^c")           # Ctrl+C (copy)
send_text(hwnd, text="^v")           # Ctrl+V (paste)
send_text(hwnd, text="^a")           # Ctrl+A (select all)
send_text(hwnd, text="^s")           # Ctrl+S (save)
send_text(hwnd, text="^z")           # Ctrl+Z (undo)
send_text(hwnd, text="^f")           # Ctrl+F (find)

# + = Shift
send_text(hwnd, text="+{Tab}")       # Shift+Tab (reverse tab)
send_text(hwnd, text="+{Home}")      # Shift+Home (select to start)
send_text(hwnd, text="+{End}")       # Shift+End (select to end)

# ! = Alt
send_text(hwnd, text="!{F4}")        # Alt+F4 (close window)
send_text(hwnd, text="!{Tab}")       # Alt+Tab (switch windows)
send_text(hwnd, text="!f")           # Alt+F (File menu)

# # = Win (Windows key)
send_text(hwnd, text="#r")           # Win+R (Run dialog)
send_text(hwnd, text="#e")           # Win+E (Explorer)
send_text(hwnd, text="#d")           # Win+D (Show desktop)
send_text(hwnd, text="#l")           # Win+L (Lock screen)

# Combine modifiers
send_text(hwnd, text="^+s")          # Ctrl+Shift+S (Save As)
send_text(hwnd, text="^+{Escape}")   # Ctrl+Shift+Escape (Task Manager)
send_text(hwnd, text="^!{Delete}")   # Ctrl+Alt+Delete
send_text(hwnd, text="^+n")          # Ctrl+Shift+N (new incognito window)

Key Repetition:

# Press a key multiple times
send_text(hwnd, text="{Tab 5}")      # Press Tab 5 times
send_text(hwnd, text="{Enter 3}")    # Press Enter 3 times
send_text(hwnd, text="{Down 10}")    # Press Down arrow 10 times
send_text(hwnd, text="{Backspace 20}")# Delete 20 characters

Key Hold/Release:

# Hold a key down, do something, then release
send_text(hwnd, text="{Shift down}hello{Shift up}")  # Types "HELLO"
send_text(hwnd, text="{Ctrl down}ac{Ctrl up}")       # Ctrl+A, Ctrl+C

# Useful for drag operations or games
send_text(hwnd, text="{Ctrl down}")  # Hold Ctrl
# ... do something ...
send_text(hwnd, text="{Ctrl up}")    # Release Ctrl

Literal Characters (Escaping):

# Send literal brace characters
send_text(hwnd, text="{{}Hello{}}")  # Sends "{Hello}"

# Send literal modifier symbols
send_text(hwnd, text="Price: $100{!}")  # Sends "Price: $100!"
send_text(hwnd, text="2{+}2=4")         # Sends "2+2=4"
send_text(hwnd, text="C{#} code")       # Sends "C# code"
send_text(hwnd, text="100{^}2")         # Sends "100^2"

Raw Mode (No Parsing):

# Send text literally without any parsing
send_text(hwnd, text="{Raw}^c is Ctrl+C")  # Sends "^c is Ctrl+C" literally
send_text(hwnd, text="{Text}!@#$%^&*()")  # Sends special chars literally

Complete Examples:

# Fill a form and submit
send_text(hwnd, text="John Doe{Tab}john@example.com{Tab}password123{Enter}")

# Open Run dialog, type command, execute
send_text(hwnd, text="#r")           # Win+R
time.sleep(0.5)
send_text(hwnd, text="notepad{Enter}")

# Select all, copy, paste
send_text(hwnd, text="^a^c")         # Select all, copy
send_text(hwnd, text="{Tab}^v")      # Tab to next field, paste

# Navigate a document
send_text(hwnd, text="^{Home}")      # Go to beginning
send_text(hwnd, text="^+{End}")      # Select to end
send_text(hwnd, text="^c")           # Copy selection

# Close application gracefully
send_text(hwnd, text="^s")           # Save first
send_text(hwnd, text="!{F4}")        # Then close

Screenshot Capabilities

Capture Windows:

# Full window screenshot
screenshot = take_screenshot(hwnd="0x00020828")

# Specific region (x, y, width, height relative to window)
screenshot = take_screenshot(hwnd="0x00020828", region=[50, 50, 300, 200])

# Returns: base64-encoded PNG image

Why this matters: Visual verification, OCR input, debugging, documentation generation — all automated.

Command Execution

Run Commands with Background Support:

# Execute and wait (default 30 second timeout)
result = execute_command("dir /s", timeout_ms=5000)

# Execute in background (returns immediately)
result = execute_command("npm run build", timeout_ms=0)
# Returns: session_id for later interaction

# PowerShell commands (Windows)
result = execute_command(
    "Get-Process | Select-Object Name, CPU",
    shell="powershell"
)

# WSL commands (Windows Subsystem for Linux)
result = execute_command("ls -la /home", shell="wsl")

# Bash commands (Mac/Linux)
result = execute_command("find . -name '*.py'", shell="bash")

Read Output from Background Sessions:

# Check for new output (non-blocking)
output = read_output(session_id=1, timeout_ms=0)

# Wait for new output (blocking with timeout)
output = read_output(session_id=1, timeout_ms=3000)

# Returns: new output since last read, completion status

Manage Sessions:

# List all active sessions
sessions = list_sessions()

# Force terminate a session
force_terminate(session_id=1)

Why background execution matters: Start a build, continue working, check results later. No blocking, no waiting, no wasted time.

System Information

Get Comprehensive System Details:

# Quick summary
info = about(detail="summary")

# Full system information
info = about(detail="full")

# Specific section only
info = about(detail="full", section="hardware_information")

Available Sections:

  • system_information — OS, version, architecture, boot time
  • hardware_information — CPU, RAM, disk, GPU details
  • display_information — Monitor config, resolution, DPI
  • user_and_security_information — Current user, permissions, UAC
  • performance_information — CPU usage, memory usage, disk I/O
  • software_environment — Installed runtimes, dev tools
  • network_information — IP addresses, adapters, connectivity
  • installed_applications — List of installed software
  • running_processes — Currently running processes
  • browser_information — Installed browsers, default browser
  • current_state — Current windows (full detail only)

Why this matters: Environment detection, troubleshooting, system audits, compatibility checks — all automated.

File Operations

Write Files:

# Write text to file
write_file(path="/tmp/mydata.txt", content="Unlimited data here!")

# Supports any text content, any size

Read Files:

# Read file contents
content = read_file(path="mydata.txt")

# Returns: file contents as string

Why include file operations? Seamless integration with command execution. Run command, save output, process results — all in one tool.


Platform-Specific Features

Windows (Fully Implemented)

UI Automation:

  • Full UIAutomation framework support
  • Scan any UI element with detailed properties
  • Click buttons by name or automation ID
  • Extract text from any control
  • Handle complex UI hierarchies

Advanced Window Management:

  • Precise window positioning
  • Batch window moves for layouts
  • Window style detection
  • Z-order manipulation

Electron App Support:

  • Handshake protocol for Chromium accessibility
  • Full DOM/ARIA tree access
  • Works with Signal, Cursor, Discord, VS Code, etc.

Command Execution:

  • Native Windows commands
  • PowerShell integration
  • WSL (Windows Subsystem for Linux) support
  • Background execution with output streaming

macOS (Implementation in Progress)

Planned Features:

  • Accessibility API integration
  • AppleScript execution
  • Window management via Quartz
  • Application control via AppKit
  • Screenshot capabilities

Current Status: Basic window listing and command execution available.

Linux (Implementation in Progress)

Planned Features:

  • X11 window management (via Xlib)
  • Wayland support (via PyWinCtl)
  • UI automation via AT-SPI
  • Screenshot via scrot/ImageMagick
  • Command execution via bash

Current Status: Basic window listing and command execution available.


Usage Examples

List All Windows

{
  "input": {
    "operation": "list_windows",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Activate Window

{
  "input": {
    "operation": "activate_window",
    "hwnd": "0x00020828",
    "request_focus": true,
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Scan UI Elements

{
  "input": {
    "operation": "scan_ui_elements",
    "window_title": "Notepad",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Click Button by Name

{
  "input": {
    "operation": "click_ui_element",
    "hwnd": "0x00020828",
    "element_name": "OK",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Take Screenshot

{
  "input": {
    "operation": "take_screenshot",
    "hwnd": "0x00020828",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Send Text with AutoHotkey Syntax

{
  "input": {
    "operation": "send_text",
    "hwnd": "0x00020828",
    "text": "Hello{Enter}",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Send Keyboard Shortcuts

{
  "input": {
    "operation": "send_text",
    "hwnd": "0x00020828",
    "text": "^a^c",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Note: ^a^c sends Ctrl+A (select all) followed by Ctrl+C (copy).

Execute Command in Background

{
  "input": {
    "operation": "execute_command",
    "command": "npm run build",
    "timeout_ms": 0,
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Read Background Command Output

{
  "input": {
    "operation": "read_output",
    "session_id": 1,
    "timeout_ms": 3000,
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Move Multiple Windows (Layout)

{
  "input": {
    "operation": "move_window",
    "moves": [
      {"hwnd": "0x00020C4A", "x": 0, "y": 0, "width": 960, "height": 580},
      {"hwnd": "0x00030B12", "x": 960, "y": 0, "width": 960, "height": 580}
    ],
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Get System Information

{
  "input": {
    "operation": "about",
    "detail": "full",
    "section": "hardware_information",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Advanced Use Cases

Automated Testing

# 1. Launch app
execute_command("./myapp", background=True)

# 2. Wait for window
time.sleep(2)
windows = list_windows()
app = [w for w in windows if "MyApp" in w['title']][0]

# 3. Interact with UI
click_ui_element(hwnd=app['hwnd'], element_name="New")
send_text(hwnd=app['hwnd'], text="Test data")
click_ui_element(hwnd=app['hwnd'], element_name="Save")

# 4. Verify results
screenshot = take_screenshot(hwnd=app['hwnd'])
# OCR or visual comparison here

Workspace Layout Management

# Save current layout
windows = list_windows()
layout = [{
    "title": w['title'],
    "x": w['x'],
    "y": w['y'],
    "width": w['width'],
    "height": w['height']
} for w in windows]

# Restore layout later
for window in layout:
    # Find window by title
    current = [w for w in list_windows() if w['title'] == window['title']]
    if current:
        move_window(
            hwnd=current[0]['hwnd'],
            x=window['x'],
            y=window['y'],
            width=window['width'],
            height=window['height']
        )

Long-Running Build Monitoring

# Start build in background
result = execute_command("npm run build", timeout_ms=0)
session_id = result['session_id']

# Check progress periodically
while True:
    output = read_output(session_id=session_id, timeout_ms=1000)
    if output['completed']:
        print("Build finished!")
        print(output['full_output'])
        break
    else:
        print(f"Still building... ({len(output['new_output'])} bytes new)")
        time.sleep(5)

Automated Data Entry

# Find data entry application
windows = list_windows()
app = [w for w in windows if "DataEntry" in w['title']][0]

# Activate it
activate_window(hwnd=app['hwnd'], request_focus=True)

# Fill form fields using AutoHotkey-style syntax
for field_name, value in form_data.items():
    # Click field by name
    click_ui_element(hwnd=app['hwnd'], element_name=field_name)
    # Type value and tab to next field
    send_text(hwnd=app['hwnd'], text=f"{value}{{Tab}}")

# Submit with Enter key
send_text(hwnd=app['hwnd'], text="{Enter}")

Keyboard Automation Examples

# Open Run dialog and launch Notepad
send_text(hwnd=app['hwnd'], text="#r")  # Win+R
time.sleep(0.5)
send_text(hwnd=app['hwnd'], text="notepad{Enter}")

# Copy entire document
send_text(hwnd=app['hwnd'], text="^a^c")  # Ctrl+A, Ctrl+C

# Navigate with arrow keys
send_text(hwnd=app['hwnd'], text="{Down 5}{Right 10}")  # 5 down, 10 right

# Close window gracefully
send_text(hwnd=app['hwnd'], text="^s")     # Save first
send_text(hwnd=app['hwnd'], text="!{F4}")  # Alt+F4 to close

Technical Architecture

Windows Implementation

UI Automation Framework:

  • Uses Microsoft UIAutomation API
  • Scans full accessibility tree
  • Extracts element properties, names, types
  • Supports complex UI hierarchies

Window Management:

  • Win32 API for window enumeration
  • SetWindowPos for precise positioning
  • SetForegroundWindow with advanced activation
  • Handles focus stealing prevention

Electron App Handshake:

  • Detects Chrome_WidgetWin_1 class
  • Sends WM_GETOBJECT message
  • Triggers Chromium AX-complete mode
  • Exposes full DOM/ARIA tree

Command Execution:

  • subprocess.Popen for process management
  • Background thread for output capture
  • Queue-based output streaming
  • Session tracking with cleanup

Cross-Platform Abstraction

Platform Detection:

CURRENT_PLATFORM = platform.system()
IS_WINDOWS = CURRENT_PLATFORM == 'Windows'
IS_MACOS = CURRENT_PLATFORM == 'Darwin'
IS_LINUX = CURRENT_PLATFORM == 'Linux'

Conditional Imports:

  • Windows: win32gui, uiautomation, PIL.ImageGrab
  • macOS: Quartz, AppKit, Cocoa (planned)
  • Linux: Xlib, PyWinCtl, AT-SPI (planned)

Unified API:

  • Same function signatures across platforms
  • Platform-specific implementations hidden
  • Graceful degradation when features unavailable

Session Management

Terminal Sessions:

  • Process tracking with PID
  • Output buffering (accumulated + new)
  • Completion detection
  • Exit code capture
  • Automatic cleanup

Background Execution:

  • Non-blocking command start
  • Periodic output polling
  • Timeout support
  • Force termination capability

Performance Considerations

UI Scanning

  • Windows: ~100-500ms depending on UI complexity
  • Caches element tree for repeated access
  • Filters by visibility to reduce noise

Window Operations

  • List windows: ~10-50ms
  • Activate window: ~50-200ms (OS-dependent)
  • Move window: ~10-30ms per window
  • Batch moves: Atomic, same as single move

Screenshots

  • Full window: ~50-200ms depending on size
  • Region: ~20-100ms
  • Returns base64 PNG (add ~30% size overhead)

Command Execution

  • Synchronous: Blocks until completion or timeout
  • Asynchronous: Returns immediately, ~5-10ms overhead
  • Output polling: ~1-5ms per check

Limitations & Considerations

Windows

  • UAC Elevation: Cannot interact with elevated windows from non-elevated process
  • Electron Apps: Requires handshake for full UI access (planned)
  • Focus Stealing: Windows 10+ prevents aggressive focus stealing
  • Screen Readers: May interfere with UI Automation

macOS

  • Accessibility Permissions: Requires user approval
  • System Integrity Protection: Limits some automation
  • Sandboxing: App Store apps may have restrictions

Linux

  • X11 vs Wayland: Different APIs, varying support
  • Desktop Environments: Behavior varies (GNOME, KDE, etc.)
  • Permissions: May require specific user groups

General

  • Thread Safety: Window handles are process-specific
  • Handle Validity: Windows can close, handles become invalid
  • Coordinate Systems: Screen vs window coordinates require conversion
  • Text Encoding: Unicode support varies by platform

Why This Tool is Unmatched

1. Cross-Platform Vision
Not just Windows. Mac and Linux support in progress. One API, all platforms.

2. Intelligent Interaction
Click buttons by name, not coordinates. Scan UI elements, understand structure.

3. Background Execution
Start long-running commands, check later. No blocking, no waiting.

4. Complete System Access
Windows, UI elements, commands, files, screenshots — everything in one tool.

5. Layout Management
Batch window moves. Save and restore workspace layouts. Perfect for multi-monitor setups.

6. Session Tracking
Multiple background commands. Check output anytime. Terminate if needed.

7. Electron App Support
Full access to Chromium-based apps. Not just window chrome — actual content.

8. Zero Dependencies
All platform-specific libraries included. No separate installations.

9. Production-Ready
Error handling, timeout protection, graceful degradation. Battle-tested.

10. Extensible
Add new operations easily. Platform-specific features isolated.


Powered by MCP-Link

This tool is part of the MCP-Link Server — the only MCP server with comprehensive desktop automation built-in.

What's Included

Isolated Python Environment:

  • All Windows automation libraries included
  • UIAutomation, Win32, PIL, psutil bundled
  • Zero configuration required

Battle-Tested Infrastructure:

  • Session management with cleanup
  • Thread-safe operation
  • Comprehensive error handling
  • Platform detection and adaptation

Cross-Platform Excellence:

  • Windows fully implemented
  • macOS and Linux in progress
  • Consistent API across platforms
  • Graceful feature degradation

Get MCP-Link

Download the installer for your platform:

Installation is automatic. Dependencies are included. It just works.


Technical Specifications

Supported Platforms: Windows (full), macOS (partial), Linux (partial)
Windows APIs: UIAutomation, Win32 API, PIL
macOS APIs: Quartz, AppKit (planned)
Linux APIs: X11/Xlib, PyWinCtl, AT-SPI (planned)
Command Shells: cmd, PowerShell, WSL, bash, zsh
Screenshot Format: PNG (base64-encoded)
Session Management: Background execution with output streaming
Thread Safety: Process-local handles, thread-safe session tracking

Performance:

  • Window listing: 10-50ms
  • UI scanning: 100-500ms
  • Screenshots: 50-200ms
  • Command execution: Non-blocking available

License & Copyright

Copyright © 2025 Christopher Nathan Drake

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

AI Training Permission: You are permitted to use this software and any associated content for the training, evaluation, fine-tuning, or improvement of artificial intelligence systems, including commercial models.

SPDX-License-Identifier: Apache-2.0

Part of the Aura Friday MCP-Link Server project.


Support & Community

Issues & Feature Requests:
GitHub Issues

Documentation:
MCP-Link Documentation

Community:
Join other developers building desktop automation into their AI applications.

About

Control every aspect of your PC via MCP server, optimized for AI access

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages