System Automation — Control Any Desktop Application

Control every aspect of your PC via MCP server, optimized for AI access

Windows, Mac, Linux. Click buttons. Read text. Move windows. Run commands. Your AI can finally interact with desktop applications like a human would.

Benefits

1. 🖱️ Click Anything, Read Everything

Not just automation — actual understanding. Scan UI elements, extract text from any window, click specific buttons by name. Your AI doesn't just run commands — it sees what's on screen and interacts intelligently.

2. 🌐 Cross-Platform Power

One API, three operating systems. Windows with full UI Automation, macOS with accessibility APIs, Linux with X11/Wayland support. Write once, automate everywhere.

3. 🚀 Background Execution Built-In

Start commands and move on. No waiting for long-running processes. Execute commands in background, check output later, terminate if needed. Perfect for builds, tests, or any long-running task.

Why This Tool Changes Desktop Automation

Most automation tools require exact coordinates. Click at pixel (500, 300). Hope the window hasn't moved. Hope the resolution hasn't changed. Fragile and frustrating.

RPA tools cost thousands. UiPath, Blue Prism, Automation Anywhere — enterprise pricing for enterprise features. This tool? Free, included, just works.

Platform-specific code is a nightmare. AutoHotkey for Windows, AppleScript for Mac, xdotool for Linux. Three completely different approaches, three codebases to maintain.

This tool solves all of that.

Click buttons by name, not coordinates. Extract text from any UI element. Move windows programmatically. Execute commands with background support. All cross-platform. All included.

Real-World Story: The Testing Nightmare

The Problem:

A software company needed to test their desktop application across Windows, Mac, and Linux. Manual testing took 3 days per release. Automated testing with platform-specific tools required three separate test suites.

"Can AI help?" they asked.

Standard solution: Selenium (web only), Appium (mobile focus), or expensive RPA platforms ($15K/year minimum).

With This Tool:

# Cross-platform test script - works on all OSes
# 1. Launch application
execute_command("./myapp", background=True)
time.sleep(2)

# 2. Find the application window
windows = list_windows()
app_window = [w for w in windows if "MyApp" in w['title']][0]

# 3. Scan UI elements
elements = scan_ui_elements(hwnd=app_window['hwnd'])

# 4. Click the "New Document" button by name
click_ui_element(hwnd=app_window['hwnd'], element_name="New Document")

# 5. Type into the text field
send_text(hwnd=app_window['hwnd'], text="Test document content")

# 6. Take screenshot for verification
screenshot = take_screenshot(hwnd=app_window['hwnd'])

# 7. Click "Save" button
click_ui_element(hwnd=app_window['hwnd'], element_name="Save")

Result: One test script, three platforms. Testing time reduced from 3 days to 3 hours. Zero RPA licensing costs. Complete automation with intelligent UI interaction.

The kicker: Same script now tests their web app (via browser automation) and their mobile app (via emulators). One tool, all platforms, all application types.

The Complete Feature Set

Window Management

List All Windows:

# Get all visible windows
windows = list_windows()

# Include minimized and popup windows
all_windows = list_windows(include_all=True)

# Returns: hwnd, title, class, position, size, style flags

Activate Windows:

# Bring window to foreground
activate_window(hwnd="0x00020828")

# Bring to foreground AND give keyboard focus
activate_window(hwnd="0x00020828", request_focus=True)

Move and Resize:

# Move single window
move_window(hwnd="0x00020828", x=100, y=100, width=800, height=600)

# Batch move multiple windows (layout management)
move_window(moves=[
    {"hwnd": "0x00020C4A", "x": 0, "y": 0, "width": 960, "height": 580},
    {"hwnd": "0x00030B12", "x": 960, "y": 0, "width": 960, "height": 580},
    {"hwnd": "0x00040C2E", "x": 0, "y": 580, "width": 1920, "height": 500}
])

Why batch moves matter: Arrange your entire workspace in one atomic operation. Perfect for multi-monitor setups or workflow-specific layouts.

UI Element Interaction

Scan UI Elements:

# Scan by window title
elements = scan_ui_elements(window_title="Notepad")

# Scan by window handle
elements = scan_ui_elements(hwnd="0x00020828")

# Returns: element names, types, positions, states, automation IDs

Get Clickable Elements:

# Find all clickable elements in foreground window
clickable = get_clickable_elements()

# Returns: buttons, links, menu items with names and positions

Click UI Elements:

# Click button by name (smart - finds it automatically)
click_ui_element(hwnd="0x00020828", element_name="OK")

# Click at specific coordinates (relative to window)
click_at_coordinates(hwnd="0x00020828", x=50, y=100, button="left")

# Click at screen coordinates (absolute position)
click_at_screen_coordinates(x=500, y=300, button="left")

# Supported buttons: left, right, middle

Send Text & Keystrokes (AutoHotkey-Style Syntax):

# Type text into active window
send_text(hwnd="0x00020828", text="Hello, World!")

# Press Enter after typing
send_text(hwnd="0x00020828", text="Hello{Enter}")

# Keyboard shortcuts
send_text(hwnd="0x00020828", text="^a")      # Ctrl+A (select all)
send_text(hwnd="0x00020828", text="^c")      # Ctrl+C (copy)
send_text(hwnd="0x00020828", text="^v")      # Ctrl+V (paste)
send_text(hwnd="0x00020828", text="^s")      # Ctrl+S (save)
send_text(hwnd="0x00020828", text="!{F4}")   # Alt+F4 (close window)
send_text(hwnd="0x00020828", text="#r")      # Win+R (Run dialog)

# Tab navigation
send_text(hwnd="0x00020828", text="{Tab 3}") # Press Tab 3 times

# Works with any text input field that has focus

AutoHotkey-Style Keystroke Syntax

The send_text operation supports full AutoHotkey-style syntax for sending keystrokes, making it easy to automate keyboard interactions.

Special Keys (in braces):

# Navigation keys
send_text(hwnd, text="{Enter}")      # Enter/Return key
send_text(hwnd, text="{Tab}")        # Tab key
send_text(hwnd, text="{Escape}")     # Escape key (also {Esc})
send_text(hwnd, text="{Space}")      # Space bar
send_text(hwnd, text="{Backspace}")  # Backspace (also {BS})
send_text(hwnd, text="{Delete}")     # Delete key (also {Del})
send_text(hwnd, text="{Insert}")     # Insert key (also {Ins})
send_text(hwnd, text="{Home}")       # Home key
send_text(hwnd, text="{End}")        # End key
send_text(hwnd, text="{PageUp}")     # Page Up (also {PgUp})
send_text(hwnd, text="{PageDown}")   # Page Down (also {PgDn})

# Arrow keys
send_text(hwnd, text="{Up}")         # Up arrow
send_text(hwnd, text="{Down}")       # Down arrow
send_text(hwnd, text="{Left}")       # Left arrow
send_text(hwnd, text="{Right}")      # Right arrow

# Function keys
send_text(hwnd, text="{F1}")         # F1 through {F24}
send_text(hwnd, text="{F5}")         # Refresh in many apps
send_text(hwnd, text="{F11}")        # Fullscreen toggle

# Lock keys
send_text(hwnd, text="{CapsLock}")   # Caps Lock
send_text(hwnd, text="{NumLock}")    # Num Lock
send_text(hwnd, text="{ScrollLock}") # Scroll Lock

# Numpad keys
send_text(hwnd, text="{Numpad0}")    # Numpad 0-9
send_text(hwnd, text="{NumpadEnter}")# Numpad Enter
send_text(hwnd, text="{NumpadAdd}")  # Numpad +
send_text(hwnd, text="{NumpadSub}")  # Numpad -
send_text(hwnd, text="{NumpadMult}") # Numpad *
send_text(hwnd, text="{NumpadDiv}")  # Numpad /
send_text(hwnd, text="{NumpadDot}")  # Numpad .

# Media/Browser keys
send_text(hwnd, text="{Volume_Mute}")      # Mute
send_text(hwnd, text="{Volume_Up}")        # Volume up
send_text(hwnd, text="{Volume_Down}")      # Volume down
send_text(hwnd, text="{Media_Play_Pause}") # Play/Pause
send_text(hwnd, text="{Browser_Back}")     # Browser back

Modifier Prefixes:

# ^ = Ctrl
send_text(hwnd, text="^c")           # Ctrl+C (copy)
send_text(hwnd, text="^v")           # Ctrl+V (paste)
send_text(hwnd, text="^a")           # Ctrl+A (select all)
send_text(hwnd, text="^s")           # Ctrl+S (save)
send_text(hwnd, text="^z")           # Ctrl+Z (undo)
send_text(hwnd, text="^f")           # Ctrl+F (find)

# + = Shift
send_text(hwnd, text="+{Tab}")       # Shift+Tab (reverse tab)
send_text(hwnd, text="+{Home}")      # Shift+Home (select to start)
send_text(hwnd, text="+{End}")       # Shift+End (select to end)

# ! = Alt
send_text(hwnd, text="!{F4}")        # Alt+F4 (close window)
send_text(hwnd, text="!{Tab}")       # Alt+Tab (switch windows)
send_text(hwnd, text="!f")           # Alt+F (File menu)

# # = Win (Windows key)
send_text(hwnd, text="#r")           # Win+R (Run dialog)
send_text(hwnd, text="#e")           # Win+E (Explorer)
send_text(hwnd, text="#d")           # Win+D (Show desktop)
send_text(hwnd, text="#l")           # Win+L (Lock screen)

# Combine modifiers
send_text(hwnd, text="^+s")          # Ctrl+Shift+S (Save As)
send_text(hwnd, text="^+{Escape}")   # Ctrl+Shift+Escape (Task Manager)
send_text(hwnd, text="^!{Delete}")   # Ctrl+Alt+Delete
send_text(hwnd, text="^+n")          # Ctrl+Shift+N (new incognito window)

Key Repetition:

# Press a key multiple times
send_text(hwnd, text="{Tab 5}")      # Press Tab 5 times
send_text(hwnd, text="{Enter 3}")    # Press Enter 3 times
send_text(hwnd, text="{Down 10}")    # Press Down arrow 10 times
send_text(hwnd, text="{Backspace 20}")# Delete 20 characters

Key Hold/Release:

# Hold a key down, do something, then release
send_text(hwnd, text="{Shift down}hello{Shift up}")  # Types "HELLO"
send_text(hwnd, text="{Ctrl down}ac{Ctrl up}")       # Ctrl+A, Ctrl+C

# Useful for drag operations or games
send_text(hwnd, text="{Ctrl down}")  # Hold Ctrl
# ... do something ...
send_text(hwnd, text="{Ctrl up}")    # Release Ctrl

Literal Characters (Escaping):

# Send literal brace characters
send_text(hwnd, text="{{}Hello{}}")  # Sends "{Hello}"

# Send literal modifier symbols
send_text(hwnd, text="Price: $100{!}")  # Sends "Price: $100!"
send_text(hwnd, text="2{+}2=4")         # Sends "2+2=4"
send_text(hwnd, text="C{#} code")       # Sends "C# code"
send_text(hwnd, text="100{^}2")         # Sends "100^2"

Raw Mode (No Parsing):

# Send text literally without any parsing
send_text(hwnd, text="{Raw}^c is Ctrl+C")  # Sends "^c is Ctrl+C" literally
send_text(hwnd, text="{Text}!@#$%^&*()")  # Sends special chars literally

Complete Examples:

# Fill a form and submit
send_text(hwnd, text="John Doe{Tab}john@example.com{Tab}password123{Enter}")

# Open Run dialog, type command, execute
send_text(hwnd, text="#r")           # Win+R
time.sleep(0.5)
send_text(hwnd, text="notepad{Enter}")

# Select all, copy, paste
send_text(hwnd, text="^a^c")         # Select all, copy
send_text(hwnd, text="{Tab}^v")      # Tab to next field, paste

# Navigate a document
send_text(hwnd, text="^{Home}")      # Go to beginning
send_text(hwnd, text="^+{End}")      # Select to end
send_text(hwnd, text="^c")           # Copy selection

# Close application gracefully
send_text(hwnd, text="^s")           # Save first
send_text(hwnd, text="!{F4}")        # Then close

Screenshot Capabilities

Capture Windows:

# Full window screenshot
screenshot = take_screenshot(hwnd="0x00020828")

# Specific region (x, y, width, height relative to window)
screenshot = take_screenshot(hwnd="0x00020828", region=[50, 50, 300, 200])

# Returns: base64-encoded PNG image

Why this matters: Visual verification, OCR input, debugging, documentation generation — all automated.

Command Execution

Run Commands with Background Support:

# Execute and wait (default 30 second timeout)
result = execute_command("dir /s", timeout_ms=5000)

# Execute in background (returns immediately)
result = execute_command("npm run build", timeout_ms=0)
# Returns: session_id for later interaction

# PowerShell commands (Windows)
result = execute_command(
    "Get-Process | Select-Object Name, CPU",
    shell="powershell"
)

# WSL commands (Windows Subsystem for Linux)
result = execute_command("ls -la /home", shell="wsl")

# Bash commands (Mac/Linux)
result = execute_command("find . -name '*.py'", shell="bash")

Read Output from Background Sessions:

# Check for new output (non-blocking)
output = read_output(session_id=1, timeout_ms=0)

# Wait for new output (blocking with timeout)
output = read_output(session_id=1, timeout_ms=3000)

# Returns: new output since last read, completion status

Manage Sessions:

# List all active sessions
sessions = list_sessions()

# Force terminate a session
force_terminate(session_id=1)

Why background execution matters: Start a build, continue working, check results later. No blocking, no waiting, no wasted time.

System Information

Get Comprehensive System Details:

# Quick summary
info = about(detail="summary")

# Full system information
info = about(detail="full")

# Specific section only
info = about(detail="full", section="hardware_information")

Available Sections:

system_information — OS, version, architecture, boot time
hardware_information — CPU, RAM, disk, GPU details
display_information — Monitor config, resolution, DPI
user_and_security_information — Current user, permissions, UAC
performance_information — CPU usage, memory usage, disk I/O
software_environment — Installed runtimes, dev tools
network_information — IP addresses, adapters, connectivity
installed_applications — List of installed software
running_processes — Currently running processes
browser_information — Installed browsers, default browser
current_state — Current windows (full detail only)

Why this matters: Environment detection, troubleshooting, system audits, compatibility checks — all automated.

File Operations

Write Files:

# Write text to file
write_file(path="/tmp/mydata.txt", content="Unlimited data here!")

# Supports any text content, any size

Read Files:

# Read file contents
content = read_file(path="mydata.txt")

# Returns: file contents as string

Why include file operations? Seamless integration with command execution. Run command, save output, process results — all in one tool.

Platform-Specific Features

Windows (Fully Implemented)

UI Automation:

Full UIAutomation framework support
Scan any UI element with detailed properties
Click buttons by name or automation ID
Extract text from any control
Handle complex UI hierarchies

Advanced Window Management:

Precise window positioning
Batch window moves for layouts
Window style detection
Z-order manipulation

Electron App Support:

Handshake protocol for Chromium accessibility
Full DOM/ARIA tree access
Works with Signal, Cursor, Discord, VS Code, etc.

Command Execution:

Native Windows commands
PowerShell integration
WSL (Windows Subsystem for Linux) support
Background execution with output streaming

macOS (Implementation in Progress)

Planned Features:

Accessibility API integration
AppleScript execution
Window management via Quartz
Application control via AppKit
Screenshot capabilities

Current Status: Basic window listing and command execution available.

Linux (Implementation in Progress)

Planned Features:

X11 window management (via Xlib)
Wayland support (via PyWinCtl)
UI automation via AT-SPI
Screenshot via scrot/ImageMagick
Command execution via bash

Current Status: Basic window listing and command execution available.

Usage Examples

List All Windows

{
  "input": {
    "operation": "list_windows",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Activate Window

{
  "input": {
    "operation": "activate_window",
    "hwnd": "0x00020828",
    "request_focus": true,
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Scan UI Elements

{
  "input": {
    "operation": "scan_ui_elements",
    "window_title": "Notepad",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Click Button by Name

{
  "input": {
    "operation": "click_ui_element",
    "hwnd": "0x00020828",
    "element_name": "OK",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Take Screenshot

{
  "input": {
    "operation": "take_screenshot",
    "hwnd": "0x00020828",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Send Text with AutoHotkey Syntax

{
  "input": {
    "operation": "send_text",
    "hwnd": "0x00020828",
    "text": "Hello{Enter}",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Send Keyboard Shortcuts

{
  "input": {
    "operation": "send_text",
    "hwnd": "0x00020828",
    "text": "^a^c",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Note: ^a^c sends Ctrl+A (select all) followed by Ctrl+C (copy).

Execute Command in Background

{
  "input": {
    "operation": "execute_command",
    "command": "npm run build",
    "timeout_ms": 0,
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Read Background Command Output

{
  "input": {
    "operation": "read_output",
    "session_id": 1,
    "timeout_ms": 3000,
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Move Multiple Windows (Layout)

{
  "input": {
    "operation": "move_window",
    "moves": [
      {"hwnd": "0x00020C4A", "x": 0, "y": 0, "width": 960, "height": 580},
      {"hwnd": "0x00030B12", "x": 960, "y": 0, "width": 960, "height": 580}
    ],
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Get System Information

{
  "input": {
    "operation": "about",
    "detail": "full",
    "section": "hardware_information",
    "tool_unlock_token": "YOUR_TOKEN"
  }
}

Advanced Use Cases

Automated Testing

# 1. Launch app
execute_command("./myapp", background=True)

# 2. Wait for window
time.sleep(2)
windows = list_windows()
app = [w for w in windows if "MyApp" in w['title']][0]

# 3. Interact with UI
click_ui_element(hwnd=app['hwnd'], element_name="New")
send_text(hwnd=app['hwnd'], text="Test data")
click_ui_element(hwnd=app['hwnd'], element_name="Save")

# 4. Verify results
screenshot = take_screenshot(hwnd=app['hwnd'])
# OCR or visual comparison here

Workspace Layout Management

# Save current layout
windows = list_windows()
layout = [{
    "title": w['title'],
    "x": w['x'],
    "y": w['y'],
    "width": w['width'],
    "height": w['height']
} for w in windows]

# Restore layout later
for window in layout:
    # Find window by title
    current = [w for w in list_windows() if w['title'] == window['title']]
    if current:
        move_window(
            hwnd=current[0]['hwnd'],
            x=window['x'],
            y=window['y'],
            width=window['width'],
            height=window['height']
        )

Long-Running Build Monitoring

# Start build in background
result = execute_command("npm run build", timeout_ms=0)
session_id = result['session_id']

# Check progress periodically
while True:
    output = read_output(session_id=session_id, timeout_ms=1000)
    if output['completed']:
        print("Build finished!")
        print(output['full_output'])
        break
    else:
        print(f"Still building... ({len(output['new_output'])} bytes new)")
        time.sleep(5)

Automated Data Entry

# Find data entry application
windows = list_windows()
app = [w for w in windows if "DataEntry" in w['title']][0]

# Activate it
activate_window(hwnd=app['hwnd'], request_focus=True)

# Fill form fields using AutoHotkey-style syntax
for field_name, value in form_data.items():
    # Click field by name
    click_ui_element(hwnd=app['hwnd'], element_name=field_name)
    # Type value and tab to next field
    send_text(hwnd=app['hwnd'], text=f"{value}{{Tab}}")

# Submit with Enter key
send_text(hwnd=app['hwnd'], text="{Enter}")

Keyboard Automation Examples

# Open Run dialog and launch Notepad
send_text(hwnd=app['hwnd'], text="#r")  # Win+R
time.sleep(0.5)
send_text(hwnd=app['hwnd'], text="notepad{Enter}")

# Copy entire document
send_text(hwnd=app['hwnd'], text="^a^c")  # Ctrl+A, Ctrl+C

# Navigate with arrow keys
send_text(hwnd=app['hwnd'], text="{Down 5}{Right 10}")  # 5 down, 10 right

# Close window gracefully
send_text(hwnd=app['hwnd'], text="^s")     # Save first
send_text(hwnd=app['hwnd'], text="!{F4}")  # Alt+F4 to close

Technical Architecture

Windows Implementation

UI Automation Framework:

Uses Microsoft UIAutomation API
Scans full accessibility tree
Extracts element properties, names, types
Supports complex UI hierarchies

Window Management:

Win32 API for window enumeration
SetWindowPos for precise positioning
SetForegroundWindow with advanced activation
Handles focus stealing prevention

Electron App Handshake:

Detects Chrome_WidgetWin_1 class
Sends WM_GETOBJECT message
Triggers Chromium AX-complete mode
Exposes full DOM/ARIA tree

Command Execution:

subprocess.Popen for process management
Background thread for output capture
Queue-based output streaming
Session tracking with cleanup

Cross-Platform Abstraction

Platform Detection:

CURRENT_PLATFORM = platform.system()
IS_WINDOWS = CURRENT_PLATFORM == 'Windows'
IS_MACOS = CURRENT_PLATFORM == 'Darwin'
IS_LINUX = CURRENT_PLATFORM == 'Linux'

Conditional Imports:

Windows: win32gui, uiautomation, PIL.ImageGrab
macOS: Quartz, AppKit, Cocoa (planned)
Linux: Xlib, PyWinCtl, AT-SPI (planned)

Unified API:

Same function signatures across platforms
Platform-specific implementations hidden
Graceful degradation when features unavailable

Session Management

Terminal Sessions:

Process tracking with PID
Output buffering (accumulated + new)
Completion detection
Exit code capture
Automatic cleanup

Background Execution:

Non-blocking command start
Periodic output polling
Timeout support
Force termination capability

Performance Considerations

UI Scanning

Windows: ~100-500ms depending on UI complexity
Caches element tree for repeated access
Filters by visibility to reduce noise

Window Operations

List windows: ~10-50ms
Activate window: ~50-200ms (OS-dependent)
Move window: ~10-30ms per window
Batch moves: Atomic, same as single move

Screenshots

Full window: ~50-200ms depending on size
Region: ~20-100ms
Returns base64 PNG (add ~30% size overhead)

Command Execution

Synchronous: Blocks until completion or timeout
Asynchronous: Returns immediately, ~5-10ms overhead
Output polling: ~1-5ms per check

Limitations & Considerations

Windows

UAC Elevation: Cannot interact with elevated windows from non-elevated process
Electron Apps: Requires handshake for full UI access (planned)
Focus Stealing: Windows 10+ prevents aggressive focus stealing
Screen Readers: May interfere with UI Automation

macOS

Accessibility Permissions: Requires user approval
System Integrity Protection: Limits some automation
Sandboxing: App Store apps may have restrictions

Linux

X11 vs Wayland: Different APIs, varying support
Desktop Environments: Behavior varies (GNOME, KDE, etc.)
Permissions: May require specific user groups

General

Thread Safety: Window handles are process-specific
Handle Validity: Windows can close, handles become invalid
Coordinate Systems: Screen vs window coordinates require conversion
Text Encoding: Unicode support varies by platform

Why This Tool is Unmatched

1. Cross-Platform Vision
Not just Windows. Mac and Linux support in progress. One API, all platforms.

2. Intelligent Interaction
Click buttons by name, not coordinates. Scan UI elements, understand structure.

3. Background Execution
Start long-running commands, check later. No blocking, no waiting.

4. Complete System Access
Windows, UI elements, commands, files, screenshots — everything in one tool.

5. Layout Management
Batch window moves. Save and restore workspace layouts. Perfect for multi-monitor setups.

6. Session Tracking
Multiple background commands. Check output anytime. Terminate if needed.

7. Electron App Support
Full access to Chromium-based apps. Not just window chrome — actual content.

8. Zero Dependencies
All platform-specific libraries included. No separate installations.

9. Production-Ready
Error handling, timeout protection, graceful degradation. Battle-tested.

10. Extensible
Add new operations easily. Platform-specific features isolated.

Powered by MCP-Link

This tool is part of the MCP-Link Server — the only MCP server with comprehensive desktop automation built-in.

What's Included

Isolated Python Environment:

All Windows automation libraries included
UIAutomation, Win32, PIL, psutil bundled
Zero configuration required

Battle-Tested Infrastructure:

Session management with cleanup
Thread-safe operation
Comprehensive error handling
Platform detection and adaptation

Cross-Platform Excellence:

Windows fully implemented
macOS and Linux in progress
Consistent API across platforms
Graceful feature degradation

Get MCP-Link

Download the installer for your platform:

Installation is automatic. Dependencies are included. It just works.

Technical Specifications

Supported Platforms: Windows (full), macOS (partial), Linux (partial)
Windows APIs: UIAutomation, Win32 API, PIL
macOS APIs: Quartz, AppKit (planned)
Linux APIs: X11/Xlib, PyWinCtl, AT-SPI (planned)
Command Shells: cmd, PowerShell, WSL, bash, zsh
Screenshot Format: PNG (base64-encoded)
Session Management: Background execution with output streaming
Thread Safety: Process-local handles, thread-safe session tracking

Performance:

Window listing: 10-50ms
UI scanning: 100-500ms
Screenshots: 50-200ms
Command execution: Non-blocking available

License & Copyright

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

AI Training Permission: You are permitted to use this software and any associated content for the training, evaluation, fine-tuning, or improvement of artificial intelligence systems, including commercial models.

SPDX-License-Identifier: Apache-2.0

Part of the Aura Friday MCP-Link Server project.

Support & Community

Issues & Feature Requests:
GitHub Issues

Documentation:
MCP-Link Documentation

Community:
Join other developers building desktop automation into their AI applications.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
system.py		system.py

License

AuraFriday/system_mcp

Folders and files

Latest commit

History

Repository files navigation

System Automation — Control Any Desktop Application

Benefits

1. 🖱️ Click Anything, Read Everything

2. 🌐 Cross-Platform Power

3. 🚀 Background Execution Built-In

Why This Tool Changes Desktop Automation

Real-World Story: The Testing Nightmare

The Complete Feature Set

Window Management

UI Element Interaction

AutoHotkey-Style Keystroke Syntax

Screenshot Capabilities

Command Execution

System Information

File Operations

Platform-Specific Features

Windows (Fully Implemented)

macOS (Implementation in Progress)

Linux (Implementation in Progress)

Usage Examples

List All Windows

Activate Window

Scan UI Elements

Click Button by Name

Take Screenshot

Send Text with AutoHotkey Syntax

Send Keyboard Shortcuts

Execute Command in Background

Read Background Command Output

Move Multiple Windows (Layout)

Get System Information

Advanced Use Cases

Automated Testing

Workspace Layout Management

Long-Running Build Monitoring

Automated Data Entry

Keyboard Automation Examples

Technical Architecture

Windows Implementation

Cross-Platform Abstraction

Session Management

Performance Considerations

UI Scanning

Window Operations

Screenshots

Command Execution

Limitations & Considerations

Windows

macOS

Linux

General

Why This Tool is Unmatched

Powered by MCP-Link

What's Included

Get MCP-Link

Technical Specifications

License & Copyright

Support & Community

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages