Comprehensive Windows automation MCP server for AI agents
Full control over Windows desktop applications with 25+ tools: screenshots, OCR, mouse/keyboard control, window management, process control, clipboard operations, and more.
- Full screen screenshots
- Window-specific capture
- Region capture
- Full screen text extraction
- Region-based OCR
- Powered by Tesseract
- Click (left/right/middle)
- Double-click
- Drag and drop
- Mouse movement with duration
- Scroll (up/down)
- Get mouse position
- Type text with configurable speed
- Press individual keys
- Execute hotkey combinations (Ctrl+C, Alt+F4, etc.)
- Full keyboard shortcuts support
- Copy text to clipboard
- Paste/read clipboard content
- Seamless clipboard integration
- List all open windows
- Focus/activate windows
- Close windows
- Minimize/maximize/restore
- Resize windows
- Move windows
- Get window details (position, size, state)
- List running processes with PIDs
- Filter processes by name
- Kill processes by PID
- Memory usage monitoring
- Python 3.10+ installed
- Tesseract OCR for text recognition:
- Download: https://github.com/UB-Mannheim/tesseract/wiki
- Install to default location or add to PATH
- Verify:
tesseract --version
Option 1: Install from PyPI (Recommended)
pip install win32-mcp-serverOption 2: Install from GitHub
pip install git+https://github.com/RandyNorthrup/win32-mcp-server.gitOption 3: Install from source
# Clone repository
git clone https://github.com/RandyNorthrup/win32-mcp-server.git
cd win32-mcp-server
# Install with dependencies
pip install -e .After installing via pip, add to your MCP configuration (%APPDATA%\Code\User\mcp.json):
{
"servers": {
"win32-inspector": {
"type": "stdio",
"command": "win32-mcp-server"
}
}
}Or install from VS Code MCP Extensions:
- Open VS Code
- Press
Ctrl+Shift+P - Type "MCP: Install Server"
- Search for "Windows Automation Inspector"
- Click Install
After installing via pip, add to %APPDATA%\Claude\claude_desktop_config.json:
{
"mcpServers": {
"win32-inspector": {
"command": "win32-mcp-server"
}
}
}The server uses STDIO transport and works with any MCP-compatible client that supports stdio.
"Capture screenshot of the window titled 'Compliance Guard'"
"Extract text from the screen using OCR"
"OCR the region at x=100, y=100, width=500, height=300"
"Click at coordinates (500, 300)"
"Double-click the button at (450, 250)"
"Drag from (100, 100) to (500, 500)"
"Type 'Hello World' at the current cursor position"
"Press Ctrl+C to copy"
"Execute Alt+F4 to close the window"
"List all open windows"
"Focus the window titled 'Visual Studio Code'"
"Maximize the Chrome window"
"Resize Notepad to 800x600"
"List all running processes"
"Show processes containing 'chrome'"
"Kill process with PID 1234"
| Tool | Description |
|---|---|
capture_screen |
Capture full screen screenshot |
capture_window |
Capture specific window by title |
list_windows |
List all open windows with details |
ocr_screen |
Extract text from full screen |
ocr_region |
Extract text from specified region |
click |
Click at coordinates (left/right/middle) |
double_click |
Double-click at coordinates |
drag |
Drag from start to end coordinates |
type_text |
Type text at current position |
press_key |
Press keyboard key or shortcut |
hotkey |
Execute hotkey combination |
clipboard_copy |
Copy text to clipboard |
clipboard_paste |
Get clipboard content |
mouse_position |
Get current mouse position |
mouse_move |
Move mouse to position |
scroll |
Scroll up/down |
list_processes |
List running processes with PIDs |
kill_process |
Terminate process by PID |
focus_window |
Activate window |
close_window |
Close window by title |
minimize_window |
Minimize window |
maximize_window |
Maximize window |
restore_window |
Restore window |
resize_window |
Resize window |
move_window |
Move window position |
WARNING: This server has powerful system control capabilities including:
- Mouse and keyboard control
- Process termination
- Clipboard access
- Screen capture
Only use in trusted environments where you control the MCP client.
- Restrict Usage: Only enable when actively needed
- Review Logs: Monitor all automated actions
- Sandbox Testing: Test in isolated environments first
- Access Control: Limit who can access the MCP client
- Disable PyAutoGUI Failsafe: Server disables failsafe for automation - be cautious
TesseractNotFoundError: tesseract is not installed
Solution: Install Tesseract OCR from https://github.com/UB-Mannheim/tesseract/wiki
PermissionError: [WinError 5] Access is denied
Solution: Run VS Code or MCP client as Administrator for process control features
ModuleNotFoundError: No module named 'mcp'
Solution: Reinstall dependencies: pip install -e .
Window not found: [title]
Solution: Use partial window title matching. Check exact title with list_windows first.
win32-mcp-server/
├── server.py # Main MCP server implementation
├── pyproject.toml # Package configuration
├── README.md # This file
└── LICENSE # MIT License
- mcp: Model Context Protocol SDK
- mss: Cross-platform screen capture
- Pillow: Image processing
- pyautogui: Mouse and keyboard automation
- pygetwindow: Window management
- pyperclip: Clipboard operations
- pytesseract: OCR text extraction
- psutil: Process management
MIT License - see LICENSE file
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
- Repository: https://github.com/RandyNorthrup/win32-mcp-server
- Issues: https://github.com/RandyNorthrup/win32-mcp-server/issues
- MCP Documentation: https://modelcontextprotocol.io/
For bugs and feature requests, please use GitHub Issues.
Author: Randy Northrup
GitHub: @RandyNorthrup
Built with Python, MCP SDK, and the following open-source libraries:
- mcp - Model Context Protocol SDK
- mss - Fast screenshot capture
- PyAutoGUI - Mouse and keyboard automation
- pygetwindow - Window management
- pytesseract - OCR wrapper for Tesseract
- psutil - Process and system utilities
- pyperclip - Clipboard operations
- Pillow - Image processing
Made for Windows automation and AI agents