A powerful Model Context Protocol (MCP) server that enables Claude to control Windows through a secure API interface while viewing live desktop feedback. This project implements a three-layer architecture for reliable and safe system control.
graph TD
A[MCP Server] --> B[FastAPI Server]
B --> C[Computer Control]
C --> D[Windows System]
The system consists of three main layers:
- MCP Server Layer (TypeScript) - Handles Claude interaction and command processing
- API Layer (Python/FastAPI) - Provides RESTful endpoints for system control
- Computer Control Layer (Python) - Manages low-level Windows operations
-
Mouse Operations
- Precise cursor movement
- Single/double clicks (left, right, middle buttons)
- Drag operations
- Position tracking
-
Keyboard Input
- Text typing with configurable delays
- Special key handling
- Key combinations (Ctrl, Alt, Shift modifiers)
- International character support
-
Screen Operations
- Real-time screenshot capture
- Screen dimension detection
- Image processing and optimization
- Base64 encoding for transmission
-
Window Management
- Window focus control
- State management (minimize, maximize, restore)
- Smart coordinate scaling
- Boundary protection
- Node.js 18 or higher
- Python 3.8 or higher
- Windows 10 or higher
- Install Python Dependencies
pip install -r requirements.txt- Install Node.js Dependencies
npm install- Build the MCP Server
npm run build- Start the Python API server:
python main.py- Configure Claude Desktop by adding to
%APPDATA%\Claude\claude_desktop_config.json:
{
"mcpServers": {
"windows-control": {
"command": "C:\\Program Files\\nodejs\\node.exe",
"args": [
"C:\\Users\\YourUsername\\path\\to\\windows-implementation\\dist\\index.js",
"--api-url=http://localhost:8000"
],
"cwd": "C:\\Users\\YourUsername\\path\\to\\windows-implementation"
}
}
}-
API Settings
--api-url: API server URL (default: http://localhost:8000)
-
Performance Settings
- Mouse movement duration
- Click delays
- Keyboard input timing
- Screenshot quality
- Input Validation
- Strict coordinate boundary checking
- Key mapping validation
- Screenshot size limits
- Safety Measures
- PyAutoGUI failsafe mechanism
- Screen boundary protection
- Controlled input delays
- Resource management
- Error Handling
- Comprehensive exception handling
- Detailed logging
- Automatic recovery mechanisms
npm run devnpm start -- --api-url=http://localhost:8000-
POST /mouse/move- Move cursor to coordinates
- Parameters:
x,y
-
POST /mouse/click- Click at current position
- Optional:
button("left", "right", "middle")
-
POST /mouse/double-click- Double click at position
- Optional:
x,y
-
POST /keyboard/type- Type text
- Parameter:
text
-
POST /keyboard/press- Press specific key
- Parameter:
key - Optional:
ctrl,alt,shift
-
GET /screenshot- Capture screen
- Returns base64 encoded JPEG
-
GET /screen/size- Get screen dimensions
-
Frontend/MCP
- TypeScript
- Model Context Protocol SDK
- Axios
-
Backend/API
- Python 3.8+
- FastAPI
- Uvicorn
- Pydantic
-
System Integration
- PyAutoGUI
- Win32GUI
- OpenCV
- PIL
- NumPy
The system implements a layered error handling approach:
- MCP Server Level
- Request validation
- Response formatting
- Connection management
- API Level
- Endpoint validation
- Request parsing
- Response standardization
- System Level
- Operation validation
- Resource management
- Recovery procedures
- Efficient screenshot compression
- Smart coordinate scaling
- Controlled input delays
- Memory-efficient image handling
- Proper resource cleanup
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.