Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OmniMCP: Direct Host Control Bridge Between OmniParser and Claude MCP #947

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
9e94a05
Implement OmniMCP for Claude computer control
abrichr Mar 16, 2025
8e070e3
Add standalone OmniMCP package with minimal dependencies
abrichr Mar 16, 2025
266c44c
Standardize CLI mode terminology in documentation
abrichr Mar 16, 2025
e2ddf84
Use monitor dimensions for default visualization size
abrichr Mar 16, 2025
0a4c658
Set executable permission on install.sh and update README
abrichr Mar 16, 2025
a058f63
Add robust path handling for OmniMCP standalone package
abrichr Mar 16, 2025
ca33de9
Implement lazy imports for BeautifulSoup and update OmniMCP dependencies
abrichr Mar 16, 2025
524787c
Add posthog to OmniMCP dependencies and keep BeautifulSoup lazy loading
abrichr Mar 16, 2025
a346a19
Add multiprocessing-utils dependency and update README
abrichr Mar 16, 2025
99eed07
Add numpy to OmniMCP dependencies
abrichr Mar 16, 2025
b30c6a7
Add orjson to OmniMCP dependencies
abrichr Mar 16, 2025
7fa4483
Add dictalchemy to OmniMCP dependencies
abrichr Mar 16, 2025
f7876f5
Make BeautifulSoup import lazy in models.py
abrichr Mar 16, 2025
a13e399
Add joblib to OmniMCP dependencies
abrichr Mar 16, 2025
b854f42
Add AWS dependencies for OmniParser deployment
abrichr Mar 16, 2025
e31a8dd
Improve OmniParser integration with strict validation
abrichr Mar 16, 2025
22ac392
Add Anthropic ComputerUse integration information
abrichr Mar 16, 2025
47da97a
Improve OmniParser deployment and configuration options
abrichr Mar 16, 2025
9cec405
Fix OmniMCP deployment and add utility files
abrichr Mar 16, 2025
c435c4a
WIP: Move core functionality to omnimcp package
abrichr Mar 16, 2025
8855310
Update OmniMCP for independent operation
abrichr Mar 16, 2025
8391c67
Fix OmniParser auto-deployment with AWS integration
abrichr Mar 16, 2025
729a643
simplfiy loop.py
abrichr Mar 16, 2025
4900fbc
gitignore
abrichr Mar 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -41,3 +41,7 @@ build/

OpenAdapt.spec
build_scripts/OpenAdapt.iss

omnimcp/omnimcp.egg-info
**/__pycache__
omnimcp/.env
298 changes: 258 additions & 40 deletions deploy/deploy/models/omniparser/deploy.py

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions omnimcp/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# OmniMCP AWS Configuration Example
# Copy this file to .env and fill in your AWS credentials

# AWS credentials for OmniParser deployment
ANTHROPIC_API_KEY=your_anthropic_api_key
AWS_ACCESS_KEY_ID=your_access_key_id
AWS_SECRET_ACCESS_KEY=your_secret_access_key
AWS_REGION=us-east-1
PROJECT_NAME=omnimcp2
124 changes: 124 additions & 0 deletions omnimcp/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# OmniMCP Development Notes

**FOCUS: GET THIS WORKING ASAP**

⚠️ **CRITICAL RULES** ⚠️
- NEVER VIEW the contents of any .env file
- NEVER ASK to see the contents of any .env file
- NEVER SUGGEST viewing the contents of any .env file
- These files contain sensitive credentials that must remain private
- ALWAYS USE --auto-deploy-parser when running OmniMCP
- NEVER USE --allow-no-parser under any circumstances

## Installation Commands

```bash
# Install OmniMCP with minimal dependencies
./install.sh

# Install additional dependencies for OmniParser deployment
# For temporary use (doesn't modify pyproject.toml):
uv pip install paramiko

# For permanent addition (modifies pyproject.toml):
# uv add paramiko
```

## AWS Configuration for OmniParser

OmniParser deployment requires AWS credentials. These need to be set in OpenAdapt's deploy module:

```bash
# Copy the deploy example file to the actual .env file
cp /Users/abrichr/oa/src/OpenAdapt/deploy/.env.example /Users/abrichr/oa/src/OpenAdapt/deploy/.env

# Edit the .env file to add your AWS credentials
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION must be set
```

### Important Deployment Fixes

If OmniParser deployment fails, check for these common issues:

1. **Correct import path**: The correct import path in `omnimcp/adapters/omniparser.py` should be:
```python
from deploy.deploy.models.omniparser.deploy import Deploy
```

2. **AWS Region**: Make sure to use a region where your AWS account has a properly configured default VPC with subnets. For example:
```
AWS_REGION=us-east-1
```

3. **VPC Subnet issue**: If you encounter "No subnets found in VPC" error, the deploy script has been modified to automatically create a subnet in your default VPC.

4. **Key pair path**: The EC2 key pair is now stored in the deployment script directory to avoid permission issues.

5. **Remote URL connection**: OmniMCP now captures the EC2 instance's public IP address and updates the OmniParser client URL to connect to the remote server instead of localhost.

6. **Deployment time**: OmniParser deployment timeline:
- First-time container build: ~5 minutes (includes downloading models)
- Server ready time: ~1 minute after container starts
- Subsequent connections: Should be near-instantaneous (< 1 second)

**TODO:** Implement functionality to override the .env file location to allow keeping credentials in the omnimcp directory.

## Running OmniMCP

```bash
# Run in debug mode with auto-deploy OmniParser (no confirmation)
omnimcp debug --auto-deploy-parser --skip-confirmation

# Run in CLI mode with auto-deploy OmniParser (no confirmation)
omnimcp cli --auto-deploy-parser --skip-confirmation

# Run as MCP server with auto-deploy OmniParser (no confirmation)
omnimcp server --auto-deploy-parser --skip-confirmation

# Always use auto-deploy with skip-confirmation for best results
# DO NOT use --allow-no-parser as it provides limited functionality
```

## Managing OmniParser EC2 Instances

```bash
# To stop an OmniParser EC2 instance (prevents additional AWS charges)
cd /Users/abrichr/oa/src/OpenAdapt/deploy
uv python deploy/models/omniparser/deploy.py stop
```

## OmniMCP Testing Plan

### 1. Installation
- Navigate to the omnimcp directory
- Run the installation script
- Verify that omnimcp is available in PATH

### 2. Debug Mode
- Run omnimcp in debug mode without auto-deploy-parser
- Verify that it takes a screenshot and attempts to analyze UI elements
- Save the debug visualization

### 3. OmniParser Deployment (if AWS credentials are available)
- Run omnimcp with auto-deploy-parser flag
- Verify that it deploys OmniParser to AWS EC2
- Check the deployment status and get the server URL

### 4. CLI Mode
- Run omnimcp in CLI mode with the server URL from previous step
- Test simple commands like 'find the close button'
- Verify that it can analyze the screen and take actions

### 5. MCP Server Mode
- Run omnimcp in server mode
- Test connection with Claude Desktop (if available)
- Verify that Claude can use the MCP tools

### 6. Computer Use Mode
- Run the computer-use command (if Docker is available)
- Verify that it launches the Anthropic Computer Use container
- Test browser access to the web interfaces

### 7. Cleanup
- Stop any running OmniParser instances on AWS
- Clean up any temporary files
177 changes: 177 additions & 0 deletions omnimcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# OmniMCP

OmniMCP is a UI automation system that enables Claude to control the computer through the Model Control Protocol (MCP). It combines OmniParser's visual understanding with Claude's natural language capabilities to automate UI interactions.

## Standalone Installation (minimal dependencies)

This standalone package provides OmniMCP with minimal dependencies, letting you use the core functionality without installing all of OpenAdapt's dependencies. It's part of a larger refactoring effort to make components more modular and easier to use.

### Prerequisites

- Python 3.10 or 3.11
- [uv](https://github.com/astral-sh/uv) - Fast Python package installer and resolver
```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### Install OmniMCP

```bash
# Clone the OpenAdapt repository
git clone https://github.com/OpenAdaptAI/OpenAdapt.git
cd OpenAdapt/omnimcp

# Run the installation script (creates a virtual environment using uv)
# For Unix/Mac:
./install.sh
# Note: If you get a permission error, run: chmod +x ./install.sh

# For Windows:
install.bat
```

This installation method:
1. Creates an isolated virtual environment using uv
2. Only installs the dependencies needed for OmniMCP
3. Sets up Python to find the required OpenAdapt modules without installing the full package

## Usage

After installation, activate the virtual environment:

```bash
# For Unix/Mac
source .venv/bin/activate

# For Windows
.venv\Scripts\activate.bat
```

### Development

For development and testing, you can reset the environment with:

```bash
# Reset the virtual environment and reinstall dependencies
cd /path/to/OpenAdapt/omnimcp
rm -rf .venv && chmod +x install.sh && ./install.sh
```

### Running OmniMCP

```bash
# Run CLI mode (direct command input)
omnimcp cli

# Run MCP server (for Claude Desktop)
omnimcp server

# Run in debug mode to visualize screen elements
omnimcp debug

# Run Computer Use mode (Anthropic's official Computer Use integration)
computer-use

# Connect to a remote OmniParser server
omnimcp cli --server-url=https://your-omniparser-server.example.com

# Deploy OmniParser automatically without confirming
omnimcp cli --auto-deploy-parser --skip-confirmation

# IMPORTANT: Always use auto-deploy with skip-confirmation
omnimcp cli --auto-deploy-parser --skip-confirmation

# Disable automatic OmniParser deployment attempt
omnimcp cli --auto-deploy-parser=False

# With additional options
omnimcp cli --use-normalized-coordinates
omnimcp debug --debug-dir=/path/to/debug/folder

# Computer Use with specific model
computer-use --model=claude-3-opus-20240229

# Computer Use with auto-deploy of OmniParser
computer-use --auto-deploy-parser --skip-confirmation
```

### OmniParser Configuration

OmniMCP requires access to an OmniParser server for analyzing screenshots:

1. **Use a Remote OmniParser Server** (Recommended)
```bash
omnimcp cli --server-url=https://your-omniparser-server.example.com
```

2. **Auto-Deploy OmniParser** (Convenient but requires AWS credentials)
- By default, OmniMCP will offer to deploy OmniParser if not available
- You can control this behavior with these flags:
```bash
# Deploy without asking for confirmation
omnimcp cli --auto-deploy-parser --skip-confirmation

# Disable auto-deployment completely
omnimcp cli --auto-deploy-parser=False
```

3. **Use the Default Local Server**
- OmniMCP will try to connect to `http://localhost:8000` by default
- This requires running an OmniParser server locally

4. **IMPORTANT: Always Use Auto-Deploy with Skip-Confirmation**
- For best results, always use these flags together:
```bash
omnimcp cli --auto-deploy-parser --skip-confirmation
```

### Future Direction: Anthropic ComputerUse Integration

OmniMCP and Anthropic's [ComputerUse](https://docs.anthropic.com/en/docs/agents-and-tools/computer-use) both enable Claude to control computers, but with different architectural approaches:

#### Key Differences

**Integration Approach:**
- **OmniMCP** uses OmniParser for understanding UI elements
- **ComputerUse** captures screenshots and provides them directly to Claude

**Environment:**
- **OmniMCP** runs directly on the host system with minimal dependencies
- **ComputerUse** operates in a containerized virtual desktop environment

**MCP vs. Anthropic-defined Tools:**
- **OmniMCP** uses the Model Control Protocol (MCP), a structured protocol for AI models to interact with tools
- **ComputerUse** uses Anthropic-defined tools (`computer`, `text_editor`, and `bash`) via Claude's tool use API

#### Potential Integration Paths

Future OmniMCP development could:
1. **Dual Protocol Support**: Support both MCP and Anthropic-defined tools
2. **Container Option**: Provide a containerized deployment similar to ComputerUse
3. **Unified Approach**: Create a bridge between MCP and ComputerUse tools
4. **Feature Parity**: Incorporate ComputerUse capabilities while maintaining MCP compatibility

Both approaches have merits, and integrating aspects of ComputerUse could enhance OmniMCP's capabilities while preserving its lightweight nature and existing MCP integration.

## Features

- Visual UI analysis with OmniParser
- Natural language understanding with Claude
- Keyboard and mouse control with pynput
- Model Control Protocol integration
- Debug visualizations

## Structure

OmniMCP uses code from the OpenAdapt repository but with a minimal set of dependencies. The key components are:

- `omnimcp/pyproject.toml`: Minimal dependency list
- `omnimcp/setup.py`: Setup script that adds OpenAdapt to the Python path
- `omnimcp/omnimcp/` package:
- `omnimcp/omnimcp/omnimcp.py`: Core OmniMCP functionality
- `omnimcp/omnimcp/run_omnimcp.py`: CLI interface
- `omnimcp/omnimcp/computer_use.py`: Computer Use integration
- `omnimcp/omnimcp/pathing.py`: Python path configuration
- `omnimcp/omnimcp/adapters/omniparser.py`: OmniParser client and provider
- `omnimcp/omnimcp/mcp/server.py`: Model Control Protocol server implementation
23 changes: 23 additions & 0 deletions omnimcp/install.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
@echo off
REM OmniMCP installation script for Windows

echo Creating virtual environment...
uv venv

echo Activating virtual environment...
call .venv\Scripts\activate.bat

echo Installing OmniMCP with minimal dependencies...
uv pip install -e .

echo.
echo OmniMCP installed successfully!
echo.
echo To activate the environment in the future:
echo call .venv\Scripts\activate.bat
echo.
echo To run OmniMCP:
echo omnimcp cli # For CLI mode
echo omnimcp server # For MCP server mode
echo omnimcp debug # For debug mode
echo.
35 changes: 35 additions & 0 deletions omnimcp/install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash

# OmniMCP installation script

# Create virtual environment
echo "Creating virtual environment..."
uv venv

# Activate virtual environment
echo "Activating virtual environment..."
if [[ "$OSTYPE" == "msys" || "$OSTYPE" == "win32" ]]; then
source .venv/Scripts/activate
else
source .venv/bin/activate
fi

# Install OmniMCP
echo "Installing OmniMCP with minimal dependencies..."
uv pip install -e .

echo ""
echo "OmniMCP installed successfully!"
echo ""
echo "To activate the environment in the future:"
if [[ "$OSTYPE" == "msys" || "$OSTYPE" == "win32" ]]; then
echo " source .venv/Scripts/activate"
else
echo " source .venv/bin/activate"
fi
echo ""
echo "To run OmniMCP:"
echo " omnimcp cli # For CLI mode"
echo " omnimcp server # For MCP server mode"
echo " omnimcp debug # For debug mode"
echo ""
9 changes: 9 additions & 0 deletions omnimcp/omnimcp/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""OmniMCP - Model Control Protocol for UI Automation."""

# Setup path to include OpenAdapt modules
from . import pathing

# Import from local modules
from .omnimcp import OmniMCP

__version__ = "0.1.0"
Loading
Oops, something went wrong.
Loading
Oops, something went wrong.