Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OmniMCP: Direct Host Control Bridge Between OmniParser and Claude MCP #947

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

abrichr
Copy link
Member

@abrichr abrichr commented Mar 16, 2025

OmniMCP: Direct Host Control Bridge Between OmniParser and Claude MCP

What Makes OmniMCP Unique

OmniMCP bridges Microsoft's OmniParser (for UI detection) with Anthropic's Model Control Protocol (MCP) to enable direct host computer control:

  • Automatic OmniParser Deployment: Deploys OmniParser to AWS automatically behind the scenes
  • Direct Host Control: Works on the host machine itself, not in a virtual machine
  • Cross-Platform Support: Uses pynput primitives for OS-agnostic computer control
  • MCP Integration: Embeds rich UI element descriptions directly into the MCP protocol
  • Claude's Intelligence Loop: Leverages Claude's reasoning rather than a custom decision loop

Unlike Computer Use (which runs in a VM with custom tools), OmniMCP provides a lightweight bridge that runs directly on the host and captures the entire screen, making it more flexible for general automation tasks outside a sandbox.

Key Improvements

Fixed OmniParser Auto-Deployment

  • Corrected import paths for deploy module
  • Added subnet creation for VPCs without existing subnets
  • Fixed key path handling to avoid permission issues
  • Enhanced EC2 instance discovery
  • Improved error handling and AWS resource management

Modular Package Structure

  • Self-contained directory with minimal dependencies
  • Clean separation from main OpenAdapt codebase
  • Simple configuration system focused on deployment

Three Operational Modes

CLI Mode:

  • Interactive command-line interface for entering commands
  • Maintains a session where you can issue multiple commands sequentially
  • Purpose: Direct human interaction for testing or simple automation

Server Mode:

  • Runs as a persistent server exposing UI automation via MCP protocol
  • Listens for external connections rather than accepting direct input
  • Purpose: Integration point for applications that need UI automation capabilities

Debug Mode:

  • One-time operation that visualizes and analyzes the current screen
  • Creates images showing detected UI elements with bounding boxes
  • Purpose: Troubleshooting what UI elements OmniParser detects

Installation and Usage

# Install from within OpenAdapt repo
cd OpenAdapt/omnimcp
./install.sh  # (Unix/Mac) or install.bat (Windows)

# Run in CLI mode with OmniParser auto-deployment 
omnimcp cli --auto-deploy-parser --skip-confirmation

# Run as MCP server
omnimcp server --auto-deploy-parser --skip-confirmation

# Debug mode for visualizing UI elements
omnimcp debug --auto-deploy-parser --skip-confirmation

AWS Requirements

For OmniParser deployment to work properly:

  1. AWS credentials in .env file
  2. Default VPC (subnet will be created if needed)
  3. EC2 permissions (instances, security groups, key pairs)
  4. GPU instance quota (g4dn.xlarge - T4 GPU)

Key Implementation Files

  • omnimcp/omnimcp.py: Core implementation
  • omnimcp/adapters/omniparser.py: OmniParser client and deployment logic
  • omnimcp/mcp/server.py: MCP server implementation
  • deploy/models/omniparser/deploy.py: AWS deployment script with fixes

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

This commit adds OmniMCP, a system that enables Claude to control the computer using the Model Control Protocol.

Key components:
- OmniParser adapter for UI element detection
- MCP server implementation
- CLI interface for commands and debugging
- Comprehensive documentation

OmniMCP combines OmniParser's visual understanding with Claude's natural language capabilities to automate UI interactions.
abrichr and others added 10 commits March 15, 2025 21:29
- Create dedicated omnimcp folder with pyproject.toml and setup.py
- Add installation scripts for Windows (install.bat) and Unix (install.sh)
- Set up minimal package structure that uses OpenAdapt imports
- Configure entry points for CLI commands

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Updated comment in omnimcp.py to use "CLI mode" instead of "interactively"
for consistency with other documentation and code.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace hardcoded 800x600 visualization size with actual monitor dimensions
from utils.get_monitor_dims() to ensure consistent scaling across different
display configurations.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Mark install.sh as executable for Unix/Mac users
- Add a note to the README about permissions in case Git doesn't preserve them

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Create a dedicated pathing.py module for OpenAdapt path management
- Add descriptive error messages for troubleshooting import issues
- Centralize path setup logic with proper error handling
- Update importing modules to use the new path handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add lazy imports for BeautifulSoup in utils.py functions
- Add jinja2 to OmniMCP dependencies
- Simplify setup.py to use dependencies from pyproject.toml
- Preserve OpenAdapt path handling in setup.py

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add posthog to OmniMCP dependencies
- Keep BeautifulSoup lazy loaded in utils.py functions
- Revert DistinctIDPosthog class to its original implementation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add multiprocessing-utils to OmniMCP dependencies
- Restore original implementation of process_local storage
- Add development command to README.md for resetting environment

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add numpy as a dependency for array operations
- Required by utils.py

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add orjson as a dependency for fast JSON handling
- Required by utils.py

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@abrichr abrichr force-pushed the feat/omnimcp-clean branch from 77b04c2 to b30c6a7 Compare March 16, 2025 03:15
abrichr and others added 11 commits March 15, 2025 23:17
- Add dictalchemy for SQLAlchemy dict utilities
- Required for openadapt.db module

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Update models.py to use string literals for BeautifulSoup types
- Allow OmniMCP to run without BeautifulSoup dependency

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add joblib for caching functionality
- Required by openadapt.cache module

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add boto3 and botocore for AWS SDK
- Required for deploying OmniParser service

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add allow_no_parser flag to make it explicit when running without OmniParser
- Fail by default if OmniParser server is not available
- Update README with clear instructions for OmniParser configuration
- Add TODO for future Anthropic ComputerUse integration

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add detailed comparison of OmniMCP and Anthropic ComputerUse approaches
- Describe key architectural differences and integration opportunities
- Add TODO comment for future ComputerUse integration possibilities

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add auto-deploy functionality with user confirmation
- Add skip-confirmation flag to deploy without prompting
- Add TODO for simplified AWS configuration in the future
- Update documentation with new options and deployment scenarios
- Expand README with detailed OmniParser configuration instructions

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Added an environment variable override for PROJECT_NAME
- Added .env.example to show required AWS credentials
- Updated README with clearer installation instructions
- Added CLAUDE.md with important command notes
- Added paramiko dependency for OmniParser deployment
- Modified omnimcp.py to ensure PROJECT_NAME consistency
- Simplified openadapt/adapters/__init__.py imports
This is a work-in-progress commit that:
1. Moves OmniMCP, OmniParser adapter, and MCP server to omnimcp package
2. Updates imports and dependencies to match new structure
3. Adds Computer Use integration (loop.py) as a demo
4. Updates setup.py to include the new entry points

Still TODO:
- Ensure all imports from OpenAdapt are minimal (just utils.py)
- Finish testing the OmniParser + MCP integration
- Clean up any remaining references to OpenAdapt
This commit makes OmniMCP more independent from OpenAdapt:
1. Create a local config.py to replace openadapt.config dependency
2. Use the Anthropic SDK directly instead of openadapt.drivers.anthropic
3. Update the Claude model to use latest versions (3.5/3.7)
4. Replace run_omnimcp.py with a local implementation
5. Update imports throughout the codebase to use local modules
- Fixed import path in omniparser.py to use correct deploy.deploy.models.omniparser.deploy
- Added subnet creation for VPCs without subnets
- Fixed key path handling to avoid permission issues
- Improved EC2 instance discovery to connect to remote server
- Enhanced documentation in CLAUDE.md with detailed troubleshooting steps
- Added PROJECT_NAME to .env.example for consistency
- Fixed string formatting in deploy.py Docker commands

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@abrichr abrichr changed the title Implement OmniMCP for Claude computer control OmniMCP: Direct Host Control Bridge Between OmniParser and Claude MCP Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant