Skip to content

camel-ai/browser_agent

Repository files navigation

Hybrid Browser MCP

A lightweight MCP server that exports CAMEL framework's HybridBrowserToolkit as MCP-compatible tools.

Overview

This project provides an MCP (Model Control Protocol) interface for CAMEL's HybridBrowserToolkit, enabling browser automation capabilities through a standardized protocol. It allows LLM-based applications to control web browsers, navigate pages, interact with elements, and capture screenshots.

Key features:

  • Full browser automation capabilities (click, type, navigate, etc.)
  • Screenshot capture with visual element identification
  • Multi-tab management
  • JavaScript execution in browser console
  • Async operation support

Installation

You can install the package directly from source:

git clone https://github.com/yourusername/hybrid-browser-mcp.git
cd hybrid-browser-mcp
pip install -e .

Or using pip:

pip install hybrid-browser-mcp

Claude Desktop Configuration

To use this MCP server with Claude Desktop, add it to your configuration file.

Configuration File Location

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Configuration

Add the following to your claude_desktop_config.json:

{
  "mcpServers": {
    "hybrid-browser": {
      "command": "python",
      "args": [
        "-m",
        "hybrid_browser_mcp.server"
      ]
    }
  }
}

Make sure to:

  1. Use the correct path to your Python interpreter (you can find it with which python)
  2. Ensure the package is installed in that Python environment
  3. Restart Claude Desktop completely after updating the configuration

Verify Connection

After restarting Claude Desktop:

  1. Click the 🔌 (plug icon) in the conversation interface
  2. You should see "hybrid-browser" listed among available tools
  3. The browser automation tools will be available (browser_open, browser_click, etc.)

Configuration Success Example:

Configuration example

claude_desktop_config.json with hybrid-browser MCP server configured

Browser Tools in Action:

截屏2025-09-05 22 37 58

Using browser automation tools in Claude Desktop to interact with web pages

Browser Configuration

The browser behavior is configured through hybrid_browser_mcp/config.py. You can modify this file to customize the browser settings:

BROWSER_CONFIG = {
    "headless": False,              # Run browser in headless mode
    "stealth": True,                # Enable stealth mode
    "viewport_limit": False,        # Include all elements in snapshots
    "cache_dir": "tmp/",           # Cache directory for screenshots
    "enabled_tools": [             # List of enabled browser tools
        "browser_open", "browser_close", "browser_visit_page",
        "browser_back", "browser_forward", "browser_get_som_screenshot",
        "browser_click", "browser_type", "browser_select",
        "browser_scroll", "browser_enter", "browser_mouse_control",
        "browser_mouse_drag", "browser_press_key", "browser_switch_tab",
        # Uncomment to enable additional tools:
        # "browser_get_page_snapshot",
        # "browser_close_tab",
        # "browser_console_view",
        # "browser_console_exec",
    ],
}

Configuration Options

Option Description Default Type
headless Run browser in headless mode (no window) False bool
stealth Enable stealth mode to avoid detection False bool
viewport_limit Only include elements in current viewport in snapshots False bool
cache_dir Directory for storing cache files "tmp/" str
enabled_tools List of enabled tools None* list or None

*When enabled_tools is None, these default tools are enabled: browser_open, browser_close, browser_visit_page, browser_back, browser_forward, browser_click, browser_type, browser_switch_tab

Example Configurations

1. Headless mode for automation:

USER_BROWSER_CONFIG = {
    "headless": True,
}

2. Stealth mode with visible browser:

USER_BROWSER_CONFIG = {
    "headless": False,
    "stealth": True,
}

3. Limited tools for safety:

USER_BROWSER_CONFIG = {
    "enabled_tools": [
        "browser_open",
        "browser_visit_page",
        "browser_get_page_snapshot",
        "browser_close",
    ],
}

4. Enable all available tools:

USER_BROWSER_CONFIG = {
    "enabled_tools": [
        "browser_open", "browser_close", "browser_visit_page",
        "browser_back", "browser_forward", "browser_get_page_snapshot",
        "browser_get_som_screenshot", "browser_click", "browser_type",
        "browser_select", "browser_scroll", "browser_enter",
        "browser_switch_tab", "browser_close_tab", "browser_get_tab_info",
        "browser_mouse_control", "browser_mouse_drag", "browser_press_key",
        "browser_wait_user", "browser_console_view", "browser_console_exec",
    ],
}

Available Tools

The server exposes the following browser control tools:

Core Navigation

  • browser_open(): Opens a new browser session
  • browser_close(): Closes the browser session
  • browser_visit_page(url): Navigates to a specific URL
  • browser_back(): Goes back in browser history
  • browser_forward(): Goes forward in browser history

Page Interaction

  • browser_click(ref): Clicks on an element by its reference ID
  • browser_type(ref, text, inputs): Types text into input fields
  • browser_select(ref, value): Selects an option in a dropdown
  • browser_scroll(direction, amount): Scrolls the page
  • browser_enter(): Presses the Enter key
  • browser_press_key(keys): Presses specific keyboard keys

Page Analysis

  • browser_get_page_snapshot(): Gets a textual snapshot of interactive elements
  • browser_get_som_screenshot(read_image, instruction): Captures a screenshot with element annotations
  • list_browser_functions(): Lists all available browser functions

Tab Management

  • browser_switch_tab(tab_id): Switches to a different browser tab
  • browser_close_tab(tab_id): Closes a specific tab
  • browser_get_tab_info(): Gets information about all open tabs

Advanced Features

  • browser_console_view(): Views console logs
  • browser_console_exec(code): Executes JavaScript in the browser console
  • browser_mouse_control(control, x, y): Controls mouse actions at coordinates
  • browser_mouse_drag(from_ref, to_ref): Drags elements
  • browser_wait_user(timeout_sec): Waits for user input

Example Usage

# Open browser and navigate
await browser_open()
await browser_visit_page("https://www.google.com")

# Get page snapshot to see available elements
snapshot = await browser_get_page_snapshot()
print(snapshot)

# Interact with elements
await browser_type(ref="search-input", text="CAMEL AI framework")
await browser_enter()

# Take a screenshot
await browser_get_som_screenshot()

# Close browser
await browser_close()

Architecture

The server works by:

  1. Wrapping CAMEL's HybridBrowserToolkit with async support
  2. Exposing toolkit methods as MCP-compatible tools
  3. Managing a singleton browser instance per session
  4. Handling WebSocket communication for real-time browser control

Development

To set up a development environment:

pip install -e ".[dev]"

Run tests:

pytest

Troubleshooting

Server Not Appearing in Claude Desktop

  1. Check if the package is installed correctly:

    # Should output the path to the executable
    which hybrid-browser-mcp
  2. Test the server manually:

    hybrid-browser-mcp
    # Should start without errors
    # Press Ctrl+C to stop
  3. Check Claude Desktop logs for errors:

    # macOS
    tail -f ~/Library/Logs/Claude/mcp*.log
    
    # Windows
    Get-Content "$env:APPDATA\Claude\logs\mcp*.log" -Tail 20 -Wait
  4. Verify the configuration file:

    # macOS
    cat ~/Library/Application\ Support/Claude/claude_desktop_config.json
    
    # Windows
    type %APPDATA%\Claude\claude_desktop_config.json

Common Issues

Issue: "Command not found" error

Solution: Use the full Python path in your configuration:

{
  "mcpServers": {
    "hybrid-browser": {
      "command": "/usr/bin/python3",  // or your Python path
      "args": ["-m", "hybrid_browser_mcp.server"]
    }
  }
}

Issue: Browser doesn't open or shows errors

Solution: The HybridBrowserToolkit uses a TypeScript-based browser controller that runs on Node.js. It will automatically download and manage browser binaries. If you encounter issues:

  1. Ensure Node.js is installed on your system
  2. The TypeScript server will start automatically when needed
  3. Browser binaries will be downloaded on first use

Debug Mode

To see detailed logs, you can run the server with debug output:

python -m hybrid_browser_mcp.server 2> debug.log

Then check debug.log for any error messages.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages