Skip to content

BrowserGenie/mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BrowserGenie MCP Server

An MCP (Model Context Protocol) server that gives AI models full control over a Chrome browser. It pairs with the BrowserGenie Extension to expose 50+ browser automation tools over stdio — navigation, clicking, typing, screenshots, touch gestures, macro recording, and complete DevTools access.

Two-repo setup: This is the server half. The Chrome extension lives in a separate repository. Both are required.

How It Works

AI Client (Claude, Cursor, etc.)
    │  stdio  (JSON-RPC / MCP)
    ▼
MCP Server  ◄── this repo
    │  WebSocket  ws://localhost:7890
    ▼
Chrome Extension
    ├── chrome.tabs         → Navigation, tab management
    ├── chrome.debugger     → DevTools Protocol (CDP)
    ├── chrome.scripting    → Content script injection
    ├── chrome.cookies      → Cookie management
    └── Content Scripts      → Real DOM event simulation

The MCP server bridges your AI client (over stdio) and the Chrome extension (over WebSocket). Every MCP tool call is forwarded to the extension, executed in the browser, and the result is returned to the AI client.

Requirements

Installation

Option A — npx (recommended)

No installation required. Run directly from npm:

npx browser-genie-mcp-server

Or install globally:

npm install -g browser-genie-mcp-server
browser-genie-mcp-server

Option B — Run from source

For development or to use the latest unreleased changes:

git clone https://github.com/BrowserGenie/mcp.git
cd mcp
npm install
npm run build
node dist/index.js

Configuration

Add the server to your AI client's MCP configuration.

Claude Desktop

File: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
File: %APPDATA%\Claude\claude_desktop_config.json (Windows)

{
  "mcpServers": {
    "browser-genie": {
      "command": "npx",
      "args": ["browser-genie-mcp-server"]
    }
  }
}

Claude Code

File: ~/.claude/settings.json or project-level .mcp.json:

{
  "mcpServers": {
    "browser-genie": {
      "command": "npx",
      "args": ["browser-genie-mcp-server"]
    }
  }
}

Cursor / other MCP clients

{
  "mcpServers": {
    "browser-genie": {
      "command": "npx",
      "args": ["browser-genie-mcp-server"]
    }
  }
}

If you installed the package globally with npm install -g browser-genie-mcp-server, you can use "command": "browser-genie-mcp-server" with no args instead.

After saving the config, restart your AI client.

Environment Variables

Variable Default Description
WEBSOCKET_PORT 7890 Port the WebSocket server listens on. The extension must use the same port.

Pass via the MCP config env block:

{
  "mcpServers": {
    "browser-genie": {
      "command": "npx",
      "args": ["browser-genie-mcp-server"],
      "env": { "WEBSOCKET_PORT": "8080" }
    }
  }
}

If you change the port, also update WEBSOCKET_URL in constants.ts inside the extension repo.

Verifying the Connection

Once the server is running and the extension is loaded in Chrome, click the extension icon. The popup should show a green Connected indicator. If it shows Disconnected, ensure the MCP server process is running.

MCP Tools Reference

Every tool accepts an optional apiKey (string) when authentication is enabled in the extension popup, and an optional tabId (number) to target a specific tab (defaults to the active tab).

Tip: If you see "No active tab found" or "Cannot access a chrome:// URL", use list_tabs to find a valid tab ID, or navigate to a regular web page first.

Navigation

Tool Description Key Parameters
navigate_to_url Navigate to a URL url (required)
navigate_back Go back in history
navigate_forward Go forward in history
navigate_reload Reload the page ignoreCache (bool)

Tab Management

Tool Description Key Parameters
list_tabs List all open tabs
select_tab Focus a tab tabId (required)
new_tab Open a new tab url (optional)
close_tab Close a tab tabId (required)
get_tab_state Capture URL, title, and DOM hash for state comparison
assert_tabs_match Verify two tabs have identical state tabIdA, tabIdB
test_storage_sync Test cross-tab localStorage sync tabIdA, tabIdB, key, value

Keyboard

Tool Description Key Parameters
press_key Press a key with optional modifiers key, modifiers[]
type_text Type text character by character text, delay (ms)

Mouse & Interaction

Tool Description Key Parameters
click_element Click via coordinates, CSS, or XPath with human-like Bézier curve movement and randomized delays target.type, target.value, button, doubleClick
input_and_type Click a field, optionally clear it, then type text with per-character jitter selector, text, clearFirst, submit
drag_and_drop Drag from one point to another from, to
hover_element Hover to trigger CSS states / tooltips with randomized dwell time target

Click behavior: doubleClick: true fires two rapid clicks at the same position. The element's own handlers determine focus/select behavior — the tool does not automatically select text or set focus beyond what the browser does natively.

Touch Gestures

Tool Description Key Parameters
swipe Touch swipe from point A to B with configurable duration from, to, duration
long_press Long-press on an element or coordinates target, duration
pinch Pinch zoom with two-finger convergence/divergence center, startRadius, endRadius
double_tap Double-tap for mobile interactions (zoom, edit) target, interval

Ensure the viewport is set to mobile with touch: true via resize_viewport or emulate_device before using touch gestures.

Screenshots

Tool Description Key Parameters
screenshot_viewport Capture visible viewport format, quality
screenshot_full_page Capture full scrollable page format, quality

Both return an image content block.

DevTools — Sources

Tool Description Key Parameters
read_page_html Full outerHTML of the page
read_stylesheets CSS stylesheet sources url (filter)
read_scripts JavaScript sources url (filter)
read_page_resources List all resources with URLs & sizes type filter
find_in_source Search regex pattern across HTML and all loaded scripts pattern, contextLines

DevTools — Modify

Tool Description Key Parameters
modify_html Live DOM mutation selector, action, value, attributeName
modify_css Set inline styles selector, styles (object)

DevTools — Network

Tool Description Key Parameters
get_network_logs Get collected request/response logs filter.urlPattern, filter.method, filter.statusCode
get_network_request_detail Full details of one request requestId, includeBody
clear_network_logs Clear collected logs
get_network_errors Get only failed/errored requests (4xx, 5xx, failed) clear

Network logs are collected from when the debugger attaches. Call any DevTools tool first to trigger attachment before the traffic you want to capture.

DevTools — Storage

Tool Description
get_cookies / set_cookie / delete_cookie Cookie CRUD
get_local_storage / set_local_storage / remove_local_storage localStorage
get_session_storage / set_session_storage / remove_session_storage sessionStorage

DevTools — Console

Tool Description Key Parameters
get_console_logs Retrieve console messages by level level, clear
execute_javascript Run JS in the page and return result expression

Accessibility & Auditing

Tool Description Key Parameters
browser_snapshot Text-based accessibility tree snapshot
get_accessibility_tree Raw accessibility tree as JSON selector (optional filter)
run_accessibility_audit Run axe-core WCAG audit selector, tags
check_color_contrast Check text contrast ratios selector
get_tab_order Static list of focusable elements in tab order with unique selectors
record_focus_path Interactive — simulate Tab presses and record where focus lands, flagging invisible targets steps
get_performance_metrics Single snapshot of navigation timing, LCP, CLS, FID, memory
record_performance_timeline Start/stop/get timeline recording of memory, LCP, CLS over time action, interval
check_font_loading Verify web font loading status
audit_broken_resources Find broken images, stylesheets, fonts
check_security_headers Inspect CSP, HSTS, X-Frame-Options, etc. url
detect_cookie_banners Detect cookie consent banners and CMP patterns

When to use get_tab_order vs record_focus_path: Use get_tab_order for a one-time snapshot of all focusable elements and their order. Use record_focus_path when you want to verify the actual focus behavior during keyboard navigation — it presses Tab repeatedly and records where focus lands, catching invisible or hidden focus traps.

Element Inspection

Tool Description Key Parameters
find_element Find by text, role, aria-label, CSS, or XPath text, role, css, xpath, nth
get_element_state Get exists, visible, enabled, focused, checked, etc. selector, selectorType
query_shadow_dom Query inside a single shadow root hostSelector, innerSelector
deep_query_shadow_dom Query through nested shadow roots by host path hostPath[], innerSelector
get_shadow_dom_tree Return full shadow DOM tree as JSON hostSelector, maxDepth
get_computed_styles Get computed CSS styles for an element selector, properties, pseudoElement

QA & Assertions

Tool Description Key Parameters
assert_element Assert conditions on an element (exists, visible, text, etc.) assertion, selector
assert_no_console_errors Assert zero console errors/warnings level, clear
assert_no_network_errors Assert zero failed network requests clear
assert_css_property Assert computed style value selector, property, expected
assert_network_request_made Assert a matching request was made urlPattern, method, minCount
assert_page_load_time Assert navigation timing is within threshold threshold
check_form_validity Check HTML5 form validation state selector, checkAll
tab_to_next Simulate Tab key and track focus movement direction, shift
set_input_files Set files on a file input selector, files
emulate_network_conditions Simulate slow/offline network offline, latency, throughput
intercept_requests Block, modify, or allow network requests action, urlPattern
snapshot_page_state Capture full page state (HTML, storage, cookies)
restore_page_state Restore from a snapshot snapshot
wait_for_condition Poll a JS expression until true or timeout expression, timeout, interval
stress_test_refresh Refresh page N times and run assertion each time iterations, assertionScript, bypassCache
get_all_issues Unified diagnostic — console + network + resource errors in one call includeConsole, includeNetwork, includeResources

Interaction Inspection

Tool Description Key Parameters
hover_and_inspect Hover and capture DOM/style changes target, captureChanges
force_pseudo_state Force hover/focus/active and read computed styles selector, pseudoState
get_tooltip_text Extract tooltip text from title, aria-describedby, aria-labelledby, or CSS target, waitForTooltip

Macro Recording

Tool Description Key Parameters
start_recording_macro Start recording clicks, typing, and changes
stop_recording_macro Stop recording and return events JSON
replay_macro Replay recorded events with speed multiplier events, speed

Visual Regression

Tool Description Key Parameters
compare_screenshots Pixel-by-pixel diff of two base64 PNGs using pixelmatch beforeImage, afterImage, threshold

Project Structure

browser-genie-mcp-server/
├── src/
│   ├── index.ts              # Entry point — stdio MCP transport
│   ├── server.ts             # McpServer setup & tool registration
│   ├── websocket-bridge.ts   # WebSocket server + request/response correlation
│   ├── auth.ts               # API key validation helper
│   ├── types.ts              # Shared types & constants (port, message shapes)
│   └── tools/                # One file per tool category
│       ├── navigation.ts
│       ├── tab-management.ts
│       ├── click.ts
│       ├── input.ts
│       ├── keyboard.ts
│       ├── hover.ts
│       ├── drag-drop.ts
│       ├── screenshot.ts
│       ├── gestures.ts
│       ├── macros.ts
│       ├── visual-regression.ts
│       ├── devtools-sources.ts
│       ├── devtools-modify.ts
│       ├── devtools-network.ts
│       ├── devtools-storage.ts
│       ├── devtools-console.ts
│       ├── accessibility.ts
│       ├── emulation.ts
│       ├── elements.ts
│       ├── audit.ts
│       ├── interaction.ts
│       ├── monitoring.ts
│       └── qa.ts
├── dist/                     # Compiled output (after npm run build)
├── package.json
├── tsconfig.json
├── .gitignore
└── LICENSE

Development

# Watch mode — recompiles on every file save
npm run dev

# One-shot build
npm run build

# Run the compiled server directly
npm start

Important Notes

Debugger Banner

When any DevTools feature is first used on a tab, Chrome shows an "Extension is debugging this browser" banner. This is a Chrome security requirement and cannot be suppressed. The debugger attaches lazily — only when a DevTools tool is first called for that tab.

Service Worker Lifecycle

Chrome MV3 service workers terminate after ~30 seconds of inactivity. The extension uses chrome.alarms to keep the WebSocket alive, with automatic exponential-backoff reconnection (1 s → 2 s → 4 s → … → 30 s max).

Human-Like Interactions

All mouse and keyboard interactions include randomized delays and natural movement patterns to avoid bot detection. Click uses Bézier curves, typing has per-character jitter, and hover has randomized dwell time.

Contributing

Contributions are welcome! Please open an issue first to discuss what you'd like to change, then submit a pull request.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feat/my-feature
  3. Commit your changes: git commit -m 'feat: add my feature'
  4. Push and open a Pull Request

Related

License

Apache License 2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors