An MCP (Model Context Protocol) server that gives AI models full control over a Chrome browser. It pairs with the BrowserGenie Extension to expose 50+ browser automation tools over stdio — navigation, clicking, typing, screenshots, touch gestures, macro recording, and complete DevTools access.
Two-repo setup: This is the server half. The Chrome extension lives in a separate repository. Both are required.
AI Client (Claude, Cursor, etc.)
│ stdio (JSON-RPC / MCP)
▼
MCP Server ◄── this repo
│ WebSocket ws://localhost:7890
▼
Chrome Extension
├── chrome.tabs → Navigation, tab management
├── chrome.debugger → DevTools Protocol (CDP)
├── chrome.scripting → Content script injection
├── chrome.cookies → Cookie management
└── Content Scripts → Real DOM event simulation
The MCP server bridges your AI client (over stdio) and the Chrome extension (over WebSocket). Every MCP tool call is forwarded to the extension, executed in the browser, and the result is returned to the AI client.
- Node.js 18+
- npm
- The companion BrowserGenie Extension installed in Chrome
No installation required. Run directly from npm:
npx browser-genie-mcp-serverOr install globally:
npm install -g browser-genie-mcp-server
browser-genie-mcp-serverFor development or to use the latest unreleased changes:
git clone https://github.com/BrowserGenie/mcp.git
cd mcp
npm install
npm run build
node dist/index.jsAdd the server to your AI client's MCP configuration.
File: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
File: %APPDATA%\Claude\claude_desktop_config.json (Windows)
{
"mcpServers": {
"browser-genie": {
"command": "npx",
"args": ["browser-genie-mcp-server"]
}
}
}File: ~/.claude/settings.json or project-level .mcp.json:
{
"mcpServers": {
"browser-genie": {
"command": "npx",
"args": ["browser-genie-mcp-server"]
}
}
}{
"mcpServers": {
"browser-genie": {
"command": "npx",
"args": ["browser-genie-mcp-server"]
}
}
}If you installed the package globally with
npm install -g browser-genie-mcp-server, you can use"command": "browser-genie-mcp-server"with noargsinstead.
After saving the config, restart your AI client.
| Variable | Default | Description |
|---|---|---|
WEBSOCKET_PORT |
7890 |
Port the WebSocket server listens on. The extension must use the same port. |
Pass via the MCP config env block:
{
"mcpServers": {
"browser-genie": {
"command": "npx",
"args": ["browser-genie-mcp-server"],
"env": { "WEBSOCKET_PORT": "8080" }
}
}
}If you change the port, also update
WEBSOCKET_URLinconstants.tsinside the extension repo.
Once the server is running and the extension is loaded in Chrome, click the extension icon. The popup should show a green Connected indicator. If it shows Disconnected, ensure the MCP server process is running.
Every tool accepts an optional apiKey (string) when authentication is enabled in the extension popup, and an optional tabId (number) to target a specific tab (defaults to the active tab).
Tip: If you see "No active tab found" or "Cannot access a chrome:// URL", use
list_tabsto find a valid tab ID, or navigate to a regular web page first.
| Tool | Description | Key Parameters |
|---|---|---|
navigate_to_url |
Navigate to a URL | url (required) |
navigate_back |
Go back in history | — |
navigate_forward |
Go forward in history | — |
navigate_reload |
Reload the page | ignoreCache (bool) |
| Tool | Description | Key Parameters |
|---|---|---|
list_tabs |
List all open tabs | — |
select_tab |
Focus a tab | tabId (required) |
new_tab |
Open a new tab | url (optional) |
close_tab |
Close a tab | tabId (required) |
get_tab_state |
Capture URL, title, and DOM hash for state comparison | — |
assert_tabs_match |
Verify two tabs have identical state | tabIdA, tabIdB |
test_storage_sync |
Test cross-tab localStorage sync | tabIdA, tabIdB, key, value |
| Tool | Description | Key Parameters |
|---|---|---|
press_key |
Press a key with optional modifiers | key, modifiers[] |
type_text |
Type text character by character | text, delay (ms) |
| Tool | Description | Key Parameters |
|---|---|---|
click_element |
Click via coordinates, CSS, or XPath with human-like Bézier curve movement and randomized delays | target.type, target.value, button, doubleClick |
input_and_type |
Click a field, optionally clear it, then type text with per-character jitter | selector, text, clearFirst, submit |
drag_and_drop |
Drag from one point to another | from, to |
hover_element |
Hover to trigger CSS states / tooltips with randomized dwell time | target |
Click behavior:
doubleClick: truefires two rapid clicks at the same position. The element's own handlers determine focus/select behavior — the tool does not automatically select text or set focus beyond what the browser does natively.
| Tool | Description | Key Parameters |
|---|---|---|
swipe |
Touch swipe from point A to B with configurable duration | from, to, duration |
long_press |
Long-press on an element or coordinates | target, duration |
pinch |
Pinch zoom with two-finger convergence/divergence | center, startRadius, endRadius |
double_tap |
Double-tap for mobile interactions (zoom, edit) | target, interval |
Ensure the viewport is set to mobile with
touch: trueviaresize_viewportoremulate_devicebefore using touch gestures.
| Tool | Description | Key Parameters |
|---|---|---|
screenshot_viewport |
Capture visible viewport | format, quality |
screenshot_full_page |
Capture full scrollable page | format, quality |
Both return an image content block.
| Tool | Description | Key Parameters |
|---|---|---|
read_page_html |
Full outerHTML of the page | — |
read_stylesheets |
CSS stylesheet sources | url (filter) |
read_scripts |
JavaScript sources | url (filter) |
read_page_resources |
List all resources with URLs & sizes | type filter |
find_in_source |
Search regex pattern across HTML and all loaded scripts | pattern, contextLines |
| Tool | Description | Key Parameters |
|---|---|---|
modify_html |
Live DOM mutation | selector, action, value, attributeName |
modify_css |
Set inline styles | selector, styles (object) |
| Tool | Description | Key Parameters |
|---|---|---|
get_network_logs |
Get collected request/response logs | filter.urlPattern, filter.method, filter.statusCode |
get_network_request_detail |
Full details of one request | requestId, includeBody |
clear_network_logs |
Clear collected logs | — |
get_network_errors |
Get only failed/errored requests (4xx, 5xx, failed) | clear |
Network logs are collected from when the debugger attaches. Call any DevTools tool first to trigger attachment before the traffic you want to capture.
| Tool | Description |
|---|---|
get_cookies / set_cookie / delete_cookie |
Cookie CRUD |
get_local_storage / set_local_storage / remove_local_storage |
localStorage |
get_session_storage / set_session_storage / remove_session_storage |
sessionStorage |
| Tool | Description | Key Parameters |
|---|---|---|
get_console_logs |
Retrieve console messages by level | level, clear |
execute_javascript |
Run JS in the page and return result | expression |
| Tool | Description | Key Parameters |
|---|---|---|
browser_snapshot |
Text-based accessibility tree snapshot | — |
get_accessibility_tree |
Raw accessibility tree as JSON | selector (optional filter) |
run_accessibility_audit |
Run axe-core WCAG audit | selector, tags |
check_color_contrast |
Check text contrast ratios | selector |
get_tab_order |
Static list of focusable elements in tab order with unique selectors | — |
record_focus_path |
Interactive — simulate Tab presses and record where focus lands, flagging invisible targets | steps |
get_performance_metrics |
Single snapshot of navigation timing, LCP, CLS, FID, memory | — |
record_performance_timeline |
Start/stop/get timeline recording of memory, LCP, CLS over time | action, interval |
check_font_loading |
Verify web font loading status | — |
audit_broken_resources |
Find broken images, stylesheets, fonts | — |
check_security_headers |
Inspect CSP, HSTS, X-Frame-Options, etc. | url |
detect_cookie_banners |
Detect cookie consent banners and CMP patterns | — |
When to use
get_tab_ordervsrecord_focus_path: Useget_tab_orderfor a one-time snapshot of all focusable elements and their order. Userecord_focus_pathwhen you want to verify the actual focus behavior during keyboard navigation — it presses Tab repeatedly and records where focus lands, catching invisible or hidden focus traps.
| Tool | Description | Key Parameters |
|---|---|---|
find_element |
Find by text, role, aria-label, CSS, or XPath | text, role, css, xpath, nth |
get_element_state |
Get exists, visible, enabled, focused, checked, etc. | selector, selectorType |
query_shadow_dom |
Query inside a single shadow root | hostSelector, innerSelector |
deep_query_shadow_dom |
Query through nested shadow roots by host path | hostPath[], innerSelector |
get_shadow_dom_tree |
Return full shadow DOM tree as JSON | hostSelector, maxDepth |
get_computed_styles |
Get computed CSS styles for an element | selector, properties, pseudoElement |
| Tool | Description | Key Parameters |
|---|---|---|
assert_element |
Assert conditions on an element (exists, visible, text, etc.) | assertion, selector |
assert_no_console_errors |
Assert zero console errors/warnings | level, clear |
assert_no_network_errors |
Assert zero failed network requests | clear |
assert_css_property |
Assert computed style value | selector, property, expected |
assert_network_request_made |
Assert a matching request was made | urlPattern, method, minCount |
assert_page_load_time |
Assert navigation timing is within threshold | threshold |
check_form_validity |
Check HTML5 form validation state | selector, checkAll |
tab_to_next |
Simulate Tab key and track focus movement | direction, shift |
set_input_files |
Set files on a file input | selector, files |
emulate_network_conditions |
Simulate slow/offline network | offline, latency, throughput |
intercept_requests |
Block, modify, or allow network requests | action, urlPattern |
snapshot_page_state |
Capture full page state (HTML, storage, cookies) | — |
restore_page_state |
Restore from a snapshot | snapshot |
wait_for_condition |
Poll a JS expression until true or timeout | expression, timeout, interval |
stress_test_refresh |
Refresh page N times and run assertion each time | iterations, assertionScript, bypassCache |
get_all_issues |
Unified diagnostic — console + network + resource errors in one call | includeConsole, includeNetwork, includeResources |
| Tool | Description | Key Parameters |
|---|---|---|
hover_and_inspect |
Hover and capture DOM/style changes | target, captureChanges |
force_pseudo_state |
Force hover/focus/active and read computed styles | selector, pseudoState |
get_tooltip_text |
Extract tooltip text from title, aria-describedby, aria-labelledby, or CSS | target, waitForTooltip |
| Tool | Description | Key Parameters |
|---|---|---|
start_recording_macro |
Start recording clicks, typing, and changes | — |
stop_recording_macro |
Stop recording and return events JSON | — |
replay_macro |
Replay recorded events with speed multiplier | events, speed |
| Tool | Description | Key Parameters |
|---|---|---|
compare_screenshots |
Pixel-by-pixel diff of two base64 PNGs using pixelmatch | beforeImage, afterImage, threshold |
browser-genie-mcp-server/
├── src/
│ ├── index.ts # Entry point — stdio MCP transport
│ ├── server.ts # McpServer setup & tool registration
│ ├── websocket-bridge.ts # WebSocket server + request/response correlation
│ ├── auth.ts # API key validation helper
│ ├── types.ts # Shared types & constants (port, message shapes)
│ └── tools/ # One file per tool category
│ ├── navigation.ts
│ ├── tab-management.ts
│ ├── click.ts
│ ├── input.ts
│ ├── keyboard.ts
│ ├── hover.ts
│ ├── drag-drop.ts
│ ├── screenshot.ts
│ ├── gestures.ts
│ ├── macros.ts
│ ├── visual-regression.ts
│ ├── devtools-sources.ts
│ ├── devtools-modify.ts
│ ├── devtools-network.ts
│ ├── devtools-storage.ts
│ ├── devtools-console.ts
│ ├── accessibility.ts
│ ├── emulation.ts
│ ├── elements.ts
│ ├── audit.ts
│ ├── interaction.ts
│ ├── monitoring.ts
│ └── qa.ts
├── dist/ # Compiled output (after npm run build)
├── package.json
├── tsconfig.json
├── .gitignore
└── LICENSE
# Watch mode — recompiles on every file save
npm run dev
# One-shot build
npm run build
# Run the compiled server directly
npm startWhen any DevTools feature is first used on a tab, Chrome shows an "Extension is debugging this browser" banner. This is a Chrome security requirement and cannot be suppressed. The debugger attaches lazily — only when a DevTools tool is first called for that tab.
Chrome MV3 service workers terminate after ~30 seconds of inactivity. The extension uses chrome.alarms to keep the WebSocket alive, with automatic exponential-backoff reconnection (1 s → 2 s → 4 s → … → 30 s max).
All mouse and keyboard interactions include randomized delays and natural movement patterns to avoid bot detection. Click uses Bézier curves, typing has per-character jitter, and hover has randomized dwell time.
Contributions are welcome! Please open an issue first to discuss what you'd like to change, then submit a pull request.
- Fork the repository
- Create a feature branch:
git checkout -b feat/my-feature - Commit your changes:
git commit -m 'feat: add my feature' - Push and open a Pull Request
- BrowserGenie Extension — the Chrome extension that pairs with this server
- Model Context Protocol — the open protocol this server implements