CLI, Skill, and MCP bridge that lets Codex (and any other agent) drive a real Chrome browser — including your logged-in profile — without dragging in a heavyweight automation stack.
中文文档: README.zh-CN.md
Three integration layers over a single self-contained runtime:
- CLI:
ga-browser ...for humans, scripts, and CLI-first agents. - MCP server:
ga-browser mcpfor MCP-capable agents. - Codex skill:
skills/ga-browser-bridgefor Codex routing and usage guidance.
The design goal is simple: keep Codex or another agent as the planner, and use a token-efficient real-browser layer as the executor. The browser runtime (a vendored copy of lsdefine/GenericAgent's TMWebDriver + simphtml modules) ships inside this package — pip install is enough, no external checkout required.
# 1. Install
pip install codex-ga-browser-bridge
# 2. Load the bundled Chrome extension (chrome://extensions → Developer mode → Load unpacked)
ga-browser extension-path # prints the absolute path to paste into Chrome
# 3. Verify
ga-browser doctor --jsonOnce doctor reports chrome_extension_connected: true, you can drive the browser:
ga-browser tabs --json
ga-browser summarize --json --profile default
ga-browser click --selector "#login" --jsonIf you're hacking on the upstream GenericAgent repo and want the bridge to call into your working copy instead of the vendored modules, set:
export GAB_GENERICAGENT_HOME=/path/to/your/GenericAgentLeave it unset for normal use.
ga-browser doctor --jsonga-browser servega-browser serve --http --jsonga-browser tabs --jsonga-browser scan --json --text-only --max-chars 12000ga-browser summarize --json --profile formsga-browser exec --json --script "return location.href" --no-monitorga-browser exec --json --script "document.querySelector('button').click()" --wait-js "return document.body.innerText.includes('Done')"ga-browser cdp --json --method Runtime.evaluate --params '{"expression":"location.href"}'ga-browser batch --json --file commands.jsonga-browser cookies --jsonga-browser content-settings --type automaticDownloads --setting allow --jsonga-browser new-tab https://example.com --jsonga-browser switch-tab --tab-id 123 --jsonga-browser close-tab --tab-id 123 --jsonga-browser open https://example.com --jsonga-browser mcpga-browser skill-pathga-browser extension-path
Eight token-efficient primitives. Each takes either a CSS selector or a short ref returned by summarize:
ga-browser click --selector "#login" --jsonorga-browser click --ref e3 --json(optional--visualizeinjects a transient cursor + ripple overlay; optional--wait-jspolls a post-click condition without another agent turn)ga-browser fill --selector "#user" --value "alice" --jsonor--ref e4 --value "alice"ga-browser type --text "hello" --jsonga-browser press --key Enter --jsonga-browser get-text --ref e1 --jsonga-browser get-attr --ref e2 --attr href --jsonga-browser wait-for --ref e5 --timeout-ms 5000 --jsonga-browser drag --ref-from e1 --ref-to e2 --json
Each primitive returns path: "cdp" | "js" | "failed". CDP path uses Input.dispatch* so events carry isTrusted=true (clicks pass page hardening checks). JS fallback only kicks in when CDP errors. Drop to exec/cdp only for actions the primitives cannot express (shadow DOM, custom dropdowns, etc.).
Use --wait-js when an action should be followed by a page condition. This avoids a separate agent loop that repeatedly calls summarize/exec:
ga-browser exec --json \
--script "document.querySelector('button[type=submit]').click()" \
--wait-js "return document.body.innerText.includes('Saved')" \
--wait-timeout-ms 5000
ga-browser click --ref e3 --json \
--wait-js "return location.href.includes('/dashboard')"When --wait-js is present, CLI exec automatically uses no_monitor=true; the wait result is returned under wait.
browser_summarize emits short e1, e2, ... ids for each element. The bridge stores the ref → selector mapping per tab and resolves on the next primitive call. Savings vs. repeating CSS selectors: ~12% on clean pages, ~40% on deep-path SPAs where selectors otherwise dominate the JSON. If a ref is stale (page changed, refs got replaced), the primitive returns {kind: "ref_stale"} so the agent re-summarizes and retries. See docs/BENCHMARK.md §6 for numbers.
For one-shot CLI use, the bridge waits up to 7 seconds for the Chrome extension to reconnect. For repeated commands, keep a master running in another terminal:
ga-browser serveThen run tabs, summarize, exec, and other commands from another shell.
For fastest repeated CLI reuse, run the outer HTTP service and point later commands at it:
ga-browser serve --http --json
export GAB_SERVICE_URL=http://127.0.0.1:18767
ga-browser tabs --json
ga-browser summarize --json --profile formsYou can also pass --service-url http://127.0.0.1:18767 per command instead of
setting the environment variable.
Use explicit tab lifecycle commands for cross-tab workflows:
ga-browser new-tab https://example.com --json
ga-browser new-tab https://example.com --background --json
ga-browser switch-tab --tab-id 123 --json
ga-browser bring-to-front --tab-id 123 --json
ga-browser close-tab --tab-id 123 --jsonnew-tab returns the created tab_id; pass --background if you do not want
the bridge focus to switch to the new tab.
Cookie values are redacted by default:
ga-browser cookies --json
ga-browser cookies --url https://example.com --include-values --jsonUse content-settings for extension-backed Chrome settings, for example:
ga-browser content-settings --type automaticDownloads --setting allow --pattern '<all_urls>' --jsonUse commands in this order:
ga-browser doctor --json
ga-browser tabs --json
ga-browser summarize --json --profile default
ga-browser scan --json --text-only --max-chars 12000
ga-browser exec --json --no-monitor --script "return document.title"For page-changing operations, omit --no-monitor:
ga-browser exec --json --script "document.querySelector('button').click(); return true"For CDP:
ga-browser cdp --json --method Runtime.evaluate --params '{"expression":"location.href","returnByValue":true}'Use tabs and summarize before scan. Prefer JS and CDP batch calls over visual clicking. Use --no-monitor for pure reads.
The browser runtime that powers this bridge is vendored from lsdefine/GenericAgent (MIT, © 2025 lsdefine). Specifically:
src/codex_ga_browser_bridge/vendor/TMWebDriver.pysrc/codex_ga_browser_bridge/vendor/simphtml.pysrc/codex_ga_browser_bridge/assets/tmwd_cdp_bridge/(Chrome extension)
This project ships these files unmodified, with per-file attribution headers and a third-party notice in LICENSE. All credit for the underlying CDP-over-WebSocket transport and token-efficient HTML compression belongs to the GenericAgent project.
- Usage guide: docs/USAGE.md
- Agent/MCP integration: docs/AGENT_INTEGRATION.md
- Token benchmark: docs/BENCHMARK.md
- Roadmap: docs/ROADMAP.md
- Chinese usage guide: docs/USAGE.zh-CN.md
- Chinese agent integration: docs/AGENT_INTEGRATION.zh-CN.md