Skip to content

500tpig/ga-browser-bridge

Repository files navigation

codex-ga-browser-bridge

CLI, Skill, and MCP bridge that lets Codex (and any other agent) drive a real Chrome browser — including your logged-in profile — without dragging in a heavyweight automation stack.

中文文档: README.zh-CN.md

What this is

Three integration layers over a single self-contained runtime:

  • CLI: ga-browser ... for humans, scripts, and CLI-first agents.
  • MCP server: ga-browser mcp for MCP-capable agents.
  • Codex skill: skills/ga-browser-bridge for Codex routing and usage guidance.

The design goal is simple: keep Codex or another agent as the planner, and use a token-efficient real-browser layer as the executor. The browser runtime (a vendored copy of lsdefine/GenericAgent's TMWebDriver + simphtml modules) ships inside this package — pip install is enough, no external checkout required.

Quick start (3 steps)

# 1. Install
pip install codex-ga-browser-bridge

# 2. Load the bundled Chrome extension (chrome://extensions → Developer mode → Load unpacked)
ga-browser extension-path        # prints the absolute path to paste into Chrome

# 3. Verify
ga-browser doctor --json

Once doctor reports chrome_extension_connected: true, you can drive the browser:

ga-browser tabs --json
ga-browser summarize --json --profile default
ga-browser click --selector "#login" --json

Optional: point at an external GenericAgent checkout

If you're hacking on the upstream GenericAgent repo and want the bridge to call into your working copy instead of the vendored modules, set:

export GAB_GENERICAGENT_HOME=/path/to/your/GenericAgent

Leave it unset for normal use.

Core commands

  • ga-browser doctor --json
  • ga-browser serve
  • ga-browser serve --http --json
  • ga-browser tabs --json
  • ga-browser scan --json --text-only --max-chars 12000
  • ga-browser summarize --json --profile forms
  • ga-browser exec --json --script "return location.href" --no-monitor
  • ga-browser exec --json --script "document.querySelector('button').click()" --wait-js "return document.body.innerText.includes('Done')"
  • ga-browser cdp --json --method Runtime.evaluate --params '{"expression":"location.href"}'
  • ga-browser batch --json --file commands.json
  • ga-browser cookies --json
  • ga-browser content-settings --type automaticDownloads --setting allow --json
  • ga-browser new-tab https://example.com --json
  • ga-browser switch-tab --tab-id 123 --json
  • ga-browser close-tab --tab-id 123 --json
  • ga-browser open https://example.com --json
  • ga-browser mcp
  • ga-browser skill-path
  • ga-browser extension-path

High-level primitives (prefer over exec)

Eight token-efficient primitives. Each takes either a CSS selector or a short ref returned by summarize:

  • ga-browser click --selector "#login" --json or ga-browser click --ref e3 --json (optional --visualize injects a transient cursor + ripple overlay; optional --wait-js polls a post-click condition without another agent turn)
  • ga-browser fill --selector "#user" --value "alice" --json or --ref e4 --value "alice"
  • ga-browser type --text "hello" --json
  • ga-browser press --key Enter --json
  • ga-browser get-text --ref e1 --json
  • ga-browser get-attr --ref e2 --attr href --json
  • ga-browser wait-for --ref e5 --timeout-ms 5000 --json
  • ga-browser drag --ref-from e1 --ref-to e2 --json

Each primitive returns path: "cdp" | "js" | "failed". CDP path uses Input.dispatch* so events carry isTrusted=true (clicks pass page hardening checks). JS fallback only kicks in when CDP errors. Drop to exec/cdp only for actions the primitives cannot express (shadow DOM, custom dropdowns, etc.).

Action + wait in one tool call

Use --wait-js when an action should be followed by a page condition. This avoids a separate agent loop that repeatedly calls summarize/exec:

ga-browser exec --json \
  --script "document.querySelector('button[type=submit]').click()" \
  --wait-js "return document.body.innerText.includes('Saved')" \
  --wait-timeout-ms 5000

ga-browser click --ref e3 --json \
  --wait-js "return location.href.includes('/dashboard')"

When --wait-js is present, CLI exec automatically uses no_monitor=true; the wait result is returned under wait.

Ref system (v0.3.0)

browser_summarize emits short e1, e2, ... ids for each element. The bridge stores the ref → selector mapping per tab and resolves on the next primitive call. Savings vs. repeating CSS selectors: ~12% on clean pages, ~40% on deep-path SPAs where selectors otherwise dominate the JSON. If a ref is stale (page changed, refs got replaced), the primitive returns {kind: "ref_stale"} so the agent re-summarizes and retries. See docs/BENCHMARK.md §6 for numbers.

For one-shot CLI use, the bridge waits up to 7 seconds for the Chrome extension to reconnect. For repeated commands, keep a master running in another terminal:

ga-browser serve

Then run tabs, summarize, exec, and other commands from another shell.

For fastest repeated CLI reuse, run the outer HTTP service and point later commands at it:

ga-browser serve --http --json
export GAB_SERVICE_URL=http://127.0.0.1:18767
ga-browser tabs --json
ga-browser summarize --json --profile forms

You can also pass --service-url http://127.0.0.1:18767 per command instead of setting the environment variable.

Tab lifecycle

Use explicit tab lifecycle commands for cross-tab workflows:

ga-browser new-tab https://example.com --json
ga-browser new-tab https://example.com --background --json
ga-browser switch-tab --tab-id 123 --json
ga-browser bring-to-front --tab-id 123 --json
ga-browser close-tab --tab-id 123 --json

new-tab returns the created tab_id; pass --background if you do not want the bridge focus to switch to the new tab.

Cookies and content settings

Cookie values are redacted by default:

ga-browser cookies --json
ga-browser cookies --url https://example.com --include-values --json

Use content-settings for extension-backed Chrome settings, for example:

ga-browser content-settings --type automaticDownloads --setting allow --pattern '<all_urls>' --json

Usage pattern

Use commands in this order:

ga-browser doctor --json
ga-browser tabs --json
ga-browser summarize --json --profile default
ga-browser scan --json --text-only --max-chars 12000
ga-browser exec --json --no-monitor --script "return document.title"

For page-changing operations, omit --no-monitor:

ga-browser exec --json --script "document.querySelector('button').click(); return true"

For CDP:

ga-browser cdp --json --method Runtime.evaluate --params '{"expression":"location.href","returnByValue":true}'

Token policy

Use tabs and summarize before scan. Prefer JS and CDP batch calls over visual clicking. Use --no-monitor for pure reads.

Acknowledgements

The browser runtime that powers this bridge is vendored from lsdefine/GenericAgent (MIT, © 2025 lsdefine). Specifically:

  • src/codex_ga_browser_bridge/vendor/TMWebDriver.py
  • src/codex_ga_browser_bridge/vendor/simphtml.py
  • src/codex_ga_browser_bridge/assets/tmwd_cdp_bridge/ (Chrome extension)

This project ships these files unmodified, with per-file attribution headers and a third-party notice in LICENSE. All credit for the underlying CDP-over-WebSocket transport and token-efficient HTML compression belongs to the GenericAgent project.

Documentation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors