Completely undetectable. Your real Chrome as you. Playwright and PinchTab are old news. agent-browser replaces them. No screenshots. Reads page structure. 93% less context. Near-instant. 100s of parallel sessions working together agentically. "Find 5-star Amazon sellers and order from the best." "Message 20 Alibaba suppliers." Whatever you do in Chrome, it handles. Ships with a Claude Code skill that researches the UI, plans the smartest path, and verifies its own work. Open source.
- Real Chrome, real cookies — log in once, stay logged in. Your AI uses your actual browser session with persistent cookies
- Invisible to bot detection — no
navigator.webdriverflag, no fingerprint mismatches. Sites see a real human, not a bot - 93% fewer tokens than Playwright MCP — ~200 tokens per page vs ~13,700. Your AI does more with less
- 5x faster — direct WebSocket to Chrome, no middleware relay. Every call saves seconds
- Independent sessions — run multiple AI agents simultaneously on the same machine. Zero conflicts, zero shared state
- Headed, not headless — you see everything the AI does in real time. Watch it work, jump in anytime, take over when you want
- Claude Code skill included — drop one file and Claude knows how to drive your browser. No setup, no configuration
agent-browser is a CLI that gives AI agents direct control of your real, visible Chrome browser — without Playwright, without Puppeteer, without downloading bundled browser binaries, and without burning through your token budget.
It speaks raw Chrome DevTools Protocol (CDP) over a single WebSocket. That's it. No middleware. No relay servers. No 50MB dependency you never asked for.
Built by Caleb Dane. Originally forked from vercel-labs/agent-browser — CDP transport layer rewritten from scratch.
If you're using Playwright or Puppeteer for AI browser automation, here's what's actually happening under the hood:
Your AI → Playwright/Puppeteer → Node.js WebSocket relay → Chrome → back through all of that
That middle layer — the Node.js relay — adds an extra network hop on every single browser call. Click a button? Extra hop. Take a screenshot? Extra hop. Read the page? Extra hop. Multiply that by hundreds of operations per session and you get real, measurable slowdowns.
And then there's the size. Playwright alone adds ~50MB to your node_modules. It downloads its own browser binaries. It bundles Firefox and WebKit engines you'll never use for AI automation.
The industry is moving away from this. browser-use reported 5x faster element extraction after dropping Playwright for raw CDP. Stagehand (Browserbase) is making the same move. Even Microsoft built Playwright CLI to work around their own tool's token bloat.
Newer tools like PinchTab still add an HTTP relay layer between your AI and Chrome. agent-browser skips that entirely.
Your AI → agent-browser → Chrome
That's the whole stack. One WebSocket connection. Zero relay layers. Your commands go straight to Chrome and the response comes straight back.
| agent-browser | Playwright MCP | Playwright CLI | |
|---|---|---|---|
| Tokens per page | ~200-400 | ~13,700 per step | ~2,700 per step |
| 10-step workflow | ~7,000 tokens | ~114,000 tokens | ~27,000 tokens |
| Install size | Lightweight (uses your Chrome) | ~50MB + browser binaries | ~50MB + browser binaries |
| Network hops per call | 1 (direct to Chrome) | 2 (relay + Chrome) | 2 (relay + Chrome) |
| Extra browser download? | No — uses your existing Chrome | Yes — downloads Chromium | Yes — downloads Chromium |
Under the same token budget, agent-browser runs 5.7x more automation cycles than Playwright MCP. That's not a minor optimization — it's the difference between your AI agent finishing the job or running out of context halfway through.
This isn't headless automation running invisibly in the background. agent-browser is headed — it controls your real, visible Chrome window. You can watch everything the AI does in real time.
Think of it like handing someone a remote control to your computer:
- Watch the AI work — see it click buttons, fill forms, navigate pages, all on your actual screen
- Jump in anytime — navigate to a page manually, then tell the AI "now fill out this form" or "click that button"
- Hand control back and forth — you browse to the right page, the AI handles the tedious parts, you verify the result
- Pair browse — stream the viewport via WebSocket so you can watch from another machine or share with a teammate
- Debug in real time — when something goes wrong, you see exactly what the AI sees. No guessing what happened in a headless void
Other automation tools run in a hidden browser you can't see or interact with. agent-browser runs in your browser — the one you're already looking at.
If you're new to browser automation, here's the simple version:
agent-browser lets an AI control your Chrome browser the same way you do — it can open websites, click buttons, fill out forms, read what's on the page, and take screenshots. You see everything it does because it's working in your real, visible browser — not some hidden process running in the background.
Here's everything it automates:
- Open any website — navigate to URLs, go back, go forward, refresh
- Read the page — get a structured snapshot of everything on the page (buttons, links, text fields, headings) that an AI can understand in ~200 tokens instead of thousands
- Click things — buttons, links, checkboxes, dropdowns — by simple reference like
@e1instead of fragile CSS selectors - Fill out forms — type into text fields, select options, check boxes
- Take screenshots — capture what the page looks like for visual verification
- Run JavaScript — execute any code in the browser for advanced automation
- Track errors — catch console errors and broken pages automatically
- Manage tabs — open new tabs, switch between them, close them
- Intercept network requests — mock API responses, block tracking scripts, test error states
- Stream the viewport — watch what the browser is doing in real time via screencast
All of this through one simple CLI: agent-browser <command>.
This is a big deal if you're running multiple AI agents at the same time.
Every session is completely independent. Each AI session (like each Claude Code window) gets its own daemon process through an environment variable:
AGENT_BROWSER_SESSION="claude-$$" # Each session gets a unique IDWhat this means in practice:
- Session A can be testing your login page while Session B tests the checkout flow — simultaneously, on the same machine
- No shared state between sessions — different cookies, different tabs, different browsing history
- No race conditions — one agent clicking a button won't interfere with another agent reading a page
- Sessions clean up after themselves — close one and the others keep running
If you've ever had two Playwright scripts fight over the same browser instance, you know why this matters.
This is the part most automation tools get wrong.
Playwright and Puppeteer download their own Chromium binary — a stripped-down, identifiable browser that websites can detect instantly. They set navigator.webdriver = true. They leave fingerprint mismatches in canvas rendering, WebGL, and device memory. Even with "stealth" plugins, they fail advanced detection systems like Cloudflare and Pixelscan.
agent-browser doesn't have this problem. It connects to your real Chrome — the same browser you use every day, with your real cookies, your real extensions, your real fingerprint. Websites can't tell the difference between you and your AI agent because there is no difference. It's the same browser.
Log in once, stay logged in forever. Sign into Amazon, Gmail, your bank — whatever. Those cookies persist in your Chrome profile. Next time your AI agent opens that site, it's already authenticated. No re-entering passwords. No 2FA loops. No expired sessions.
Shop on Amazon. Your AI can browse products, compare prices, add items to your cart, and go through checkout — on your real account, with your saved payment methods, at your saved addresses. The same workflow that gets blocked instantly with Playwright just works here because Amazon sees a real Chrome browser with a real browsing history.
Manage any authenticated account. Banking dashboards, social media, email, admin panels, SaaS tools — if you can access it in Chrome, your AI agent can too. Same cookies. Same session. No bot flags.
Get past Cloudflare, CAPTCHAs, and bot walls. Sites that block automated browsers don't block yours — because yours isn't automated in the way they're detecting. There's no navigator.webdriver flag. No stripped-down Chromium binary. No fingerprint inconsistencies. It's your real Chrome, headed and visible.
| agent-browser | Playwright / Puppeteer | |
|---|---|---|
| Browser used | Your real Chrome | Downloaded Chromium binary |
navigator.webdriver |
false (real browser) |
true (automation flag) |
| Cookies | Your real cookies, persistent | Fresh/empty every session |
| Browser fingerprint | Genuine (canvas, WebGL, etc.) | Detectable mismatches |
| Bot detection result | Passes as human | Detected and blocked |
npm install -g agent-browser# Start Chrome with debugging enabled
google-chrome --remote-debugging-port=9222 &
# Open a website
agent-browser open https://example.com
# See what's on the page (AI-readable snapshot)
agent-browser snapshot -i --compact
# Output:
# - heading "Example Domain" [level=1]
# - paragraph "This domain is for use in illustrative examples..."
# - link "More information..." [ref=e1]
# Click the link
agent-browser click @e1
# Take a screenshot
agent-browser screenshotThat @e1 is an element reference. Instead of writing brittle CSS selectors like #main > div:nth-child(3) > a.link-class, you just say "click element 1." The AI reads the snapshot, picks the right ref, and acts on it.
Drop one file and Claude Code knows how to drive a browser:
mkdir -p ~/.claude/skills/agent-browser
cp SKILL.md ~/.claude/skills/agent-browser/SKILL.mdNow you can tell Claude things like:
- "Test the login page and make sure it works"
- "Check if the homepage has any console errors"
- "Fill out the contact form and submit it"
- "Take a screenshot of the dashboard"
Claude will use agent-browser automatically — opening the browser, navigating, clicking, filling forms, taking screenshots, and reporting back what it found.
| Command | What it does |
|---|---|
open <url> |
Navigate to a URL |
snapshot -i --compact |
AI-readable page snapshot (interactive elements only) |
snapshot |
Full page structure |
click @e1 |
Click an element by ref |
fill @e1 "text" |
Clear a field and type text |
type @e1 "text" |
Append text to a field |
hover @e1 |
Hover over an element |
press Enter |
Press a keyboard key |
screenshot |
Capture the viewport as PNG |
eval "document.title" |
Run JavaScript in the browser |
errors |
Show console errors |
back / forward |
Navigate browser history |
wait --load networkidle |
Wait for the page to finish loading |
close |
Close the browser connection |
Claude Code → agent-browser CLI (Rust) → daemon (Node.js) → Chrome CDP (WebSocket)
|
cdp.js Raw WebSocket JSON-RPC
browser.js Page/Locator/Context API
snapshot.js Accessibility tree + refs
actions.js Command handlers
cdp.js — The engine. ~950 lines of raw WebSocket CDP transport. Connects to ws://localhost:9222, sends JSON-RPC commands, handles sessions, lifecycle events, dialogs, and network idle detection. No npm CDP libraries.
browser.js — Wraps the raw CDP calls into a clean Page/Locator/Context API so the rest of the code doesn't need to think about WebSocket frames.
snapshot.js — Calls Chrome's Accessibility.getFullAXTree() and formats it into the compact text tree with element refs (@e1, @e2, ...) that AI agents read.
actions.js — Maps CLI commands to browser actions. click @e1 resolves the ref, scrolls the element into view, gets its coordinates, and dispatches a click event through CDP.
- AI developers building agents that need to interact with real websites
- Claude Code users who want their AI to test, verify, and automate browser tasks
- Teams running parallel AI agents that need session isolation
- Anyone frustrated with Playwright/Puppeteer bloat who just wants to talk to Chrome
- People who want AI to handle real-world tasks — shopping on Amazon, managing accounts, interacting with sites that block bots
- New developers who want a simple CLI instead of learning a complex automation framework
| Feature | agent-browser | PinchTab | Playwright | Puppeteer | Playwright MCP | Selenium |
|---|---|---|---|---|---|---|
| Direct CDP (no relay) | Yes | No (HTTP→CDP) | No | No | No | No |
| Token-efficient snapshots | ~200-400/page | ~800/page | N/A | N/A | ~13,700/step | N/A |
| Session isolation | Built-in | Per-instance | Manual | Manual | Manual | Manual |
| Install size | Lightweight | 12MB Go binary | ~50MB | ~30MB | ~50MB | ~100MB+ |
| Downloads browsers | No | Yes (its own Chrome) | Yes | Yes | Yes | Yes |
AI-native refs (@e1) |
Yes | No | No | No | Yes | No |
| CLI-first design | Yes | No (HTTP API) | No | No | Partial | No |
| Persistent cookies | Yes (real Chrome profile) | No (fresh instances) | No (fresh each run) | No (fresh each run) | No (fresh each run) | No (fresh each run) |
| Invisible to bot detection | Yes (real browser) | No (stealth injection) | No (webdriver=true) |
No (webdriver=true) |
No (webdriver=true) |
No (webdriver=true) |
| Visible browser (headed) | Yes — you watch it work | No (headless default) | No (headless default) | No (headless default) | No (headless default) | No (headless default) |
| Cross-browser | Chrome only | Chrome only | Chrome, Firefox, WebKit | Chrome only | Chrome only | All |
The trade-off is intentional: agent-browser only supports Chrome because that's what AI agents need. Dropping Firefox and WebKit means zero bundled browsers, zero extra downloads, and a much simpler codebase.
PinchTab (7,300+ stars) markets itself as "5-13x cheaper than screenshots." That's true — but it's comparing against the worst-case baseline. When you compare PinchTab against agent-browser, the picture flips:
- 2-4x fewer tokens — agent-browser uses ~200-400 tokens per page. PinchTab uses ~800. PinchTab compares itself against screenshots (~3,600+ tokens), not against snapshot-based tools like agent-browser
- One fewer network hop — agent-browser talks directly to Chrome over WebSocket. PinchTab adds an HTTP server between your AI and Chrome (HTTP→CDP), doubling the round trips
- Real Chrome, real cookies — PinchTab launches its own Chrome instances with fresh sessions. agent-browser uses your actual browser with your actual cookies. Log in once, stay logged in
- No HTTP server to manage — agent-browser is a CLI. Call it directly. PinchTab runs a localhost daemon that your AI talks to through HTTP — an extra process to start, monitor, and kill
- 50+ commands vs a basic set — agent-browser includes video recording, network interception, device emulation, frame support, semantic locators, and profiling. PinchTab covers navigate, click, type, and extract
- Headed by default — you watch agent-browser work in your real browser. PinchTab is headless-first — your AI works in a browser you can't see
- No bot detection flags — agent-browser is invisible because it's your real Chrome. PinchTab uses stealth injection, which advanced detection systems can still catch
If you're searching for a PinchTab alternative, browser control for AI agents, or the most token-efficient way to automate Chrome — agent-browser does more with less.
Apache-2.0
Caleb Dane (@CalebDane7)
Originally forked from vercel-labs/agent-browser. CDP transport layer (cdp.js, browser.js) rewritten from scratch — zero Playwright code, zero Puppeteer code, zero browser automation library dependencies.
The claims in this README are backed by real benchmarks, migration reports, and industry analysis:
- Closer to the Metal: Leaving Playwright for CDP — browser-use's migration report documenting 5x faster element extraction after dropping Playwright
- Why Vercel's agent-browser Is Winning the Token Efficiency War — 5.7x more test cycles under the same token budget
- Agent-Browser: AI-First Browser Automation That Saves 93% of Your Context Window — Deep dive on token savings
- Playwright CLI: The Token-Efficient Alternative to Playwright MCP — Microsoft's own acknowledgment of the MCP token problem (~114K tokens vs ~27K with CLI)
- MCP vs Playwright CLI: Best Browser Control for Agents — Head-to-head comparison
- CDP vs Playwright vs Puppeteer: Is This the Wrong Question? — Architectural analysis of the relay layer overhead
- Playwright vs Puppeteer: Which to Choose in 2026? — Puppeteer runs 15-20% faster than Playwright on identical Chromium tasks
- Stagehand vs Browser Use vs Playwright: AI Browser Automation Compared — Industry comparison of AI browser approaches
- Top Playwright Alternatives in 2026 — BrowserStack's overview of the alternative landscape
- How to Detect Headless Chrome Bots Instrumented with Playwright — Why Playwright's
navigator.webdriver=trueis an instant detection signal - From Puppeteer Stealth to Nodriver: How Anti-Detect Frameworks Evolved — The industry shift toward CDP-minimal frameworks
- Stealth AI Browser Agents: Ultimate 2026 Guide — Comprehensive guide on browser fingerprinting and detection evasion
- The Best Headless Chrome Browser for Bypassing Anti-Bot Systems — Testing results showing Playwright/Puppeteer fail advanced detection
- PinchTab — Popular HTTP-based alternative (7,300+ stars). Comparison: agent-browser uses 2-4x fewer tokens (~200-400 vs ~800 per page) and connects directly to Chrome without an HTTP relay
- 11 Best AI Browser Agents in 2026 — Firecrawl's comprehensive review
- Top 10 Browser AI Agents 2026: Complete Review & Guide — o-mega's agent comparison
- The Agentic Browser Landscape in 2026 — Full landscape analysis
- Browser Agent Security Risks: CDP Automation Leaking Cookies — Security considerations for CDP-based agents
