agent-browser

Breaking. Browser automation just got completely replaced.

Completely undetectable. Your real Chrome as you. Playwright and PinchTab are old news. agent-browser replaces them. No screenshots. Reads page structure. 93% less context. Near-instant. 100s of parallel sessions working together agentically. "Find 5-star Amazon sellers and order from the best." "Message 20 Alibaba suppliers." Whatever you do in Chrome, it handles. Ships with a Claude Code skill that researches the UI, plans the smartest path, and verifies its own work. Open source.

Real Chrome, real cookies — log in once, stay logged in. Your AI uses your actual browser session with persistent cookies
Invisible to bot detection — no navigator.webdriver flag, no fingerprint mismatches. Sites see a real human, not a bot
93% fewer tokens than Playwright MCP — ~200 tokens per page vs ~13,700. Your AI does more with less
5x faster — direct WebSocket to Chrome, no middleware relay. Every call saves seconds
Independent sessions — run multiple AI agents simultaneously on the same machine. Zero conflicts, zero shared state
Headed, not headless — you see everything the AI does in real time. Watch it work, jump in anytime, take over when you want
Claude Code skill included — drop one file and Claude knows how to drive your browser. No setup, no configuration

agent-browser is a CLI that gives AI agents direct control of your real, visible Chrome browser — without Playwright, without Puppeteer, without downloading bundled browser binaries, and without burning through your token budget.

It speaks raw Chrome DevTools Protocol (CDP) over a single WebSocket. That's it. No middleware. No relay servers. No 50MB dependency you never asked for.

Built by Caleb Dane. Originally forked from vercel-labs/agent-browser — CDP transport layer rewritten from scratch.

The Problem with Playwright and Puppeteer

If you're using Playwright or Puppeteer for AI browser automation, here's what's actually happening under the hood:

Your AI  →  Playwright/Puppeteer  →  Node.js WebSocket relay  →  Chrome  →  back through all of that

That middle layer — the Node.js relay — adds an extra network hop on every single browser call. Click a button? Extra hop. Take a screenshot? Extra hop. Read the page? Extra hop. Multiply that by hundreds of operations per session and you get real, measurable slowdowns.

And then there's the size. Playwright alone adds ~50MB to your node_modules. It downloads its own browser binaries. It bundles Firefox and WebKit engines you'll never use for AI automation.

The industry is moving away from this. browser-use reported 5x faster element extraction after dropping Playwright for raw CDP. Stagehand (Browserbase) is making the same move. Even Microsoft built Playwright CLI to work around their own tool's token bloat.

Newer tools like PinchTab still add an HTTP relay layer between your AI and Chrome. agent-browser skips that entirely.

How agent-browser Is Different

Your AI  →  agent-browser  →  Chrome

That's the whole stack. One WebSocket connection. Zero relay layers. Your commands go straight to Chrome and the response comes straight back.

By the numbers

	agent-browser	Playwright MCP	Playwright CLI
Tokens per page	~200-400	~13,700 per step	~2,700 per step
10-step workflow	~7,000 tokens	~114,000 tokens	~27,000 tokens
Install size	Lightweight (uses your Chrome)	~50MB + browser binaries	~50MB + browser binaries
Network hops per call	1 (direct to Chrome)	2 (relay + Chrome)	2 (relay + Chrome)
Extra browser download?	No — uses your existing Chrome	Yes — downloads Chromium	Yes — downloads Chromium

Under the same token budget, agent-browser runs 5.7x more automation cycles than Playwright MCP. That's not a minor optimization — it's the difference between your AI agent finishing the job or running out of context halfway through.

You and Your AI Share the Same Browser

This isn't headless automation running invisibly in the background. agent-browser is headed — it controls your real, visible Chrome window. You can watch everything the AI does in real time.

Think of it like handing someone a remote control to your computer:

Watch the AI work — see it click buttons, fill forms, navigate pages, all on your actual screen
Jump in anytime — navigate to a page manually, then tell the AI "now fill out this form" or "click that button"
Hand control back and forth — you browse to the right page, the AI handles the tedious parts, you verify the result
Pair browse — stream the viewport via WebSocket so you can watch from another machine or share with a teammate
Debug in real time — when something goes wrong, you see exactly what the AI sees. No guessing what happened in a headless void

Other automation tools run in a hidden browser you can't see or interact with. agent-browser runs in your browser — the one you're already looking at.

What It Actually Does (Plain English)

If you're new to browser automation, here's the simple version:

agent-browser lets an AI control your Chrome browser the same way you do — it can open websites, click buttons, fill out forms, read what's on the page, and take screenshots. You see everything it does because it's working in your real, visible browser — not some hidden process running in the background.

Here's everything it automates:

Open any website — navigate to URLs, go back, go forward, refresh
Read the page — get a structured snapshot of everything on the page (buttons, links, text fields, headings) that an AI can understand in ~200 tokens instead of thousands
Click things — buttons, links, checkboxes, dropdowns — by simple reference like @e1 instead of fragile CSS selectors
Fill out forms — type into text fields, select options, check boxes
Take screenshots — capture what the page looks like for visual verification
Run JavaScript — execute any code in the browser for advanced automation
Track errors — catch console errors and broken pages automatically
Manage tabs — open new tabs, switch between them, close them
Intercept network requests — mock API responses, block tracking scripts, test error states
Stream the viewport — watch what the browser is doing in real time via screencast

All of this through one simple CLI: agent-browser <command>.

Sessions That Don't Step on Each Other

This is a big deal if you're running multiple AI agents at the same time.

Every session is completely independent. Each AI session (like each Claude Code window) gets its own daemon process through an environment variable:

AGENT_BROWSER_SESSION="claude-$$"  # Each session gets a unique ID

What this means in practice:

Session A can be testing your login page while Session B tests the checkout flow — simultaneously, on the same machine
No shared state between sessions — different cookies, different tabs, different browsing history
No race conditions — one agent clicking a button won't interfere with another agent reading a page
Sessions clean up after themselves — close one and the others keep running

If you've ever had two Playwright scripts fight over the same browser instance, you know why this matters.

Real Chrome. Real Cookies. Invisible to Bot Detection.

This is the part most automation tools get wrong.

Playwright and Puppeteer download their own Chromium binary — a stripped-down, identifiable browser that websites can detect instantly. They set navigator.webdriver = true. They leave fingerprint mismatches in canvas rendering, WebGL, and device memory. Even with "stealth" plugins, they fail advanced detection systems like Cloudflare and Pixelscan.

agent-browser doesn't have this problem. It connects to your real Chrome — the same browser you use every day, with your real cookies, your real extensions, your real fingerprint. Websites can't tell the difference between you and your AI agent because there is no difference. It's the same browser.

What this means in practice

Log in once, stay logged in forever. Sign into Amazon, Gmail, your bank — whatever. Those cookies persist in your Chrome profile. Next time your AI agent opens that site, it's already authenticated. No re-entering passwords. No 2FA loops. No expired sessions.

Shop on Amazon. Your AI can browse products, compare prices, add items to your cart, and go through checkout — on your real account, with your saved payment methods, at your saved addresses. The same workflow that gets blocked instantly with Playwright just works here because Amazon sees a real Chrome browser with a real browsing history.

Manage any authenticated account. Banking dashboards, social media, email, admin panels, SaaS tools — if you can access it in Chrome, your AI agent can too. Same cookies. Same session. No bot flags.

Get past Cloudflare, CAPTCHAs, and bot walls. Sites that block automated browsers don't block yours — because yours isn't automated in the way they're detecting. There's no navigator.webdriver flag. No stripped-down Chromium binary. No fingerprint inconsistencies. It's your real Chrome, headed and visible.

Why this works

	agent-browser	Playwright / Puppeteer
Browser used	Your real Chrome	Downloaded Chromium binary
`navigator.webdriver`	`false` (real browser)	`true` (automation flag)
Cookies	Your real cookies, persistent	Fresh/empty every session
Browser fingerprint	Genuine (canvas, WebGL, etc.)	Detectable mismatches
Bot detection result	Passes as human	Detected and blocked

Quick Start

Install

npm install -g agent-browser

Use it right now

# Start Chrome with debugging enabled
google-chrome --remote-debugging-port=9222 &

# Open a website
agent-browser open https://example.com

# See what's on the page (AI-readable snapshot)
agent-browser snapshot -i --compact
# Output:
# - heading "Example Domain" [level=1]
# - paragraph "This domain is for use in illustrative examples..."
# - link "More information..." [ref=e1]

# Click the link
agent-browser click @e1

# Take a screenshot
agent-browser screenshot

That @e1 is an element reference. Instead of writing brittle CSS selectors like #main > div:nth-child(3) > a.link-class, you just say "click element 1." The AI reads the snapshot, picks the right ref, and acts on it.

Use It as a Claude Code Skill

Drop one file and Claude Code knows how to drive a browser:

mkdir -p ~/.claude/skills/agent-browser
cp SKILL.md ~/.claude/skills/agent-browser/SKILL.md

Now you can tell Claude things like:

"Test the login page and make sure it works"
"Check if the homepage has any console errors"
"Fill out the contact form and submit it"
"Take a screenshot of the dashboard"

Claude will use agent-browser automatically — opening the browser, navigating, clicking, filling forms, taking screenshots, and reporting back what it found.

Every Command

Command	What it does
`open <url>`	Navigate to a URL
`snapshot -i --compact`	AI-readable page snapshot (interactive elements only)
`snapshot`	Full page structure
`click @e1`	Click an element by ref
`fill @e1 "text"`	Clear a field and type text
`type @e1 "text"`	Append text to a field
`hover @e1`	Hover over an element
`press Enter`	Press a keyboard key
`screenshot`	Capture the viewport as PNG
`eval "document.title"`	Run JavaScript in the browser
`errors`	Show console errors
`back` / `forward`	Navigate browser history
`wait --load networkidle`	Wait for the page to finish loading
`close`	Close the browser connection

How It Works Under the Hood

Claude Code  →  agent-browser CLI (Rust)  →  daemon (Node.js)  →  Chrome CDP (WebSocket)
                                                   |
                                                 cdp.js      Raw WebSocket JSON-RPC
                                                 browser.js   Page/Locator/Context API
                                                 snapshot.js  Accessibility tree + refs
                                                 actions.js   Command handlers

cdp.js — The engine. ~950 lines of raw WebSocket CDP transport. Connects to ws://localhost:9222, sends JSON-RPC commands, handles sessions, lifecycle events, dialogs, and network idle detection. No npm CDP libraries.

browser.js — Wraps the raw CDP calls into a clean Page/Locator/Context API so the rest of the code doesn't need to think about WebSocket frames.

snapshot.js — Calls Chrome's Accessibility.getFullAXTree() and formats it into the compact text tree with element refs (@e1, @e2, ...) that AI agents read.

actions.js — Maps CLI commands to browser actions. click @e1 resolves the ref, scrolls the element into view, gets its coordinates, and dispatches a click event through CDP.

Who This Is For

AI developers building agents that need to interact with real websites
Claude Code users who want their AI to test, verify, and automate browser tasks
Teams running parallel AI agents that need session isolation
Anyone frustrated with Playwright/Puppeteer bloat who just wants to talk to Chrome
People who want AI to handle real-world tasks — shopping on Amazon, managing accounts, interacting with sites that block bots
New developers who want a simple CLI instead of learning a complex automation framework

Compared to the Alternatives

Feature	agent-browser	PinchTab	Playwright	Puppeteer	Playwright MCP	Selenium
Direct CDP (no relay)	Yes	No (HTTP→CDP)	No	No	No	No
Token-efficient snapshots	~200-400/page	~800/page	N/A	N/A	~13,700/step	N/A
Session isolation	Built-in	Per-instance	Manual	Manual	Manual	Manual
Install size	Lightweight	12MB Go binary	~50MB	~30MB	~50MB	~100MB+
Downloads browsers	No	Yes (its own Chrome)	Yes	Yes	Yes	Yes
AI-native refs (`@e1`)	Yes	No	No	No	Yes	No
CLI-first design	Yes	No (HTTP API)	No	No	Partial	No
Persistent cookies	Yes (real Chrome profile)	No (fresh instances)	No (fresh each run)	No (fresh each run)	No (fresh each run)	No (fresh each run)
Invisible to bot detection	Yes (real browser)	No (stealth injection)	No (`webdriver=true`)	No (`webdriver=true`)	No (`webdriver=true`)	No (`webdriver=true`)
Visible browser (headed)	Yes — you watch it work	No (headless default)	No (headless default)	No (headless default)	No (headless default)	No (headless default)
Cross-browser	Chrome only	Chrome only	Chrome, Firefox, WebKit	Chrome only	Chrome only	All

The trade-off is intentional: agent-browser only supports Chrome because that's what AI agents need. Dropping Firefox and WebKit means zero bundled browsers, zero extra downloads, and a much simpler codebase.

agent-browser vs PinchTab

PinchTab (7,300+ stars) markets itself as "5-13x cheaper than screenshots." That's true — but it's comparing against the worst-case baseline. When you compare PinchTab against agent-browser, the picture flips:

2-4x fewer tokens — agent-browser uses ~200-400 tokens per page. PinchTab uses ~800. PinchTab compares itself against screenshots (~3,600+ tokens), not against snapshot-based tools like agent-browser
One fewer network hop — agent-browser talks directly to Chrome over WebSocket. PinchTab adds an HTTP server between your AI and Chrome (HTTP→CDP), doubling the round trips
Real Chrome, real cookies — PinchTab launches its own Chrome instances with fresh sessions. agent-browser uses your actual browser with your actual cookies. Log in once, stay logged in
No HTTP server to manage — agent-browser is a CLI. Call it directly. PinchTab runs a localhost daemon that your AI talks to through HTTP — an extra process to start, monitor, and kill
50+ commands vs a basic set — agent-browser includes video recording, network interception, device emulation, frame support, semantic locators, and profiling. PinchTab covers navigate, click, type, and extract
Headed by default — you watch agent-browser work in your real browser. PinchTab is headless-first — your AI works in a browser you can't see
No bot detection flags — agent-browser is invisible because it's your real Chrome. PinchTab uses stealth injection, which advanced detection systems can still catch

If you're searching for a PinchTab alternative, browser control for AI agents, or the most token-efficient way to automate Chrome — agent-browser does more with less.

License

Apache-2.0

Author

Caleb Dane (@CalebDane7)

Originally forked from vercel-labs/agent-browser. CDP transport layer (cdp.js, browser.js) rewritten from scratch — zero Playwright code, zero Puppeteer code, zero browser automation library dependencies.

Research & References

The claims in this README are backed by real benchmarks, migration reports, and industry analysis:

Performance & Token Efficiency

Closer to the Metal: Leaving Playwright for CDP — browser-use's migration report documenting 5x faster element extraction after dropping Playwright
Why Vercel's agent-browser Is Winning the Token Efficiency War — 5.7x more test cycles under the same token budget
Agent-Browser: AI-First Browser Automation That Saves 93% of Your Context Window — Deep dive on token savings
Playwright CLI: The Token-Efficient Alternative to Playwright MCP — Microsoft's own acknowledgment of the MCP token problem (~114K tokens vs ~27K with CLI)
MCP vs Playwright CLI: Best Browser Control for Agents — Head-to-head comparison

CDP vs Playwright vs Puppeteer

CDP vs Playwright vs Puppeteer: Is This the Wrong Question? — Architectural analysis of the relay layer overhead
Playwright vs Puppeteer: Which to Choose in 2026? — Puppeteer runs 15-20% faster than Playwright on identical Chromium tasks
Stagehand vs Browser Use vs Playwright: AI Browser Automation Compared — Industry comparison of AI browser approaches
Top Playwright Alternatives in 2026 — BrowserStack's overview of the alternative landscape

Bot Detection & Real Chrome

How to Detect Headless Chrome Bots Instrumented with Playwright — Why Playwright's navigator.webdriver=true is an instant detection signal
From Puppeteer Stealth to Nodriver: How Anti-Detect Frameworks Evolved — The industry shift toward CDP-minimal frameworks
Stealth AI Browser Agents: Ultimate 2026 Guide — Comprehensive guide on browser fingerprinting and detection evasion
The Best Headless Chrome Browser for Bypassing Anti-Bot Systems — Testing results showing Playwright/Puppeteer fail advanced detection

PinchTab Comparison

PinchTab — Popular HTTP-based alternative (7,300+ stars). Comparison: agent-browser uses 2-4x fewer tokens (~200-400 vs ~800 per page) and connects directly to Chrome without an HTTP relay

AI Browser Agents Landscape

11 Best AI Browser Agents in 2026 — Firecrawl's comprehensive review
Top 10 Browser AI Agents 2026: Complete Review & Guide — o-mega's agent comparison
The Agentic Browser Landscape in 2026 — Full landscape analysis
Browser Agent Security Risks: CDP Automation Leaking Cookies — Security considerations for CDP-based agents

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
bin		bin
dist		dist
scripts		scripts
skills		skills
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
package.json		package.json
social-preview.png		social-preview.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-browser

Breaking. Browser automation just got completely replaced.

The Problem with Playwright and Puppeteer

How agent-browser Is Different

By the numbers

You and Your AI Share the Same Browser

What It Actually Does (Plain English)

Sessions That Don't Step on Each Other

Real Chrome. Real Cookies. Invisible to Bot Detection.

What this means in practice

Why this works

Quick Start

Install

Use it right now

Use It as a Claude Code Skill

Every Command

How It Works Under the Hood

Who This Is For

Compared to the Alternatives

agent-browser vs PinchTab

License

Author

Research & References

Performance & Token Efficiency

CDP vs Playwright vs Puppeteer

Bot Detection & Real Chrome

PinchTab Comparison

AI Browser Agents Landscape

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-browser

Breaking. Browser automation just got completely replaced.

The Problem with Playwright and Puppeteer

How agent-browser Is Different

By the numbers

You and Your AI Share the Same Browser

What It Actually Does (Plain English)

Sessions That Don't Step on Each Other

Real Chrome. Real Cookies. Invisible to Bot Detection.

What this means in practice

Why this works

Quick Start

Install

Use it right now

Use It as a Claude Code Skill

Every Command

How It Works Under the Hood

Who This Is For

Compared to the Alternatives

agent-browser vs PinchTab

License

Author

Research & References

Performance & Token Efficiency

CDP vs Playwright vs Puppeteer

Bot Detection & Real Chrome

PinchTab Comparison

AI Browser Agents Landscape

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages