Give your AI agent a real browser.
AgentBrowse is the browser layer for agent systems that need to work with real web pages — launch a browser, read what's on screen, interact with it, and extract structured data. Your app keeps full control of orchestration and business logic; AgentBrowse handles the page.
Typical workflow:
- open a browser (or attach to an existing one) and get a
session; - ask AgentBrowse what's on the page with
observe(...); - act on what you found with
act(...); - use
extract(...)when you need structured data instead of an action; - close the session when you are done.
That shape fits naturally into a worker, backend service, CLI, or agent runtime you already have.
Three terms come up repeatedly in the API:
session— the handle AgentBrowse returns fromlaunch(...)orattach(...). You pass it into every later call. The session carries browser identity, runtime state, and sticky-owner metadata so healthy commands reuse one browser owner instead of opening a fresh root CDP attach for every call. If you persist a session across process runs, the next command may repair that owner while the underlying browser is still alive; otherwise the session fails closed and you shouldlaunch(...)orattach(...)again.ref(alsotargetRef,scopeRef,fillRef) — a stable reference returned byobserve(...). You act on references, never on raw CSS selectors. Refs are valid for the page state that produced them, not forever. Navigation, route changes, or a major DOM re-render invalidate them — callobserve(...)again to get fresh refs.- CDP — the Chrome DevTools Protocol. Chrome, Chromium, and Playwright all speak it. If a browser exposes a CDP WebSocket URL, AgentBrowse can attach to it.
Optionally, AgentBrowse can call an LLM to understand pages at a higher
level. That layer is called the assistive runtime and is only required
for extract(...) and goal-driven observe(session, goal).
npm i @mercuryo-ai/agentbrowseIf you want the operator-facing CLI command, install the separate global CLI package:
npm i -g @mercuryo-ai/agentbrowse-cli@latest@mercuryo-ai/agentbrowse is the library package for imports. It does not
install the agentbrowse shell command.
This is the normal managed-browser flow. It does not require LLM setup.
import {
act,
close,
launch,
navigate,
observe,
screenshot,
status,
} from '@mercuryo-ai/agentbrowse';
const launchResult = await launch('https://example.com');
if (!launchResult.success) {
throw new Error(launchResult.reason ?? launchResult.message);
}
const { session } = launchResult;
try {
const observeResult = await observe(session);
if (!observeResult.success) {
throw new Error(observeResult.reason ?? observeResult.message);
}
const firstActionableTarget = observeResult.targets.find((target) => typeof target.ref === 'string');
if (firstActionableTarget?.ref) {
const actResult = await act(session, firstActionableTarget.ref, 'click');
if (!actResult.success) {
throw new Error(actResult.reason ?? actResult.message);
}
}
const navigateResult = await navigate(session, 'https://example.com/checkout');
if (!navigateResult.success) {
throw new Error(navigateResult.reason ?? navigateResult.message);
}
const screenshotResult = await screenshot(session, '/tmp/checkout.png');
if (!screenshotResult.success) {
throw new Error(screenshotResult.reason ?? screenshotResult.message);
}
const statusResult = await status(session);
if (!statusResult.alive) {
throw new Error('Browser is no longer reachable.');
}
} finally {
await close(session);
}Runnable examples live in examples/:
- first run
npm run buildwhen executing them from this repo npx tsx examples/basic.tsnpx tsx examples/attach.tsnpx tsx examples/extract.ts
The library entrypoint does not load .env files. Environment loading only
happens in the CLI entrypoint.
Both launch(...) and attach(...) bootstrap the same sticky-owner
lifecycle. That owner may live in-process or in an internal detached host,
but it is not a user-managed daemon contract.
If you already have a browser that exposes a CDP WebSocket URL, use
attach(...) instead of launch(...).
Common sources of a CDP URL:
- a local Chrome or Chromium started with the
--remote-debugging-portflag; - a managed cloud browser (Browserbase, Browserless, and similar) that hands you a WebSocket URL;
- any other browser runtime Playwright can reach through CDP.
import { attach, observe } from '@mercuryo-ai/agentbrowse';
const attached = await attach('ws://127.0.0.1:9222/devtools/browser/browser-id');
if (!attached.success) {
throw new Error(attached.reason ?? attached.message);
}
const observeResult = await observe(attached.session);
if (!observeResult.success) {
throw new Error(observeResult.reason ?? observeResult.message);
}If your provider gives you a labeled remote session, you can carry that label in the session handle:
const attached = await attach(remoteCdpUrl, {
provider: 'browserbase',
});attach(...) is not a separate reconnect mode. It is the second bootstrap
path into the same sticky-owner execution model as launch(...). After
attach succeeds, later browser commands reuse or repair that owner instead of
performing a fresh provider-level root attach on every healthy command.
| API | Use it when | Typical result |
|---|---|---|
launch(url?, options?) |
You need a new browser session | session, current url, current title |
attach(cdpUrl, options?) |
You already have a running browser that exposes CDP | session, current url, current title |
observe(session, goal?) |
You want to understand the page | targets, scopes, signals, fillable forms |
act(session, targetRef, action, value?) |
You want to click, type, select, fill, or press | action result and target metadata |
navigate(session, url) |
You want to move to another page | page metadata after navigation |
extract(session, schema, scopeRef?) |
You want structured JSON from the page | data that matches your schema |
screenshot(session, path?) |
You want a screenshot artifact | saved path and page metadata |
status(session) |
You want to know whether the session is still healthy | liveness, page info, runtime summary |
close(session) |
You are done with the browser | close result |
Two common questions:
observe(session)gives you a general inventory of the page.observe(session, goal)focuses that inventory around a question such as"find the checkout total"or"find the email field".
All main APIs return the same broad result shape:
- success path:
{ success: true, ... } - failure path:
{ success: false, error, message, reason, ... }
You only need assistive runtime when AgentBrowse should call an LLM.
In practice, that mainly means:
extract(...)- better quality goal-based
observe(session, goal)
The runtime interface is intentionally small: you provide an object that can create an OpenAI-compatible chat-completions client.
// Pseudocode shape only. For a runnable fetch-based adapter, see
// `examples/extract.ts` and `docs/assistive-runtime.md`.
import { createAgentbrowseClient } from '@mercuryo-ai/agentbrowse';
const client = createAgentbrowseClient({
assistiveRuntime: createMyFetchBackedRuntime(),
});The same pattern works for OpenRouter and other OpenAI-compatible backends.
See:
Normal usage is explicit-session based:
- call
launch(...)orattach(...) - keep the returned
session - pass that session into later calls
If you want to restore a session across process runs, use the optional store helpers:
import {
createBrowserSessionStore,
loadBrowserSession,
saveBrowserSession,
} from '@mercuryo-ai/agentbrowse';
saveBrowserSession(session);
const restored = loadBrowserSession();
const store = createBrowserSessionStore({
rootDir: '/tmp/my-app/browser-state',
});
store.save(session);
const restoredFromCustomRoot = store.load();Persisted session files contain versioned sticky-owner metadata, not a live
Playwright connection. loadBrowserSession() and custom stores intentionally
return null for incompatible reconnect-era records or incomplete owner
metadata instead of auto-migrating them. After loading a session, call
status(restored) or let the next browser command verify or repair
ownership. If the underlying browser is gone, the session fails closed and
you start fresh with launch(...) or attach(...).
There is no separate daemon API to supervise. close(session) is the public
lifecycle boundary for shutting down the internal owner host and, when
applicable, the managed browser itself.
If you want to use a proxy, pass it directly to launch(...):
const launchResult = await launch('https://example.com', {
useProxy: true,
proxy: 'http://user:pass@proxy.example:8080',
});Diagnostics are optional. If you need tracing or custom logging, use a client:
import { createAgentbrowseClient } from '@mercuryo-ai/agentbrowse';
const client = createAgentbrowseClient({
diagnostics: {
startStep() {
return {
finish() {},
};
},
},
});See:
If your package wraps AgentBrowse and you want a stable test helper for the assistive runtime, use the dedicated testing subpath:
import {
installFetchBackedTestAssistiveRuntime,
uninstallTestAssistiveRuntime,
} from '@mercuryo-ai/agentbrowse/testing';See:
Protected fill is for cases where your application already has sensitive values and wants AgentBrowse to apply them to a previously observed form through a guarded browser execution path.
Import it separately:
import { fillProtectedForm } from '@mercuryo-ai/agentbrowse/protected-fill';See: