@mercuryo-ai/agentbrowse

Give your AI agent a real browser.

AgentBrowse is the browser layer for agent systems that need to work with real web pages — launch a browser, read what's on screen, interact with it, and extract structured data. Your app keeps full control of orchestration and business logic; AgentBrowse handles the page.

Typical workflow:

open a browser (or attach to an existing one) and get a session;
ask AgentBrowse what's on the page with observe(...);
act on what you found with act(...);
use extract(...) when you need structured data instead of an action;
close the session when you are done.

That shape fits naturally into a worker, backend service, CLI, or agent runtime you already have.

Key Terms

Three terms come up repeatedly in the API:

session — the handle AgentBrowse returns from launch(...) or attach(...). You pass it into every later call. The session carries browser identity, runtime state, and sticky-owner metadata so healthy commands reuse one browser owner instead of opening a fresh root CDP attach for every call. If you persist a session across process runs, the next command may repair that owner while the underlying browser is still alive; otherwise the session fails closed and you should launch(...) or attach(...) again.
ref (also targetRef, scopeRef, fillRef) — a stable reference returned by observe(...). You act on references, never on raw CSS selectors. Refs are valid for the page state that produced them, not forever. Navigation, route changes, or a major DOM re-render invalidate them — call observe(...) again to get fresh refs.
CDP — the Chrome DevTools Protocol. Chrome, Chromium, and Playwright all speak it. If a browser exposes a CDP WebSocket URL, AgentBrowse can attach to it.

Optionally, AgentBrowse can call an LLM to understand pages at a higher level. That layer is called the assistive runtime and is only required for extract(...) and goal-driven observe(session, goal).

Install

npm i @mercuryo-ai/agentbrowse

If you want the operator-facing CLI command, install the separate global CLI package:

npm i -g @mercuryo-ai/agentbrowse-cli@latest

@mercuryo-ai/agentbrowse is the library package for imports. It does not install the agentbrowse shell command.

Quick Start

This is the normal managed-browser flow. It does not require LLM setup.

import {
  act,
  close,
  launch,
  navigate,
  observe,
  screenshot,
  status,
} from '@mercuryo-ai/agentbrowse';

const launchResult = await launch('https://example.com');
if (!launchResult.success) {
  throw new Error(launchResult.reason ?? launchResult.message);
}

const { session } = launchResult;

try {
  const observeResult = await observe(session);
  if (!observeResult.success) {
    throw new Error(observeResult.reason ?? observeResult.message);
  }

  const firstActionableTarget = observeResult.targets.find((target) => typeof target.ref === 'string');

  if (firstActionableTarget?.ref) {
    const actResult = await act(session, firstActionableTarget.ref, 'click');
    if (!actResult.success) {
      throw new Error(actResult.reason ?? actResult.message);
    }
  }

  const navigateResult = await navigate(session, 'https://example.com/checkout');
  if (!navigateResult.success) {
    throw new Error(navigateResult.reason ?? navigateResult.message);
  }

  const screenshotResult = await screenshot(session, '/tmp/checkout.png');
  if (!screenshotResult.success) {
    throw new Error(screenshotResult.reason ?? screenshotResult.message);
  }

  const statusResult = await status(session);
  if (!statusResult.alive) {
    throw new Error('Browser is no longer reachable.');
  }
} finally {
  await close(session);
}

Runnable examples live in examples/:

first run npm run build when executing them from this repo
npx tsx examples/basic.ts
npx tsx examples/attach.ts
npx tsx examples/extract.ts

The library entrypoint does not load .env files. Environment loading only happens in the CLI entrypoint.

Both launch(...) and attach(...) bootstrap the same sticky-owner lifecycle. That owner may live in-process or in an internal detached host, but it is not a user-managed daemon contract.

Attach To An Existing Browser

If you already have a browser that exposes a CDP WebSocket URL, use attach(...) instead of launch(...).

Common sources of a CDP URL:

a local Chrome or Chromium started with the --remote-debugging-port flag;
a managed cloud browser (Browserbase, Browserless, and similar) that hands you a WebSocket URL;
any other browser runtime Playwright can reach through CDP.

import { attach, observe } from '@mercuryo-ai/agentbrowse';

const attached = await attach('ws://127.0.0.1:9222/devtools/browser/browser-id');
if (!attached.success) {
  throw new Error(attached.reason ?? attached.message);
}

const observeResult = await observe(attached.session);
if (!observeResult.success) {
  throw new Error(observeResult.reason ?? observeResult.message);
}

If your provider gives you a labeled remote session, you can carry that label in the session handle:

const attached = await attach(remoteCdpUrl, {
  provider: 'browserbase',
});

attach(...) is not a separate reconnect mode. It is the second bootstrap path into the same sticky-owner execution model as launch(...). After attach succeeds, later browser commands reuse or repair that owner instead of performing a fresh provider-level root attach on every healthy command.

What Each Main API Does

API	Use it when	Typical result
`launch(url?, options?)`	You need a new browser session	`session`, current `url`, current `title`
`attach(cdpUrl, options?)`	You already have a running browser that exposes CDP	`session`, current `url`, current `title`
`observe(session, goal?)`	You want to understand the page	targets, scopes, signals, fillable forms
`act(session, targetRef, action, value?)`	You want to click, type, select, fill, or press	action result and target metadata
`navigate(session, url)`	You want to move to another page	page metadata after navigation
`extract(session, schema, scopeRef?)`	You want structured JSON from the page	`data` that matches your schema
`screenshot(session, path?)`	You want a screenshot artifact	saved path and page metadata
`status(session)`	You want to know whether the session is still healthy	liveness, page info, runtime summary
`close(session)`	You are done with the browser	close result

Two common questions:

observe(session) gives you a general inventory of the page.
observe(session, goal) focuses that inventory around a question such as "find the checkout total" or "find the email field".

All main APIs return the same broad result shape:

success path: { success: true, ... }
failure path: { success: false, error, message, reason, ... }

When You Need An Assistive Runtime

You only need assistive runtime when AgentBrowse should call an LLM.

In practice, that mainly means:

extract(...)
better quality goal-based observe(session, goal)

The runtime interface is intentionally small: you provide an object that can create an OpenAI-compatible chat-completions client.

// Pseudocode shape only. For a runnable fetch-based adapter, see
// `examples/extract.ts` and `docs/assistive-runtime.md`.
import { createAgentbrowseClient } from '@mercuryo-ai/agentbrowse';

const client = createAgentbrowseClient({
  assistiveRuntime: createMyFetchBackedRuntime(),
});

The same pattern works for OpenRouter and other OpenAI-compatible backends.

See:

Assistive Runtime Guide

Session Persistence, Proxy, And Diagnostics

Normal usage is explicit-session based:

call launch(...) or attach(...)
keep the returned session
pass that session into later calls

If you want to restore a session across process runs, use the optional store helpers:

import {
  createBrowserSessionStore,
  loadBrowserSession,
  saveBrowserSession,
} from '@mercuryo-ai/agentbrowse';

saveBrowserSession(session);
const restored = loadBrowserSession();

const store = createBrowserSessionStore({
  rootDir: '/tmp/my-app/browser-state',
});

store.save(session);
const restoredFromCustomRoot = store.load();

Persisted session files contain versioned sticky-owner metadata, not a live Playwright connection. loadBrowserSession() and custom stores intentionally return null for incompatible reconnect-era records or incomplete owner metadata instead of auto-migrating them. After loading a session, call status(restored) or let the next browser command verify or repair ownership. If the underlying browser is gone, the session fails closed and you start fresh with launch(...) or attach(...).

There is no separate daemon API to supervise. close(session) is the public lifecycle boundary for shutting down the internal owner host and, when applicable, the managed browser itself.

If you want to use a proxy, pass it directly to launch(...):

const launchResult = await launch('https://example.com', {
  useProxy: true,
  proxy: 'http://user:pass@proxy.example:8080',
});

Diagnostics are optional. If you need tracing or custom logging, use a client:

import { createAgentbrowseClient } from '@mercuryo-ai/agentbrowse';

const client = createAgentbrowseClient({
  diagnostics: {
    startStep() {
      return {
        finish() {},
      };
    },
  },
});

See:

Configuration Guide

Testing Wrappers Around AgentBrowse

If your package wraps AgentBrowse and you want a stable test helper for the assistive runtime, use the dedicated testing subpath:

import {
  installFetchBackedTestAssistiveRuntime,
  uninstallTestAssistiveRuntime,
} from '@mercuryo-ai/agentbrowse/testing';

See:

Testing Guide

Protected Fill

Protected fill is for cases where your application already has sensitive values and wants AgentBrowse to apply them to a previously observed form through a guarded browser execution path.

Import it separately:

import { fillProtectedForm } from '@mercuryo-ai/agentbrowse/protected-fill';

See:

Protected Fill Guide

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@mercuryo-ai/agentbrowse

Key Terms

Install

Quick Start

Attach To An Existing Browser

What Each Main API Does

When You Need An Assistive Runtime

Session Persistence, Proxy, And Diagnostics

Testing Wrappers Around AgentBrowse

Protected Fill

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@mercuryo-ai/agentbrowse

Key Terms

Install

Quick Start

Attach To An Existing Browser

What Each Main API Does

When You Need An Assistive Runtime

Session Persistence, Proxy, And Diagnostics

Testing Wrappers Around AgentBrowse

Protected Fill

Documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages