Skip to content

MercuryoAI/agentbrowse

Repository files navigation

@mercuryo-ai/agentbrowse

npm version MIT License Node.js >= 18

Give your AI agent a real browser.

AgentBrowse is the browser layer for agent systems that need to work with real web pages — launch a browser, read what's on screen, interact with it, and extract structured data. Your app keeps full control of orchestration and business logic; AgentBrowse handles the page.

Typical workflow:

  1. open a browser (or attach to an existing one) and get a session;
  2. ask AgentBrowse what's on the page with observe(...);
  3. act on what you found with act(...);
  4. use extract(...) when you need structured data instead of an action;
  5. close the session when you are done.

That shape fits naturally into a worker, backend service, CLI, or agent runtime you already have.

Key Terms

Three terms come up repeatedly in the API:

  • session — the handle AgentBrowse returns from launch(...) or attach(...). You pass it into every later call. The session carries browser identity, runtime state, and sticky-owner metadata so healthy commands reuse one browser owner instead of opening a fresh root CDP attach for every call. If you persist a session across process runs, the next command may repair that owner while the underlying browser is still alive; otherwise the session fails closed and you should launch(...) or attach(...) again.
  • ref (also targetRef, scopeRef, fillRef) — a stable reference returned by observe(...). You act on references, never on raw CSS selectors. Refs are valid for the page state that produced them, not forever. Navigation, route changes, or a major DOM re-render invalidate them — call observe(...) again to get fresh refs.
  • CDP — the Chrome DevTools Protocol. Chrome, Chromium, and Playwright all speak it. If a browser exposes a CDP WebSocket URL, AgentBrowse can attach to it.

Optionally, AgentBrowse can call an LLM to understand pages at a higher level. That layer is called the assistive runtime and is only required for extract(...) and goal-driven observe(session, goal).

Install

npm i @mercuryo-ai/agentbrowse

If you want the operator-facing CLI command, install the separate global CLI package:

npm i -g @mercuryo-ai/agentbrowse-cli@latest

@mercuryo-ai/agentbrowse is the library package for imports. It does not install the agentbrowse shell command.

Quick Start

This is the normal managed-browser flow. It does not require LLM setup.

import {
  act,
  close,
  launch,
  navigate,
  observe,
  screenshot,
  status,
} from '@mercuryo-ai/agentbrowse';

const launchResult = await launch('https://example.com');
if (!launchResult.success) {
  throw new Error(launchResult.reason ?? launchResult.message);
}

const { session } = launchResult;

try {
  const observeResult = await observe(session);
  if (!observeResult.success) {
    throw new Error(observeResult.reason ?? observeResult.message);
  }

  const firstActionableTarget = observeResult.targets.find((target) => typeof target.ref === 'string');

  if (firstActionableTarget?.ref) {
    const actResult = await act(session, firstActionableTarget.ref, 'click');
    if (!actResult.success) {
      throw new Error(actResult.reason ?? actResult.message);
    }
  }

  const navigateResult = await navigate(session, 'https://example.com/checkout');
  if (!navigateResult.success) {
    throw new Error(navigateResult.reason ?? navigateResult.message);
  }

  const screenshotResult = await screenshot(session, '/tmp/checkout.png');
  if (!screenshotResult.success) {
    throw new Error(screenshotResult.reason ?? screenshotResult.message);
  }

  const statusResult = await status(session);
  if (!statusResult.alive) {
    throw new Error('Browser is no longer reachable.');
  }
} finally {
  await close(session);
}

Runnable examples live in examples/:

  • first run npm run build when executing them from this repo
  • npx tsx examples/basic.ts
  • npx tsx examples/attach.ts
  • npx tsx examples/extract.ts

The library entrypoint does not load .env files. Environment loading only happens in the CLI entrypoint.

Both launch(...) and attach(...) bootstrap the same sticky-owner lifecycle. That owner may live in-process or in an internal detached host, but it is not a user-managed daemon contract.

Attach To An Existing Browser

If you already have a browser that exposes a CDP WebSocket URL, use attach(...) instead of launch(...).

Common sources of a CDP URL:

  • a local Chrome or Chromium started with the --remote-debugging-port flag;
  • a managed cloud browser (Browserbase, Browserless, and similar) that hands you a WebSocket URL;
  • any other browser runtime Playwright can reach through CDP.
import { attach, observe } from '@mercuryo-ai/agentbrowse';

const attached = await attach('ws://127.0.0.1:9222/devtools/browser/browser-id');
if (!attached.success) {
  throw new Error(attached.reason ?? attached.message);
}

const observeResult = await observe(attached.session);
if (!observeResult.success) {
  throw new Error(observeResult.reason ?? observeResult.message);
}

If your provider gives you a labeled remote session, you can carry that label in the session handle:

const attached = await attach(remoteCdpUrl, {
  provider: 'browserbase',
});

attach(...) is not a separate reconnect mode. It is the second bootstrap path into the same sticky-owner execution model as launch(...). After attach succeeds, later browser commands reuse or repair that owner instead of performing a fresh provider-level root attach on every healthy command.

What Each Main API Does

API Use it when Typical result
launch(url?, options?) You need a new browser session session, current url, current title
attach(cdpUrl, options?) You already have a running browser that exposes CDP session, current url, current title
observe(session, goal?) You want to understand the page targets, scopes, signals, fillable forms
act(session, targetRef, action, value?) You want to click, type, select, fill, or press action result and target metadata
navigate(session, url) You want to move to another page page metadata after navigation
extract(session, schema, scopeRef?) You want structured JSON from the page data that matches your schema
screenshot(session, path?) You want a screenshot artifact saved path and page metadata
status(session) You want to know whether the session is still healthy liveness, page info, runtime summary
close(session) You are done with the browser close result

Two common questions:

  • observe(session) gives you a general inventory of the page.
  • observe(session, goal) focuses that inventory around a question such as "find the checkout total" or "find the email field".

All main APIs return the same broad result shape:

  • success path: { success: true, ... }
  • failure path: { success: false, error, message, reason, ... }

When You Need An Assistive Runtime

You only need assistive runtime when AgentBrowse should call an LLM.

In practice, that mainly means:

  • extract(...)
  • better quality goal-based observe(session, goal)

The runtime interface is intentionally small: you provide an object that can create an OpenAI-compatible chat-completions client.

// Pseudocode shape only. For a runnable fetch-based adapter, see
// `examples/extract.ts` and `docs/assistive-runtime.md`.
import { createAgentbrowseClient } from '@mercuryo-ai/agentbrowse';

const client = createAgentbrowseClient({
  assistiveRuntime: createMyFetchBackedRuntime(),
});

The same pattern works for OpenRouter and other OpenAI-compatible backends.

See:

Session Persistence, Proxy, And Diagnostics

Normal usage is explicit-session based:

  1. call launch(...) or attach(...)
  2. keep the returned session
  3. pass that session into later calls

If you want to restore a session across process runs, use the optional store helpers:

import {
  createBrowserSessionStore,
  loadBrowserSession,
  saveBrowserSession,
} from '@mercuryo-ai/agentbrowse';

saveBrowserSession(session);
const restored = loadBrowserSession();

const store = createBrowserSessionStore({
  rootDir: '/tmp/my-app/browser-state',
});

store.save(session);
const restoredFromCustomRoot = store.load();

Persisted session files contain versioned sticky-owner metadata, not a live Playwright connection. loadBrowserSession() and custom stores intentionally return null for incompatible reconnect-era records or incomplete owner metadata instead of auto-migrating them. After loading a session, call status(restored) or let the next browser command verify or repair ownership. If the underlying browser is gone, the session fails closed and you start fresh with launch(...) or attach(...).

There is no separate daemon API to supervise. close(session) is the public lifecycle boundary for shutting down the internal owner host and, when applicable, the managed browser itself.

If you want to use a proxy, pass it directly to launch(...):

const launchResult = await launch('https://example.com', {
  useProxy: true,
  proxy: 'http://user:pass@proxy.example:8080',
});

Diagnostics are optional. If you need tracing or custom logging, use a client:

import { createAgentbrowseClient } from '@mercuryo-ai/agentbrowse';

const client = createAgentbrowseClient({
  diagnostics: {
    startStep() {
      return {
        finish() {},
      };
    },
  },
});

See:

Testing Wrappers Around AgentBrowse

If your package wraps AgentBrowse and you want a stable test helper for the assistive runtime, use the dedicated testing subpath:

import {
  installFetchBackedTestAssistiveRuntime,
  uninstallTestAssistiveRuntime,
} from '@mercuryo-ai/agentbrowse/testing';

See:

Protected Fill

Protected fill is for cases where your application already has sensitive values and wants AgentBrowse to apply them to a previously observed form through a guarded browser execution path.

Import it separately:

import { fillProtectedForm } from '@mercuryo-ai/agentbrowse/protected-fill';

See:

Documentation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors