Give your AI agent a real browser and a disciplined way to operate it.
MagicBrowse is an LLM-first browser automation runtime for systems that need to work with real web pages. It drives Chromium over CDP with a planner and navigator loop, while still exposing deterministic browser verbs for precise click, type, fill, select, and press steps.
Your application stays in charge of business logic, credentials, approvals, and orchestration. MagicBrowse owns the browser session, page observation, goal-driven browser work, guarded deterministic actions, and diagnostics.
The package has four main parts:
- Browser sessions -
launch(...),attach(...),status(...),close(...), and persisted current-session state underMAGICBROWSE_HOME. - Page understanding -
observe(...)andscreenshot(...)return a redacted view of the active page and the actionable/fillable targets on it. - LLM-driven action -
act(...)runs the planner and navigator against an explicit LLM adapter and returns a typed terminal status. - Deterministic apply steps -
click(...),type(...),fill(...),select(...),press(...), open-data fill helpers, and protected-fill helpers execute known actions without asking the LLM to invent values.
Typical workflow:
- open a browser with
launch(...)or connect to one withattach(...); - ask what is on the page with
observe(...), or delegate a bounded goal withact(...); - use deterministic verbs when you already know the target and value;
- stop at protected-data, approval, login, or human-verification edges and let the owning application decide what happens next;
- close the session when the workflow is done.
That shape fits backend workers, CLIs, agent runtimes, browser handoff flows, and application-specific checkout or form automation.
Three terms come up repeatedly in the API:
session- the managed browser session returned bylaunch(...)orattach(...). Top-level calls use the current persisted session; the returned session object exposes the same operations pre-bound to that session id.targetRef- a stable reference returned byobserve(...)for an actionable or fillable page element. Refs are tied to the page state that produced them. Navigation, route changes, or a major DOM re-render can make old refs stale, so callobserve(...)again after meaningful page changes.- CDP - the Chrome DevTools Protocol. Chrome, Chromium, Browserbase, and other browser providers can expose a CDP endpoint that MagicBrowse can launch or attach to.
MagicBrowse does not load .env files in the library entrypoint. Host
applications own configuration loading and pass credentials or LLM adapters
explicitly.
npm i @mercuryo-ai/magicbrowseThe library is ESM-only and requires Node.js >=18.
If you want the operator-facing CLI, install the separate CLI package:
npm i -g @mercuryo-ai/magicbrowse-cli@latest@mercuryo-ai/magicbrowse is the library package for imports. It does not
install the magicbrowse shell command.
This example launches Chromium, asks the browser agent to read a page, and then closes the session.
import {
act,
close,
createDirectLlmAdapter,
launch,
} from '@mercuryo-ai/magicbrowse';
const llmAdapter = createDirectLlmAdapter({
provider: 'openrouter',
apiKey: process.env.OPENROUTER_API_KEY,
navigatorModel: 'google/gemini-2.5-flash',
plannerModel: 'google/gemini-2.5-pro',
});
const session = await launch({
url: 'https://news.ycombinator.com',
headless: true,
});
try {
const result = await act({
goal: 'Report the title and current score of the front-page top story, then stop.',
maxSteps: 5,
llmAdapter,
onEvent(event) {
process.stderr.write(`[${event.actor}.${event.state}] ${event.details}\n`);
},
});
if (result.status !== 'completed') {
throw new Error(result.finalMessage ?? `MagicBrowse stopped with ${result.status}`);
}
process.stdout.write(`${result.finalMessage ?? 'Done.'}\n`);
} finally {
await close({ sessionId: session.id });
}act(...) requires an explicit LLM. Pass either a MagicBrowseLlmAdapter or a
LangChain BaseChatModel. There is no package-level environment fallback.
The built-in direct adapter supports openai, azure-openai, anthropic,
deepseek, gemini, xai, groq, cerebras, ollama, openrouter,
llama, and custom provider families. Product CLIs and apps can pass their
own backend adapter instead.
Use observe(...) when your host application wants to inspect the page before
choosing a deterministic action.
import { click, launch, observe } from '@mercuryo-ai/magicbrowse';
await launch({ url: 'https://example.com', headless: true });
const page = await observe({
includeOrchestration: true,
});
const firstButton = page.orchestration?.actionTargets?.descriptors.find(
(target) => target.kind === 'button'
);
if (firstButton) {
await click({ target: firstButton });
}Deterministic verbs do not call the LLM. They take the observed target plus caller-supplied input and dispatch through the same browser executor used by the LLM-driven path.
If a browser already exposes a CDP endpoint, use attach(...) instead of
launch(...).
Common CDP sources:
- local Chrome or Chromium started with
--remote-debugging-port; - Browserbase or another managed browser provider;
- any browser runtime that gives you a CDP WebSocket endpoint.
import { attach, observe } from '@mercuryo-ai/magicbrowse';
const session = await attach({
cdpUrl: 'ws://127.0.0.1:9222/devtools/browser/browser-id',
});
const page = await observe({ sessionId: session.id });
process.stdout.write(`${page.plannerView}\n`);Attach is not a different API family. It is the second entry point into the
same session manager, run store, and action executor used by launch(...).
MagicBrowse can create or attach to Browserbase sessions when the host provides Browserbase credentials in the process environment.
import { launch } from '@mercuryo-ai/magicbrowse';
await launch({
cloud: true,
url: 'https://example.com',
});Required Browserbase environment variables:
BROWSERBASE_API_KEYBROWSERBASE_PROJECT_ID
Optional variables include BROWSERBASE_REGION and BROWSERBASE_TIMEOUT.
Cloud session metadata is persisted in the MagicBrowse session state, while
provider connection URLs are redacted from returned data.
| API | Use it when | Typical result |
|---|---|---|
launch(options?) |
You need a new owned browser session | managed session, current session file, run record |
attach(options) |
You already have a running CDP browser | managed session bound to the external browser |
currentSession() |
You want the current persisted session handle | session object or undefined |
status() |
You want to know whether the current session is reachable | browser_alive, browser_not_running, or browser_mismatch |
observe(options?) |
You want a redacted page inventory | page identity, signals, action targets, fillable targets |
act(options) |
You want goal-driven browser work through planner + navigator | typed terminal status, steps, final message, optional handoff |
screenshot(options?) |
You want a screenshot of the active page | screenshot bytes or saved file metadata |
click(options) |
You know which observed target to click | deterministic action result |
type(options) |
You want to type text into an observed target | deterministic action result |
fill(options) |
You want to clear and fill an observed target | deterministic action result |
select(options) |
You want to choose an option on an observed select target | deterministic action result |
press(options) |
You want to send a key chord to the focused page | deterministic action result |
fillOpenDataTarget(options) |
You want to apply non-protected user-provided data | field-level fill result |
fillProtectedGroup(options) |
You want to apply protected data through a guarded reader | group-level protected fill result |
submitFormTarget(options) |
You want to submit a known observed form target | submit result |
markCaptchaResolved(options?) |
A human or approved solver resolved CAPTCHA on the current page | single-use trusted continuation marker |
close(options?) |
You are done with the browser | owned browser closed or attached session cleared |
Use machine-readable fields for control flow. finalMessage is for humans and
logs, not for parsing.
MagicBrowseActResult.status is intentionally typed:
completed- the delegated browser task completed.blocked- the task cannot continue without missing input, a different strategy, or a page state change.needs_handoff- the task reached protected data, login/OTP, payment, identity/KYC data, CAPTCHA, or human verification.needs_approval- the next useful action would commit a consequential external side effect and needs explicit approval.failed- runtime, model, or browser-tool failure.max_steps- the step budget ended before a terminal decision.cancelled- caller or user cancellation.
For protected forms, needs_handoff may include
handoff: { kind: 'protected_form', resumeObjective }. After the approved
protected-data owner completes the fill, call act(...) again with that narrow
resumeObjective to continue.
MagicBrowse separates ordinary open-data fills from protected-data fills.
Open data is information the user has already provided clearly for the current task, such as a city, travel date, quantity, or public profile detail.
import { fillOpenDataTarget, observe } from '@mercuryo-ai/magicbrowse';
const page = await observe({ includeOrchestration: true });
const cityTarget = page.orchestration?.fillableTargets?.descriptors.find(
(target) => target.label?.toLowerCase().includes('city')
);
if (cityTarget) {
await fillOpenDataTarget({
target: cityTarget,
value: {
value: 'Singapore',
source: 'user',
},
});
}Protected data is sensitive information such as passwords, OTPs, payment card data, identity documents, bank details, API keys, private keys, and secrets. MagicBrowse can apply protected values only through an explicit artifact reader provided by the caller. The LLM is not asked to invent or handle those values.
Core rules:
- Protected values are not sent to the planner or navigator.
- Observe output and run records are redacted after protected fills.
- Login, identity, checkout, payment, CAPTCHA, and human-verification edges
stop as
needs_handoffunless the caller uses the dedicated guarded path. - Consequential external actions should go through the caller's approval flow before the browser continues.
MagicBrowse stores current-session state and run diagnostics under
MAGICBROWSE_HOME.
Default location:
~/.magicbrowse/
Important files:
current-session.json- active session pointer, CDP endpoint, page identity, and run id.runs/<runId>.json- session events, act starts/results, executor events, LLM trace markers, page identity changes, close/detach events, and redacted browser observations.run-index.json- mapping from session ids to run ids, so repeatedact(...)calls inside one session append to the same record.
Use a separate MAGICBROWSE_HOME per concurrent workflow. The default current
session file is a singleton.
MAGICBROWSE_HOME=/tmp/my-workflow node ./my-browser-worker.mjsMagicBrowse is designed for browser delegation, not unchecked autonomy.
- It should not type passwords, OTPs, payment details, identity details, API
keys, private keys, or secrets through the LLM-driven
act(...)path. - It should not bypass CAPTCHA or human verification.
- It should not submit consequential external actions without caller approval.
- It should not invent missing user data.
- It should fail closed when semantic field matching is unavailable, invalid, or uncertain.
These boundaries are part of the runtime contract, not only prompt wording.
The CLI lives in @mercuryo-ai/magicbrowse-cli.
npm i -g @mercuryo-ai/magicbrowse-cli@latest
magicbrowse --helpCommon commands:
magicbrowse init <apiKey>
magicbrowse doctor
magicbrowse launch https://example.com
magicbrowse observe
magicbrowse act "Find the checkout button and stop before payment"
magicbrowse closeThe CLI owns its credential loading and product integration behavior. The library package remains explicit and host-controlled.
npm install
npm run check-types
npm run build
npm run lint
npm run pack:verify
npm run smoke:pack-installLive browser-agent tests require provider credentials and are intentionally not part of the default public verification workflow.
Mercuryo-authored code is licensed under the MIT License. MagicBrowse also
includes adapted Nanobrowser components under Apache-2.0; see
THIRD_PARTY_NOTICES.md for provenance and license
details.
- Public package: https://www.npmjs.com/package/@mercuryo-ai/magicbrowse
- Source repository: https://github.com/MercuryoAI/magicbrowse
- CLI package: https://www.npmjs.com/package/@mercuryo-ai/magicbrowse-cli
- Issues: https://github.com/MercuryoAI/magicbrowse/issues
Vendor adaptation notes live in vendor-notes/ for
engineers who need to understand the bundled browser/planner internals.