Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ sync: _check-ansible
--rsync-path="sudo -u anton rsync" \
-e "ssh $$SSH_OPTS" \
"$(REPO_ROOT)/" "$$USER@$$IP:$(REMOTE_REPO)/" 2>&1; \
echo " ○ Installing remote dependencies..."; \
ssh $$SSH_OPTS "$$USER@$$IP" "cd $(REMOTE_REPO) && sudo -u anton bash -c 'pnpm --config.confirmModulesPurge=false --filter=\"./packages/*\" --filter=\"!@anton/desktop\" --filter=\"!@anton/mobile\" install --frozen-lockfile'" 2>&1 || exit 1; \
echo " ○ Building on remote..."; \
ssh $$SSH_OPTS "$$USER@$$IP" "cd $(REMOTE_REPO) && sudo -u anton bash -c 'pnpm -r --filter=\"./packages/*\" --filter=\"!@anton/desktop\" --filter=\"!@anton/mobile\" build'" 2>&1 | tail -10; \
echo " ○ Rebuilding native modules on remote..."; \
Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
],
"pnpm": {
"onlyBuiltDependencies": [
"agent-browser",
"better-sqlite3"
]
},
Expand Down
4 changes: 2 additions & 2 deletions packages/agent-config/prompts/system.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ You are a doer, and a describer. When the user asks you to do something, use you
- **glob**: Find files by pattern (e.g. "*.ts", "**/*.tsx"). **Always use instead of shell find/ls.**
- **list**: List directory contents or show directory tree structure.
- **browser**: Browse and interact with web pages. Two modes:
- **fetch/extract**: Fast, lightweight content retrieval (no JS). Use for reading articles, docs, APIs behind the scenes.
- **open** (+ snapshot/click/fill/scroll/screenshot/get/wait/close): Full browser automation with live screenshots shown in the user's sidebar. **Use `open` when the user asks to visit, browse, scrape, or interact with a website** — this shows the browser UI live. Chromium auto-installs on first use.
- **fetch/extract**: Fast Lightpanda engine for reading and extracting pages behind the scenes.
- **open** (+ snapshot/click/fill/scroll/screenshot/get/wait/close): Visible Chromium browser with persistent Anton cookies/profile and live stream in the user's Browser pane. **Use `open` when the user asks to visit, browse, preview localhost, scrape an app, or interact with a website** — this shows the browser UI live.
- **web_search**: Fast single-pass web search (Exa). Use for **single-fact lookups**, **finding a specific URL**, **quick time-sensitive checks** (price, score, "is X live"), or when you already know what you're searching for. **Do NOT loop this tool** to answer research questions — switch to `web_research`.
- **web_research**: Deep multi-hop research (Parallel). Runs several queries in parallel, fetches pages, synthesises excerpts, returns research-grade results with citations. **PREFER THIS** whenever the user asks for: "give me a brief on X", "overview / background / writeup of X", "research / investigate / look into X", "due diligence on X", "find me reliable sources on X", "what's known about X", "what's the latest on X", "compare X and Y", "X vs Y", or any question you'd otherwise answer with 3+ back-to-back `web_search` calls. One `web_research` call replaces an entire research loop. If not configured, guide the user to enable the Deep Research connector in Settings → Connectors — do NOT silently fall back to looping `web_search`.

Expand Down
8 changes: 2 additions & 6 deletions packages/agent-core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,14 @@
"@anton/protocol": "workspace:*",
"@mariozechner/pi-agent-core": "^0.60.0",
"@mariozechner/pi-ai": "^0.60.0",
"@mozilla/readability": "^0.6.0",
"@sinclair/typebox": "^0.34.0",
"agent-browser": "0.26.0",
"autoevals": "^0.0.80",
"braintrust": "^0.0.182",
"linkedom": "^0.18.12",
"marked": "^18.0.0",
"playwright": "^1.52.0",
"turndown": "^7.2.2"
"marked": "^18.0.0"
},
"devDependencies": {
"@types/node": "^22.0.0",
"@types/turndown": "^5.0.6",
"tsx": "^4.0.0",
"typescript": "^5.6.0"
}
Expand Down
4 changes: 3 additions & 1 deletion packages/agent-core/src/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,8 @@ export interface ToolCallbacks {
screenshot?: string
lastAction: import('@anton/protocol').BrowserAction
elementCount?: number
stream?: import('@anton/protocol').BrowserStreamState
engine?: import('@anton/protocol').BrowserEngine
}) => void
/** Callback when the browser is closed. */
onBrowserClose?: () => void
Expand Down Expand Up @@ -468,7 +470,7 @@ export function buildTools(
defineTool({
name: BROWSER_TOOL_NAME,
label: 'Browser',
description: `Web browsing and browser automation. Two modes:\n• **fetch/extract** — Fast, lightweight. Use for reading articles, docs, APIs behind the scenes. No JS execution.\n• **open/snapshot/click/fill/scroll/screenshot/get/wait/close** — Full browser with live screenshots shown in the user sidebar. Use \`open\` when the user asks to visit, browse, scrape, or interact with a website. Chromium auto-installs on first use.\nFor local files, use the ${READ_TOOL_NAME} tool.`,
description: `Web browsing and browser automation. Two modes:\n• **fetch/extract** — Fast Lightpanda engine for reading and extracting pages behind the scenes.\n• **open/snapshot/click/fill/scroll/screenshot/get/wait/close** — Visible Chromium browser with persistent Anton cookies/profile and live stream in the desktop Browser pane. Use \`open\` when the user asks to visit, browse, preview localhost, scrape an app, or interact with a website.\nFor local files, use the ${READ_TOOL_NAME} tool.`,
parameters: Type.Object({
operation: Type.Union(
[
Expand Down
2 changes: 2 additions & 0 deletions packages/agent-core/src/harness/codex-harness-session.ts
Original file line number Diff line number Diff line change
Expand Up @@ -879,6 +879,8 @@ export class CodexHarnessSession {
screenshot?: string
lastAction: import('@anton/protocol').BrowserAction
elementCount?: number
stream?: import('@anton/protocol').BrowserStreamState
engine?: import('@anton/protocol').BrowserEngine
}) {
this.emit({ type: 'browser_state', ...state })
}
Expand Down
2 changes: 2 additions & 0 deletions packages/agent-core/src/harness/harness-session.ts
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,8 @@ export class HarnessSession {
screenshot?: string
lastAction: import('@anton/protocol').BrowserAction
elementCount?: number
stream?: import('@anton/protocol').BrowserStreamState
engine?: import('@anton/protocol').BrowserEngine
}) {
this.pushEvent?.({ type: 'browser_state', ...state })
}
Expand Down
10 changes: 9 additions & 1 deletion packages/agent-core/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,15 @@ export {
type ResolvedProviderToken,
} from './tools/factories.js'
export { initTracing, flushTraces, hashPromptVersion, logSpanFeedback } from './tracing.js'
export { closeBrowserSession } from './tools/browser.js'
export {
closeBrowserSession,
executeBrowser,
getBrowserRuntimeStatus,
installBrowserRuntime,
refreshVisibleBrowserState,
setVisibleBrowserViewport,
type BrowserCallbacks,
} from './tools/browser.js'
export {
type HarnessAdapter,
ClaudeAdapter,
Expand Down
4 changes: 4 additions & 0 deletions packages/agent-core/src/session.ts
Original file line number Diff line number Diff line change
Expand Up @@ -458,6 +458,8 @@ export type SessionEvent =
screenshot?: string
lastAction: import('@anton/protocol').BrowserAction
elementCount?: number
stream?: import('@anton/protocol').BrowserStreamState
engine?: import('@anton/protocol').BrowserEngine
}
| { type: 'browser_close' }

Expand Down Expand Up @@ -981,6 +983,8 @@ export class Session {
screenshot?: string
lastAction: import('@anton/protocol').BrowserAction
elementCount?: number
stream?: import('@anton/protocol').BrowserStreamState
engine?: import('@anton/protocol').BrowserEngine
}) {
this.pushEvent?.({ type: 'browser_state', ...state })
}
Expand Down
24 changes: 12 additions & 12 deletions packages/agent-core/src/tools/browser-factory.ts
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
/**
* `browser` — fetch / extract / Playwright automation. Lifted out of
* `browser` — Lightpanda-backed fetch / extract plus visible Chrome automation.
* Lifted out of
* agent.ts so the harness MCP shim can hand it to Codex / Claude Code.
*
* The lightweight `fetch` / `extract` operations don't need any
* callbacks. The full-browser operations (`open` / `snapshot` /
* The lightweight `fetch` / `extract` operations use agent-browser's
* Lightpanda engine and don't need callbacks. The full-browser operations (`open` / `snapshot` /
* `click` / `fill` / `scroll` / `screenshot` / `get` / `wait` /
* `close`) drive `onBrowserState` to push live screenshots into the
* desktop sidebar — same callback shape Pi SDK uses.
* `close`) use agent-browser's Chrome engine with an Anton-owned persistent
* profile and drive `onBrowserState` to push live state into the desktop
* browser pane — same callback shape Pi SDK uses.
*
* Note on per-session scoping: the underlying Playwright instance in
* `tools/browser.ts` is process-scoped today, just like in Pi SDK. If
* we ever run multiple harness sessions concurrently driving a real
* browser, we'll need to scope it per-session. For now the constraint
* matches Pi SDK's, so behavior is identical.
* Note on per-session scoping: agent-browser sessions are named. The default
* visible browser uses `anton-visible`; the background Lightpanda browser uses
* `anton-lightpanda`.
*/

import type { AgentTool } from '@mariozechner/pi-agent-core'
Expand All @@ -26,8 +26,8 @@ export function buildBrowserTool(callbacks?: BrowserCallbacks): AgentTool {
label: 'Browser',
description:
'Web browsing and browser automation. Two modes:\n' +
'• fetch/extract — Fast, lightweight. Use for reading articles, docs, APIs behind the scenes. No JS execution.\n' +
'• open/snapshot/click/fill/scroll/screenshot/get/wait/close — Full browser with live screenshots shown in the user sidebar. Use `open` when the user asks to visit, browse, scrape, or interact with a website. Chromium auto-installs on first use.\n' +
'• fetch/extract — Fast Lightpanda engine for reading and extracting pages behind the scenes.\n' +
'• open/snapshot/click/fill/scroll/screenshot/get/wait/close — Visible Chromium browser with persistent Anton cookies/profile and live stream in the desktop Browser pane. Use `open` when the user asks to visit, browse, preview localhost, scrape an app, or interact with a website.\n' +
'For local files, use the read tool instead.',
parameters: Type.Object({
operation: Type.Union(
Expand Down
Loading