diff --git a/packages/bcode-browser/README.md b/packages/bcode-browser/README.md index e79eecdf0..3100920cb 100644 --- a/packages/bcode-browser/README.md +++ b/packages/bcode-browser/README.md @@ -12,7 +12,7 @@ See `decisions.md §1c` (three-level model) and `§1d` (this package) in the Bro | `src/browser-execute.ts` | In-process JS-eval `browser_execute` body. | | `src/session-store.ts` | Per-opencode-session CDP `Session` map. The agent calls `session.connect(...)` from a snippet; subsequent snippets find the same Session. | | `src/skills.ts` | Runtime resolver for embedded skills (extract on first call in compiled mode; in-tree path in dev). | -| `skills/` | `BROWSER.md` (the agent's prompt for `browser_execute`), `cloud-browser.md` (Way 3 — provision/stop a Browser Use cloud browser via raw HTTP from inside a snippet), and `interaction-skills/*.md` (UI mechanic reference docs). Embedded into the binary by `script/embed-skills.ts`. | +| `skills/` | `BROWSER.md` (the agent's prompt for `browser_execute`) and `cloud-browser.md` (Way 3 — provision/stop a Browser Use cloud browser via raw HTTP from inside a snippet). Embedded into the binary by `script/embed-skills.ts`. The interaction-skills set inherited from the Python harness was archived 2026-05-09 — we'll reintroduce only what evals show is needed, one skill at a time. | | `script/embed-skills.ts` | Build-time embed; emits `bcode-skills.gen.ts` consumed by the compiled binary. | | `test/` | `bun test` smoke coverage for the workspace dynamic-import pattern. | diff --git a/packages/bcode-browser/skills/BROWSER.md b/packages/bcode-browser/skills/BROWSER.md index ee13cae03..2a7670933 100644 --- a/packages/bcode-browser/skills/BROWSER.md +++ b/packages/bcode-browser/skills/BROWSER.md @@ -5,11 +5,11 @@ Use the `browser_execute` tool to run JavaScript against a connected browser via **Locations:** - Workspace (read/write your reusable scripts): `/.bcode/agent-workspace/`. The bcode CLI runs from the project root, so `./.bcode/agent-workspace/foo.ts` works directly with the `read`/`write`/`edit` tools. -- Skills (read-only reference docs): `{{SKILLS_DIR}}/`. Run `read {{SKILLS_DIR}}/interaction-skills/` to list every available interaction skill before reading any one of them. +- Skills (read-only reference docs): `{{SKILLS_DIR}}/`. Currently `BROWSER.md` (this file) and `cloud-browser.md`. ## The model in one paragraph -`browser_execute` evaluates whatever JS you write against `session`. There is no auto-loaded library, no privileged file, no helper namespace — just `session` and standard JS globals. To reuse code from a previous snippet, save it as a `.ts` file under `./.bcode/agent-workspace/` (using the `write` tool) and `await import("/abs/path?t=" + Date.now())` it from a later snippet. The import takes an **absolute** path — construct it from `process.cwd()` inside the snippet. Same mechanism for a 5-line wrapper and a 500-line script. Skills under `{{SKILLS_DIR}}/` are documentation you `read`, not modules you `import` — they teach you the CDP patterns; you write the code. +`browser_execute` evaluates whatever JS you write against `session`. There is no auto-loaded library, no privileged file, no helper namespace — just `session` and standard JS globals. To reuse code from a previous snippet, save it as a `.ts` file under `./.bcode/agent-workspace/` (using the `write` tool) and `await import("/abs/path?t=" + Date.now())` it from a later snippet. The import takes an **absolute** path — construct it from `process.cwd()` inside the snippet. Same mechanism for a 5-line wrapper and a 500-line script. ## Connecting @@ -148,8 +148,6 @@ await session.Page.captureScreenshot({ format: "png" }) // for the rare case you want to process it programmatically. ``` -For the full menu of UI mechanics — dropdowns, dialogs, iframes, shadow DOM, uploads, scrolling, screenshots-with-highlights — list `{{SKILLS_DIR}}/interaction-skills/` to see all available topics, then read the relevant one. - ## Switching browsers mid-session You own the connection. To swap: @@ -202,7 +200,7 @@ Cache-bust (`?t=${Date.now()}`) is your responsibility: without it, edits to the ## When something doesn't work - **`session.Page.navigate` hangs forever** → the page is showing a native dialog. Use `session.Page.handleJavaScriptDialog({ accept: true })` to dismiss. -- **Selectors don't find elements that you can see** → likely an iframe or shadow DOM. Read `{{SKILLS_DIR}}/interaction-skills/iframes.md` or `shadow-dom.md`. +- **Selectors don't find elements that you can see** → likely an iframe or shadow DOM. Walk frames via `Page.getFrameTree` / `Target.attachToTarget`, or pierce shadow roots with `element.shadowRoot.querySelector(...)`. - **Actions silently no-op** → the page is mid-load. After `Page.navigate`, await `session.waitFor("Page.loadEventFired")` before driving inputs. - **Connection refused, 403, or `WS closed before open` on connect()** → see the Way 1 failure-mode list above. Most often: the `chrome://inspect/#remote-debugging` checkbox isn't ticked, or the Chrome 144+ "Allow remote debugging?" popup hasn't been clicked. Pass `{ profileDir, timeoutMs: 30000 }` (Way 1, user's profile) to wait up to 30s for the click, or fall back to Way 2. - **Cloud `connect()` fails after a successful provision** → check that `cdp_url` came back in the POST response; some BU regions return `cdpUrl` (camelCase) — accept both. See `{{SKILLS_DIR}}/cloud-browser.md`. diff --git a/packages/bcode-browser/skills/interaction-skills/connection.md b/packages/bcode-browser/skills/interaction-skills/connection.md deleted file mode 100644 index b619c2e41..000000000 --- a/packages/bcode-browser/skills/interaction-skills/connection.md +++ /dev/null @@ -1,104 +0,0 @@ -# Connection & Tab Visibility - -## Just call `session.connect()` - -No args required. It scans OS-specific profile dirs for every running Chromium-based browser (Chrome, Chromium, Edge, Brave, Arc, Vivaldi, Opera, Comet, Canary), picks the most-recently-launched one whose WebSocket accepts, and attaches. Dead ports and permission-denied (403) candidates fall through in <100ms each, so the loop is fast. - -```js -await session.connect() -``` - -Inspect what's available (e.g. to let the user choose) with `detectBrowsers()`: - -```js -const browsers = await detectBrowsers() -// [{ name: 'Google Chrome', profileDir, port, wsPath, wsUrl, mtimeMs }, ...] -``` - -### Explicit forms (override auto-detect) - -Use only when auto-detect picks the wrong browser or you already know the destination. - -| Form | When | -|---|---| -| `{ profileDir }` | Target a specific running browser. Reads its `DevToolsActivePort` directly. OS-agnostic. | -| `{ wsUrl }` | You already have `ws://…/devtools/browser/`. | - -```js -await session.connect({ profileDir: '/Users//Library/Application Support/Google/Chrome' }) -await session.connect({ wsUrl: 'ws://127.0.0.1:9222/devtools/browser/' }) -``` - -### Timeouts and the Allow popup - -Per-candidate WS-open timeout defaults to **5s**. A live browser either opens or closes the connection within ~100ms, so 5s is always enough — unless the user has to click **Allow** on Chrome's remote-debugging popup. In that case, pass `timeoutMs: 30000` to give them time: - -```js -await session.connect({ profileDir, timeoutMs: 30_000 }) -``` - -If `session.connect()` reports `No detected browser accepted a connection`, it means every browser with `DevToolsActivePort` answered 403 or closed without opening — most likely the user hasn't clicked Allow yet. Ask them to, then retry. - -## The omnibox popup problem - -When Chrome opens fresh, the only CDP `type: "page"` targets may be `chrome://inspect` and `chrome://omnibox-popup.top-chrome/` (a 1px invisible viewport). If you attach to the omnibox popup, every subsequent action happens on a tab the user cannot see. - -`listPageTargets()` already filters `chrome://` and `devtools://` URLs. If you call `Target.getTargets` directly, filter these manually: - -```js -const { targetInfos } = await session.Target.getTargets({}) -const realTabs = targetInfos.filter(t => - t.type === 'page' && - !t.url.startsWith('chrome://') && - !t.url.startsWith('devtools://') -) -``` - -If no real pages exist yet, create one instead of attaching to nothing: - -```js -const tabs = await listPageTargets() -let targetId = tabs[0]?.targetId -if (!targetId) { - ({ targetId } = await session.Target.createTarget({ url: 'about:blank' })) -} -await session.use(targetId) -``` - -## Startup sequence - -```js -await session.connect() // 1. auto-detect the running browser -const tabs = await listPageTargets() // 2. real pages only (chrome:// already filtered) -let targetId = tabs[0]?.targetId -if (!targetId) { // 3. handle the empty case (fresh window, omnibox-only) - ({ targetId } = await session.Target.createTarget({ url: 'about:blank' })) -} -await session.use(targetId) // 4. route Page/DOM/Runtime/Network to that target -await session.Target.activateTarget({ targetId }) // 5. bring it visually to front -await session.Page.enable() // 6. enable the domains you need -``` - -## CDP target order ≠ visible tab-strip order - -When the user says "the first tab I can see", do NOT trust the order of `Target.getTargets`. Use: - -- A screenshot (`session.Page.captureScreenshot()`) to identify visually. -- Page title / URL heuristics. -- Or platform UI automation (macOS: AppleScript; Linux: `xdotool`/`wmctrl`). - -`Target.activateTarget` only switches to a targetId you already know — it cannot resolve "leftmost tab". - -## Bringing Chrome to front - -```bash -# macOS — prefer AppleScript over `open -a` (reuses current profile, avoids the profile picker) -osascript -e 'tell application "Google Chrome" to activate' - -# Linux (X11) — use wmctrl or xdotool -wmctrl -a 'Google Chrome' -xdotool search --name 'Google Chrome' windowactivate - -# Windows (PowerShell) -powershell -NoProfile -Command "(New-Object -ComObject WScript.Shell).AppActivate('Google Chrome')" -``` diff --git a/packages/bcode-browser/skills/interaction-skills/cookies.md b/packages/bcode-browser/skills/interaction-skills/cookies.md deleted file mode 100644 index f72984b72..000000000 --- a/packages/bcode-browser/skills/interaction-skills/cookies.md +++ /dev/null @@ -1,61 +0,0 @@ -# Cookies - -Use `Network.*` for cookies scoped to the attached page/context; use `Storage.getCookies` / `Storage.setCookies` for every cookie in the browser. - -## Read - -```js -await session.Network.enable({}) - -// All cookies visible to the attached page (current origin + its frames) -const { cookies } = await session.Network.getCookies({}) - -// Cookies for specific URLs -const { cookies: github } = await session.Network.getCookies({ - urls: ['https://github.com/'], -}) - -// Every cookie across the whole browser (requires Storage domain) -const { cookies: all } = await session.Storage.getCookies({}) -``` - -Shape: `{ name, value, domain, path, expires, size, httpOnly, secure, session, sameSite?, sourceScheme?, priority? }`. - -## Write - -```js -// Single cookie on the attached page -await session.Network.setCookie({ - name: 'session', - value: 'abc123', - domain: '.example.com', - path: '/', - secure: true, - httpOnly: true, - sameSite: 'Lax', - expires: Date.now() / 1000 + 86400, // seconds since epoch -}) - -// Bulk import (e.g. to preload an auth session) -await session.Network.setCookies({ - cookies: [ - { name: 'a', value: '1', domain: '.example.com', path: '/' }, - { name: 'b', value: '2', domain: '.example.com', path: '/' }, - ], -}) -``` - -## Delete / clear - -```js -await session.Network.deleteCookies({ name: 'session', domain: '.example.com' }) -await session.Network.clearBrowserCookies() // nukes everything in the default context -``` - -## Gotchas - -- `Network.setCookie` silently fails with no error if `domain` doesn't match any origin in the current profile — you'll get `{ success: true }` and the cookie just won't be there. Verify with `getCookies` after. -- `expires` is seconds (float), **not** milliseconds. A common mistake. -- Session cookies: pass no `expires` and Chrome treats them as session-scoped. Setting `expires: 0` also works. -- `sameSite` values are `'Strict'` | `'Lax'` | `'None'`. For `'None'`, Chrome also requires `secure: true`. -- Clearing cookies does NOT clear localStorage/IndexedDB. For a full logout, also call `Storage.clearDataForOrigin({ origin, storageTypes: 'all' })`. diff --git a/packages/bcode-browser/skills/interaction-skills/cross-origin-iframes.md b/packages/bcode-browser/skills/interaction-skills/cross-origin-iframes.md deleted file mode 100644 index d0909b570..000000000 --- a/packages/bcode-browser/skills/interaction-skills/cross-origin-iframes.md +++ /dev/null @@ -1,76 +0,0 @@ -# Cross-Origin Iframes (OOPIFs) - -Cross-origin iframes (stripe.com checkout, recaptcha, Salesforce Lightning, Azure blades) run in **out-of-process iframes (OOPIFs)** with their own CDP target. You cannot reach them via `contentDocument` from the parent. - -## First try: coordinate clicks - -Compositor-level input passes through OOPIFs transparently. If the thing you want is a button you can see in a screenshot, try this first — it's simpler, undetectable, and doesn't need attaching to anything: - -```js -// Click a "Pay" button inside a Stripe iframe by page coordinates -await session.Input.dispatchMouseEvent({ type: 'mousePressed', x, y, button: 'left', clickCount: 1 }) -await session.Input.dispatchMouseEvent({ type: 'mouseReleased', x, y, button: 'left', clickCount: 1 }) -``` - -Coordinate-based typing also works if you click first, then `Input.insertText`/`Input.dispatchKeyEvent`. - -## When you need DOM inside the OOPIF - -Find the iframe target and route Runtime/DOM calls to it. Remember the parent's `targetId` first so you can switch back: - -```js -// Capture the parent target before switching — `session.use` doesn't expose it. -const parentTargetId = (await session.Target.getTargets({})) - .targetInfos.find(t => t.type === 'page' && !t.url.startsWith('chrome://'))?.targetId - -const { targetInfos } = await session.Target.getTargets({}) -const iframe = targetInfos.find(t => t.type === 'iframe' && t.url.includes('stripe.com')) -if (!iframe) { - // OOPIF targets are lazy. Interact with the parent input first - // (a coordinate click on the card-number area), then re-query Target.getTargets. - throw new Error('Stripe iframe target not present yet — interact and retry') -} - -// Route subsequent calls to the iframe target -await session.use(iframe.targetId) - -await session.Runtime.enable() -const { result } = await session.Runtime.evaluate({ - expression: 'document.querySelector("[name=cardnumber]").value', - returnByValue: true, -}) - -// Switch back to the parent page when done -if (parentTargetId) await session.use(parentTargetId) -``` - -`session.use(iframe.targetId)` auto-attaches if not already attached, and routes Page/DOM/Runtime/Network to it. `Target.*` and `Browser.*` always hit the browser endpoint regardless of `use`. - -## Which target is which? - -`Target.getTargets` returns **all** OOPIFs in the page, flat. If multiple iframes share an origin (e.g. multiple Stripe Elements), you need more than URL to disambiguate: - -- Filter by URL path (`cardNumber` vs `cardExpiry` vs `cvc` in Stripe). -- Enumerate in DOM order from the parent: find all `