[STG-2404] feat(cli): attribute usage via install_id + cli_version on sessions and cloud API headers#2277
Conversation
…nd cloud API headers Stamp a stable anonymous install_id and cli_version onto remote browser session userMetadata and onto cloud Search/Fetch request headers so CLI-driven Browserbase usage is attributable to an install/account. - Extract the anonymous install-id resolver into a shared lib/identity.ts module with an async resolveInstallId (memoized) + sync peekInstallId cache, plus a getCliVersion helper that reads the package's own version (the same source oclif uses). Telemetry imports from here; behavior is byte-identical (same marker path, UUID-on-miss, swallow-on-failure). - remote.ts: remoteStagehandOptions() is now async and adds install_id + cli_version to userMetadata; falls back to browse_cli + cli_version if the install id can't be resolved. Interface, local-only stub, and the session-manager call site updated accordingly. - api.ts: send x-bb-client and (when resolved) x-bb-install-id on both the raw request helper (search) and the SDK client defaultHeaders (fetch). Install-id resolution is kicked off lazily and never blocks a request. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 855738c The changes in this PR will be included in the next version bump. This PR includes changesets to release 0 packagesWhen changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
4 issues found across 8 files
Confidence score: 3/5
packages/cli/src/lib/identity.tscan race on concurrent first-run install-id creation, so different processes may persist/use different IDs and produce inconsistent attribution or downstream identity behavior — reserve the marker file with exclusive create and onEEXISTread and reuse the persisted value before merging.packages/cli/src/lib/driver/remote.disabled.tsexposes a Promise-shaped API but throws synchronously, which can bypass callers’.catchhandling and surface as an unhandled error path in local-only mode — return a rejected Promise instead (or make callers handle sync throws explicitly) to align behavior.packages/cli/src/lib/cloud/api.tsandpackages/cli/src/lib/driver/remote.tsadd attribution/header logic without focused coverage, so regressions in always-on fields vs conditionalinstall_idcould slip through unnoticed — add targeted unit/integration tests for header/session success and fallback paths before merging.
Architecture diagram
sequenceDiagram
participant CLI as CLI Command
participant Identity as identity.ts
participant Telemetry as telemetry.ts
participant Remote as remote.ts
participant CloudAPI as cloud/api.ts
participant SDK as Browserbase SDK
participant Platform as Browserbase Platform
Note over CLI,Platform: Module Load Time
CloudAPI->>CloudAPI: Kick off resolveInstallId(process.env)
CloudAPI->>Identity: resolveInstallId()
Identity->>Identity: Check cachedInstallId
alt Cache miss (first call)
Identity->>Identity: Read marker file from disk
alt File not found
Identity->>Identity: Generate UUID
Identity->>Identity: Write marker file (best-effort)
end
Identity->>Identity: Cache result
end
Identity-->>CloudAPI: install_id (promise, not awaited here)
Note over CLI,Platform: Remote Session Flow (e.g., browse open --remote)
CLI->>Remote: remoteStagehandOptions()
Remote->>Identity: getCliVersion()
Identity-->>Remote: "x.y.z"
Remote->>Remote: Build userMetadata { browse_cli: "true", cli_version: "x.y.z" }
Remote->>Identity: resolveInstallId(process.env)
alt Resolution succeeds
Identity-->>Remote: "uuid-abc-123"
Remote->>Remote: Add install_id to userMetadata
else Resolution fails (caught)
Remote->>Remote: Continue without install_id
end
Remote-->>CLI: StagehandConstructorOptions with userMetadata
CLI->>SDK: Stagehand.init() with userMetadata
SDK->>Platform: Create session with userMetadata
Platform-->>SDK: Session created
SDK-->>CLI: Stagehand instance
Note over CLI,Platform: Cloud Search/Fetch Flow (e.g., browse cloud search)
CLI->>CloudAPI: search() / fetch()
Note over CloudAPI: attributionHeaders() called
CloudAPI->>Identity: getCliVersion()
Identity-->>CloudAPI: "x.y.z"
CloudAPI->>CloudAPI: Set x-bb-client: browse-cli/x.y.z
CloudAPI->>Identity: peekInstallId()
alt install_id already resolved
Identity-->>CloudAPI: "uuid-abc-123"
CloudAPI->>CloudAPI: Set x-bb-install-id: uuid-abc-123
else Still resolving or failed
Identity-->>CloudAPI: undefined
CloudAPI->>CloudAPI: Omit x-bb-install-id header
end
alt Raw request path (search)
CloudAPI->>Platform: GET /v1/search with x-bb-client + possibly x-bb-install-id
Platform-->>CloudAPI: Search results
else SDK client path (fetch)
CloudAPI->>SDK: createBrowserbaseClient() with defaultHeaders
SDK->>Platform: GET /v1/fetch with x-bb-client + possibly x-bb-install-id
Platform-->>SDK: Fetch response
SDK-->>CloudAPI: Markdown/text
end
CloudAPI-->>CLI: Command output
Note over CLI,Platform: Telemetry Flow (unchanged, now shared)
CLI->>Telemetry: initialize telemetry
Telemetry->>Identity: resolveInstallId(env, sessionId)
Identity-->>Telemetry: install_id (same as above)
Telemetry->>Platform: POST telemetry events with install_id
Tip: cubic used a learning from your PR history. Let your coding agent read cubic learnings directly with the cubic MCP.
Fix all with cubic | Re-trigger cubic
The Browserbase session-create endpoint validates userMetadata values against [\w\-_,;:.()&$%#@!?~] and rejects anything else with HTTP 400. A semver version with a +build suffix (or any other unexpected char) would break every remote session. Add `toMetadataValue()` in identity.ts to strip disallowed chars and truncate to 64 chars, then wrap cli_version and install_id before stamping them into userMetadata. HTTP headers are unaffected (no validator constraint there). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
…ub, attribution tests - remote.disabled.ts: make remoteStagehandOptions() properly async so Promise .catch() handlers catch the rejection (was throwing synchronously despite a Promise return type) - identity.ts: make install-id persistence atomic using exclusive-create (open flag 'wx') + EEXIST re-read fallback; concurrent first-run processes now converge on the same id rather than racing on a non-atomic writeFile (this was pre-existing behavior moved from telemetry, now hardened) - tests/identity-attribution.test.ts: focused unit tests for toMetadataValue (allowed chars, truncation, semver +build suffix), remoteStagehandOptions userMetadata (browse_cli+cli_version always present, install_id included on success, omitted on failure without throwing) - tests/remote-disabled.test.ts: update remoteStagehandOptions assertion to use rejects.toThrow() matching the now-async signature Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
The previous EEXIST recovery path read back the marker file once; if that
file was EMPTY (another process created it via 'wx' but hadn't written yet,
or a stale empty marker), the re-read returned "" and the function fell
through to return a non-persisted in-memory id — diverging across
concurrent first runs and defeating the intended convergence to one id.
Rewrite resolveAnonymousInstallId as a bounded retry loop (5 attempts):
1. read marker; non-empty id wins → return it
2. exclusive-create ('wx') + write our id → we won the race, return it
3. EEXIST (file exists but empty) → tiny backoff (1<<attempt ms) and loop
so the next read picks up the winner's id
4. non-EEXIST open error (read-only/permissions) → best-effort in-memory id
After the loop (marker stayed empty, no winner ever wrote) → take ownership
with a truncating writeFile so we still return a persisted id.
Net: returns an existing non-empty id if present; otherwise persists and
returns one. No non-persistent return path remains.
Add a focused test for the empty-marker case: pre-create an EMPTY file at
the install-id path, call resolveInstallId, assert it returns a non-empty
id AND the file is now non-empty and equals the returned id. Also cover
existing-id passthrough and fresh-create persistence. Uses vi.resetModules()
to defeat the module-level install-id cache between cases.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
…ge.json fs read getCliVersion previously read the CLI's own package.json from disk on first use. Replace that with a single source of truth: seed the version once from oclif's `this.config.version` at startup. - identity.ts: remove the package.json read (and the readFileSync import). Add `setCliVersion(version)` (stores only truthy values) and make `getCliVersion()` return the cached value or "unknown" — no fs. - base.ts: add a `BrowseCommand.init()` override that calls `setCliVersion(this.config.version)` after super.init(). Every command extends BrowseCommand, so the cache is seeded in whichever process builds a session or header. Process coverage (the crux): the background `browse daemon` process — spawned detached by daemon/client.ts:spawnDaemon — is what actually builds the remote userMetadata and creates the Browserbase session (session-manager.ts → remoteStagehandOptions). Because `Daemon extends BrowseCommand` and oclif runs init() before run(), the daemon seeds setCliVersion(this.config.version) before runDriverDaemon ever builds userMetadata. No reliance on env/argv plumbing. Verified live: a daemon-created remote session reports cli_version="0.9.0" (not "unknown"), and `cloud fetch` sends x-bb-client: browse-cli/0.9.0. - tests: getCliVersion now returns "unknown" when unseeded (no fs fallback); add focused setCliVersion/getCliVersion tests (seeded value, empty-seed ignored, prior value preserved on empty re-seed); update the remoteStagehandOptions test to seed via setCliVersion and assert the seeded cli_version. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
…undary Add a focused test that loads the package's real oclif Config, runs a BrowseCommand subclass's init() lifecycle, and asserts getCliVersion() is seeded from config.version (not "unknown") afterward. This exercises the exact path used by both the foreground command and the background `browse daemon` process that creates Browserbase sessions. Uses fileURLToPath for a Windows-safe config root. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eate Until now only the driver `open --remote` path stamped attribution onto sessions; `cloud sessions create` set no userMetadata at all, so those CLI-created sessions were unattributable. Stamp browse_cli + install_id + cli_version onto the create body (sanitized via toMetadataValue), preserving any user-supplied userMetadata while keeping the attribution keys authoritative. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
…o prevent spoofing When install-id resolution fails, existing user-supplied `install_id` from --body/--stdin was retained in the final userMetadata. Strip it before spreading `existing` so the field is only present when we resolve it authoritatively. Add a focused regression test: caller passes install_id in --body with no marker file → final body must not contain the spoofed value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
…all-id with migration from legacy path Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The override path is creatable, so resolveInstallId succeeds with a fresh UUID rather than failing; the test's real guarantee is that the caller's spoofed install_id is stripped. Reword the comment and drop the inaccurate "when resolution fails" from the test name (Cubic P3).
What & why
CLI-driven Browserbase usage isn't fully attributable today:
userMetadata.browse_cli:"true", but carry no install/version, so we can't tie usage to an install or correlate with the CLI's anonymous PostHog telemetry.browse cloud search(/v1/search) andbrowse cloud fetchcreate no session at all, so they're invisible in session metadata.This PR stamps a stable anonymous
install_id+cli_versiononto both paths.Changes (
packages/clionly)src/lib/identity.ts— single source for install identity:resolveInstallId(async, memoized, atomic write via exclusive-create + EEXIST re-read),peekInstallId(sync, never blocks),getCliVersion, andtoMetadataValue(sanitizes session-metadata values). install-id logic moved verbatim out oftelemetry.ts; its tests pass unchanged.driver/remote.tsremoteStagehandOptions()adds sanitizedinstall_id+cli_versiontouserMetadata(made async; resolver awaited with a safe fallback so telemetry never throws). Interface, local-only stub (now properly async so.catchworks), and the call site updated.lib/cloud/api.tssendsx-bb-client: browse-cli/<version>(+x-bb-install-idwhen resolved) on both transports: the rawrequestBrowserbaseJsonhelper (coverssearch, sessions, contexts, projects, extensions) andcreateBrowserbaseClient()defaultHeaders(coversfetch, functions). Never emits empty-value headers.Why sanitize values
Browserbase session-create runs
validateMetadataObject— values must match[\w\-_,;:.()&$%#@!?~]and total ≤512 chars. A+buildsemver would otherwise 400 every remote session, socli_version/install_idare passed throughtoMetadataValue()before reachinguserMetadata. (HTTP headers are unconstrained, so the full version stays inx-bb-client.)E2E Test Matrix
<local build> open https://example.com --remote→cloud sessions list --status RUNNINGuserMetadata={stagehand:"true", browse_cli:"true", install_id:"<uuid>", cli_version:"0.9.0"}install_idequals the on-disk marker. (Server-side ingestion out of scope.)cloud search "..."andcloud fetch https://example.comagainst a local capture server (--base-url)/v1/searchand/v1/fetcheach receivedx-bb-client: browse-cli/0.9.0+x-bb-install-id: <uuid>cloud search "..."andcloud fetch https://example.com(live API)200+ markdowndist/lib/identity.js: seed legacy~/…/cli/telemetry-id=1111…5555, thenresolveInstallIdwithXDG_CONFIG_HOMEpointed at a temp dir1111…5555; new<tmp>/browserbase/install-idcontains1111…5555(fresh-dir case mints a new uuid instead)~/Library/.../telemetry-idid copied to~/.config/browserbase/install-id, legacy file intact.dist: (a)HOME=/tmp/...writable, (b) read-only0555dir, (c)ENOTDIRunder/dev/null, (d) unwritableHOME=/var/empty, plus realbrowse --versionandbrowse cloud sessions listunder unwritableHOME/tmp/.../.config/browserbase/install-id; (b)(c)(d) return a valid in-memory UUID, nothing written, no throw; real CLI exits cleanly (version prints; cloud cmd returns a clean401, not an FS crash); no illegal writes under/var/empty.catch. On Lambda (HOME=/tmp) it persists; if the dir/file is unwritable it degrades to a per-invocation in-memory id without failing the command.turbo run build --filter=browse·pnpm lint·pnpm test:cliReview follow-ups (addressed)
All 5 Cubic threads resolved: async local-only stub (so
.catchworks), atomic install-id write (race-safe on concurrent first runs — pre-existing behavior, hardened), and 12 focused unit tests fortoMetadataValue(allowed-char filtering,+buildstripping, truncation, UUID round-trip), the attribution headers, and theremoteStagehandOptionssuccess + fallback paths.Dependency / follow-up (not in this PR)
Session
userMetadatakeys are queryable in Snowflake today (STG_SESSIONS.SESSION_METADATA). The search/fetch headers only become useful once Platform logsx-bb-client/x-bb-install-idon those endpoints (/v1/fetchhas an unpopulatedfetch_tasks.headerscolumn not yet in the Estuary mirror;/v1/searchwrites no DB row) — tracked as a server-side follow-up.Update — also tags
cloud sessions createExtended attribution to the
browse cloud sessions createpath too (previously only the driveropen --remotepath carried it): itsuserMetadatanow includesbrowse_cli+install_id+cli_version(sanitized viatoMetadataValue), merged with any user-supplied--bodymetadata while keeping the attribution keys authoritative — a user can't spoofbrowse_clito"false". So every CLI-created session is attributable — driver andcloud sessions create. Verified live:cloud sessions create→ readbackuserMetadata: { browse_cli:"true", install_id:"…", cli_version:"0.9.0" }→ released. +2 tests (292 total cli tests).Update — standardized install-id path (review follow-up)
Per review (thanks @pirate), moved the anonymous install-id marker off the bespoke per-OS path (
~/Library/Application Support/Browserbase/cli/telemetry-id,%APPDATA%/Browserbase/cli/telemetry-id,<xdg>/browserbase/cli/telemetry-id) to the standardized~/.config/browserbase/install-id— consistent with core (BROWSERBASE_CONFIG_DIR) and the CLI's own~/.config/browserbase/skills. HonorsBROWSERBASE_CONFIG_DIR, falls back toXDG_CONFIG_HOME/~/.configon every platform; theBROWSERBASE_TELEMETRY_INSTALL_ID_FILEoverride still short-circuits everything (incl. migration).Backwards-compatible: if the canonical file is absent but a legacy marker exists, its UUID is copied forward so existing installs keep their stable id — no attribution reset; the legacy file is left intact. Renamed
telemetry-id→install-idsince it's no longer telemetry-only (andinstall-id, notdevice-id, because it's a per-install id, not a hardware fingerprint). Considered and declinednode-machine-id: a cross-app hardware id doesn't fix ephemeral-fleet counting (unpredictable across customer image strategies) and conflicts with the anonymous, install-scoped intent.Closes STG-2404.
🤖 Generated with Claude Code