Skip to content

[STG-2404] feat(cli): attribute usage via install_id + cli_version on sessions and cloud API headers#2277

Merged
shrey150 merged 11 commits into
mainfrom
shrey/cli-install-id-attribution
Jun 26, 2026
Merged

[STG-2404] feat(cli): attribute usage via install_id + cli_version on sessions and cloud API headers#2277
shrey150 merged 11 commits into
mainfrom
shrey/cli-install-id-attribution

Conversation

@shrey150

@shrey150 shrey150 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What & why

CLI-driven Browserbase usage isn't fully attributable today:

  • Remote browser sessions the CLI creates are tagged userMetadata.browse_cli:"true", but carry no install/version, so we can't tie usage to an install or correlate with the CLI's anonymous PostHog telemetry.
  • browse cloud search (/v1/search) and browse cloud fetch create no session at all, so they're invisible in session metadata.

This PR stamps a stable anonymous install_id + cli_version onto both paths.

Changes (packages/cli only)

  • New src/lib/identity.ts — single source for install identity: resolveInstallId (async, memoized, atomic write via exclusive-create + EEXIST re-read), peekInstallId (sync, never blocks), getCliVersion, and toMetadataValue (sanitizes session-metadata values). install-id logic moved verbatim out of telemetry.ts; its tests pass unchanged.
  • Sessionsdriver/remote.ts remoteStagehandOptions() adds sanitized install_id + cli_version to userMetadata (made async; resolver awaited with a safe fallback so telemetry never throws). Interface, local-only stub (now properly async so .catch works), and the call site updated.
  • Cloud headerslib/cloud/api.ts sends x-bb-client: browse-cli/<version> (+ x-bb-install-id when resolved) on both transports: the raw requestBrowserbaseJson helper (covers search, sessions, contexts, projects, extensions) and createBrowserbaseClient() defaultHeaders (covers fetch, functions). Never emits empty-value headers.
  • Patch changeset.

Why sanitize values

Browserbase session-create runs validateMetadataObject — values must match [\w\-_,;:.()&$%#@!?~] and total ≤512 chars. A +build semver would otherwise 400 every remote session, so cli_version/install_id are passed through toMetadataValue() before reaching userMetadata. (HTTP headers are unconstrained, so the full version stays in x-bb-client.)

E2E Test Matrix

Command / flow Observed output Confidence / sufficiency
<local build> open https://example.com --remotecloud sessions list --status RUNNING session userMetadata = {stagehand:"true", browse_cli:"true", install_id:"<uuid>", cli_version:"0.9.0"} All 3 attribution keys land on a driver-created remote session; install_id equals the on-disk marker. (Server-side ingestion out of scope.)
cloud search "..." and cloud fetch https://example.com against a local capture server (--base-url) /v1/search and /v1/fetch each received x-bb-client: browse-cli/0.9.0 + x-bb-install-id: <uuid> Exact outgoing headers confirmed on both the raw-helper (search) and SDK (fetch) paths.
cloud search "..." and cloud fetch https://example.com (live API) real results JSON; fetch 200 + markdown New headers don't break live calls.
Migration smoke against built dist/lib/identity.js: seed legacy ~/…/cli/telemetry-id = 1111…5555, then resolveInstallId with XDG_CONFIG_HOME pointed at a temp dir returned 1111…5555; new <tmp>/browserbase/install-id contains 1111…5555 (fresh-dir case mints a new uuid instead) Legacy id is carried forward to the new canonical path, not reset; first-run still mints. Also confirmed on real disk: existing ~/Library/.../telemetry-id id copied to ~/.config/browserbase/install-id, legacy file intact.
Read-only-FS / Lambda resilience against built dist: (a) HOME=/tmp/... writable, (b) read-only 0555 dir, (c) ENOTDIR under /dev/null, (d) unwritable HOME=/var/empty, plus real browse --version and browse cloud sessions list under unwritable HOME (a) persists to /tmp/.../.config/browserbase/install-id; (b)(c)(d) return a valid in-memory UUID, nothing written, no throw; real CLI exits cleanly (version prints; cloud cmd returns a clean 401, not an FS crash); no illegal writes under /var/empty install-id resolution + migration are best-effort: every read/mkdir/write is guarded and all 4 callers wrap in .catch. On Lambda (HOME=/tmp) it persists; if the dir/file is unwritable it degrades to a per-invocation in-memory id without failing the command.
turbo run build --filter=browse · pnpm lint · pnpm test:cli build + lint clean · 299/299 tests pass (+7 path-resolution / migration tests over the prior 292) No regressions; path change + migration covered.

Review follow-ups (addressed)

All 5 Cubic threads resolved: async local-only stub (so .catch works), atomic install-id write (race-safe on concurrent first runs — pre-existing behavior, hardened), and 12 focused unit tests for toMetadataValue (allowed-char filtering, +build stripping, truncation, UUID round-trip), the attribution headers, and the remoteStagehandOptions success + fallback paths.

Dependency / follow-up (not in this PR)

Session userMetadata keys are queryable in Snowflake today (STG_SESSIONS.SESSION_METADATA). The search/fetch headers only become useful once Platform logs x-bb-client / x-bb-install-id on those endpoints (/v1/fetch has an unpopulated fetch_tasks.headers column not yet in the Estuary mirror; /v1/search writes no DB row) — tracked as a server-side follow-up.

Update — also tags cloud sessions create

Extended attribution to the browse cloud sessions create path too (previously only the driver open --remote path carried it): its userMetadata now includes browse_cli + install_id + cli_version (sanitized via toMetadataValue), merged with any user-supplied --body metadata while keeping the attribution keys authoritative — a user can't spoof browse_cli to "false". So every CLI-created session is attributable — driver and cloud sessions create. Verified live: cloud sessions create → readback userMetadata: { browse_cli:"true", install_id:"…", cli_version:"0.9.0" } → released. +2 tests (292 total cli tests).

Update — standardized install-id path (review follow-up)

Per review (thanks @pirate), moved the anonymous install-id marker off the bespoke per-OS path (~/Library/Application Support/Browserbase/cli/telemetry-id, %APPDATA%/Browserbase/cli/telemetry-id, <xdg>/browserbase/cli/telemetry-id) to the standardized ~/.config/browserbase/install-id — consistent with core (BROWSERBASE_CONFIG_DIR) and the CLI's own ~/.config/browserbase/skills. Honors BROWSERBASE_CONFIG_DIR, falls back to XDG_CONFIG_HOME/~/.config on every platform; the BROWSERBASE_TELEMETRY_INSTALL_ID_FILE override still short-circuits everything (incl. migration).

Backwards-compatible: if the canonical file is absent but a legacy marker exists, its UUID is copied forward so existing installs keep their stable id — no attribution reset; the legacy file is left intact. Renamed telemetry-idinstall-id since it's no longer telemetry-only (and install-id, not device-id, because it's a per-install id, not a hardware fingerprint). Considered and declined node-machine-id: a cross-app hardware id doesn't fix ephemeral-fleet counting (unpredictable across customer image strategies) and conflicts with the anonymous, install-scoped intent.

Closes STG-2404.

🤖 Generated with Claude Code

…nd cloud API headers

Stamp a stable anonymous install_id and cli_version onto remote browser
session userMetadata and onto cloud Search/Fetch request headers so
CLI-driven Browserbase usage is attributable to an install/account.

- Extract the anonymous install-id resolver into a shared lib/identity.ts
  module with an async resolveInstallId (memoized) + sync peekInstallId
  cache, plus a getCliVersion helper that reads the package's own version
  (the same source oclif uses). Telemetry imports from here; behavior is
  byte-identical (same marker path, UUID-on-miss, swallow-on-failure).
- remote.ts: remoteStagehandOptions() is now async and adds install_id +
  cli_version to userMetadata; falls back to browse_cli + cli_version if
  the install id can't be resolved. Interface, local-only stub, and the
  session-manager call site updated accordingly.
- api.ts: send x-bb-client and (when resolved) x-bb-install-id on both the
  raw request helper (search) and the SDK client defaultHeaders (fetch).
  Install-id resolution is kicked off lazily and never blocks a request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@changeset-bot

changeset-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 855738c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 0 packages

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 8 files

Confidence score: 3/5

  • packages/cli/src/lib/identity.ts can race on concurrent first-run install-id creation, so different processes may persist/use different IDs and produce inconsistent attribution or downstream identity behavior — reserve the marker file with exclusive create and on EEXIST read and reuse the persisted value before merging.
  • packages/cli/src/lib/driver/remote.disabled.ts exposes a Promise-shaped API but throws synchronously, which can bypass callers’ .catch handling and surface as an unhandled error path in local-only mode — return a rejected Promise instead (or make callers handle sync throws explicitly) to align behavior.
  • packages/cli/src/lib/cloud/api.ts and packages/cli/src/lib/driver/remote.ts add attribution/header logic without focused coverage, so regressions in always-on fields vs conditional install_id could slip through unnoticed — add targeted unit/integration tests for header/session success and fallback paths before merging.
Architecture diagram
sequenceDiagram
    participant CLI as CLI Command
    participant Identity as identity.ts
    participant Telemetry as telemetry.ts
    participant Remote as remote.ts
    participant CloudAPI as cloud/api.ts
    participant SDK as Browserbase SDK
    participant Platform as Browserbase Platform
    
    Note over CLI,Platform: Module Load Time
    
    CloudAPI->>CloudAPI: Kick off resolveInstallId(process.env)
    CloudAPI->>Identity: resolveInstallId()
    Identity->>Identity: Check cachedInstallId
    alt Cache miss (first call)
        Identity->>Identity: Read marker file from disk
        alt File not found
            Identity->>Identity: Generate UUID
            Identity->>Identity: Write marker file (best-effort)
        end
        Identity->>Identity: Cache result
    end
    Identity-->>CloudAPI: install_id (promise, not awaited here)
    
    Note over CLI,Platform: Remote Session Flow (e.g., browse open --remote)
    
    CLI->>Remote: remoteStagehandOptions()
    Remote->>Identity: getCliVersion()
    Identity-->>Remote: "x.y.z"
    Remote->>Remote: Build userMetadata { browse_cli: "true", cli_version: "x.y.z" }
    Remote->>Identity: resolveInstallId(process.env)
    alt Resolution succeeds
        Identity-->>Remote: "uuid-abc-123"
        Remote->>Remote: Add install_id to userMetadata
    else Resolution fails (caught)
        Remote->>Remote: Continue without install_id
    end
    Remote-->>CLI: StagehandConstructorOptions with userMetadata
    
    CLI->>SDK: Stagehand.init() with userMetadata
    SDK->>Platform: Create session with userMetadata
    Platform-->>SDK: Session created
    SDK-->>CLI: Stagehand instance
    
    Note over CLI,Platform: Cloud Search/Fetch Flow (e.g., browse cloud search)
    
    CLI->>CloudAPI: search() / fetch()
    
    Note over CloudAPI: attributionHeaders() called
    CloudAPI->>Identity: getCliVersion()
    Identity-->>CloudAPI: "x.y.z"
    CloudAPI->>CloudAPI: Set x-bb-client: browse-cli/x.y.z
    CloudAPI->>Identity: peekInstallId()
    alt install_id already resolved
        Identity-->>CloudAPI: "uuid-abc-123"
        CloudAPI->>CloudAPI: Set x-bb-install-id: uuid-abc-123
    else Still resolving or failed
        Identity-->>CloudAPI: undefined
        CloudAPI->>CloudAPI: Omit x-bb-install-id header
    end
    
    alt Raw request path (search)
        CloudAPI->>Platform: GET /v1/search with x-bb-client + possibly x-bb-install-id
        Platform-->>CloudAPI: Search results
    else SDK client path (fetch)
        CloudAPI->>SDK: createBrowserbaseClient() with defaultHeaders
        SDK->>Platform: GET /v1/fetch with x-bb-client + possibly x-bb-install-id
        Platform-->>SDK: Fetch response
        SDK-->>CloudAPI: Markdown/text
    end
    CloudAPI-->>CLI: Command output
    
    Note over CLI,Platform: Telemetry Flow (unchanged, now shared)
    
    CLI->>Telemetry: initialize telemetry
    Telemetry->>Identity: resolveInstallId(env, sessionId)
    Identity-->>Telemetry: install_id (same as above)
    Telemetry->>Platform: POST telemetry events with install_id
Loading

Tip: cubic used a learning from your PR history. Let your coding agent read cubic learnings directly with the cubic MCP.

Fix all with cubic | Re-trigger cubic

Comment thread packages/cli/src/lib/driver/remote.disabled.ts Outdated
Comment thread packages/cli/src/lib/identity.ts
Comment thread packages/cli/src/lib/cloud/api.ts
Comment thread packages/cli/src/lib/driver/remote.ts
The Browserbase session-create endpoint validates userMetadata values
against [\w\-_,;:.()&$%#@!?~] and rejects anything else with HTTP 400.
A semver version with a +build suffix (or any other unexpected char)
would break every remote session. Add `toMetadataValue()` in identity.ts
to strip disallowed chars and truncate to 64 chars, then wrap cli_version
and install_id before stamping them into userMetadata. HTTP headers are
unaffected (no validator constraint there).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread packages/cli/src/lib/identity.ts
…ub, attribution tests

- remote.disabled.ts: make remoteStagehandOptions() properly async so
  Promise .catch() handlers catch the rejection (was throwing synchronously
  despite a Promise return type)
- identity.ts: make install-id persistence atomic using exclusive-create
  (open flag 'wx') + EEXIST re-read fallback; concurrent first-run
  processes now converge on the same id rather than racing on a non-atomic
  writeFile (this was pre-existing behavior moved from telemetry, now
  hardened)
- tests/identity-attribution.test.ts: focused unit tests for
  toMetadataValue (allowed chars, truncation, semver +build suffix),
  remoteStagehandOptions userMetadata (browse_cli+cli_version always
  present, install_id included on success, omitted on failure without
  throwing)
- tests/remote-disabled.test.ts: update remoteStagehandOptions assertion
  to use rejects.toThrow() matching the now-async signature

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread packages/cli/src/lib/identity.ts Outdated
The previous EEXIST recovery path read back the marker file once; if that
file was EMPTY (another process created it via 'wx' but hadn't written yet,
or a stale empty marker), the re-read returned "" and the function fell
through to return a non-persisted in-memory id — diverging across
concurrent first runs and defeating the intended convergence to one id.

Rewrite resolveAnonymousInstallId as a bounded retry loop (5 attempts):
  1. read marker; non-empty id wins → return it
  2. exclusive-create ('wx') + write our id → we won the race, return it
  3. EEXIST (file exists but empty) → tiny backoff (1<<attempt ms) and loop
     so the next read picks up the winner's id
  4. non-EEXIST open error (read-only/permissions) → best-effort in-memory id
After the loop (marker stayed empty, no winner ever wrote) → take ownership
with a truncating writeFile so we still return a persisted id.

Net: returns an existing non-empty id if present; otherwise persists and
returns one. No non-persistent return path remains.

Add a focused test for the empty-marker case: pre-create an EMPTY file at
the install-id path, call resolveInstallId, assert it returns a non-empty
id AND the file is now non-empty and equals the returned id. Also cover
existing-id passthrough and fresh-create persistence. Uses vi.resetModules()
to defeat the module-level install-id cache between cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread packages/cli/src/lib/identity.ts
Comment thread packages/cli/src/lib/identity.ts Outdated
…ge.json fs read

getCliVersion previously read the CLI's own package.json from disk on first
use. Replace that with a single source of truth: seed the version once from
oclif's `this.config.version` at startup.

- identity.ts: remove the package.json read (and the readFileSync import).
  Add `setCliVersion(version)` (stores only truthy values) and make
  `getCliVersion()` return the cached value or "unknown" — no fs.
- base.ts: add a `BrowseCommand.init()` override that calls
  `setCliVersion(this.config.version)` after super.init(). Every command
  extends BrowseCommand, so the cache is seeded in whichever process builds
  a session or header.

Process coverage (the crux): the background `browse daemon` process — spawned
detached by daemon/client.ts:spawnDaemon — is what actually builds the remote
userMetadata and creates the Browserbase session (session-manager.ts →
remoteStagehandOptions). Because `Daemon extends BrowseCommand` and oclif runs
init() before run(), the daemon seeds setCliVersion(this.config.version)
before runDriverDaemon ever builds userMetadata. No reliance on env/argv
plumbing. Verified live: a daemon-created remote session reports
cli_version="0.9.0" (not "unknown"), and `cloud fetch` sends
x-bb-client: browse-cli/0.9.0.

- tests: getCliVersion now returns "unknown" when unseeded (no fs fallback);
  add focused setCliVersion/getCliVersion tests (seeded value, empty-seed
  ignored, prior value preserved on empty re-seed); update the
  remoteStagehandOptions test to seed via setCliVersion and assert the
  seeded cli_version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread packages/cli/src/base.ts
…undary

Add a focused test that loads the package's real oclif Config, runs a
BrowseCommand subclass's init() lifecycle, and asserts getCliVersion() is
seeded from config.version (not "unknown") afterward. This exercises the
exact path used by both the foreground command and the background
`browse daemon` process that creates Browserbase sessions. Uses
fileURLToPath for a Windows-safe config root.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread packages/cli/src/lib/identity.ts
…eate

Until now only the driver `open --remote` path stamped attribution onto
sessions; `cloud sessions create` set no userMetadata at all, so those
CLI-created sessions were unattributable. Stamp browse_cli + install_id +
cli_version onto the create body (sanitized via toMetadataValue), preserving
any user-supplied userMetadata while keeping the attribution keys
authoritative.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread packages/cli/src/commands/cloud/sessions/create.ts
shrey150 and others added 2 commits June 26, 2026 14:34
…o prevent spoofing

When install-id resolution fails, existing user-supplied `install_id`
from --body/--stdin was retained in the final userMetadata. Strip it
before spreading `existing` so the field is only present when we
resolve it authoritatively.

Add a focused regression test: caller passes install_id in --body with
no marker file → final body must not contain the spoofed value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread packages/cli/tests/cli-cloud-contract.test.ts Outdated
shrey150 and others added 2 commits June 26, 2026 14:56
…all-id with migration from legacy path

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The override path is creatable, so resolveInstallId succeeds with a fresh
UUID rather than failing; the test's real guarantee is that the caller's
spoofed install_id is stripped. Reword the comment and drop the inaccurate
"when resolution fails" from the test name (Cubic P3).
@shrey150 shrey150 merged commit 263e4d4 into main Jun 26, 2026
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants