Skip to content

feat(cli): posthog feature flags + fast_provision experiment#3366

Merged
la14-1 merged 2 commits intoOpenRouterTeam:mainfrom
AhmedTMM:feat/feature-flags-fast-provision
Apr 28, 2026
Merged

feat(cli): posthog feature flags + fast_provision experiment#3366
la14-1 merged 2 commits intoOpenRouterTeam:mainfrom
AhmedTMM:feat/feature-flags-fast-provision

Conversation

@AhmedTMM
Copy link
Copy Markdown
Collaborator

Summary

Wires PostHog `/decide` into the CLI so we can A/B-test provisioning behaviors with feature flags. First experiment: `fast_provision` — for users who didn't pass `--beta` or `--fast` manually, the `test` variant turns on `tarball + images` by default to see if faster provisioning lifts the late-funnel conversion rate.

The PostHog experiment was already created in the dashboard; this PR is the code side of it.

Design calls

Why `tarball,images` and not the full `--fast` set (`+parallel,docker`)? Clean attribution. The hypothesis is specifically about tarball/image; if we ship the full `--fast` bundle we can't tell which feature moved the metric. `--fast` stays as the power-user knob.

Why share `distinct_id` with telemetry? PostHog identity needs to match across telemetry events and flag decisions, otherwise the experiment's exposure events don't line up with the funnel events they're supposed to attribute. Telemetry already had a persistent user-id at `~/.config/spawn/.telemetry-id` — moved that into a shared `install-id.ts` module so feature flags reuse it. Existing users keep their bucket.

On-disk cache with 1h TTL. Without a cache, every `spawn` invocation pays a 1.5s network call. Stale-while-revalidate via the cache file means cold starts get a near-instant variant, refreshes happen lazily.

User-wins. If the user passes `--beta tarball` or `--fast`, the flag is bypassed entirely. `SPAWN_FEATURE_FLAGS_DISABLED=1` is a hard kill switch.

Files

  • `shared/install-id.ts` (new) — UUID generation/read with disk-failure fallback
  • `shared/feature-flags.ts` (new) — hand-rolled `/decide` POST, 1.5s timeout, fail-open, on-disk cache, exposure events
  • `shared/telemetry.ts` — `distinct_id` now sourced from `install-id.ts`
  • `shared/paths.ts` — adds `getInstallIdPath()` (returns existing telemetry-id path)
  • `index.ts` — `await initFeatureFlags()` early in `main()`; applies `fast_provision` test variant after `--beta`/`--fast` composition (so they win)
  • 14 new unit tests across `install-id.test.ts` and `feature-flags.test.ts`

Rollout

Recommend ramping the PostHog flag at 5% → 25% → 50% → 100% on the `test` variant with 24h between bumps. The 1.5s fail-open timeout is itself a soft kill switch — if PostHog is down, every user gets control.

Test plan

  • `bunx @biomejs/biome check src/` — 0 errors over 199 files
  • `bunx tsc --noEmit -p .` — 0 production errors
  • `bun test` — 2183 pass, same 4 pre-existing failures as upstream/main
  • New tests: install-id roundtrip + format guard; feature-flags fetch/HTTP500/malformed/disabled/idempotent/stale-cache; exposure event capture
  • End-to-end: spawn with experiment flag set to `test` in PostHog → confirm `SPAWN_BETA=tarball,images` is set
  • Verify `$feature_flag_called` events arrive in PostHog tagged correctly to the experiment

Bumps CLI to 1.0.23.

🤖 Generated with Claude Code

AhmedTMM and others added 2 commits April 27, 2026 16:27
Wires PostHog `/decide` into the CLI so we can A/B-test provisioning
behaviors. First experiment: `fast_provision` — for users who didn't
pass --beta or --fast manually, the `test` variant turns on
`tarball + images` by default. Hypothesis: faster provisioning →
fewer drop-offs in the "VM ready → install completed" leg of the
funnel.

What's added:

- `shared/install-id.ts` — stable per-machine UUID, persisted at
  ~/.config/spawn/.telemetry-id. Reuses telemetry's existing path
  so existing users keep their PostHog identity. Falls back to an
  ephemeral UUID on disk-write failure.
- `shared/feature-flags.ts` — hand-rolled POST to PostHog /decide
  (no SDK dep). 1.5s timeout, fail-open. On-disk cache at
  $SPAWN_HOME/feature-flags-cache.json with 1h TTL so cold starts
  don't pay the network cost. SPAWN_FEATURE_FLAGS_DISABLED=1 kill
  switch. Captures `$feature_flag_called` exposure events for both
  arms so PostHog can compute conversion.
- `shared/telemetry.ts` — moves user-id loading into install-id.ts
  so flags and events share the same `distinct_id`.
- `index.ts` — `await initFeatureFlags()` at the top of `main()`,
  then applies `fast_provision`'s `test` variant by appending
  `tarball,images` to SPAWN_BETA — but only if the user didn't
  pass --beta or --fast (those always win, so opt-out is free).

Why tarball+images and not all four (`+parallel,docker`):
clean attribution. The hypothesis is about tarball/image; if we
ship the full --fast bundle we can't tell which feature moved the
metric. Keep --fast as the user-facing power-user knob.

Tests: 14 new (install-id roundtrip + format guard, feature-flags
fetch/timeout/HTTP500/malformed/disabled/idempotent/stale-cache,
exposure-event behavior). Full suite: 2183 pass, same 4 pre-existing
failures as upstream/main.

Bumps CLI to 1.0.23.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt real SWR

Two review-fix commits from PR feedback squashed into one:

1. Move `await initFeatureFlags()` below the `spawn pick` and
   `spawn feedback` bypass clauses in `main()`. Both commands are called
   from bash scripts and must stay fast; neither gates on a flag, so
   there's no reason to pay up to 1.5s of network latency on cold cache.

2. Implement real stale-while-revalidate in `shared/feature-flags.ts`.
   The prior implementation did a synchronous fetch on stale cache,
   which contradicted the docstring and PR description. Now:
     - fresh cache (<TTL)  → use cache, no network
     - stale cache (>=TTL) → use cache immediately, refresh in background
     - no cache            → await sync fetch (first run only)

   Adds `_awaitBackgroundRefreshForTest()` so tests can deterministically
   wait for the background refresh before asserting. Updated the existing
   "stale cache" test to verify SWR semantics (stale served first, fresh
   lands next invocation) and added a "fresh cache does not fetch" test.

All 2127 tests pass; biome clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@la14-1
Copy link
Copy Markdown
Member

la14-1 commented Apr 28, 2026

Applied the two must-fix items from review in d2ec13d:

  1. Fast-path guard: moved await initFeatureFlags() below the spawn pick and spawn feedback bypass clauses in main(). Shell-invoked fast paths no longer pay the up-to-1.5s flag-fetch cost.
  2. Real SWR: initFeatureFlags() now serves stale cache immediately and refreshes in the background (fire-and-forget), matching the docstring and PR description. No cache → still awaits a bounded sync fetch for the first-run case.

Test coverage:

  • Renamed the >1h stale cache re-fetches test to assert SWR semantics (stale served first, fresh lands next invocation) via a new _awaitBackgroundRefreshForTest() helper.
  • Added does NOT fetch when cache is fresh (<1h old) to pin down the no-network path.

All 2127 tests pass, biome clean.

Copy link
Copy Markdown
Member

@la14-1 la14-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review fixes applied: fast-path skip + real SWR. All checks green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants