Skip to content

fix(auth): structural fallback gate for env-picker so Auth blocks even when pendingEnvSelection gets clobbered#775

Merged
kelsonpw merged 1 commit into
mainfrom
fix/env-picker-race-actual
May 14, 2026
Merged

fix(auth): structural fallback gate for env-picker so Auth blocks even when pendingEnvSelection gets clobbered#775
kelsonpw merged 1 commit into
mainfrom
fix/env-picker-race-actual

Conversation

@kelsonpw
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw commented May 14, 2026

Summary

Three prior PRs each claimed to close the recurring "stuck on Setup with no env-picker" bug:

Each ship passed its unit tests. The bug kept reproducing in user-reported runs: successful run → git reset --hard → rerun → wizard navigates the TUI, stops at ✓ Welcome ─ ✓ Auth ─ ● Setup ←, and hangs.

What the prior PRs got right

The prior diagnoses are all internally consistent and each PR's regression test passes both before and after this PR. They form a layered defense, and they keep working.

What the prior PRs missed

Per-run logs end at [intro] no framework matched — falling back to Generic and nothing fires after. Static analysis of every write to pendingEnvSelection across src/ found exactly one set-true path (applyEnvSelectionDeferral) and four set-false paths (all inside AuthScreen, gated on the user picking an env). None of them explain how the flag flips back to false before the user has interacted with the picker — but the journey stepper's ● Setup ← state requires both Auth.isComplete=true AND pendingEnvSelection=false, so something is clobbering the flag through a path that's not yet pinned down.

The fix

Rather than ship a fourth speculative root-cause fix, this PR adds a structural fallback gate to Auth.isComplete:

!s.pendingEnvSelection &&        // primary gate (PR #760, still load-bearing)
!needsEnvPickStillRequired(s)    // new structural fallback

`needsEnvPickStillRequired` returns true iff EVERY one of these holds:

  1. pendingOrgs is non-null (the resolver populated it — distinguishes deferred path from manual API key path).
  2. The session has a resolved selectedOrgId and selectedProjectId.
  3. That project has ≥ 2 environments with usable API keys.
  4. selectedEnvName === null (the user hasn't picked one yet).

When ALL of those hold, Auth.isComplete returns false regardless of pendingEnvSelection's value. The predicate reads observable session shape — no mutable flag for a future regression to clobber.

The flag stays in place as the primary gate; the structural check is a second, redundant safety net. Both point at the same "env still needs picking" state, but one is mutable and one is derived, so any single bug that clobbers the flag is caught by the structural check and vice versa.

Carve-outs

  • pendingOrgs === null short-circuits to false so the manual-API-key path (apiKeyNotice resolver outcome) passes through unchanged.
  • Single-env projects don't trigger the gate.
  • Stale checkpoint selectedProjectId not in pendingOrgs short-circuits to false; the existing stale-org useEffect in AuthScreen handles that case.

Test plan

  • New file: src/ui/tui/__tests__/env-picker-repro.test.ts (6 tests covering the smoking-gun scenario plus all four carve-outs).
  • Tests fail on origin/main (verified by stashing the flows.ts change and running the test file — both "router parks on Auth" tests fail with expected 'data-setup' to be 'auth').
  • Tests pass with the fix.
  • All 4112 existing tests still pass (pnpm test).
  • Lint clean (pnpm lint).
  • Build clean (pnpm build includes smoke test).
  • src/utils/wizard-abort.ts untouched.
  • No --no-verify used.

🤖 Generated with Claude Code


Note

Medium Risk
Changes the Auth.isComplete routing predicate, which can alter wizard navigation in auth/onboarding flows; guard conditions and targeted tests reduce but don’t eliminate risk of unintended blocking/advancing.

Overview
Prevents the recurring “stuck on Setup with no env-picker” issue by adding a defense-in-depth structural gate (needsEnvPickStillRequired) to Auth.isComplete, blocking progression when pendingOrgs indicates a multi-env project and selectedEnvName is still unset (while explicitly not affecting manual API-key, single-env, or stale-checkpoint scenarios).

Adds a new Vitest regression suite (env-picker-repro.test.ts) that reproduces the clobbered-flag scenario and asserts the intended carve-outs and unblock behavior once an env is selected.

Reviewed by Cursor Bugbot for commit 9ea8487. Bugbot is set up for automated code reviews on this repo. Configure here.

…n when pendingEnvSelection gets clobbered

Three prior PRs claimed to fix the "stuck on Setup with no env-picker"
bug — #747 cleared session.credentials when resolveCredentials returned
needs_user_choice/environment_selection, #760 added the
pendingEnvSelection flag and gated every post-Auth show: predicate on
it, and #762 patched nanostores' map.setKey to mutate in-place so the
bin.ts session reference stays aligned with $session.value. Each ship
passed its unit tests. The bug kept reproducing in user-reported runs
(pnpm try:prod --install-dir=~/excalidraw → successful run → git reset
--hard → rerun → wizard navigates the TUI, stops at "✓ Welcome ─ ✓
Auth ─ ● Setup ←", and hangs).

Per-run logs end at "[intro] no framework matched — falling back to
Generic" and nothing fires after. Static analysis of every write to
pendingEnvSelection across src/ found exactly one direct set-true path
(applyEnvSelectionDeferral) and four explicit set-false paths (all
inside AuthScreen, gated on the user picking an env). None of them
explain how the flag flips back to false before the user has
interacted with the picker — but the journey stepper's "● Setup ←"
state requires both Auth.isComplete=true AND pendingEnvSelection=false,
so something is clobbering the flag through a path we don't yet
understand.

Rather than ship a fourth speculative root-cause fix, this commit adds
a structural fallback gate to Auth.isComplete: when pendingOrgs is
populated AND the resolved project has ≥ 2 environments with API keys
AND selectedEnvName is null, Auth.isComplete returns false regardless
of pendingEnvSelection's value. That predicate reads observable
session shape — pendingOrgs is set by resolveCredentials in the
deferred path, selectedEnvName is cleared by applyEnvSelectionDeferral
— so there's no mutable flag for a future regression to clobber. The
flag stays in place as the primary gate; the structural check is a
second, redundant safety net.

Scoped carve-outs:
- pendingOrgs===null short-circuits to false so the manual-API-key
  path (apiKeyNotice resolver outcome) passes through unchanged.
- Single-env projects don't trigger the gate (no picker needed).
- Stale checkpoint selectedProjectId not in pendingOrgs short-circuits
  to false; the existing stale-org useEffect in AuthScreen handles
  that case.

Regression coverage in src/ui/tui/__tests__/env-picker-repro.test.ts
includes the smoking-gun scenario (multi-env + selectedEnvName=null +
pendingEnvSelection=false → must still land on Auth) plus four
carve-out cases (env picked, manual key, single-env, stale checkpoint)
to guard the fix's scope. All six fail without the change and pass
with it.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw requested a review from a team as a code owner May 14, 2026 03:44
@kelsonpw kelsonpw merged commit f195bbd into main May 14, 2026
15 of 16 checks passed
kelsonpw added a commit that referenced this pull request May 15, 2026
… no IDs selected (#780)

When `git reset --hard` wipes `<installDir>/.amplitude/` and PR #778's
`loadCheckpoint` invalidates the per-user checkpoint, the session
reaches Auth with `selectedOrgId === null` AND `selectedProjectId ===
null` — `resolveCredentials` does NOT set those in the multi-env defer
branch. PR #775's structural gate (`needsEnvPickStillRequired`)
short-circuited to `false` in that case, leaving only
`pendingEnvSelection` as a gate. The recurring "stuck on Setup with no
env-picker" bug class then reproduces if anything clobbers that flag —
exactly the symptom users keep reporting (`✓ Welcome ─ ✓ Auth ─ ●
Setup ←`, no env picker rendered, TUI stuck).

This commit extends the structural gate with a fallback path that
mirrors `resolveCredentials`' own "first project of first org"
heuristic (credential-resolution.ts ~line 483). When no specific
selection has been made, the gate consults `pendingOrgs[0]
.projects[0]` — the exact (org, project) tuple `resolveCredentials`
walked when it issued the deferral. Once the user picks an org/project
via AuthScreen, the existing specific-IDs guard takes over.

Guard reordering: `pendingOrgs === null` short-circuit stays first
(unchanged manual-API-key carve-out). `selectedEnvName !== null`
short-circuit also stays — picking an env (real or auto-select) is
still the canonical "we're done with this gate" signal regardless of
which (org, project) tuple we resolved through.

Regression coverage in `flow-invariants.test.ts`:
  - `structural gate alone holds Auth in restart-after-reset`: pins
    the smoking-gun scenario (credentials + names set, IDs null,
    pendingEnvSelection clobbered). Fails without the fix.
  - `structural gate does not block manual-API-key path`: counter-test
    keeping `pendingOrgs === null` carve-out intact.
  - `parks on Auth after checkpoint invalidation + multi-env defer`:
    wider integration test through the full bin.ts mutation sequence.
  - Plus the AuthScreen snapshot test confirming the env picker
    renders when `pendingEnvSelection: true`.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 15, 2026
…stale selectedOrgId/ProjectId (#797)

Prior PRs #747 / #760 / #762 / #775 / #778 / #780 each tried to fix the
recurring "stuck on Setup with no env-picker" hang after `git reset --hard`
on a previously-instrumented project. Each landed a gate that worked at
the unit-test level, but the live repro kept reproducing.

This pins the actual gap PR #780 left behind. PR #780's structural fallback
(`needsEnvPickStillRequired` → `pendingOrgs[0].projects[0]`) only fires
when BOTH `selectedOrgId` AND `selectedProjectId` are null. In the real
restart-after-reset sequence:

  1. `resolveCredentials` populates `pendingOrgs` (first fetch) and defers
     on multi-env.
  2. `applyEnvSelectionDeferral` sets `pendingEnvSelection=true`.
  3. AuthScreen mounts and its single-org/single-project auto-resolve
     effect writes `selectedOrgId='org-1'`, `selectedProjectId='proj-1'`
     from that first snapshot.
  4. `authTask` runs OAuth, `fetchAmplitudeUser` fetches a SECOND time,
     and `setOAuthComplete` REPLACES `pendingOrgs` with the fresh data.
     If the two snapshots' IDs don't match (re-fetch race, ordering,
     account changes, stale in-memory IDs), the existing
     `pendingOrgs.find(o => o.id === selectedOrgId)` lookup returns
     `undefined`.
  5. The structural gate short-circuits to `false`. If anything clobbers
     `pendingEnvSelection=false` (the recurring failure-mode the 6 prior
     PRs chased — every fix relied on the flag staying true), Auth.isComplete
     returns true and the router walks past Auth into the Setup-bucket
     screens. User sees `✓ Auth ─ ● Setup ←` with no env picker on
     screen.

Fix: when stale `selectedOrgId/ProjectId` don't resolve a project in
`pendingOrgs`, fall through to the same `pendingOrgs[0].projects[0]`
heuristic instead of bailing to `false`. The structural gate is now
load-bearing across all three states:

  (a) no IDs picked yet (PR #780)
  (b) IDs picked and valid in pendingOrgs (existing happy path)
  (c) IDs picked but stale relative to fresh pendingOrgs (this PR)

Regression test: src/ui/tui/__tests__/env-picker-restart-after-reset-hang.test.ts
drives the complete bin.ts startup sequence — buildSession → store.session →
resolveCredentials multi-env defer → applyEnvSelectionDeferral →
concludeIntro → setOAuthComplete → AuthScreen's setOrgAndProject — and
asserts the router parks on Auth at every step. The new "stale IDs"
test fails without the fix (router resolves to data-setup) and passes
with it.

Pre-existing test in env-picker-repro.test.ts that asserted the OPPOSITE
behavior ("does not block when the resolved project is missing from
pendingOrgs") was inverted to match the new (correct) behavior — that
test was inadvertently encoding the bug as expected behavior.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant