fix(auth): structural fallback gate for env-picker so Auth blocks even when pendingEnvSelection gets clobbered by kelsonpw · Pull Request #775 · amplitude/wizard

kelsonpw · 2026-05-14T03:44:19Z

Summary

Three prior PRs each claimed to close the recurring "stuck on Setup with no env-picker" bug:

fix(auth): clear credentials on needs_user_choice/environment_selection #747 clears session.credentials when resolveCredentials returns needs_user_choice/environment_selection.
fix(auth): block flow forward-walk while env selection is pending #760 adds the pendingEnvSelection flag and gates every post-Auth show: predicate on it.
fix(tui): patch nanostores map.setKey to mutate $session in place #762 patches nanostores' map.setKey to mutate $session.value in place so bin.ts's session ref stays aligned with the store.

Each ship passed its unit tests. The bug kept reproducing in user-reported runs: successful run → git reset --hard → rerun → wizard navigates the TUI, stops at ✓ Welcome ─ ✓ Auth ─ ● Setup ←, and hangs.

What the prior PRs got right

The prior diagnoses are all internally consistent and each PR's regression test passes both before and after this PR. They form a layered defense, and they keep working.

What the prior PRs missed

Per-run logs end at [intro] no framework matched — falling back to Generic and nothing fires after. Static analysis of every write to pendingEnvSelection across src/ found exactly one set-true path (applyEnvSelectionDeferral) and four set-false paths (all inside AuthScreen, gated on the user picking an env). None of them explain how the flag flips back to false before the user has interacted with the picker — but the journey stepper's ● Setup ← state requires both Auth.isComplete=true AND pendingEnvSelection=false, so something is clobbering the flag through a path that's not yet pinned down.

The fix

Rather than ship a fourth speculative root-cause fix, this PR adds a structural fallback gate to Auth.isComplete:

!s.pendingEnvSelection &&        // primary gate (PR #760, still load-bearing)
!needsEnvPickStillRequired(s)    // new structural fallback

`needsEnvPickStillRequired` returns true iff EVERY one of these holds:

pendingOrgs is non-null (the resolver populated it — distinguishes deferred path from manual API key path).
The session has a resolved selectedOrgId and selectedProjectId.
That project has ≥ 2 environments with usable API keys.
selectedEnvName === null (the user hasn't picked one yet).

When ALL of those hold, Auth.isComplete returns false regardless of pendingEnvSelection's value. The predicate reads observable session shape — no mutable flag for a future regression to clobber.

The flag stays in place as the primary gate; the structural check is a second, redundant safety net. Both point at the same "env still needs picking" state, but one is mutable and one is derived, so any single bug that clobbers the flag is caught by the structural check and vice versa.

Carve-outs

pendingOrgs === null short-circuits to false so the manual-API-key path (apiKeyNotice resolver outcome) passes through unchanged.
Single-env projects don't trigger the gate.
Stale checkpoint selectedProjectId not in pendingOrgs short-circuits to false; the existing stale-org useEffect in AuthScreen handles that case.

Test plan

New file: src/ui/tui/__tests__/env-picker-repro.test.ts (6 tests covering the smoking-gun scenario plus all four carve-outs).
Tests fail on origin/main (verified by stashing the flows.ts change and running the test file — both "router parks on Auth" tests fail with expected 'data-setup' to be 'auth').
Tests pass with the fix.
All 4112 existing tests still pass (pnpm test).
Lint clean (pnpm lint).
Build clean (pnpm build includes smoke test).
src/utils/wizard-abort.ts untouched.
No --no-verify used.

🤖 Generated with Claude Code

Note

Medium Risk
Changes the Auth.isComplete routing predicate, which can alter wizard navigation in auth/onboarding flows; guard conditions and targeted tests reduce but don’t eliminate risk of unintended blocking/advancing.

Overview
Prevents the recurring “stuck on Setup with no env-picker” issue by adding a defense-in-depth structural gate (needsEnvPickStillRequired) to Auth.isComplete, blocking progression when pendingOrgs indicates a multi-env project and selectedEnvName is still unset (while explicitly not affecting manual API-key, single-env, or stale-checkpoint scenarios).

Adds a new Vitest regression suite (env-picker-repro.test.ts) that reproduces the clobbered-flag scenario and asserts the intended carve-outs and unblock behavior once an env is selected.

^{Reviewed by Cursor Bugbot for commit 9ea8487. Bugbot is set up for automated code reviews on this repo. Configure here.}

…n when pendingEnvSelection gets clobbered Three prior PRs claimed to fix the "stuck on Setup with no env-picker" bug — #747 cleared session.credentials when resolveCredentials returned needs_user_choice/environment_selection, #760 added the pendingEnvSelection flag and gated every post-Auth show: predicate on it, and #762 patched nanostores' map.setKey to mutate in-place so the bin.ts session reference stays aligned with $session.value. Each ship passed its unit tests. The bug kept reproducing in user-reported runs (pnpm try:prod --install-dir=~/excalidraw → successful run → git reset --hard → rerun → wizard navigates the TUI, stops at "✓ Welcome ─ ✓ Auth ─ ● Setup ←", and hangs). Per-run logs end at "[intro] no framework matched — falling back to Generic" and nothing fires after. Static analysis of every write to pendingEnvSelection across src/ found exactly one direct set-true path (applyEnvSelectionDeferral) and four explicit set-false paths (all inside AuthScreen, gated on the user picking an env). None of them explain how the flag flips back to false before the user has interacted with the picker — but the journey stepper's "● Setup ←" state requires both Auth.isComplete=true AND pendingEnvSelection=false, so something is clobbering the flag through a path we don't yet understand. Rather than ship a fourth speculative root-cause fix, this commit adds a structural fallback gate to Auth.isComplete: when pendingOrgs is populated AND the resolved project has ≥ 2 environments with API keys AND selectedEnvName is null, Auth.isComplete returns false regardless of pendingEnvSelection's value. That predicate reads observable session shape — pendingOrgs is set by resolveCredentials in the deferred path, selectedEnvName is cleared by applyEnvSelectionDeferral — so there's no mutable flag for a future regression to clobber. The flag stays in place as the primary gate; the structural check is a second, redundant safety net. Scoped carve-outs: - pendingOrgs===null short-circuits to false so the manual-API-key path (apiKeyNotice resolver outcome) passes through unchanged. - Single-env projects don't trigger the gate (no picker needed). - Stale checkpoint selectedProjectId not in pendingOrgs short-circuits to false; the existing stale-org useEffect in AuthScreen handles that case. Regression coverage in src/ui/tui/__tests__/env-picker-repro.test.ts includes the smoking-gun scenario (multi-env + selectedEnvName=null + pendingEnvSelection=false → must still land on Auth) plus four carve-out cases (env picked, manual key, single-env, stale checkpoint) to guard the fix's scope. All six fail without the change and pass with it. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… no IDs selected (#780) When `git reset --hard` wipes `<installDir>/.amplitude/` and PR #778's `loadCheckpoint` invalidates the per-user checkpoint, the session reaches Auth with `selectedOrgId === null` AND `selectedProjectId === null` — `resolveCredentials` does NOT set those in the multi-env defer branch. PR #775's structural gate (`needsEnvPickStillRequired`) short-circuited to `false` in that case, leaving only `pendingEnvSelection` as a gate. The recurring "stuck on Setup with no env-picker" bug class then reproduces if anything clobbers that flag — exactly the symptom users keep reporting (`✓ Welcome ─ ✓ Auth ─ ● Setup ←`, no env picker rendered, TUI stuck). This commit extends the structural gate with a fallback path that mirrors `resolveCredentials`' own "first project of first org" heuristic (credential-resolution.ts ~line 483). When no specific selection has been made, the gate consults `pendingOrgs[0] .projects[0]` — the exact (org, project) tuple `resolveCredentials` walked when it issued the deferral. Once the user picks an org/project via AuthScreen, the existing specific-IDs guard takes over. Guard reordering: `pendingOrgs === null` short-circuit stays first (unchanged manual-API-key carve-out). `selectedEnvName !== null` short-circuit also stays — picking an env (real or auto-select) is still the canonical "we're done with this gate" signal regardless of which (org, project) tuple we resolved through. Regression coverage in `flow-invariants.test.ts`: - `structural gate alone holds Auth in restart-after-reset`: pins the smoking-gun scenario (credentials + names set, IDs null, pendingEnvSelection clobbered). Fails without the fix. - `structural gate does not block manual-API-key path`: counter-test keeping `pendingOrgs === null` carve-out intact. - `parks on Auth after checkpoint invalidation + multi-env defer`: wider integration test through the full bin.ts mutation sequence. - Plus the AuthScreen snapshot test confirming the env picker renders when `pendingEnvSelection: true`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…stale selectedOrgId/ProjectId (#797) Prior PRs #747 / #760 / #762 / #775 / #778 / #780 each tried to fix the recurring "stuck on Setup with no env-picker" hang after `git reset --hard` on a previously-instrumented project. Each landed a gate that worked at the unit-test level, but the live repro kept reproducing. This pins the actual gap PR #780 left behind. PR #780's structural fallback (`needsEnvPickStillRequired` → `pendingOrgs[0].projects[0]`) only fires when BOTH `selectedOrgId` AND `selectedProjectId` are null. In the real restart-after-reset sequence: 1. `resolveCredentials` populates `pendingOrgs` (first fetch) and defers on multi-env. 2. `applyEnvSelectionDeferral` sets `pendingEnvSelection=true`. 3. AuthScreen mounts and its single-org/single-project auto-resolve effect writes `selectedOrgId='org-1'`, `selectedProjectId='proj-1'` from that first snapshot. 4. `authTask` runs OAuth, `fetchAmplitudeUser` fetches a SECOND time, and `setOAuthComplete` REPLACES `pendingOrgs` with the fresh data. If the two snapshots' IDs don't match (re-fetch race, ordering, account changes, stale in-memory IDs), the existing `pendingOrgs.find(o => o.id === selectedOrgId)` lookup returns `undefined`. 5. The structural gate short-circuits to `false`. If anything clobbers `pendingEnvSelection=false` (the recurring failure-mode the 6 prior PRs chased — every fix relied on the flag staying true), Auth.isComplete returns true and the router walks past Auth into the Setup-bucket screens. User sees `✓ Auth ─ ● Setup ←` with no env picker on screen. Fix: when stale `selectedOrgId/ProjectId` don't resolve a project in `pendingOrgs`, fall through to the same `pendingOrgs[0].projects[0]` heuristic instead of bailing to `false`. The structural gate is now load-bearing across all three states: (a) no IDs picked yet (PR #780) (b) IDs picked and valid in pendingOrgs (existing happy path) (c) IDs picked but stale relative to fresh pendingOrgs (this PR) Regression test: src/ui/tui/__tests__/env-picker-restart-after-reset-hang.test.ts drives the complete bin.ts startup sequence — buildSession → store.session → resolveCredentials multi-env defer → applyEnvSelectionDeferral → concludeIntro → setOAuthComplete → AuthScreen's setOrgAndProject — and asserts the router parks on Auth at every step. The new "stale IDs" test fails without the fix (router resolves to data-setup) and passes with it. Pre-existing test in env-picker-repro.test.ts that asserted the OPPOSITE behavior ("does not block when the resolved project is missing from pendingOrgs") was inverted to match the new (correct) behavior — that test was inadvertently encoding the bug as expected behavior. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kelsonpw requested a review from a team as a code owner May 14, 2026 03:44

kelsonpw merged commit f195bbd into main May 14, 2026
15 of 16 checks passed

amplitude-release-bot Bot mentioned this pull request May 14, 2026

chore(main): release wizard 1.18.0 #694

Open

kelsonpw mentioned this pull request May 14, 2026

fix(auth): structural gate covers no-pre-selection case (env-picker hang follow-up) #780

Merged

6 tasks

kelsonpw mentioned this pull request May 15, 2026

fix(auth): structural gate falls through to first-org/first-project when stale IDs miss pendingOrgs #797

Merged

6 tasks

kelsonpw mentioned this pull request May 16, 2026

fix(auth): env picker renders when fresh fetch returns asymmetric env keys (#797 follow-up) #805

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auth): structural fallback gate for env-picker so Auth blocks even when pendingEnvSelection gets clobbered#775

fix(auth): structural fallback gate for env-picker so Auth blocks even when pendingEnvSelection gets clobbered#775
kelsonpw merged 1 commit into
mainfrom
fix/env-picker-race-actual

kelsonpw commented May 14, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kelsonpw commented May 14, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the prior PRs got right

What the prior PRs missed

The fix

Carve-outs

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kelsonpw commented May 14, 2026 •

edited by cursor Bot

Loading