fix(auth): structural gate falls through to first-org/first-project when stale IDs miss pendingOrgs#797
Merged
Conversation
…stale selectedOrgId/ProjectId Prior PRs #747 / #760 / #762 / #775 / #778 / #780 each tried to fix the recurring "stuck on Setup with no env-picker" hang after `git reset --hard` on a previously-instrumented project. Each landed a gate that worked at the unit-test level, but the live repro kept reproducing. This pins the actual gap PR #780 left behind. PR #780's structural fallback (`needsEnvPickStillRequired` → `pendingOrgs[0].projects[0]`) only fires when BOTH `selectedOrgId` AND `selectedProjectId` are null. In the real restart-after-reset sequence: 1. `resolveCredentials` populates `pendingOrgs` (first fetch) and defers on multi-env. 2. `applyEnvSelectionDeferral` sets `pendingEnvSelection=true`. 3. AuthScreen mounts and its single-org/single-project auto-resolve effect writes `selectedOrgId='org-1'`, `selectedProjectId='proj-1'` from that first snapshot. 4. `authTask` runs OAuth, `fetchAmplitudeUser` fetches a SECOND time, and `setOAuthComplete` REPLACES `pendingOrgs` with the fresh data. If the two snapshots' IDs don't match (re-fetch race, ordering, account changes, stale in-memory IDs), the existing `pendingOrgs.find(o => o.id === selectedOrgId)` lookup returns `undefined`. 5. The structural gate short-circuits to `false`. If anything clobbers `pendingEnvSelection=false` (the recurring failure-mode the 6 prior PRs chased — every fix relied on the flag staying true), Auth.isComplete returns true and the router walks past Auth into the Setup-bucket screens. User sees `✓ Auth ─ ● Setup ←` with no env picker on screen. Fix: when stale `selectedOrgId/ProjectId` don't resolve a project in `pendingOrgs`, fall through to the same `pendingOrgs[0].projects[0]` heuristic instead of bailing to `false`. The structural gate is now load-bearing across all three states: (a) no IDs picked yet (PR #780) (b) IDs picked and valid in pendingOrgs (existing happy path) (c) IDs picked but stale relative to fresh pendingOrgs (this PR) Regression test: src/ui/tui/__tests__/env-picker-restart-after-reset-hang.test.ts drives the complete bin.ts startup sequence — buildSession → store.session → resolveCredentials multi-env defer → applyEnvSelectionDeferral → concludeIntro → setOAuthComplete → AuthScreen's setOrgAndProject — and asserts the router parks on Auth at every step. The new "stale IDs" test fails without the fix (router resolves to data-setup) and passes with it. Pre-existing test in env-picker-repro.test.ts that asserted the OPPOSITE behavior ("does not block when the resolved project is missing from pendingOrgs") was inverted to match the new (correct) behavior — that test was inadvertently encoding the bug as expected behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 16, 2026
Merged
kelsonpw
added a commit
that referenced
this pull request
May 22, 2026
… keys (#797 follow-up) (#805) 8 PRs deep into the recurring "frozen-after-reset" bug. Live integration render exposed the actual gap: when `pendingEnvSelection=true` and the fresh `setOAuthComplete` brings back pendingOrgs where one of the two envs has lost its `app.apiKey` (provisioning lag, role change, key rotation, transient server response — many production races land here), `selectableEnvs.length === 1`, AuthScreen's auto-select effect fires, the credential-loading effect takes the "selected env has apiKey" path, and the deferral flag gets cleared. The env picker NEVER renders — router walks past Auth into Run. The user's "✓ Auth ─ ● Setup" stepper was actually `Screen.Run` (Run is in the "Setup" bucket of WIZARD_STEPS). Confirmed by the "Progress Logs Snake" tabs — those are RunScreen's tabs. Fix: - Gate the auto-select-when-1-env effect on `!pendingEnvSelection`. The resolver explicitly said "user must choose" — auto-selecting silently bypasses that intent. - Widen `needsEnvPick` to render the picker even with a single-env list when `pendingEnvSelection=true`. Without this, gating the auto-select alone would freeze the user on Auth with no actionable surface — fixes the auto-resolve bug but creates a new dead-screen bug. Integration test boots App via ink-testing-library, drives the post- deferral state through `setOAuthComplete` with asymmetric env keys, and asserts the router stays on Auth. Confirmed FAILS without the fix (router resolves to Run) and PASSES with it. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the recurring env-picker hang after
git reset --hardon a previously-instrumented project. 6 prior PRs (#747 / #760 / #762 / #775 / #778 / #780) each landed a gate that worked at the unit-test level, but the live repro kept reproducing. This pins the actual gap.Root cause
PR #780 added a structural fallback (
needsEnvPickStillRequired→pendingOrgs[0].projects[0]) but only when BOTHselectedOrgIdANDselectedProjectIdare null. The live restart-after-reset sequence ends up with staleselectedOrgId/ProjectId(not null), so the fallback doesn't fire:resolveCredentialspopulatespendingOrgs(first fetch) and defers on multi-env.applyEnvSelectionDeferralsetspendingEnvSelection=true.selectedOrgId='org-1',selectedProjectId='proj-1'from that first snapshot.authTaskruns OAuth,fetchAmplitudeUserfetches a SECOND time,setOAuthCompleteREPLACESpendingOrgswith the fresh data. If IDs don't match across the two snapshots (re-fetch race, ordering, account changes, stale memory), the existingpendingOrgs.find(o => o.id === selectedOrgId)returnsundefined.false. If anything then clobberspendingEnvSelection=false,Auth.isCompletereturns true and the router walks past Auth into the Setup-bucket screens. User sees✓ Auth ─ ● Setup ←with no env picker.Fix
When stale
selectedOrgId/ProjectIddon't resolve a project inpendingOrgs, fall through to the samependingOrgs[0].projects[0]heuristic PR #780 added — instead of bailing tofalse. The gate is now load-bearing across:Single point of change in
src/ui/tui/flows.ts— no new defensive layer.Regression test
New
src/ui/tui/__tests__/env-picker-restart-after-reset-hang.test.tsdrives the completebin.tsstartup sequence and asserts the router parks on Auth at every step. The dedicated stale-IDs test failed pre-fix (router resolved todata-setup) and passes post-fix.A pre-existing test in
env-picker-repro.test.tsthat asserted the OPPOSITE behavior ("does not block when the resolved project is missing from pendingOrgs") was inverted — it was inadvertently encoding the bug as expected behavior.Test plan
pnpm exec vitest run --pool=forks --maxWorkers=1 src/ui/tui/__tests__/env-picker-restart-after-reset-hang.test.ts(new test passes)pnpm exec vitest run --pool=forks --maxWorkers=1 src/ui/tui/__tests__/env-picker-repro.test.ts src/ui/tui/__tests__/flow-invariants.test.ts src/ui/tui/__tests__/router.test.ts src/ui/tui/screens/__tests__/AuthScreen.snap.test.tsx(all neighbours pass)pnpm tsc --noEmitcleanpnpm lintcleanpnpm test— all 4293 tests passsrc/utils/wizard-abort.tsuntouched🤖 Generated with Claude Code
Note
Medium Risk
Changes the wizard router’s Auth completion gating logic; mistakes could trap users on Auth or incorrectly skip the env picker, but the change is small and covered by new regression tests.
Overview
Fixes a regression where the TUI wizard could advance past
Authinto Setup screens without rendering the environment picker whenpendingEnvSelectionwas cleared andselectedOrgId/selectedProjectIdno longer matched the refreshedpendingOrgs.needsEnvPickStillRequirednow falls back topendingOrgs[0].projects[0]when the selected IDs are stale (not just null), and tests were updated/added to assert the router consistently parks onAuthacross the restart-after-reset sequence and stale-ID scenarios.Reviewed by Cursor Bugbot for commit 771af11. Bugbot is set up for automated code reviews on this repo. Configure here.