Skip to content

fix(auth): structural gate falls through to first-org/first-project when stale IDs miss pendingOrgs#797

Merged
kelsonpw merged 1 commit into
mainfrom
fix/env-picker-actually-renders-after-reset
May 15, 2026
Merged

fix(auth): structural gate falls through to first-org/first-project when stale IDs miss pendingOrgs#797
kelsonpw merged 1 commit into
mainfrom
fix/env-picker-actually-renders-after-reset

Conversation

@kelsonpw
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw commented May 15, 2026

Summary

Fixes the recurring env-picker hang after git reset --hard on a previously-instrumented project. 6 prior PRs (#747 / #760 / #762 / #775 / #778 / #780) each landed a gate that worked at the unit-test level, but the live repro kept reproducing. This pins the actual gap.

Root cause

PR #780 added a structural fallback (needsEnvPickStillRequiredpendingOrgs[0].projects[0]) but only when BOTH selectedOrgId AND selectedProjectId are null. The live restart-after-reset sequence ends up with stale selectedOrgId/ProjectId (not null), so the fallback doesn't fire:

  1. resolveCredentials populates pendingOrgs (first fetch) and defers on multi-env.
  2. applyEnvSelectionDeferral sets pendingEnvSelection=true.
  3. AuthScreen mounts; its single-org/single-project auto-resolve effect writes selectedOrgId='org-1', selectedProjectId='proj-1' from that first snapshot.
  4. authTask runs OAuth, fetchAmplitudeUser fetches a SECOND time, setOAuthComplete REPLACES pendingOrgs with the fresh data. If IDs don't match across the two snapshots (re-fetch race, ordering, account changes, stale memory), the existing pendingOrgs.find(o => o.id === selectedOrgId) returns undefined.
  5. The structural gate short-circuits to false. If anything then clobbers pendingEnvSelection=false, Auth.isComplete returns true and the router walks past Auth into the Setup-bucket screens. User sees ✓ Auth ─ ● Setup ← with no env picker.

Fix

When stale selectedOrgId/ProjectId don't resolve a project in pendingOrgs, fall through to the same pendingOrgs[0].projects[0] heuristic PR #780 added — instead of bailing to false. The gate is now load-bearing across:

Single point of change in src/ui/tui/flows.ts — no new defensive layer.

Regression test

New src/ui/tui/__tests__/env-picker-restart-after-reset-hang.test.ts drives the complete bin.ts startup sequence and asserts the router parks on Auth at every step. The dedicated stale-IDs test failed pre-fix (router resolved to data-setup) and passes post-fix.

A pre-existing test in env-picker-repro.test.ts that asserted the OPPOSITE behavior ("does not block when the resolved project is missing from pendingOrgs") was inverted — it was inadvertently encoding the bug as expected behavior.

Test plan

  • pnpm exec vitest run --pool=forks --maxWorkers=1 src/ui/tui/__tests__/env-picker-restart-after-reset-hang.test.ts (new test passes)
  • pnpm exec vitest run --pool=forks --maxWorkers=1 src/ui/tui/__tests__/env-picker-repro.test.ts src/ui/tui/__tests__/flow-invariants.test.ts src/ui/tui/__tests__/router.test.ts src/ui/tui/screens/__tests__/AuthScreen.snap.test.tsx (all neighbours pass)
  • pnpm tsc --noEmit clean
  • pnpm lint clean
  • pnpm test — all 4293 tests pass
  • src/utils/wizard-abort.ts untouched

🤖 Generated with Claude Code


Note

Medium Risk
Changes the wizard router’s Auth completion gating logic; mistakes could trap users on Auth or incorrectly skip the env picker, but the change is small and covered by new regression tests.

Overview
Fixes a regression where the TUI wizard could advance past Auth into Setup screens without rendering the environment picker when pendingEnvSelection was cleared and selectedOrgId/selectedProjectId no longer matched the refreshed pendingOrgs.

needsEnvPickStillRequired now falls back to pendingOrgs[0].projects[0] when the selected IDs are stale (not just null), and tests were updated/added to assert the router consistently parks on Auth across the restart-after-reset sequence and stale-ID scenarios.

Reviewed by Cursor Bugbot for commit 771af11. Bugbot is set up for automated code reviews on this repo. Configure here.

…stale selectedOrgId/ProjectId

Prior PRs #747 / #760 / #762 / #775 / #778 / #780 each tried to fix the
recurring "stuck on Setup with no env-picker" hang after `git reset --hard`
on a previously-instrumented project. Each landed a gate that worked at
the unit-test level, but the live repro kept reproducing.

This pins the actual gap PR #780 left behind. PR #780's structural fallback
(`needsEnvPickStillRequired` → `pendingOrgs[0].projects[0]`) only fires
when BOTH `selectedOrgId` AND `selectedProjectId` are null. In the real
restart-after-reset sequence:

  1. `resolveCredentials` populates `pendingOrgs` (first fetch) and defers
     on multi-env.
  2. `applyEnvSelectionDeferral` sets `pendingEnvSelection=true`.
  3. AuthScreen mounts and its single-org/single-project auto-resolve
     effect writes `selectedOrgId='org-1'`, `selectedProjectId='proj-1'`
     from that first snapshot.
  4. `authTask` runs OAuth, `fetchAmplitudeUser` fetches a SECOND time,
     and `setOAuthComplete` REPLACES `pendingOrgs` with the fresh data.
     If the two snapshots' IDs don't match (re-fetch race, ordering,
     account changes, stale in-memory IDs), the existing
     `pendingOrgs.find(o => o.id === selectedOrgId)` lookup returns
     `undefined`.
  5. The structural gate short-circuits to `false`. If anything clobbers
     `pendingEnvSelection=false` (the recurring failure-mode the 6 prior
     PRs chased — every fix relied on the flag staying true), Auth.isComplete
     returns true and the router walks past Auth into the Setup-bucket
     screens. User sees `✓ Auth ─ ● Setup ←` with no env picker on
     screen.

Fix: when stale `selectedOrgId/ProjectId` don't resolve a project in
`pendingOrgs`, fall through to the same `pendingOrgs[0].projects[0]`
heuristic instead of bailing to `false`. The structural gate is now
load-bearing across all three states:

  (a) no IDs picked yet (PR #780)
  (b) IDs picked and valid in pendingOrgs (existing happy path)
  (c) IDs picked but stale relative to fresh pendingOrgs (this PR)

Regression test: src/ui/tui/__tests__/env-picker-restart-after-reset-hang.test.ts
drives the complete bin.ts startup sequence — buildSession → store.session →
resolveCredentials multi-env defer → applyEnvSelectionDeferral →
concludeIntro → setOAuthComplete → AuthScreen's setOrgAndProject — and
asserts the router parks on Auth at every step. The new "stale IDs"
test fails without the fix (router resolves to data-setup) and passes
with it.

Pre-existing test in env-picker-repro.test.ts that asserted the OPPOSITE
behavior ("does not block when the resolved project is missing from
pendingOrgs") was inverted to match the new (correct) behavior — that
test was inadvertently encoding the bug as expected behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw requested a review from a team as a code owner May 15, 2026 22:24
@kelsonpw kelsonpw merged commit 37b3cc2 into main May 15, 2026
11 checks passed
kelsonpw added a commit that referenced this pull request May 22, 2026
… keys (#797 follow-up) (#805)

8 PRs deep into the recurring "frozen-after-reset" bug. Live integration
render exposed the actual gap: when `pendingEnvSelection=true` and the
fresh `setOAuthComplete` brings back pendingOrgs where one of the two
envs has lost its `app.apiKey` (provisioning lag, role change, key
rotation, transient server response — many production races land
here), `selectableEnvs.length === 1`, AuthScreen's auto-select effect
fires, the credential-loading effect takes the "selected env has
apiKey" path, and the deferral flag gets cleared. The env picker
NEVER renders — router walks past Auth into Run.

The user's "✓ Auth ─ ● Setup" stepper was actually `Screen.Run` (Run is
in the "Setup" bucket of WIZARD_STEPS). Confirmed by the
"Progress Logs Snake" tabs — those are RunScreen's tabs.

Fix:
  - Gate the auto-select-when-1-env effect on `!pendingEnvSelection`.
    The resolver explicitly said "user must choose" — auto-selecting
    silently bypasses that intent.
  - Widen `needsEnvPick` to render the picker even with a single-env
    list when `pendingEnvSelection=true`. Without this, gating the
    auto-select alone would freeze the user on Auth with no actionable
    surface — fixes the auto-resolve bug but creates a new dead-screen
    bug.

Integration test boots App via ink-testing-library, drives the post-
deferral state through `setOAuthComplete` with asymmetric env keys,
and asserts the router stays on Auth. Confirmed FAILS without the fix
(router resolves to Run) and PASSES with it.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant