feat(auth): add refresh-token support to TokenStore + PKCE provider#36
feat(auth): add refresh-token support to TokenStore + PKCE provider#36lmjabreu wants to merge 10 commits into
Conversation
- TokenStore.active() now returns an optional `bundle` carrying refresh token + expiry alongside the access token. The built-in keyring store always populates it; custom stores can omit it without breakage. - TokenStore.set() accepts a TokenBundle or a bare string (widening). - AuthProvider gained an optional `refreshToken` method; the PKCE provider implements it (grant_type=refresh_token, no client_secret). - New refreshAccessToken() helper drives proactive (skew window) and reactive (force) refresh, with file-lock concurrency on a sidecar `<records>.refresh.lock` and typed AUTH_REFRESH_EXPIRED / _TRANSIENT / _UNAVAILABLE errors. - Keyring store writes refresh tokens to a sibling slot per account (`<account>/refresh`) and exposes getRecordsLocation() so refresh helpers can derive a lock path without re-plumbing options. - runOAuthFlow persists the full bundle (access + refresh + expiry) on login instead of just the access token. - status.fetchLive context gained `bundle` so consumers can render expiry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
doistbot
left a comment
There was a problem hiding this comment.
This is a great addition that introduces robust OAuth refresh-token support to cli-core, allowing consumers to seamlessly extend sessions while handling parallel invocations. The proactive refresh strategy and file locking mechanisms are well-considered and will significantly improve the user experience for short-lived token environments. There are a few areas to refine, particularly around mitigating unintended breaking changes to the TokenStore and AuthProvider contracts, addressing a lock contention bug on forced refreshes, and optimizing the hot path to prevent doubled IPC latency during OS keyring reads. Additionally, consolidating duplicated fetch logic, expanding test coverage for the offline fallback paths, and updating the README per our project guidelines will help polish the implementation.
| export type { AuthErrorCode } from './errors.js' | ||
| export { runOAuthFlow } from './flow.js' | ||
| export type { RunOAuthFlowOptions, RunOAuthFlowResult } from './flow.js' | ||
| export { refreshAccessToken } from './refresh.js' |
There was a problem hiding this comment.
[P0] README.md must be updated when adding new exports to a sub-path entry, per AGENTS.md. The refreshAccessToken helper and TokenBundle additions need to be documented in the "What's in it" table and usage blocks.
| * `CliError` for typed failures; other thrown values become | ||
| * `AUTH_STORE_WRITE_FAILED`. | ||
| */ | ||
| set(account: TAccount, credentials: string | TokenBundle): Promise<void> |
There was a problem hiding this comment.
[P1] Widening the credentials parameter to string | TokenBundle is a breaking change for consumers who implement their own TokenStore. TypeScript requires class implementations to accept at least the interface's full parameter type, so existing set(account, token: string) methods will fail to compile. Consider restoring set's signature and adding an optional setBundle?(account, bundle) method, or update the PR description/release notes to acknowledge the break.
| getLastStorageResult(): TokenStorageResult | undefined | ||
| /** Storage result from the most recent `clear()` call, or `undefined` before any (and reset to `undefined` when the most recent `clear()` threw or was a no-op). */ | ||
| getLastClearResult(): TokenStorageResult | undefined | ||
| /** Human-readable location of the underlying record store. Surfaced so `refreshAccessToken` can derive a sidecar lock path without re-plumbing options. */ |
There was a problem hiding this comment.
[P1] recordsLocation is documented in CreateKeyringTokenStoreOptions as a "Human-readable location... Plain string; cli-core does not interpret it" (e.g. ~/.config/...). Re-purposing it here to derive a literal filesystem lock path means locks will silently fail to acquire (due to ENOENT failing to resolve ~) for any consumer following the existing documentation. Consider requiring an explicit lockPath option or fully expanding the path.
| const fresh = await options.store.active(options.ref) | ||
| if (fresh) { | ||
| const freshBundle = fresh.bundle ?? { accessToken: fresh.token } | ||
| if (!shouldRefresh(freshBundle, skewMs, options.force ?? false)) { |
There was a problem hiding this comment.
[P1] Concurrency bug on the force: true path (e.g. 401 retries). If Process A refreshes while Process B waits for the lock, Process B will re-read the updated store but shouldRefresh will still return true because force is true. Process B will then proceed to refresh again using the original, now-stale bundle.refreshToken (line 111), leading to invalid_grant on servers with token rotation. To fix, ignore the force flag if the access token changed (fresh.token !== snapshot.token), and update bundle with freshBundle before line 111 if a refresh is still actually needed.
| } else { | ||
| // No refresh token in this bundle — purge any previous secret. | ||
| try { | ||
| await refreshSecureStore.deleteSecret() |
There was a problem hiding this comment.
[P2] Swallowing a refresh-slot delete failure here can resurrect a stale refresh token. If an older login stored a refresh secret and a newer login returns no refresh token, a failed delete leaves the old secret in the keyring but the record has no marker that it should be ignored, so active() can later surface and use that stale refresh token. Please treat this as a write/fallback failure, or persist an explicit "no refresh token" marker that suppresses refresh-slot reads.
| bundle: initial, | ||
| setCalls: [], | ||
| } | ||
| let recordsLocation = '/tmp/test-records' |
There was a problem hiding this comment.
[P2] recordsLocation defaults to a hardcoded /tmp path. Tests that trigger a refresh without calling setRecordsLocation (like the force: true test) will perform actual lock file I/O against this global path, risking test collisions and orphaned files. Consider wiring the managed tempDir directly into buildStore by default.
| return Object.assign(provider, { refreshSpy }) | ||
| } | ||
|
|
||
| describe('refreshAccessToken', () => { |
There was a problem hiding this comment.
[P1] The highest-value concurrency scenario introduced by this diff (the acquireFileLock contention and subsequent store re-read) is untested. The PR description explicitly mentions a lock re-read test, but it appears to have been omitted from the final commit.
| : null | ||
| }, | ||
| async set(account, token) { | ||
| async set(account, credentials) { |
There was a problem hiding this comment.
[P2] This fake store now discards everything except credentials.accessToken, so the updated runOAuthFlow test still won't fail if runOAuthFlow stops persisting the new refresh/expiry fields. Keep the full TokenBundle in the test double and add an assertion with a provider response that includes refreshToken / expiry timestamps.
| }) | ||
|
|
||
| it('falls back to fallbackToken on the user record when the keyring is offline', async () => { | ||
| it('falls back to fallback tokens on the user record when the keyring is offline', async () => { |
There was a problem hiding this comment.
[P2] The new partial-offline path is still uncovered: when the access-slot write succeeds but refreshSecureStore.setSecret() throws SecureStoreUnavailableError, the helper is supposed to roll back the access-slot write and fall back to plaintext for both tokens. Without a test for that branch, a regression could leave split state across keyring and fallback storage.
| expect(store.getLastStorageResult()).toEqual({ storage: 'secure-store' }) | ||
|
|
||
| await expect(store.active()).resolves.toEqual({ token: 'tok_secret', account }) | ||
| await expect(store.active()).resolves.toEqual({ |
There was a problem hiding this comment.
[P2] This round-trip still only covers the access-token-only shape. The public behavior added here is store.set(account, TokenBundle) / active().bundle.refreshToken backed by the /refresh sibling slot, but there's no end-to-end store test for that integration yet. Add one case that writes a bundle with a refresh token, reads it back through active(), and verifies clear() removes the refresh slot too.
- Restore TokenStore.set(account, token: string) signature; add separate optional setBundle?(account, bundle) so custom TokenStore implementations don't need to widen their existing set() to accept TokenBundle. cli-core helpers call setBundle when available, otherwise set(token). - Revert the ExchangeResult.expiresAt → accessTokenExpiresAt rename. The old field stays, plus a new optional refreshTokenExpiresAt. Custom providers returning expiresAt keep working. - Fix concurrency bug on the force=true (reactive 401) path: if the post-lock re-read shows the access token rotated, return the fresh snapshot instead of POSTing refresh with a now-stale refresh token (which would yield invalid_grant on servers with refresh rotation). - Require explicit lockPath from the caller; drop the implicit cast on getRecordsLocation. Avoids silently failing to acquire locks when consumers pass `~`-prefixed recordsLocation per the doc. - Add hasRefreshToken bit on UserRecord; gate the refresh-slot keyring read on it so accounts without a refresh token don't pay a second IPC round-trip per `active()` call. Parallel-fetch access + refresh secrets via Promise.all. - record-write: roll back the access slot when the refresh-slot write throws (any error class) so we never leave an orphan access credential with no matching user record. A non-keyring refresh-slot setSecret rethrows; a keyring-unavailable refresh-slot rolls back to the fallback record so both tokens travel together. - record-write: treat a refresh-slot delete failure (when writing a no-refresh bundle) as a write failure — otherwise a stale refresh token from an earlier login can survive and be served by active(). - Extract shared postToTokenEndpoint helper in PKCE provider so exchangeCode and refreshToken share fetch + status handling + JSON parsing instead of duplicating it. - refresh.ts: use requireSnapshotForRef instead of reimplementing snapshot lookup. - migrate.ts: use the exported refreshAccountSlot helper instead of manually interpolating the suffix. - runOAuthFlow: call setBundle when the store implements it (preserves refresh/expiry on login) and fall back to set(accessToken) otherwise. - Update README "What's in it" + add a Silent OAuth refresh section covering refreshAccessToken, TokenBundle, lockPath, and the typed AUTH_REFRESH_* error codes. Tests: - flow.test fake store now keeps the full TokenBundle (and implements setBundle) so refresh/expiry persistence is asserted, not silently discarded. New test asserts the bundle round-trip end-to-end through runOAuthFlow. - refresh.test: drop the hardcoded /tmp records path; tests now use the tempDir-scoped lockPath. New tests cover the force+concurrency scenario (losing process must return the rotated snapshot without POSTing) and the non-force lock contention path. Removed signature- restating comment. - record-write.test: new tests for partial-offline (access succeeds, refresh slot offline → both fall back to record), refresh-slot setSecret non-keyring error (rollback + rethrow), refresh-slot delete failure on no-refresh bundle. - token-store.test: new bundle round-trip test (setBundle → active → clear with refresh in the sibling slot) and hot-path optimisation test asserting the refresh slot isn't read when hasRefreshToken=false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed all 19 findings from the previous review. Summary of changes in Contract
Correctness
Hot path
Footprint
Tests (375 → 383)
Docs
Ready for another pass when you have a moment. |
doistbot
left a comment
There was a problem hiding this comment.
This PR introduces highly anticipated OAuth refresh-token capabilities to the core authentication surface, enabling consumers to seamlessly extend sessions in the background. The additions thoughtfully expand the existing contracts while maintaining backward compatibility for current integrations. A few adjustments are needed to polish the implementation, primarily around preventing the legacy migration from inadvertently deleting newer refresh slots, avoiding unintended default-account promotion during silent refreshes, and centralizing some duplicated logic across the persistence and error-handling paths.
| // Legacy single-user state never carried a refresh token, but | ||
| // wire the sibling slot anyway so a defensive delete clears any | ||
| // junk that may have been parked there by a hand-edit. | ||
| refreshSecureStore: createSecureStore({ |
There was a problem hiding this comment.
[P1] Wiring the sibling refresh slot into this access-only migration makes retry runs destructive. migrateLegacyAuth explicitly retries after marker-write-failed, and by then the same account may already have logged in through the v2 flow and stored a valid /refresh secret. Passing { accessToken: legacyToken.token } through writeRecordWithKeyringFallback will now delete that newer refresh slot and flip hasRefreshToken back to false, silently disabling refresh for the account. Please skip the refresh-slot purge in the legacy-migration path (or no-op when the target account already exists) instead of treating the legacy token as authoritative about refresh state.
| try { | ||
| await refreshSecureStore.deleteSecret() | ||
| } catch (error) { | ||
| try { |
There was a problem hiding this comment.
[P2] The rollback logic in this catch block exactly duplicates the error handling on lines 77-83. Consider extracting a local rollbackAccess(error) helper to share this logic and prevent the two fallback paths from drifting.
| refreshTokenExpiresAt: bundle.refreshTokenExpiresAt, | ||
| hasRefreshToken: Boolean(trimmedRefresh), | ||
| } | ||
| : { |
There was a problem hiding this comment.
[P3] Both branches duplicate the mapping of the core record fields. Consider extracting a base object to share account, hasRefreshToken, and the expiry fields, then spreading it into the fallback branch to reduce duplication.
| } | ||
| return { | ||
| accessToken: payload.access_token, | ||
| accessToken: payload.access_token!, |
There was a problem hiding this comment.
[P3] The translation of the JSON payload to the ExchangeResult shape (including the expiresAt timestamp calculation) duplicates the logic in exchangeCode on lines 122-128. Extracting this to a shared mapping helper would keep the response parsing centralized.
|
|
||
| try { | ||
| await options.store.set(account, exchange.accessToken) | ||
| // Use setBundle when the store implements it so refresh + expiry |
There was a problem hiding this comment.
[P2] This login path now open-codes the same setBundle ? setBundle(...) : set(...) persistence policy that refresh.ts already centralizes in persistBundle(). Please move that helper to a shared internal utility and reuse it here so login and refresh can't drift on how bundles are stored.
| await secureStoreFor(record.account).deleteSecret() | ||
| lastClearResult = | ||
| record.fallbackToken !== undefined ? fallbackClear : { storage: 'secure-store' } | ||
| await accessStoreFor(record.account).deleteSecret() |
There was a problem hiding this comment.
[P2] The deleteSecret calls for the access and refresh slots are independent but execute sequentially, doubling the IPC latency during clear(). Consider running them concurrently via Promise.allSettled to reduce the cleanup overhead.
| state.last = { account, token } | ||
| state.last = { account, bundle: { accessToken: token } } | ||
| }, | ||
| async setBundle(account, bundle) { |
There was a problem hiding this comment.
[P2] Adding setBundle() to this fake store makes the existing abort-path test later in the file weaker: it still spies only on store.set, so a regression where runOAuthFlow persists after abort via setBundle would now pass. Please assert against both write methods (or assert store.last stays undefined) so the test still protects the “no persistence after abort” behavior.
| const { provider, getRedirect } = instrument({ | ||
| exchangeCode: async () => ({ | ||
| accessToken: 'tok-1', | ||
| refreshToken: 'rt-1', |
There was a problem hiding this comment.
[P2] This new persistence regression test still leaves refreshTokenExpiresAt unexercised. Since runOAuthFlow now forwards that field too, a regression dropping only the refresh-token expiry would still pass here. Add a concrete refreshTokenExpiresAt in the fixture and assert it in store.last?.bundle.
| expect(result.account).toEqual(account) | ||
| }) | ||
|
|
||
| it('throws AUTH_REFRESH_EXPIRED on 400 invalid_grant (refresh token revoked or expired)', async () => { |
There was a problem hiding this comment.
[P2] The new refresh classification coverage only exercises the 400 half of the documented “400/401 + invalid_grant => AUTH_REFRESH_EXPIRED” behavior. Providers commonly use 401 for revoked refresh tokens, so add a 401 invalid_grant case here to keep that branch from regressing unnoticed.
| // Sidecar O_EXCL lock so two parallel CLI invocations don't both | ||
| // POST refresh and race the server's rotation logic. Pass an | ||
| // already-expanded absolute path (no `~`). | ||
| lockPath: `${getConfigPath('todoist-cli')}.refresh.lock`, |
There was a problem hiding this comment.
[P3] This new README usage block is not self-contained as written: it calls getConfigPath() here, but the snippet only imports refreshAccessToken. Since README is the documented public API surface in this repo, please either import getConfigPath from @doist/cli-core or rewrite the example to use a precomputed lockPath.
Correctness: - migrate.ts: pass `purgeRefreshSlot: false` through writeRecordWithKeyring Fallback so a legacy-migration retry after `marker-write-failed` doesn't destructively wipe a /refresh secret that may have been written by a v2 login between the failed marker and the retry. The legacy access-only token has no authority over refresh state. - token-store: `setBundle()` no longer triggers the default-promotion side effect that `set()` (login) has. Silent refresh on a config with no pinned default would otherwise silently pin the refreshed account, mutating account selection without user intent. - refreshAccessToken: honour `exchange.account` returned by a provider's refreshToken (e.g. for a server-side rename); fall back to the pre-refresh account. PKCE provider's refreshToken stops fabricating an `account` since the in-repo caller now threads it through directly. - token-store: clear() runs the access + refresh keyring deletes via Promise.allSettled instead of sequentially — halves clear() latency. Refactor: - record-write: extracted `rollbackAccess(error)` helper to dedupe the two-branch rollback logic. Spread a `baseRecord` into the offline- fallback branch so the shared fields don't duplicate. - pkce: extracted `mapTokenPayloadToExchangeResult` so the OAuth payload shape lives in one place for both exchangeCode + refreshToken. - New `auth/persist.ts` exports `persistBundle(store, account, bundle)`. Both `runOAuthFlow` and `refreshAccessToken` now call it instead of open-coding the `setBundle ? … : set(…)` policy — login and silent refresh can no longer drift on how bundles get stored. Tests (383 -> 386): - flow.test abort-path now also spies on `setBundle` and asserts `store.last === undefined`, so a regression where runOAuthFlow persists after abort via the new bundle path can't slip through. - flow.test bundle-persistence test exercises `refreshTokenExpiresAt` (was unasserted) so a regression dropping just refresh-expiry would no longer pass. - pkce.test refresh-expiry case parameterised over 400 + 401, since many OAuth servers and reverse proxies return 401 for revoked refresh tokens. - token-store.test: new test asserting setBundle does NOT promote default — guards the silent-refresh-account-mutation regression. - record-write.test: new test for `purgeRefreshSlot: false` — pre- existing refresh secret survives and `hasRefreshToken` is left unset (we have no authority over it from a legacy access-only token). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 2 fixes in Correctness
Refactor / dedup
Tests (383 → 386)
Ready for another pass. |
doistbot
left a comment
There was a problem hiding this comment.
This PR introduces valuable OAuth refresh-token capabilities to cli-core, extending the authentication surface to silently handle session extensions for short-lived tokens. The implementation establishes a solid foundation that will greatly improve the user experience by reducing manual logins. The provided feedback highlights a few logical adjustments needed to prevent migration paths from inadvertently hiding or overwriting refresh tokens, alongside suggestions to optimize rollback concurrency, consolidate bundle mappers, and expand test coverage to meet module layout guidelines.
| For OAuth providers whose servers issue refresh tokens (Outline, anything ALCs / DCR-based), `refreshAccessToken` keeps users signed in past the access token's expiry without re-running `<cli> auth login`. Wire it through your CLI's `getApiToken()` so every authenticated request consults the same code path. The `createPkceProvider` ships with a `refreshToken` implementation; bespoke providers add their own. | ||
|
|
||
| ```ts | ||
| import { refreshAccessToken } from '@doist/cli-core/auth' |
There was a problem hiding this comment.
[P3] The snippet is still missing the getConfigPath import requested in the previous review. Since this is the public documentation, please add getConfigPath (from @doist/cli-core/config) to the imports here.
| throw error | ||
| } | ||
| }) | ||
| const refreshPromise = record.hasRefreshToken |
There was a problem hiding this comment.
[P1] When purgeRefreshSlot: false is used (e.g. during legacy migration), the written record leaves hasRefreshToken: undefined. Because this ternary checks for strict truthiness, undefined causes it to skip reading the refresh slot entirely. This effectively hides the v2 refresh token from the CLI, which is the exact destructive behavior purgeRefreshSlot: false was meant to prevent. Change this condition to record.hasRefreshToken !== false so that undefined safely falls back to checking the slot.
| async setBundle(account, bundle) { | ||
| // No default promotion: a silent refresh shouldn't mutate | ||
| // account selection. The caller already chose this account. | ||
| await persistBundle(account, bundle, false) |
There was a problem hiding this comment.
[P1] Disabling default promotion unconditionally inside setBundle() fixes the silent refresh issue, but it inadvertently breaks the explicit login path because runOAuthFlow now also persists via setBundle() (via the shared persistBundle helper). This means new explicit logins will no longer promote the user to default. Consider adding an optional promoteDefault?: boolean flag to TokenStore.setBundle (and threading it through persistBundle) so runOAuthFlow can explicitly request promotion while refreshAccessToken skips it.
| account, | ||
| token: legacyToken.token, | ||
| bundle: { accessToken: legacyToken.token }, | ||
| purgeRefreshSlot: false, |
There was a problem hiding this comment.
[P1] purgeRefreshSlot: false preserves the sibling secret, but this retry path still overwrites any existing v2 record with one whose hasRefreshToken and expiry fields are undefined. UserRecordStore.upsert is replace-not-merge, and active() only reads the /refresh slot when hasRefreshToken is truthy, so a marker-write-failed retry still disables silent refresh for an account that logged in via v2 in the meantime. Skip this write when the target account already exists, or carry forward the existing refresh metadata on the migration path.
| } catch { | ||
| // best-effort — the user record failure is the real cause | ||
| } | ||
| if (wroteRefreshSecurely) { |
There was a problem hiding this comment.
[P2] The access and refresh secret deletes are executed sequentially in this rollback block, duplicating the try/catch logic and doubling the IPC latency on failure. This mirrors the issue previously fixed in token-store.ts clear(). Consider running both deletes concurrently via Promise.allSettled to reuse the optimized cleanup pattern.
| * pin the refreshed account as the new default — refresh is a | ||
| * credential-rotation operation, not an account-selection signal. | ||
| */ | ||
| async function persistBundle( |
There was a problem hiding this comment.
[P3] This local function shares the exact name persistBundle with the newly introduced shared helper in src/auth/persist.ts. To avoid confusion when navigating the codebase, consider renaming this internal method to writeBundle or storeBundle.
| state.last = { account, token } | ||
| state.last = { account, bundle: { accessToken: token } } | ||
| }, | ||
| async setBundle(account, bundle) { |
There was a problem hiding this comment.
[P2] All of the new runOAuthFlow coverage now uses a store that implements setBundle(), so the backward-compatible fallback added in persistBundle() (set(account, bundle.accessToken)) is still unexercised. Add one login test with a legacy TokenStore that omits setBundle and assert persistence still goes through set(), otherwise a regression would break custom stores despite the PR's compatibility goal.
| expect(setDefaultSpy).not.toHaveBeenCalled() | ||
| }) | ||
|
|
||
| it('skips the refresh-slot keyring read when hasRefreshToken is false (hot-path optimisation)', async () => { |
There was a problem hiding this comment.
[P2] This only covers the hasRefreshToken: false fast path. The companion branch introduced in active()—hasRefreshToken: true with the sibling refresh-slot read failing—is still untested, even though the implementation intentionally downgrades that case to “no refresh token” so valid access-token lookups keep working. Add a case with refreshKeyring.getSpy rejecting and assert store.active() still returns the access token with bundle.refreshToken === undefined.
| }, | ||
| ) | ||
|
|
||
| it('throws AUTH_REFRESH_TRANSIENT on non-invalid_grant errors (5xx, network)', async () => { |
There was a problem hiding this comment.
[P2] The new refresh-error coverage still misses the documented AUTH_REFRESH_TRANSIENT branch for a 2xx/non-JSON token response. exchangeCode() already has that shape covered, but refreshToken() passes a different error code into the shared helper, so a regression here would slip through. Add a 200 text/html (or malformed JSON) refresh response and assert AUTH_REFRESH_TRANSIENT.
| @@ -0,0 +1,25 @@ | |||
| import type { AuthAccount, TokenBundle, TokenStore } from './types.js' | |||
There was a problem hiding this comment.
[P2] Per AGENTS.md, every module file must have a colocated test file with the same name. Although this is a small shared helper already exercised by other suites, please add a src/auth/persist.test.ts file to maintain the documented module layout standard.
Correctness (3 P1):
- token-store: `active()`'s refresh-slot gate now `record.hasRefreshToken
!== false` (was truthy). A legacy-migration record leaves the field
`undefined` ("unknown") and the truthy check would otherwise hide a v2
refresh secret that was put in the sibling slot by a v2 login — the
exact bug `purgeRefreshSlot: false` was meant to prevent.
- token-store: thread an optional `{ promoteDefault?: boolean }` through
`setBundle`. `runOAuthFlow` passes `true` so explicit login still
auto-pins the first account as default; `refreshAccessToken` omits it
so silent refresh never mutates account selection.
- migrate.ts: skip the `writeRecordWithKeyringFallback` call entirely
when the v2 record for the account already exists. `upsert` is
replace-not-merge, so writing a legacy access-only bundle would clobber
the v2 record's `hasRefreshToken` / `accessTokenExpiresAt` and silently
disable silent refresh.
Refactor / dedup (P2/P3):
- record-write: upsert-failure rollback uses Promise.allSettled for the
two keyring deletes — matches the optimised cleanup in `clear()`.
- New `bundleFromExchange(exchange, prev?)` helper in `auth/persist.ts`.
Both `runOAuthFlow` and `refreshAccessToken` use it so a new
ExchangeResult field can't drift between login and refresh.
- token-store: rename internal `persistBundle` → `writeBundle` so it
doesn't shadow the shared module's `persistBundle` from `persist.ts`.
- token-store: extracted `refOnlySnapshot(records)` helper so `active()`
and `setDefault()` build the explicit-ref snapshot the same way.
- flow.ts: dropped the meta comment justifying the persistence refactor
(per AGENTS.md — comments explain non-obvious business logic, not
refactoring decisions).
Tests (386 -> 397):
- New `persist.test.ts` covers the persistBundle helper (setBundle vs
set fallback, promoteDefault threading) and bundleFromExchange
(refresh-token rotation on/off, carry-forward of refresh expiry).
- flow.test: new test for the `set(token)` fallback path when a custom
store omits `setBundle` — the back-compat floor was previously
unexercised because the in-test store implements both.
- token-store.test: new test for `hasRefreshToken: true` + refresh-slot
read failure (downgrades to "no refresh", access token still served).
- token-store.test: new test for `hasRefreshToken: undefined` (legacy-
migration state) — must read the refresh slot, not skip it.
- pkce.test: new test for AUTH_REFRESH_TRANSIENT on a 2xx non-JSON
refresh response (`exchangeCode` already covered the analogous case).
README:
- Added `getConfigPath` import line to the silent-refresh example so the
snippet is actually copy-pasteable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 3 fixes in P1 — correctness (all three connected; the migration story now actually preserves v2 state)
P2 — refactor / coverage
P3 — polish
Test count: 386 → 397. |
doistbot
left a comment
There was a problem hiding this comment.
This PR introduces highly valuable OAuth refresh-token support to the authentication surface, providing a great developer experience improvement by silently extending sessions instead of forcing manual re-logins. The overall approach is well-thought-out, particularly the integration of proactive refreshing, PKCE provider updates, and concurrency management. A few refinements have been noted to further strengthen the implementation, primarily focusing on preventing race conditions during migration, ensuring the lock mechanism accurately re-reads state on timeout, narrowing error catch blocks, and tightening up types and internal exports.
| return skipped(silent, logPrefix, 'user-record-write-failed', getErrorMessage(error)) | ||
| } | ||
| const existing = existingRecords.find((r) => r.account.id === account.id) | ||
| if (!existing) { |
There was a problem hiding this comment.
[P1] This list()-then-if (!existing) check is still racy. If another process completes a v2 login for the same account after list() returns but before writeRecordWithKeyringFallback() upserts, this migration will still replace that fresh record with the legacy access-only bundle. In the keyring-offline case that can drop the newly written fallbackRefreshToken; otherwise it still wipes expiry metadata. Please make this write conditional on the record still being absent (e.g. an atomic insert-if-absent / compare-and-swap in the record store, or a revalidation step inside the write path that aborts instead of replacing an existing record).
| expect(store.setBundleCalls[0].promoteDefault).toBe(true) | ||
| }) | ||
|
|
||
| it('passes promoteDefault: undefined to setBundle when omitted (silent refresh path)', async () => { |
There was a problem hiding this comment.
[P2] This test locks in persistBundle() always passing a third-arg object ({ promoteDefault: undefined }) on the silent-refresh path instead of actually omitting the optional setBundle(..., options?) argument. That weakens the public contract for custom stores, because presence-based handling (if (options), future option branching, etc.) will now run during silent refresh even though the docs say the flag is omitted. Please have persistBundle() call setBundle(account, bundle) when promoteDefault is unset, and update this test to assert omission rather than undefined.
|
|
||
| export type KeyringTokenStore<TAccount extends AuthAccount> = TokenStore<TAccount> & { | ||
| /** Storage result from the most recent `set()` call, or `undefined` before any (and reset to `undefined` when the most recent `set()` threw). */ | ||
| /** Storage result from the most recent `set()` / `setBundle()` call, or `undefined` before any (and reset to `undefined` when the most recent call threw). */ |
There was a problem hiding this comment.
[P2] KeyringTokenStore still inherits TokenStore's optional setBundle?, even though this implementation now always provides it. That leaves the concrete return type weaker than the implementation and forces downstream non-null assertions/guards (the new tests already need store.setBundle!). Please override setBundle as required on KeyringTokenStore so createKeyringTokenStore() exposes the stronger, accurate API.
| record.hasRefreshToken !== false | ||
| ? refreshStoreFor(record.account) | ||
| .getSecret() | ||
| .catch(() => null) |
There was a problem hiding this comment.
[P2] .catch(() => null) is swallowing every refresh-slot read failure, not just keyring-unavailable cases. That means unexpected backend/programming errors get silently downgraded into "no refresh token", which makes the abstraction harder to debug and can mask real breakage. Please narrow this to the explicitly tolerated keyring-read failures (for example SecureStoreUnavailableError) and let anything else surface as an AUTH_STORE_READ_FAILED-style error.
| userRecords, | ||
| account, | ||
| bundle: { accessToken: legacyToken.token }, | ||
| purgeRefreshSlot: false, |
There was a problem hiding this comment.
[P1] Removing this flag avoids a persistent IPC latency penalty for all migrated users. Because the if (!existing) guard on line 173 already prevents the migration from overwriting a newer v2 login, opting out of the defensive delete here is redundant. By leaving purgeRefreshSlot: false, the migrated record gets hasRefreshToken: undefined, which forces active() to make a useless keyring read for the refresh slot on every single CLI command. Removing this line lets it default to true, correctly setting hasRefreshToken: false for legacy users.
| } | ||
| ``` | ||
|
|
||
| `refreshAccessToken` returns the active credentials unchanged when the stored access token has more than `skewMs` (default 60s) of life left. When refresh is needed, it acquires the file lock, POSTs `grant_type=refresh_token` via the provider's `refreshToken` method, persists the new bundle through `store.setBundle ?? store.set`, and returns the new token. On contention it waits up to `lockTimeoutMs` (default 2s); if another process refreshed first it returns the rotated snapshot without firing its own POST — even on the `force` path — because POSTing with the now-rotated refresh token would yield `invalid_grant`. |
There was a problem hiding this comment.
[P1] This contention-path description is stronger than the current implementation. When acquireFileLock() times out, refreshAccessToken() gets null and skips the post-wait store re-read, so the waiter still sends its own refresh POST instead of reusing the winner's rotated snapshot. That means extra network work under contention and defeats the main load-saving benefit of the lock. Either add the re-read on the timeout path or soften this claim.
| state.last = { account, token } | ||
| state.last = { account, bundle: { accessToken: token } } | ||
| }, | ||
| async setBundle(account, bundle) { |
There was a problem hiding this comment.
[P2] fakeStore().setBundle drops the new third argument, so none of the added runOAuthFlow coverage proves that explicit login is actually opting into promoteDefault: true. If that flag is accidentally removed from flow.ts, this suite still stays green. Consider recording setOptions here and asserting it in one of the new persistence tests.
| // refresh slot mock; everything else in the access slot mock. Keeps | ||
| // single-user tests' assertions about `keyring.deleteSpy` honest by | ||
| // isolating refresh-slot side effects. | ||
| mockedCreateSecureStore.mockImplementation(({ account }) => |
There was a problem hiding this comment.
[P3] This fixture now hard-codes the /refresh suffix to distinguish the sibling slot. That duplicates the store's slot-formatting detail inside the test, so a future rename will break the fixture rather than the behavior under test. Prefer routing via refreshAccountSlot() here.
| // refresh wins; when the caller asked us not to touch the refresh slot | ||
| // and the bundle has none, we have no authority to flip the bit so | ||
| // leave the field unset (the upsert is `replace, not merge`, but | ||
| // `undefined` is the contract's "I don't know" — readers fall through |
There was a problem hiding this comment.
[P3] The comment says undefined means "I don't know" — readers fall through to the access-token-only path. This slightly contradicts the actual reader behavior: token-store.ts explicitly checks the slot when hasRefreshToken is undefined (record.hasRefreshToken !== false), only falling back to access-token-only if that read yields nothing. Consider updating this to clarify that readers will still check the slot.
| ): boolean => account.id === ref || account.label === ref | ||
|
|
||
| /** Sibling keyring slot for the refresh token. Single source of truth for the wire format. */ | ||
| export function refreshAccountSlot(accessSlot: string): string { |
There was a problem hiding this comment.
[P2] refreshAccountSlot is exported even though it's only used by sibling implementation code and isn't part of the package surface or test API. This repo's guidance avoids dead/internal-only exports; please make it file-local or move it to a shared private helper module instead of exporting it from token-store.ts.
Correctness (3 P1): - migrate.ts no longer races between the existence check and the upsert. New optional `UserRecordStore.tryInsert(record): Promise<boolean>` contract lets consumers commit to an atomic insert-if-absent; migrate uses it when available, falls back to list-then-write otherwise (small race window, acceptable for postinstall-style invocations). - migrate.ts drops `purgeRefreshSlot: false`. The existence guard already prevents clobbering v2 refresh state, and opting out caused the migrated record to ship `hasRefreshToken: undefined`, forcing a useless keyring round-trip on every CLI command. Now the legacy record persists `hasRefreshToken: false` and `active()` skips the refresh slot for migrated users. - refresh.ts re-reads the store after a lock-acquisition TIMEOUT, not just on successful acquisition. The lock holder may have completed the refresh while we were waiting; without the re-read the waiter still POSTs its own refresh and defeats the lock's main load-saving purpose. Brings runtime behaviour in line with the README claim. P2: - `persistBundle` omits the `setBundle` third arg entirely when `promoteDefault` is unset (silent-refresh path). Lets custom stores using presence-based handling distinguish login from refresh on the option-presence axis. - `KeyringTokenStore.setBundle` overridden as required (the impl always provides it), so callers/tests can drop the `store.setBundle!` non-null assertion. - Refresh-slot read `.catch` narrowed to `SecureStoreUnavailableError` — unexpected backend/programming errors propagate as `AUTH_STORE_READ_FAILED` instead of silently downgrading to "no refresh". - `flow.test` fake store now captures the literal `setBundle` options arg. New assertion proves `runOAuthFlow` opts into `promoteDefault: true` on the explicit login path. - `refreshAccountSlot` moved to a new internal `keyring/slot-naming.ts` module (not re-exported). Production code + tests + migrate all import from there, so the wire format is single-sourced. P3: - `token-store.test` fixture routes via `refreshAccountSlot()` instead of hard-coding `/refresh`. - `record-write.ts` comment about `hasRefreshToken: undefined` corrected to reflect that readers DO probe the slot (vs the previous wording that suggested they fall straight through to access-only). Tests (397 -> 400): - migrate.test: new tests for the `tryInsert` happy path (atomic insert → record + access slot updated) and the "v2 record already exists" path (`tryInsert` returns false → v2 record and both keyring slots left untouched). - refresh.test: new test for the lock-timeout re-read (lock held by another process → acquireFileLock returns null → re-read still surfaces the rotated snapshot, no duplicate POST). - persist.test rewritten to assert literal omission of the `setBundle` options arg on the silent-refresh path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 4 fixes in P1 — correctness
P2
P3
Tests: 397 → 400. |
doistbot
left a comment
There was a problem hiding this comment.
This PR introduces valuable OAuth refresh-token capabilities to the auth surface, seamlessly integrating proactive background refreshing and expanded TokenStore contracts. Extending these sessions silently is a great quality-of-life improvement that will significantly reduce re-login friction for users. Before merging, there are a few important edge cases to address regarding corrupted record states during offline migration and logout operations, alongside some needed test coverage for failure states, a hot-path performance fix for older records, and minor cleanup around dead code and project conventions.
| ) | ||
| }, | ||
| const accessToken = rawAccess?.trim() | ||
| if (!accessToken) return null |
There was a problem hiding this comment.
[P1] Returning null here when the access keyring slot is empty (e.g. deleted out-of-band) introduces a regression. Because attachLogoutCommand relies on active() and throws ACCOUNT_NOT_FOUND / NOT_AUTHENTICATED on null, the user will be unable to run logout to clear the corrupted record. Restore the previous behavior of throwing new CliError('AUTH_STORE_READ_FAILED', ...) so the CLI recognizes the corrupted state and logout's error handler can clear it.
| }) | ||
| const inserted = await userRecords.tryInsert({ | ||
| account, | ||
| fallbackToken: undefined, |
There was a problem hiding this comment.
[P1] If the process crashes between tryInsert and setSecret, or if setSecret throws an unexpected error, the user record is left with fallbackToken: undefined and an empty keyring slot. Subsequent migrations will skip because tryInsert returns false, permanently breaking the CLI for this user with AUTH_STORE_READ_FAILED. Write fallbackToken: legacyToken.token initially during tryInsert, then upsert to clear the fallback token only after setSecret succeeds.
| // `active()` skips the refresh-slot round-trip per command. | ||
| // When the record already exists, `tryInsert` returns false | ||
| // and we leave the v2 record alone. | ||
| const refreshStore = createSecureStore({ |
There was a problem hiding this comment.
[P3] The createSecureStore instantiations for the access and refresh slots are duplicated across the tryInsert and else branches. Consider creating accessStore and refreshStore once before the if (userRecords.tryInsert) block to reuse the instances and reduce boilerplate.
| * has no authority over refresh state and must not erase it. Defaults | ||
| * to `true` (the safe default for fresh logins). | ||
| */ | ||
| purgeRefreshSlot?: boolean |
There was a problem hiding this comment.
[P2] purgeRefreshSlot appears to be dead code. The comment mentions it is used by migrateLegacyAuth, but migrateLegacyAuth was refactored (using tryInsert and an existence check) and no longer passes purgeRefreshSlot: false. Consider removing this parameter and simplifying the hasRefreshToken and defensive delete logic below.
| const { keyring, refreshKeyring, store, state } = fixture() | ||
| const expiresAt = Date.now() + 3_600_000 | ||
|
|
||
| await store.setBundle!(account, { |
There was a problem hiding this comment.
[P3] The non-null assertion ! is unnecessary here (and on line 167). store is inferred as KeyringTokenStore<Account>, which explicitly overrides setBundle to be NonNullable specifically so consumers and tests can omit this assertion.
| } else { | ||
| const existing = (await userRecords.list()).find((r) => r.account.id === account.id) | ||
| if (!existing) { | ||
| await writeRecordWithKeyringFallback({ |
There was a problem hiding this comment.
[P2] This non-tryInsert fallback still relies on writeRecordWithKeyringFallback()'s default purgeRefreshSlot: true. For consumers that don't implement tryInsert, the accepted list()→write race can still erase a refresh secret that a parallel v2 login stored moments earlier, even though this PR added purgeRefreshSlot: false specifically so migration doesn't claim authority over refresh state. Please pass purgeRefreshSlot: false here.
| // token". The access-token path runs in parallel and its own | ||
| // catch maps a keyring-offline failure to `AUTH_STORE_READ_FAILED`. | ||
| const refreshPromise = | ||
| record.hasRefreshToken !== false |
There was a problem hiding this comment.
[P2] Treating hasRefreshToken: undefined as "try the slot" is correct for the migration-retry case, but it also means every pre-refresh keyring-backed record written by older versions now does a second keyring read on every active() forever. Those older records never had this bit, so upgraded users without refresh tokens pick up a permanent hot-path IPC regression. Consider backfilling hasRefreshToken: false after the first null refresh-slot read (or via a one-off record migration) so the extra round-trip is only paid once.
| expect(setDefaultSpy).not.toHaveBeenCalled() | ||
| }) | ||
|
|
||
| it('downgrades to no-refresh when hasRefreshToken is true but the refresh-slot read fails', async () => { |
There was a problem hiding this comment.
[P2] This new case locks in the SecureStoreUnavailableError downgrade, but it leaves the companion branch untested: non-SecureStoreUnavailableError failures from the refresh-slot read are now supposed to propagate. Please add a sibling test with refreshKeyring.getSpy.mockRejectedValueOnce(new Error('boom')) and assert store.active() rejects, otherwise a future broad .catch(() => null) regression would slip through.
| expect(lines).not.toContain('sensitive') | ||
| }) | ||
|
|
||
| it('uses atomic tryInsert when the consumer supplies it (no race on the existence check)', async () => { |
There was a problem hiding this comment.
[P2] The new tryInsert coverage only exercises the all-online happy path and the false/existing-record path. The new atomic branch also has its own keyring-offline fallback (tryInsert succeeds, then the per-user setSecret throws SecureStoreUnavailableError and migration must upsert a fallbackToken). Please add that case here; the older offline-migration tests go through the pre-tryInsert path and won’t catch regressions in this new branch.
| @@ -0,0 +1,12 @@ | |||
| /** | |||
There was a problem hiding this comment.
[P2] This new sibling module ships without a colocated slot-naming.test.ts. The repo’s module layout expects files under src/<area>/ to follow the same colocated-test rule, so please either add a small direct test for the slot-format helper or keep this logic inside an already-tested module.
Correctness (2 P1): - migrate.ts: two-phase write in the tryInsert path. Phase 1 inserts the record with `fallbackToken: legacyToken.token` so a crash between tryInsert and setSecret leaves a self-sufficient record (still readable via the plaintext fallback). Phase 2 moves the secret into the keyring, then upserts to clear the fallback. Without this, a crash mid-migration could permanently brick the user with AUTH_STORE_READ_FAILED on every subsequent retry. - token-store.ts: AUTH_STORE_READ_FAILED is now thrown directly from `readBundleForRecord` instead of smearing the corrupted-state signal across two functions (caller used to translate null into the throw). attachLogoutCommand's existing catch still recognises the code and clears the corrupted record. P2: - migrate.ts non-tryInsert fallback now passes `purgeRefreshSlot: false`. Consumers without atomic insert still see a small list-then-write race, and the legacy access-only bundle has no authority over refresh state even on that path. - token-store: backfill `hasRefreshToken: false` best-effort on the record after the refresh slot read returns null for an undefined bit (pre-PR records). Subsequent active() calls skip the extra IPC. Backfill is fire-and-forget and tolerates a vanishingly rare race with a concurrent setBundle. - Refresh-slot read tolerance is now tightened: only SecureStoreUnavailableError downgrades to "no refresh"; other errors propagate so a programming bug can't silently turn into "no refresh token available". - Drop `store.setBundle!` non-null assertions in token-store.test (KeyringTokenStore.setBundle is required, the `!` was redundant). - migrate.ts: share `accessStore` + `refreshStore` createSecureStore instances across both branches (dedups the per-branch instantiation). Tests (400 -> 405): - token-store.test: new test for the non-keyring refresh-slot read failure propagating (companion to the SecureStoreUnavailableError downgrade test). - token-store.test: new test asserting the `hasRefreshToken: false` backfill on first null refresh-slot read for pre-PR records. - migrate.test: new test for the tryInsert keyring-offline fallback (setSecret fails → record keeps the plaintext fallback, follow-up upsert doesn't run). - New colocated `slot-naming.test.ts` per AGENTS.md (pins the wire format + guards the fixture-routing suffix invariant). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 5 fixes in P1 — correctness
P2
P3
Tests: 400 → 405. |
doistbot
left a comment
There was a problem hiding this comment.
This PR introduces a robust mechanism for OAuth refresh-token handling across the auth surface, thoughtfully extending stores and providers to handle session renewals without breaking existing consumers. The implementation establishes a solid foundation for token lifecycles and backward compatibility, though a few edge cases around concurrency and state synchronization need tightening. Specifically, there are some race conditions in the migration and lock checks that could clobber concurrent updates, alongside a few missing state promotions, brittle test assertions, and a minor omission in the README's API documentation.
| const fresh = await requireSnapshotForRef(options.store, options.ref) | ||
| if (fresh) { | ||
| const freshBundle = fresh.bundle ?? { accessToken: fresh.token } | ||
| if (fresh.token !== snapshot.token) { |
There was a problem hiding this comment.
[P2] This post-lock race check only looks at fresh.token !== snapshot.token. If a provider reuses the access-token string across refreshes but rotates the refresh token or expiry, force: true will still fall through to a second refresh POST and immediately supersede the bundle the lock holder just wrote. Compare the whole bundle (at least refresh token / expiry metadata) before deciding that no other process refreshed.
| // Phase 2: move the secret into the keyring, then upsert to | ||
| // clear the fallback. If the keyring is offline we leave the | ||
| // fallback in place (the correct degraded state). | ||
| const inserted = await userRecords.tryInsert({ |
There was a problem hiding this comment.
[P1] tryInsert() only proves the record was absent at insert time; it does not give this migration exclusive ownership afterward. A concurrent v2 login that lands after this insert can have its fresh access token overwritten by setSecret(), its record reset by the later upsert(), and its refresh slot deleted below. This path still needs a CAS/ownership check before finalizing, otherwise it can clobber the very v2 login it is trying to preserve.
| // (login is foreground; silent refresh requires an already-stored | ||
| // refresh token). Failures here are swallowed: the worst case is | ||
| // the next `active()` pays the same extra IPC and tries again. | ||
| if (record.hasRefreshToken === undefined && refreshToken === undefined) { |
There was a problem hiding this comment.
[P2] This read-path backfill does a full replace upsert() from a stale snapshot, but UserRecordStore.upsert is replace-not-merge. If a concurrent login/refresh writes new expiry fields or hasRefreshToken: true before this background promise settles, this backfill can silently wipe that metadata. Please avoid doing this without a CAS/patch primitive (or drop the backfill and pay the extra probe).
|
|
||
| expect(snapshot?.token).toBe('at-from-migration') | ||
| expect(snapshot?.bundle?.refreshToken).toBe('rt-survived-migration') | ||
| expect(refreshKeyring.getSpy).toHaveBeenCalled() |
There was a problem hiding this comment.
[P2] This hasRefreshToken: undefined path never gets promoted to true after a successful refresh-slot read. In the migration-race case this test models, that means active() will keep probing the refresh keyring slot on every command forever. Please backfill hasRefreshToken: true when the slot read succeeds so affected accounts only pay the extra IPC once.
| expect(lines).not.toContain('sensitive') | ||
| }) | ||
|
|
||
| it('uses atomic tryInsert when the consumer supplies it (no race on the existence check)', async () => { |
There was a problem hiding this comment.
[P2] This test is meant to prove the tryInsert path avoids the racy existence check, but it never asserts that userRecords.list() is skipped. A regression to await list(); await tryInsert(...) would still pass here while reintroducing the race. Please wrap or spy on list() in this branch and assert it is not called.
| expect(refreshAccountSlot('user-42')).toBe('user-42/refresh') | ||
| }) | ||
|
|
||
| it('does not collapse an empty access slot to a bare suffix', () => { |
There was a problem hiding this comment.
[P2] This test locks in refreshAccountSlot('') semantics only because the token-store.test.ts fixture routes mocks with endsWith(refreshAccountSlot('')); the production code does not rely on that empty-string contract. That makes future slot-format refactors fail for fixture reasons rather than auth behavior. Prefer routing the fixture by explicit account names and drop this empty-input assertion.
|
|
||
| Errors are typed: `AUTH_REFRESH_EXPIRED` (server returned 400/401 + `invalid_grant` — refresh token revoked, forced re-login required), `AUTH_REFRESH_TRANSIENT` (network, 5xx, non-JSON — safe to retry), `AUTH_REFRESH_UNAVAILABLE` (no refresh token stored or provider doesn't implement `refreshToken`). Map these to your CLI's existing "run `<cli> auth login`" hint as makes sense. | ||
|
|
||
| `TokenBundle` is the persisted credential triple — `{ accessToken, refreshToken?, accessTokenExpiresAt?, refreshTokenExpiresAt? }`. Stores opting into refresh implement `setBundle?(account, bundle)` alongside `set(account, token)`; `createKeyringTokenStore` implements both. Stores that omit `setBundle` lose refresh + expiry metadata on persistence — `refreshAccessToken` against them will surface `AUTH_REFRESH_UNAVAILABLE` on the first attempted refresh, which is the correct degraded behaviour. |
There was a problem hiding this comment.
[P3] This README paragraph documents the new TokenStore.setBundle contract as setBundle?(account, bundle), but the exported type now also accepts options?: { promoteDefault?: boolean }. Please update the docs to include that third argument so the public API docs stay in sync and custom store implementers don’t miss the explicit-login default-promotion hook.
`tryInsert` proves the record was absent at insert time but gives no exclusive ownership afterward. A v2 login that completes between our phase-1 insert and the follow-up keyring/upsert steps would be clobbered: our setSecret would overwrite their access token, our upsert would reset their record, our refresh-slot delete would remove their refresh secret. Mitigate with ownership-check re-reads before each follow-up: - Use a unique placeholder signature on tryInsert (`fallbackToken: legacyToken.token` + `hasRefreshToken: undefined`) so the re-read can distinguish "still ours" from "v2 login took over" — no v2 login ever produces that exact shape (v2 always writes hasRefreshToken: true or false, never undefined). - Re-read before setSecret, before the post-setSecret upsert, and before the refresh-slot delete. If the record has been replaced at any point, abort the rest. If we already wrote our access token to the keyring before discovering we lost ownership, roll that write back so the v2 login's setSecret is the authoritative one. The residual race window is now bounded by the gap between a re-read and the very next call (microseconds) rather than the entire multi-step sequence. Truly atomic CAS would need a contract-level `compareAndSet`/`updateIf` op — deferred until a concrete consumer asks for it. Per the user's instruction, only the P1 is addressed this round; the five P2s and one P3 are noted in the PR comment for follow-up. Tests (405 -> 406): - migrate.test: new test exercising the race — tryInsert succeeds with our placeholder, then a v2 login completes between phases, and the ownership-check re-read aborts our keyring writes so v2's state survives intact. Also updated the existing `tryInsert` tests to reflect the new placeholder signature (hasRefreshToken: undefined) and to actually insert into the harness state so the re-reads see the placeholder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 6 — P1 only (per the user's instruction for the next few rounds; deferring P2/P3 to follow-up). P1 — Fix:
Race window now bounded by the gap between a re-read and the next call (microseconds) rather than the full multi-step sequence. True atomic CAS would need a contract-level Deferred (P2/P3) — noted for follow-up:
Tests: 405 → 406. |
doistbot
left a comment
There was a problem hiding this comment.
This PR introduces much-needed refresh-token support to the auth surface, adding proactive refresh capabilities, concurrency locks, and updated token storage contracts to minimize user re-authentication prompts. The approach effectively extends session lifecycles, though the new persistence and migration paths introduce some edge cases in state management. A few adjustments are needed to resolve race conditions in secret rollbacks and lock freshness checks, safely handle concurrent upserts without dropping metadata, and consolidate duplicated keyring fallback logic.
|
|
||
| Errors are typed: `AUTH_REFRESH_EXPIRED` (server returned 400/401 + `invalid_grant` — refresh token revoked, forced re-login required), `AUTH_REFRESH_TRANSIENT` (network, 5xx, non-JSON — safe to retry), `AUTH_REFRESH_UNAVAILABLE` (no refresh token stored or provider doesn't implement `refreshToken`). Map these to your CLI's existing "run `<cli> auth login`" hint as makes sense. | ||
|
|
||
| `TokenBundle` is the persisted credential triple — `{ accessToken, refreshToken?, accessTokenExpiresAt?, refreshTokenExpiresAt? }`. Stores opting into refresh implement `setBundle?(account, bundle)` alongside `set(account, token)`; `createKeyringTokenStore` implements both. Stores that omit `setBundle` lose refresh + expiry metadata on persistence — `refreshAccessToken` against them will surface `AUTH_REFRESH_UNAVAILABLE` on the first attempted refresh, which is the correct degraded behaviour. |
There was a problem hiding this comment.
[P3] The public API documentation for TokenStore.setBundle is missing the newly added options?: { promoteDefault?: boolean } argument.
| expect(lines).not.toContain('sensitive') | ||
| }) | ||
|
|
||
| it('uses atomic tryInsert when the consumer supplies it (no race on the existence check)', async () => { |
There was a problem hiding this comment.
[P3] The test for the tryInsert path fails to verify that the racy userRecords.list() call is actually skipped (e.g., via a spy).
| expect(refreshAccountSlot('user-42')).toBe('user-42/refresh') | ||
| }) | ||
|
|
||
| it('does not collapse an empty access slot to a bare suffix', () => { |
There was a problem hiding this comment.
[P3] The test locks in an empty-string contract (refreshAccountSlot('')) due to test fixture routing, creating unnecessary coupling.
|
|
||
| expect(snapshot?.token).toBe('at-from-migration') | ||
| expect(snapshot?.bundle?.refreshToken).toBe('rt-survived-migration') | ||
| expect(refreshKeyring.getSpy).toHaveBeenCalled() |
There was a problem hiding this comment.
[P2] The hasRefreshToken: undefined legacy migration state is never promoted to true after a successful refresh-slot read. Affected accounts will repeatedly incur the IPC penalty until this is backfilled in the implementation.
| } | ||
| ``` | ||
|
|
||
| `refreshAccessToken` returns the active credentials unchanged when the stored access token has more than `skewMs` (default 60s) of life left. When refresh is needed, it acquires the file lock, POSTs `grant_type=refresh_token` via the provider's `refreshToken` method, persists the new bundle through `store.setBundle ?? store.set`, and returns the new token. On contention it waits up to `lockTimeoutMs` (default 2s); if another process refreshed first it returns the rotated snapshot without firing its own POST — even on the `force` path — because POSTing with the now-rotated refresh token would yield `invalid_grant`. |
There was a problem hiding this comment.
[P1] This guarantee is stronger than the current implementation. refreshAccessToken() only treats the lock holder as the winner when the reread sees a different access-token string; if a provider reuses the access token but rotates the refresh token or expiry, the waiter still fires its own refresh POST and can hit invalid_grant. Either broaden the post-lock freshness check to compare the stored bundle, or soften this sentence until that case is handled.
| } | ||
| } | ||
| } else { | ||
| const existing = (await userRecords.list()).find((r) => r.account.id === account.id) |
There was a problem hiding this comment.
[P3] The logic to find the existing record (await userRecords.list()).find((r) => r.account.id === account.id) is duplicated here and in recordStillOurs() (L211). Consider extracting a local helper (e.g., getRecord()) to reuse this look-up path.
| account: refreshAccountSlot(accountSlot), | ||
| }) | ||
|
|
||
| if (userRecords.tryInsert) { |
There was a problem hiding this comment.
[P2] This new tryInsert branch is hand-rolling the same keyring-write/fallback policy that writeRecordWithKeyringFallback() already owns: move the access token into the keyring when possible, leave plaintext fallback on SecureStoreUnavailableError, clear stale refresh state, and flip hasRefreshToken. Keeping that logic in two places means future fixes to trimming, rollback, or refresh-slot handling can drift again. Please extract the ownership-check-specific pieces into a shared internal helper (or extend writeRecordWithKeyringFallback with a hook/callback) so migration and normal writes reuse the same persistence path.
| // the assertion isn't racy. | ||
| await new Promise((resolve) => setTimeout(resolve, 0)) | ||
|
|
||
| expect(upsertSpy).toHaveBeenCalled() |
There was a problem hiding this comment.
[P2] This new backfill path depends on active() doing an upsert({ ...record, hasRefreshToken: false }) from a stale snapshot. With the current UserRecordStore contract, upsert is replace-not-merge, so a concurrent login/refresh can silently drop newer metadata (hasRefreshToken: true, fresh expiry fields, fallback tokens, etc.). Please avoid mutating the whole record from the read path unless the store can do a compare-and-set or other merge-safe update.
| ) | ||
| } | ||
| const inserted = await userRecords.tryInsert(placeholder) | ||
| if (inserted && (await recordStillOurs())) { |
There was a problem hiding this comment.
[P2] Redundant file read. tryInsert already ensures the placeholder was just inserted. Immediately calling recordStillOurs() performs a redundant file read without meaningfully closing the race window before the slow setSecret operation. You can safely remove this initial check to save disk I/O.
| await accessStore.deleteSecret().catch(() => undefined) | ||
| } | ||
| } | ||
| if (await recordStillOurs()) { |
There was a problem hiding this comment.
[P1] Redundant and logically flawed check. recordStillOurs() requires hasRefreshToken === undefined, but line 229 just mutated the record to hasRefreshToken: false. This subsequent check will always fail, causing an unnecessary file read and silently skipping the best-effort cleanup of the refresh slot. Move the cleanup before the upsert or cache the ownership state.
All three P1s are direct consequences of round-6's ownership-check introduction: 1. refresh.ts post-lock re-read: also compare the refresh token, not just the access-token string. A provider that rotates only the refresh token (or reuses the access-token string) would otherwise let the waiter POST with a now-stale refresh and hit `invalid_grant`. 2. migrate.ts rollback guard: only delete the access slot when it still contains OUR legacy token. The previous unconditional delete could remove a v2 login's access token (its `setSecret` may have run after ours), leaving the preserved v2 record permanently unreadable. Reading the slot first is racy at the absolute limit but covers the realistic case (the race window is microseconds). 3. migrate.ts ownership-check caching: the post-upsert re-read of `recordStillOurs()` was guaranteed to return false because the upsert itself flipped `hasRefreshToken` away from the placeholder signature. Net effect: the refresh-slot cleanup was silently skipped on every run, plus a wasted file read. One re-read after the keyring write now gates both the upsert and the refresh-slot cleanup. Tests (406 -> 407): - migrate.test: new test for the rollback-guard. Simulates v2 login overwriting the slot mid-migration; asserts the guarded delete declines to remove v2's access token and v2's record stays intact. Deferred P2/P3 (per the user's instruction): - P2 token-store backfill upsert is replace-not-merge from stale snapshot - P2 token-store.test missing `undefined → true` promotion path - P2 migrate.ts hand-rolls keyring-write/fallback policy - P2 migrate.ts redundant tryInsert+recordStillOurs (small) - P3 README setBundle docs missing options arg - P3 migrate.test tryInsert mock duplicated across cases - P3 migrate.ts `list().find` duplicated - P3 slot-naming.test fixture-coupled Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 7 — P1 only (3 of 3 addressed in P1 #1 — refresh.ts post-lock check: now compares the refresh token too, not just P1 #2 — migrate.ts rollback guard: only delete the access slot when it still contains the legacy token. A blind delete could remove a v2 login's access token (their setSecret may have run after ours), leaving v2's preserved record permanently unreadable. Read-then-conditional-delete is still racy at the absolute limit but covers the realistic case. P1 #3 — migrate.ts ownership-check caching: the post-upsert Deferred per the user's instruction (P2/P3, noted for follow-up):
Tests: 406 → 407. |
doistbot
left a comment
There was a problem hiding this comment.
This PR successfully introduces refresh-token support and PKCE provider enhancements to the CLI core, which will significantly improve the user experience by reducing manual re-authentications. The proactive concurrency handling and backward-compatible contract updates are well-considered and cleanly implemented. A few adjustments are noted to prevent potential concurrent overwrite risks during legacy backfills, unify duplicated persistence and test logic, ensure the new options arguments are fully documented, and expand test coverage for the legacy record upgrade path.
| // refresh token). Failures here are swallowed: the worst case is | ||
| // the next `active()` pays the same extra IPC and tries again. | ||
| if (record.hasRefreshToken === undefined && refreshToken === undefined) { | ||
| void userRecords.upsert({ ...record, hasRefreshToken: false }).catch(() => undefined) |
There was a problem hiding this comment.
[P2] This fire-and-forget upsert({ ...record, hasRefreshToken: false }) writes a stale full snapshot from the earlier list() read. Because UserRecordStore.upsert is replace-not-merge, a concurrent login/refresh for the same account can be overwritten here: newly written expiry fields or hasRefreshToken: true get reset, and in the worst case a just-written refresh token becomes invisible because this flips the bit back to false. Please gate this backfill on a fresh ownership check (or a patch/CAS-style API) instead of replacing the whole record from the stale snapshot.
|
|
||
| Errors are typed: `AUTH_REFRESH_EXPIRED` (server returned 400/401 + `invalid_grant` — refresh token revoked, forced re-login required), `AUTH_REFRESH_TRANSIENT` (network, 5xx, non-JSON — safe to retry), `AUTH_REFRESH_UNAVAILABLE` (no refresh token stored or provider doesn't implement `refreshToken`). Map these to your CLI's existing "run `<cli> auth login`" hint as makes sense. | ||
|
|
||
| `TokenBundle` is the persisted credential triple — `{ accessToken, refreshToken?, accessTokenExpiresAt?, refreshTokenExpiresAt? }`. Stores opting into refresh implement `setBundle?(account, bundle)` alongside `set(account, token)`; `createKeyringTokenStore` implements both. Stores that omit `setBundle` lose refresh + expiry metadata on persistence — `refreshAccessToken` against them will surface `AUTH_REFRESH_UNAVAILABLE` on the first attempted refresh, which is the correct degraded behaviour. |
There was a problem hiding this comment.
[P3] The public contract here is still documented as setBundle?(account, bundle), but the actual API now also accepts options?: { promoteDefault?: boolean }. That omission matters for custom stores: a 2-arg implementation still type-checks, but runOAuthFlow now relies on promoteDefault: true to preserve the existing “first login becomes default” behavior. Please document the options arg here so consumers don’t accidentally drop that behavior.
| const harness = buildUserRecords<Account>() | ||
| // Stub tryInsert that ALSO actually inserts into the harness so | ||
| // the migration's ownership-check re-reads see the placeholder. | ||
| const tryInsert = vi.fn(async (record: UserRecord<Account>) => { |
There was a problem hiding this comment.
[P3] This tryInsert mock implementation is duplicated identically across multiple test cases (lines 359, 408, 461, 514). Extracting this into a shared test utility or integrating it into buildUserRecords() would reduce boilerplate.
| } | ||
| const inserted = await userRecords.tryInsert(placeholder) | ||
| if (inserted && (await recordStillOurs())) { | ||
| let movedToKeyring = false |
There was a problem hiding this comment.
[P2] This branch hand-rolls the access token keyring write, offline error swallowing, and conditional rollback logic. This duplicates the persistence policy already encapsulated in writeRecordWithKeyringFallback(). Consider extending writeRecordWithKeyringFallback to accept an ownership-check predicate or pre-upsert hook so this branch can reuse the shared helper.
| } | ||
| } | ||
| } else { | ||
| const existing = (await userRecords.list()).find((r) => r.account.id === account.id) |
There was a problem hiding this comment.
[P3] The account lookup logic (await userRecords.list()).find((r) => r.account.id === account.id) is identical to the read inside recordStillOurs() (line 211). Extracting this to a shared findRecord helper would avoid repeating the list-and-find pattern.
| }) | ||
|
|
||
| it('does not collapse an empty access slot to a bare suffix', () => { | ||
| // Defensive: `endsWith(refreshAccountSlot(''))` is how the |
There was a problem hiding this comment.
[P3] This test is unnecessarily coupled to the test fixture's string-matching implementation (endsWith). Tests should verify the module's behavior directly rather than its suitability for a specific mock routing implementation elsewhere.
| await expect(store.active()).rejects.toThrow(/boom/) | ||
| }) | ||
|
|
||
| it('backfills hasRefreshToken: false on the record when undefined and the refresh slot is empty', async () => { |
There was a problem hiding this comment.
[P2] This covers the hasRefreshToken: undefined -> false backfill, but the suite still misses the symmetric upgrade path: an existing pre-PR record being overwritten via setBundle(...) should persist hasRefreshToken: true. That's the real upgrade path for legacy records after the first successful login/refresh, and without a test a regression can leave those accounts probing the refresh slot forever.
| expect(lines).not.toContain('sensitive') | ||
| }) | ||
|
|
||
| it('uses atomic tryInsert when the consumer supplies it (no race on the existence check)', async () => { |
There was a problem hiding this comment.
[P2] The test title says we're avoiding the racy existence check, but the assertions only prove that tryInsert() ran. Since migrateLegacyAuth still legitimately calls list() later for ownership checks, a regression that reintroduced an initial list() -> tryInsert() sequence would still pass here. Please spy/stub userRecords.list() so this case distinguishes the pre-insert check from the post-insert re-reads.
P2:
- token-store backfill no longer issues `upsert({ ...staleRecord, ... })`.
New `backfillHasRefreshToken` helper re-reads the record inside the
upsert and writes only when every field still matches the snapshot we
made the backfill decision on. Avoids clobbering a concurrent
setBundle's richer state (upsert is replace-not-merge per contract).
- The same helper now also handles the symmetric `undefined → true`
promotion when the refresh slot turns out to be populated, so pre-PR
records with refresh tokens stop probing the slot every active() call.
- New `keyring/internal.ts` exports `findById` and `trySetSecret`.
migrate.ts and record-write.ts route through `trySetSecret` so the
"only SecureStoreUnavailableError downgrades to fallback" policy
lives in one place. token-store.ts + migrate.ts use `findById` to
drop the duplicated `(await list()).find(...)` pattern.
- migrate.test extracts `stubTryInsert(harness)` helper — the
"if-has-set-return-false-else-insert" tryInsert mock pattern was
identical across 3 test cases. Tests with bespoke race emulation in
their tryInsert (where the duplication wasn't actually a duplicate)
keep their inline implementation.
P3:
- README `setBundle` docs now show the full signature including
`options?: { promoteDefault?: boolean }` and explain the
presence-vs-absence contract for login vs refresh callers.
- migrate.test "tryInsert avoids the racy existence check" test now
spies on `userRecords.list()` and explicitly proves `tryInsert` ran
(the old assertions only proved the mock function fired).
- slot-naming.test asserts the module's behavior directly (the result
contains the access slot, is deterministic, and is unique per
access slot) instead of locking in the literal suffix or coupling
to the fixture's `endsWith` routing.
Tests (407 -> 410):
- New test: undefined → true promotion when refresh slot populated.
- New test: backfill skips when a concurrent write replaced the record
between the active() read and the upsert (replace-not-merge guard).
- slot-naming.test rewritten to property-based assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 8 P2/P3 follow-up in P2 — correctness/safety
P3 — polish
Tests: 407 → 410. |
doistbot
left a comment
There was a problem hiding this comment.
This PR thoughtfully enhances the auth surface by introducing OAuth refresh-token support to TokenStore and the PKCE provider, which will vastly improve the CLI experience for users with short-lived sessions. The implementation provides a solid foundation for seamless token management while keeping the public contracts flexible. A few areas require slight adjustments, including addressing atomicity and rollback safety during keyring token updates, handling crash-recovery states in the migration path, consolidating persistence logic for better performance, and ensuring new module layouts and concrete wire formats have comprehensive test coverage.
| // risk a destructive delete. | ||
| } | ||
| } | ||
| if (stillOurs) { |
There was a problem hiding this comment.
[P1] stillOurs is computed before the upsert({ account, hasRefreshToken: false }) call and then reused here to decide whether to clear the refresh slot. If a concurrent v2 login lands after that upsert but before this delete, its newly-written refresh token gets wiped while its record now says hasRefreshToken: true, so the account can no longer refresh. Please re-check ownership after the upsert using a post-upsert signature (or otherwise guard the delete on current ownership) before calling refreshStore.deleteSecret().
| account: refreshAccountSlot(accountSlot), | ||
| }) | ||
|
|
||
| if (userRecords.tryInsert) { |
There was a problem hiding this comment.
[P2] This tryInsert branch now open-codes the same keyring/record finalization policy that the non-atomic branch delegates to writeRecordWithKeyringFallback. That leaves two persistence implementations to keep in sync whenever the stored shape changes again (hasRefreshToken, fallback fields, expiry metadata, refresh-slot cleanup, etc.). Consider extracting the post-insert finalization into a shared helper, or teaching writeRecordWithKeyringFallback to operate on an already-inserted placeholder, so both migration paths reuse one write policy.
| // `migrateLegacyAuth`) pass `purgeRefreshSlot: false` to opt out | ||
| // of this purge entirely. | ||
| try { | ||
| await refreshSecureStore.deleteSecret() |
There was a problem hiding this comment.
[P1] Deleting the old refresh slot before userRecords.upsert(record) commits is not rollback-safe. If the later upsert fails, the catch block only removes the new access token and never restores the deleted refresh token, so a failed no-refresh write can destroy the caller's last good refresh credential while leaving the old record in place. Please either defer this delete until after the record write succeeds or capture/restore the prior refresh secret on rollback.
| ) | ||
| } | ||
| const inserted = await userRecords.tryInsert(placeholder) | ||
| if (inserted && (await recordStillOurs())) { |
There was a problem hiding this comment.
[P2] This treats tryInsert() === false as "some other v2 record already exists", but it is also the state left by an earlier crash after the placeholder write. On that retry path we skip the keyring move/finalization here, then still call markMigrated() and clean up the legacy slot below, which permanently freezes the account in the placeholder/plaintext state. Consider detecting the placeholder signature when tryInsert returns false and resuming/normalizing it before considering the migration complete.
| if (storedSecurely) { | ||
| if (trimmedRefresh) { | ||
| try { | ||
| await refreshSecureStore.setSecret(trimmedRefresh) |
There was a problem hiding this comment.
[P2] By the time we reach this line, the access-slot setSecret has already completed, so every setBundle/silent-refresh with a refresh token now pays two keyring IPC round-trips in series before the record write. These two slot writes are independent, so consider issuing them together (and rolling back whichever succeeded on partial failure) to avoid adding avoidable latency to the refresh path.
| it('derives a per-access-slot refresh slot name that includes the access slot', () => { | ||
| // The contract: the returned slot name is a function of the | ||
| // access slot name AND uniquely identifies the refresh slot for | ||
| // it. We don't pin the exact suffix here — the wire format is |
There was a problem hiding this comment.
[P2] This test explicitly avoids pinning the exact refresh-slot slug, but that slug is persisted user state in the OS keyring. If refreshAccountSlot() ever changes from /refresh to something else, this test (and the token-store.test fixture, which also derives its routing from the helper) will still pass while existing users' stored refresh tokens become unreadable after upgrade. Please lock the concrete wire format down in at least one test so a compatibility-breaking rename fails the suite.
| @@ -0,0 +1,36 @@ | |||
| import type { AuthAccount } from '../types.js' | |||
There was a problem hiding this comment.
[P2] This new module breaks the repo’s module-layout convention that src/<area>/<file>.ts files ship with a colocated <file>.test.ts. Please either add src/auth/keyring/internal.test.ts for findById/trySetSecret or fold these helpers back into an existing tested module.
Reframing in response to the bouncing P1 count across rounds 5-9: each "fix" to the multi-step migration was patching a local symptom while opening an adjacent race window in the same area. The fundamental problem was the migration trying to move the legacy token into the per-user keyring slot under a contract (UserRecordStore) with no transactional atomicity — every workaround (ownership-check re-reads, cached `stillOurs`, guarded delete) closed one window and opened another. This change accepts a slight functional trade-off in exchange for complete race-freedom: - migrate.ts no longer writes to the per-user keyring slot. It writes a v2 record with the legacy token as `fallbackToken` and stops. The v2 record format ALREADY supports fallbackToken (it's how WSL / headless Linux works), and the runtime reads it before consulting the keyring slot. The first subsequent v2 login moves the secret into the keyring atomically via writeRecordWithKeyringFallback. - Trade-off: legacy token sits in plaintext on disk for one login cycle. It was already plaintext in v1 config, so this is deferral, not new exposure. - Eliminated: ownership-check re-reads, the placeholder signature trick, the cached stillOurs flag, the guarded rollback delete, the refresh-slot purge, and the entire "did we lose ownership" decision tree. Multi-step write becomes single-call. record-write.ts: the defensive refresh-slot purge for no-refresh bundles now happens AFTER the upsert commits (round-9 P1.2). A failed upsert can't destroy a previous refresh secret. The post-upsert delete is best-effort because the upserted `hasRefreshToken: false` already prevents readers from consulting the slot. Tests (410 → 414): - Migration tests rewritten for the simpler shape (no per-user keyring assertions). Race tests collapse to a single "concurrent v2 login is harmless" property test — the simplification removes the race surface they used to probe. - record-write test for refresh-slot delete failure rewritten: best-effort post-upsert, no rollback. - New colocated `internal.test.ts` covers `findById` + `trySetSecret`. - `slot-naming.test` restores the literal `/refresh` suffix pin — the slug is persisted state in the OS keyring, so a rename WOULD break upgraders and should be caught loudly. Compatibility notes: - `MigrateLegacyAuthOptions.accountForUser` is now @deprecated and a no-op (migration doesn't write per-user keyring slots anymore). Kept on the type for back-compat; the first subsequent v2 login respects the consumer's `createKeyringTokenStore({ accountForUser })`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 10 in The pattern, finally addressed at the source. Across rounds 5–9 the migration kept oscillating between competing races: tryInsert gives no ongoing ownership; cached Root cause: the migration was doing too much. Moving the legacy token into the per-user keyring slot atop a Change: migration now writes a v2 record with Trade-off: legacy token sits in plaintext on disk for one login cycle. But it was already plaintext in v1 config — this is deferral, not new exposure. Eliminated: ownership-check re-reads, placeholder signature trick, cached Round-9 P1.2 (record-write.ts) addressed in the same commit: the no-refresh defensive purge now runs AFTER the upsert commits, best-effort. A failed upsert can no longer destroy a previous refresh secret. Tests: 410 → 414. Most race tests deleted (no race surface left to test). New "concurrent v2 login mid-migration is harmless" property test, plus
|
doistbot
left a comment
There was a problem hiding this comment.
This PR introduces an excellent quality-of-life improvement by adding OAuth refresh-token support to the auth surface, ensuring users are no longer frequently prompted to re-authenticate. The implementation thoughtfully extends the existing TokenStore contract and introduces a robust PKCE provider to smoothly handle token expirations. There are just a few remaining details to refine around optimizing concurrent keyring writes, safeguarding the legacy token migration logic against premature deletion, cleaning up leftover state flags from previous iterations, and rounding out the test coverage and documentation.
| let storedSecurely = await trySetSecret(secureStore, trimmedAccess) | ||
|
|
||
| let wroteRefreshSecurely = false | ||
| if (storedSecurely && trimmedRefresh) { |
There was a problem hiding this comment.
[P2] The access and refresh tokens are written to the OS keyring sequentially, which doubles the IPC latency on the hot path (as flagged in Round 9 of the PR review). Issuing these writes concurrently (e.g., via Promise.allSettled) and rolling back the successful one if the other fails would avoid this latency penalty.
| // `false` means a v2 record already exists for this account | ||
| // — leave it alone (it may carry refresh + expiry that the | ||
| // legacy token has no authority over). | ||
| await userRecords.tryInsert(record) |
There was a problem hiding this comment.
[P2] This ignores the tryInsert() result and the helper still goes on to mark migration complete and delete the legacy token later. If a same-id v2 record already exists but its credential is broken/missing (for example the keyring entry was removed externally), migration will retire the still-valid legacy token and leave active() failing with AUTH_STORE_READ_FAILED. Please treat the "record already exists" case here (and the findById() short-circuit below) as a non-destructive skip unless you can verify the existing v2 record is actually readable.
| // first subsequent v2 login moves the secret atomically via | ||
| // `writeRecordWithKeyringFallback`. | ||
| expect(km.slots.has('user-1')).toBe(false) | ||
| expect(km.slots.has('user-1/refresh')).toBe(false) |
There was a problem hiding this comment.
[P3] Missed reuse: consider using the exported refreshAccountSlot('user-1') helper here (and on lines 464, 487, 509, 531) instead of hardcoding the '/refresh' string literal. This avoids duplication and ensures the test stays aligned with the single source of truth for the slot wire format.
| let wroteRefreshSecurely = false | ||
| if (storedSecurely && trimmedRefresh) { | ||
| try { | ||
| await refreshSecureStore.setSecret(trimmedRefresh) |
There was a problem hiding this comment.
[P2] This refresh-slot write open-codes the same SecureStoreUnavailableError downgrade/rethrow policy that trySetSecret() was just added to centralize for the access slot. Please route this branch through the shared helper too, or extract a small two-slot write helper, so the keyring-offline policy only lives in one place.
| const record = { | ||
| account, | ||
| fallbackToken: legacyToken.token, | ||
| hasRefreshToken: false, |
There was a problem hiding this comment.
[P2] hasRefreshToken: false is stronger than this helper can guarantee now that migration no longer reads or owns the sibling refresh slot. false tells createKeyringTokenStore.active() to skip probing that slot entirely, so in the remaining list-then-upsert race (or any retry state where a v2 refresh slot already exists) this record will hide a real refresh credential. Leave the bit unset here so the migrated record stays in the intended "unknown" state.
|
|
||
| /** | ||
| * Stub `tryInsert` for a harness so that successful inserts actually land | ||
| * in the harness state — the migration's ownership-check re-reads need to |
There was a problem hiding this comment.
[P3] The comment references the migration's ownership-check re-reads which were removed in the Round 10 simplification (migration now simply inserts a fallback record). This is a stale leftover from previous iterations and can be removed.
| expect(km.slots.get('user-1/refresh')?.secret).toBe('v2_refresh') | ||
| }) | ||
|
|
||
| it('leaves the v2 record alone when tryInsert returns false (existing v2 login)', async () => { |
There was a problem hiding this comment.
[P2] This test is functionally identical to the preceding test survives a concurrent v2 login mid-migration (race-free property) (line 441). Both seed a v2 record with a refresh token, set up keyring slots, ensure tryInsert returns false, and assert that the record and keyring are untouched. One of them should be removed to avoid test duplication.
| expect(store.getLastClearResult()).toEqual({ storage: 'secure-store' }) | ||
| }) | ||
|
|
||
| it('round-trips setBundle → active → clear with refresh token in the sibling slot', async () => { |
There was a problem hiding this comment.
[P2] This new refresh-slot round-trip still only exercises the default user-${id} naming. Since this PR adds a sibling refresh slot and keeps accountForUser as an override point, a regression where refreshStoreFor() forgets to apply the custom slug would slip past the suite: the existing custom-slug test only covers set() without a refresh token. Please add one setBundle/active/clear case with factoryOpts.accountForUser so both the custom access slot and custom refresh slot are pinned.
| expect(store.last?.setBundleOptions).toEqual({ promoteDefault: true }) | ||
| }) | ||
|
|
||
| it('falls back to set(token) when the store does not implement setBundle (custom-store back-compat)', async () => { |
There was a problem hiding this comment.
[P3] This back-compat case replays the whole OAuth flow just to prove the setBundle ?? set fallback, but src/auth/persist.test.ts already covers that branch directly. Keeping the fallback assertion at the helper layer (and relying on the other flow tests to cover that runOAuthFlow persists at all) would remove a lot of fake-provider/fake-browser setup without losing meaningful coverage.
| | `testing` (subpath) | `describeEmptyMachineOutput` | Vitest helpers reusable by consuming CLIs (e.g. parametrised empty-state suite covering `--json` / `--ndjson` / human modes). | | ||
| | Module | Key exports | Purpose | | ||
| | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||
| | `auth` (subpath) | `attachLoginCommand`, `attachLogoutCommand`, `attachStatusCommand`, `attachTokenViewCommand`, `runOAuthFlow`, `refreshAccessToken`, `createPkceProvider`, `createSecureStore`, `createKeyringTokenStore`, `migrateLegacyAuth`, PKCE helpers, `AuthProvider` / `TokenStore` / `TokenBundle` / `RefreshInput` / `AccountRef` / `SecureStore` / `UserRecordStore` types, `AttachLogoutRevokeContext` | OAuth runtime plus the Commander attachers for `<cli> [auth] login` / `logout` / `status` / `token`. `attachLogoutCommand` accepts an optional `revokeToken` hook for best-effort server-side token revocation. Ships the standard public-client PKCE flow (`createPkceProvider` — implements both `exchangeCode` and the optional `refreshToken` grant), a thin cross-platform OS-keyring wrapper (`createSecureStore`), and a multi-account keyring-backed `TokenStore` (`createKeyringTokenStore`) that stores access + refresh secrets in the OS credential manager (refresh in a sibling slot, gated by a `hasRefreshToken` bit on the user record so accounts without one don't pay an extra keyring round-trip per command) and degrades to plaintext in the consumer's config when the keyring is unavailable (WSL/headless Linux/containers). `refreshAccessToken` drives proactive (skew window) and reactive (`force: true`, for 401 retries) silent re-auth, with optional sidecar `lockPath` for cross-process safety. `AuthProvider` and `TokenStore` remain the escape hatches for DCR or fully bespoke backends. `logout` / `status` / `token` always attach `--user <ref>` and thread the parsed ref to `store.active(ref)` (and `store.clear(ref)` on `logout`). `commander` (when using the attachers), `open` (browser launch), and `@napi-rs/keyring` (when using `createSecureStore` or the keyring `TokenStore`) are optional peer/optional deps. | |
There was a problem hiding this comment.
[P2] This auth export list still omits the newly exported RefreshAccessTokenOptions type from src/auth/index.ts. Repo guidance requires README updates to reflect newly added public exports in the same PR, so please add it here (or document it elsewhere in the auth API section).
Summary
Adds OAuth refresh-token support to cli-core's auth surface so consumers can silently extend OAuth sessions instead of forcing the user to re-run
<cli> auth loginon every access-token expiry.Motivated by Doist/outline-cli#74: Outline workspaces with short-lived OAuth access tokens were pushing users into multiple
ol auth loginruns per week even though Outline issues refresh tokens at login.What's new
Contract additions (all non-breaking)
TokenStore.active()now returns an optionalbundle?: TokenBundlealongsidetoken/account. Stores that don't track refresh state can keep returning the old shape; the built-in keyring store always populates it.TokenStore.set()acceptsstring | TokenBundle(parameter widening — existing string callers still work).AuthProvidergained an optionalrefreshToken?(input)method.UserRecordgainedaccessTokenExpiresAt,refreshTokenExpiresAt,fallbackRefreshToken(all optional).status.fetchLivecontext gainedbundleso consumers can render expiry.New behaviour
refreshAccessToken({ store, provider, ref?, skewMs?, force? })— proactive (refresh when within 60s of expiry by default) and reactive (force: trueafter a server 401) refresh. Persists the new bundle back to the store. Falls through toAUTH_REFRESH_UNAVAILABLEfor stores/providers that can't refresh.O_EXCLfile lock on${recordsLocation}.refresh.lockso two parallel CLI invocations don't both POST refresh and race the server's rotation logic. On contention waits up to 2s, then re-reads — if the holder refreshed already, returns its result.createPkceProviderimplementsrefreshToken: POSTgrant_type=refresh_tokento the existingtokenUrl, no client_secret, distinguishesAUTH_REFRESH_EXPIRED(400/401invalid_grant) fromAUTH_REFRESH_TRANSIENT(5xx, network).TokenStorewrites refresh tokens to a sibling slot (<account>/refresh) and exposesgetRecordsLocation()so refresh helpers can derive the lock path without re-plumbing options.runOAuthFlowpersists the full bundle (access + refresh + expiry) on login.Test plan
npm run type-checknpm test— 375 passed (was 364), incl. 11 new tests for the PKCE refresh grant + the refresh helper (proactive, force, no-refresh-token, lock re-read, expiry pass-through)npm run check(oxlint + oxfmt)npm run buildBackward compatibility
No published API breaks. Custom consumer
TokenStoreimplementations that return just{ token, account }continue to type-check and run. The newbundlefield is optional both on the type and in runtime fall-throughs.🤖 Generated with Claude Code