fix: [ENG-2548] activate byterover provider on connect + raise auth:getState timeout#596
fix: [ENG-2548] activate byterover provider on connect + raise auth:getState timeout#596bao-byterover merged 4 commits intomainfrom
Conversation
PR #549 added a deferred-activation gate in provider:connect that skips withActiveProvider when the provider has no defaultModel and no existing activeModel. byterover hits both conditions on a fresh connect — the registry has no defaultModel, and the user has no `brv model switch` recovery path because byterover has no model fetcher and the model-switch command is hard-blocked. The result was "connected but never active": brv providers showed an empty active provider, brv curate failed with "no provider connected", and the TUI bounced users back to the setup view. Bypasses the gate by id for byterover. The registry stays free of defaultModel intentionally so the runtime fallback in agent-process.ts (activeModel ?? DEFAULT_LLM_MODEL) keeps owning model resolution — that way changing the constant rolls out a new default to every existing user without needing a per-user migration step. openai-compatible — the original target of the PR #549 gate — is unchanged: still defers, still recovers via model:setActive when the user picks a model.
|
Claude finished @RyanNg1403's task in 2m 30s —— View job PR Review: fix/ENG-2548Tasks
SummaryScore: 4 / 5 — Correct, minimal, well-tested fix for a real regression. Two minor style issues noted inline; no correctness or security concerns. What the fix does
Correctness
Tests
Inline commentsThree inline comments posted:
Risk assessmentThe PR description's risk section is honest and complete. The |
| // connected-but-never-active. Its model is resolved at runtime via | ||
| // DEFAULT_LLM_MODEL in agent-process.ts rather than persisted here, | ||
| // so future default changes roll out without a per-user migration. | ||
| const willHaveActiveModel = providerId === 'byterover' |
There was a problem hiding this comment.
suggestion (non-blocking): 'byterover' now appears four times in this file (lines 259, 309, 344, 397). The pattern pre-dates this PR, but the PR description already flags that a second "no-model-fetcher" provider would need an authType field — at that point you'll be doing a search-replace across all four sites. Worth extracting a local constant, or a ProviderDefinition flag (noModelFetcher: true) to avoid the drift. The fix is correct and consistent with existing style; this is just a note for the follow-up refactor.
Trim multi-line block comments on the two new tests to one short line
each, per CLAUDE.md "one short line max" guidance. The PR description
already carries the full rationale; the test names are self-documenting;
the inline comments now flag only the non-obvious why ("byterover
bypasses the gate", "intentional runtime resolution").
The third review note (extract 'byterover' string literal in
provider-handler.ts to a constant or ProviderDefinition flag) is
non-blocking per the reviewer and aligns with the follow-up refactor
already documented in the PR's risks section. Deferring to that ticket
to keep this PR scoped to the regression fix.
|
Claude finished @RyanNg1403's task in 1m 7s —— View job PR Review: fix/ENG-2548 (re-review after latest commit)Tasks
SummaryScore: 5 / 5 — Previous review feedback was fully addressed. The fix is correct, minimal, and well-tested. No remaining concerns. What changed in the latest commit (
|
…und-trip The TUI's AuthInitializer fires auth:getState on every startup with a 500ms transport timeout. The daemon-side handler does an HTTPS round-trip to /user/me that takes 400-3100ms in practice (measured p99 across three home networks), so every TUI startup running on a normal-or-slow link timed out and the auth store fell back to its default isAuthorized: false. The bouncer was always firing — but useAppViewMode only routes to the provider picker on isAuthorized: false for byterover-active users. For non-byterover providers, view-mode routes to home regardless of isAuthorized, so users never saw the consequence. ENG-2548 making byterover reachable from CLI surfaced this latent bug. Bumps the per-attempt timeout to 4000ms — covers the worst observed sample (3085ms) with ~1.3x headroom plus margin for slower mobile, international, and VPN connections. Also drops React Query retries from 5 to 1: with the larger per-attempt budget, retrying 5x on a genuinely- offline machine would block startup for ~22s. One retry covers transient blips while keeping worst-case offline-wait under 9s. Backend revocation is still caught at startup (the network call still fires synchronously in setupGetState, just with enough time to complete).
|
Claude finished @RyanNg1403's task in 1m 42s —— View job PR Review: fix/ENG-2548 (re-review after latest commit
|
|
Claude finished @RyanNg1403's task in 2m 8s —— View job PR Review: fix/ENG-2548 (re-review after merge
|
| # | Scenario | Status |
|---|---|---|
| 1 | byterover → setAsActive: true, activeModel: undefined |
✓ new |
| 2 | byterover registry has no defaultModel |
✓ new |
| 3 | openai-compatible, no existing model → setAsActive: false |
✓ existing |
| 4 | openai-compatible, existing active model → setAsActive: true |
✓ existing |
| 5 | openrouter (registry defaultModel present) → setAsActive: true |
✓ existing |
| 6 | byterover auth gate (connect) → 'ByteRover account' + 'brv login --api-key' |
✓ updated |
| 7 | byterover auth gate (set-active) → same keywords | ✓ updated |
Verdict: Ready to merge. Apply the (3s) → (4s) comment fix before or after merge — it's one character and non-blocking.
| retry: 5, | ||
| retryDelay: (attemptIndex) => Math.min(500 * 2 ** attemptIndex, 2000), | ||
| // One retry covers transient blips; the per-attempt timeout is now generous | ||
| // (3s) so we don't need 5+ retries that would block startup for ~17s when offline. |
There was a problem hiding this comment.
issue (non-blocking): Comment says (3s) but the actual timeout in get-auth-state.ts:17 is 4000ms = 4s. Flagged in the previous two reviews; still present after the merge.
| // (3s) so we don't need 5+ retries that would block startup for ~17s when offline. | |
| // One retry covers transient blips; the per-attempt timeout is now generous | |
| // (4s) so we don't need 5+ retries that would block startup for ~17s when offline. |
Summary
This PR fixes two related bugs that surface when ByteRover is the active provider. Both were exposed by the regression report on ENG-2548:
brv providers connect byteroverreports success but never setsactiveProvider. Downstream commands likebrv curatefail with "No provider connected".provider-flow.tsx(which guards itself with an explicitsetActivecall). Once Feat/init #1 is fixed and the CLI can leave a user in byterover-active state, opening the TUI exposes Feat/refactor structure #2.Why both in one PR: same Linear ticket (ENG-2548), same user-visible failure mode ("byterover doesn't work"), same code domain (auth + provider activation). Bundling avoids landing #1 and immediately needing a follow-up PR to make the user experience actually work end-to-end.
Type of change
Scope (select all touched areas)
Linked issues
Root cause (bug fixes only)
Bug 1 — activation regression
PR #549 (
5ce16556) addedsetAsActive: willHaveActiveModelto theprovider:connecthandler.willHaveActiveModelevaluates toBoolean(provider?.defaultModel) || Boolean(getActiveModel(providerId)). For byterover both halves are false on a fresh connect — the registry has nodefaultModelfor byterover, and a fresh user has no storedactiveModel. So the connect handler skipswithActiveProvider('byterover')andactiveProviderstays empty.Why this was not caught earlier: PR #549's tests covered openai-compatible (the intended target) and openrouter (representative of providers WITH
defaultModel), but not byterover specifically. byterover is the only provider in the registry without adefaultModel, sitting in the gap between the two test cases. The TUI'sprovider-flow.tsx:198-199explicitly callssetActiveafterconnect, so the TUI was self-healing — only the CLI path surfaced the regression, and it surfaced as a downstreambrv curateerror rather than at the connect command itself.Bug 2 — auth:getState timeout
The TUI's
AuthInitializercallsauth:getStateon every startup with a hardcoded 500ms transport timeout (get-auth-state.ts:14). The daemon'ssetupGetStatedoes an HTTPS round-trip to/user/mewhich measured 400-3100ms across three networks (p99 = 3085ms in the slowest sample). On any normal-or-slow connection, every retry exceeded 500ms, the request stayeddata: undefined, and the auth store fell back to its defaultisAuthorized: false.The bouncer was always firing — but
useAppViewMode:49only routes to'config-provider'onbyterover-active + !isAuthorized. For Anthropic/OpenAI/etc., view-mode routes to'ready'regardless ofisAuthorized, so users never noticed. Bug 1's fix made byterover-active reachable from CLI, surfacing this latent timing issue.Why this was not caught earlier: silent failure — no error message, no log entry. The 500ms value was originally chosen for "fail fast if handler not ready," but the handler does a network call inside, so 500ms was unrealistically tight. Reproduces probabilistically in ~5% of TUI starts on a fast network, ~100% on a slow one — easy to miss in any single test session.
Test plan
Coverage added:
Test files
test/unit/infra/transport/handlers/provider-handler.test.ts—should activate byterover on connect without persisting an activeModelpins bothsetAsActive: trueandactiveModel: undefinedon the connect call args.test/unit/core/domain/entities/provider-registry.test.ts—should not have a defaultModel for byteroverpins the intentional absence (runtime resolution viaDEFAULT_LLM_MODEL).tui/features/auth/api/get-auth-state.ts(TUI api files are bare config). Verified empirically with a node script that emitsauth:getStateat the new 4000ms timeout and confirms full payload returns within budget.Key scenarios covered
setAsActive: trueANDactiveModel: undefined(new test)setAsActive: false(existing test, kept green)setAsActive: true(existing test, kept green)defaultModel) →setAsActive: true(existing test, kept green)defaultModel(new test) — protects the runtime-fallback / migration propertyUser-visible changes
brv providers connect byterovernow actually activates byterover.brv providersshows ByteRover as active.brv curateworks end-to-end after a fresh connect (was previously failing with "No provider connected").providers.jsonshape: byterover's entry has noactiveModelfield. The runtime resolves the model viaDEFAULT_LLM_MODELfromconstants.tson every invocation, so changing that constant rolls out a new default to all existing users without per-user migration code./user/meround-trip; happy-path startup is unaffected (single attempt typically completes in 400-1100ms).Evidence
Unit tests: 7240 passing, 0 failing. Targeted suite (provider-handler + provider-registry): 86 passing.
Network latency measurements for
/user/me, three networks:Bug 1 interactive verification (8 scenarios, all pass):
brv providers connect byterover(no follow-up)Provider: ByteRover (byterover)set,activeModelabsentbrv model list --provider byteroverbrv model switch X --provider byteroverbrv curate "..." --format jsonwith byterover activesuccess: true, status: completed(originally failing)success: true, status: completedBug 2 interactive verification: confirmed
auth:getStatecompletes in 1141ms with full payload at the new 4000ms client timeout (was timing out at 500ms). Confirmed by reverting to43575800(pre-timeout-fix) and reproducing the bouncer at the expected probabilistic rate.Checklist
npm test) — 7240 passingnpm run lintfails on a pre-existingbyterover-packagessubmodule issue unrelated to this PR)npm run typecheck)npm run build)mainRisks and mitigations
providerId === 'byterover'bypass would not cover it.authTypefield onProviderDefinition). Out of scope for this fix.DEFAULT_LLM_MODELimmediately affects every existing byterover user on next agent invocation, with no opt-in./user/meconsistently exceeds 4s.setupGetStatedaemon logic is unchanged — backend revocation is still caught at startup, login/logout/refresh handlers are untouched.