feat: [ENG-2464] enhance cold start-up daemon#546
Conversation
|
Claude finished @bao-byterover's task in 2m 45s —— View job PR Review: feat: [ENG-2464] enhance cold start-up daemonTasks
SummaryScore: 4 / 5 This is a well-diagnosed, minimal, and targeted fix for a real production regression. The root cause analysis in the PR description is thorough and accurate. The change is correct, the test directly validates the new invariant, and no API contracts are broken. What's correct
Issues1. Increased ECONNREFUSED window (non-blocking, worth tracking) 2. Verbose inline comment blocks (nitpick) 3. Test client not registered in outer What didn't change (confirmed)
|
|
Claude finished @bao-byterover's task in 3m 39s —— View job PR Review: feat: [ENG-2464] enhance cold start-up daemonTasks
SummaryScore: 3 / 5 The core fix remains technically correct and well-reasoned. The second commit ( What's correct
Issues (see inline comments)1. 2. 3. 4. 5. 6. Non-blocking observation (unchanged from previous review)Increased ECONNREFUSED window — What didn't change (confirmed)
|
|
Claude finished @bao-byterover's task in 4m 10s —— View job PR Review: feat: [ENG-2464] enhance cold start-up daemonTasks
SummaryScore: 2 / 5 The core fix remains technically correct and well-reasoned. The sync trigger is a merge-from-main commit ( New issue (introduced by this PR, not caught in previous passes)
Persistent issues (unresolved across all three passes — see inline suggestions)
What's still correct
|
|
Claude finished @bao-byterover's task in 1m 45s —— View job PR Review: feat: [ENG-2464] enhance cold start-up daemonTasks
SummaryScore: 5 / 5 This pass is clean. Every issue raised across the previous three review passes has been resolved in Previous issues — all resolved ✅
What's correct
|
Summary
brv status,brv locations,brv vc *,brv push,brv pull,brv login,brv hub *,brv providers list,brv worktree list,brv source list, etc.) take noticeably long to return. Warm invocations are fast.brv restart, or after the daemon idle-timeout is perceptibly slow — affects nearly every daemon-routed command. Has been in production since v2.0.0.withDaemonRetrydefaults insrc/oclif/lib/daemon-client.tsfromMAX_RETRIES=3×2 s(4 s budget) toMAX_RETRIES=10×1 s(9 s budget). With shorter delay, the cold-start ECONNREFUSED window (heartbeat fresh but port not open during handler-setup) recovers in ~1 s on the typical case instead of ~2 s, and the wider budget covers slow-OIDC cold starts that would previously fail.AuthHandleruntouched.setupFeatureHandlersuntouched. OIDC flow untouched. Token storage / auth / agent IPC / MCP transport untouched. Heartbeat ordering unchanged (intentionally — see "Risks" §1 below).Type of change
Scope (select all touched areas)
Linked issues
Root cause (bug fixes only, otherwise write
N/A)transportServer.start(port)opened the Socket.IO port early in daemon bootstrap (before handlers were registered), butonRequestwas guarded withif (!io) throw, forcing all handler registrations to happen AFTERstart(). The first CLI command of a cold session connected quickly, fired its business request, and hit a not-yet-registered listener — Socket.IO silently drops unhandled events. The client'srequestWithAcktimeout was the only thing stopping the request, followed by awithDaemonRetrybackoff and a retry. On the retry, handlers were registered, request succeeded immediately.feature-handlers.tswith blocking OIDC before handler registration — harmless at the time because no command routed through those handlers. ENG-1186 migrated ~10 local commands to daemon-routed. Either PR in isolation looked correct. Their intersection shipped in v2.0.0.Test plan
test/unit/infra/transport/socket-io-transport-server.test.ts— updated one test that asserted the now-wrong invariant (onRequestthrows beforestart()) to instead assert the new correct behavior (pre-start()registration stores the handler and applies it to the first connecting socket).start()onRequestno longer throws.start()is correctly applied to the first incoming socket afterstart()(via the existingio.on('connection')loop overrequestHandlersmap).onRequest,broadcast,broadcastTo,sendTo,getPort,isRunningstill pass.broadcast/broadcastTostill correctly throwTransportServerNotStartedErrorwhen called beforestart()(they genuinely needio).brv status,brv locations,brv vc status,brv vc branch,brv providers list,brv hub list,brv worktree list,brv source list.User-visible changes
Faster cold startup for every daemon-routed command. Warm invocations unchanged. No CLI flag, config, defaults, or output format changes. No breaking changes.
Evidence
The benchmark comparison was run against both the published
npm install -g byterover-cli@3.8.3(baseline) and this branch installed fromnpm pack(fix), on the same machine, alternating back-to-back. For each command, three cold runs were executed (brv restart+sleep 1before each) across all handler categories listed above. The fix consistently reduces cold-start latency across every tested command. See the ticket attachments / comments for the full run log.Checklist
npm test) — full suite greennpm run lint) — 0 errors on touched files (pre-existing submodule warning exists onmain, unrelated)npm run typecheck) — clean for both root and webuinpm run build) — cleanfeat: [ENG-2464] enhance cold start-up daemonmainRisks and mitigations
ECONNREFUSEDbetween the moment the daemon writes its heartbeat (sopollForDaemonconsiders it ready) and the momenttransport.start(port)actually opens the port. After this PR, that window spans the entire handler-setup phase includingsetupFeatureHandlers's OIDC discovery (typically <1 s, up to several seconds on slow networks).withDaemonRetry(src/oclif/lib/daemon-client.ts) catchesConnectionFailedErrorand sleepsDEFAULT_RETRY_DELAY_MSbetween attempts. This PR tunes those defaults from3 × 2 s(4 s budget) to10 × 1 s(9 s budget) — short enough to recover on the typical case in ~1 s, wide enough to cover slow OIDC. Verified in manual cold runs.heartbeatWriter.start()until aftertransport.start(port)would letpollForDaemonitself wait for port-readiness, removing the ECONNREFUSED window entirely. Rejected becausepollForDaemonhas a5 sDAEMON_READY_TIMEOUT_MS, and on a retry after that timeout,daemon-spawner.js'sgracefullyStopDaemonstep SIGTERMs the daemon for "stale heartbeat" — which would kill an in-progress slow-OIDC bootstrap and trigger a respawn loop. Keeping heartbeat early avoids that failure mode entirely. Properly fixing this inpollForDaemon(port-readiness probe + spawn-aware stale detection) belongs in@campfirein/brv-transport-client, not here.start()state (e.g.,transport.getPort(), which returnsundefinedbefore start) silently misbehaves.onRequestno longer requiresstart(), butbroadcast,broadcastTo, andgetPortstill require it (they genuinely needio). Tests assertingTransportServerNotStartedErrorfor those remain in place. The invariant is documented in a comment insocket-io-transport-server.tsexplaining whyonRequestis now pre-start-safe while the others are not.setupFeatureHandlersstill awaitsgetAuthConfig, still throws on network failure, still propagates up through the existingtry/catchinbrv-server.ts:mainwhich cleans up and exits. No new failure modes.GlobalInstanceManager.acquire()still runs first and is atomic — only one daemon holds the lock. If a second daemon is spawned during the bootstrap window, it sees the lock and exits withalready_running. Unchanged frommain.