core: extract transport-agnostic reconnect policy; adopt in vscode + dashboard terminals + tunnel client#972
Merged
Merged
Conversation
backoffDelayMs (shared curve), BackoffController (counter + status + give-up), and classifyUpgradeError (session-unknown rule). Bootstraps a vitest suite in core (the package had none) with 17 unit tests.
Web terminal uses BackoffController (max-attempts 50 -> 6, unified with VSCode). Refresh affordance now performs a true reconnect from the given-up state. classifyUpgradeError wired on CloseEvent.code (behavior- neutral today; browsers see 1006). Tests updated for the 6-attempt contract + the recovery affordance.
Same signature/behavior (jitter, 60s cap, 5-min floor after 10); the duplicated curve is gone. Host-side auth/rate-limit circuit breaker unchanged.
The web terminal cannot see Tower's upgrade-stage 404 (browsers get 1006), so classifyUpgradeError on CloseEvent.code was inert and its HTTP-4xx numeric contract wouldn't match a future WS close code anyway. Keep web on blind retry; the session-unknown fast-path lands in #971.
amrmelsayed
added a commit
that referenced
this pull request
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PIR Review: Extract transport-agnostic reconnect policy into
@cluesmith/codev-coreFixes #961
Summary
The exponential-backoff reconnect curve (
min(1000·2^attempt, cap)) was hand-rolled at four call sites across three packages, with divergent tuning (max-attempts 6 vs 50) and divergent session-unknown handling. This PR extracts a pureReconnectPolicymodule into@cluesmith/codev-core—backoffDelayMs(the shared curve),BackoffController(counter + status + give-up), andclassifyUpgradeError(the session-unknown rule) — and adopts it at all four sites. Three adoptions are behavior-preserving refactors (VSCode terminal, SSE health-check, tunnel); the web terminal is the one place with deliberate behavior changes (give-up unified 50→6, plus a real recovery affordance), both resolved at the plan gate.Files Changed
packages/core/src/reconnect-policy.ts(+219 / -0) — new pure modulepackages/core/src/__tests__/reconnect-policy.test.ts(+148 / -0) — new, 17 unit testspackages/core/package.json(+7 / -1) —./reconnect-policyexport,vitestdevDep,testscriptpackages/core/vitest.config.ts(+8 / -0) — new (core had no test runner)packages/core/tsconfig.json(+2 / -1) — exclude tests from the build (codev's convention)packages/vscode/src/terminal-adapter.ts(+13 / -23) — adoptBackoffController+classifyUpgradeErrorpackages/vscode/src/connection-manager.ts(+5 / -1) — SSE adoptsbackoffDelayMspackages/dashboard/src/components/Terminal.tsx(+27 / -19) — adopt controller; 50→6; recovery affordancepackages/dashboard/__tests__/Terminal.reconnect.test.tsx(+38 / -6) — updated for the 6-attempt contract + recovery testpackages/codev/src/agent-farm/lib/tunnel-client.ts(+13 / -3) —calculateBackoffreimplemented over the shared curve.github/workflows/test.yml(+4 / -0) — run core unit tests in CIcodev/resources/arch.md(+2 / -2) —ReconnectPolicyadded as a shared core primitivecodev/resources/lessons-learned.md(+2 / -0) — two durable lessonscodev/plans/961-*.md,codev/state/pir-961_thread.md— plan + builder threadCommits
89a8a7e9[PIR core: extract transport-agnostic reconnect policy; adopt in vscode + dashboard terminals + tunnel client #961] Add transport-agnostic ReconnectPolicy to codev-coref9fb2961[PIR core: extract transport-agnostic reconnect policy; adopt in vscode + dashboard terminals + tunnel client #961] Adopt ReconnectPolicy in vscode terminal + SSE clients3118a52d[PIR core: extract transport-agnostic reconnect policy; adopt in vscode + dashboard terminals + tunnel client #961] Adopt ReconnectPolicy in web terminal; unify give-up at 6febd1e56[PIR core: extract transport-agnostic reconnect policy; adopt in vscode + dashboard terminals + tunnel client #961] Reimplement tunnel calculateBackoff over shared curve8ac242a0[PIR core: extract transport-agnostic reconnect policy; adopt in vscode + dashboard terminals + tunnel client #961] Run core unit tests in CI; update thread07414aeb[PIR core: extract transport-agnostic reconnect policy; adopt in vscode + dashboard terminals + tunnel client #961] Align core tsconfig exclude with codev's convention1e49da4a[PIR core: extract transport-agnostic reconnect policy; adopt in vscode + dashboard terminals + tunnel client #961] Drop dormant web-terminal classifier seam (defer to web terminal: adopt session-unknown fast-path once Tower emits a browser-visible WS close code #971)Test Results
pnpm build(root: core + codev incl. dashboard): ✓ passpnpm test(codev suite): ✓ 3210 passed, 13 skippedpackages/core: ✓ 17 new testspackages/vscodetest:unit: ✓ 222 (theterminal-adapterclose-loop drives the realBackoffControllerand still asserts[1s,2s,4s,8s,16s,30s]→ give-up)packages/dashboardreconnect suite: ✓ 12 (incl. a new recovery-affordance test)dev-approvalgate): ran the worktree; exercised VSCode + dashboard terminals against a forced give-up.Architecture Updates
Updated
codev/resources/arch.md: addedReconnectPolicyalongsideEscapeBufferin the two places that enumerate@cluesmith/codev-core's shared primitives (the subsystem guide and the package table). This PR follows — rather than introduces — the existing "pure cross-host logic lives in core" pattern (EscapeBufferis the cited precedent), so no new module-boundary documentation was needed beyond naming the new primitive.Lessons Learned Updates
Added two entries to
codev/resources/lessons-learned.md(Architecture section):WebSocketcan't see a failed upgrade's HTTP status (onlyCloseEvent 1006), so the Node-wssession-unknown fast-path can't be ported to the dashboard without a server-side close-code change — and a classifier wired againstCloseEvent.codemust not assume HTTP-4xx semantics.attemptarg and layer stateful give-up logic on top only where needed — a single shared stateful counter would silently re-tune the differently-ordered sites.Things to Look At During PR Review
BackoffControllergive-up sequencing (reconnect-policy.ts): the off-by-one is deliberate —nextDelayMs()usesattempt - 1so the first retry is the base delay, andrecordFailure()returnsgive-uponly once the budget is exhausted. The contract is pinned byreconnect-policy.test.ts("reproduces the terminal-adapter give-up sequence") and by the real-controller vscode close-loop test.tunnel-client.ts):calculateBackoffkeeps its exact signature/behavior (jitter, 60s cap, 5-min floor after 10). The guard is the unchangedtunnel-client.test.ts— confirm it still passes (it does).Terminal.tsx): dropping the attempt budget makes give-up actually reachable, so the refresh button was enriched to do a true reconnect from the dead-socket state (it previously SIGWINCH'd a live socket only). If you review one without the other, the change looks like a regression.How to Test Locally
pir-961→ View Diffafx dev pir-961Flaky Tests
None skipped. One pre-existing, deterministic failure is unrelated to this change and left untouched per protocol:
packages/dashboard/__tests__/scrollController.test.ts > ScrollController > onScroll handler > warns on unexpected scroll-to-top— it fails on a clean tree too (verified by stashing this PR's only dashboard change; the test importsScrollController, neverTerminal). It must be fixed or quarantined before the dashboard unit suite can gate CI — captured in #967.