feat(act): per-reaction retry backoff (ACT-601)#724
Merged
Conversation
Adds a `backoff` option on reaction handlers that paces inter-attempt timing — fixed, linear, or exponential with optional jitter. Closes the last gap in drain's retry semantics: today the framework re-claims a failed stream on the next cycle (typically within ms), turning transient outages into exhausted retry budgets. The controller maintains the backoff window in process memory; deferred streams hold their existing lease via runDrainCycle's claim-but-skip path, so no Store contract change and no DB schema change. With N competing workers, retries escalate up to N× faster than configured — intentional: per-worker pacing speeds recovery on transient per-worker faults, and poison messages quarantine sooner. Wires wolfdesk's MessageAdded delivery reaction to exponential backoff with jitter, with a flaky-delivery stub to make the pacing observable in dev logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop unreachable `dispose()` method and the now-redundant `size === 0` guard inside `scheduleBackoffWake` (caller already enforces it). - Inline `gcExpiredBackoff` into the timer callback — separate method was only called from one place and made coverage harder to read. - Drop optional chain on `unref()` — Node's setTimeout always returns a Timeout with `unref()`; the optional chain registered as an uncovered branch. - Add a multi-stream test that puts two streams in the backoff map at different expiries, forcing the callback to iterate both entries and exercise the "delete expired / keep pending" branch. drain-cycle.ts now 100% on statements, lines, functions, and branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🎉 This PR is included in version @rotorsoft/act-v0.41.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #687.
Adds a
backoffoption on reaction handlers that paces inter-attempt timing —fixed,linear, orexponentialwith optional jitter. Closes the last gap in drain's retry semantics: today the framework re-claims a failed stream on the next cycle (typically within ms), turning transient outages into exhausted retry budgets.Design notes
DrainControllerowns aMap<stream, nextAttemptAt>in process memory. Deferred streams hold their existing lease via a new claim-but-skip path inrunDrainCycle— the lease itself is the per-worker pacing primitive. AsetTimeoutre-arms drain at the earliest pending expiry.retry_counton the watermark climbs across workers, soblockOnErrorfires up to N× sooner than configured. Transient per-worker faults recover faster, poison messages quarantine sooner. Documented inconcepts/error-handling.mdand a CLAUDE.md safety one-liner.leaseMillisas floor. Because the controller holds the lease during the backoff window,effective_backoff = max(configured, leaseMillis). Never shorter than configured.backoffcontinue to retry as soon as the lease expires.Why this isn't an "outbox" subsystem: drain already provides ordered, at-least-once delivery, retries, dead-lettering, and competing-consumer semantics via
SKIP LOCKED. The only missing primitive was inter-attempt timing, which is one knob, not a parallel system.Test plan
libs/act/test/backoff.spec.tscover all 4 strategies, jitter bounds, deferral behavior, success-clears-entry, block-clears-entry, and the no-backoff default@rotorsoft/actsuite passes (544 tests, no regressions)libs/act,libs/act-sqlite,libs/act-tck,packages/wolfdesk(1172 tests total)MessageAddedreaction wired to exponential+jitter with a flaky-delivery stub — observable in dev logsDocs
docs/docs/concepts/error-handling.md— new Backoff section with strategy table, per-worker semantics,leaseMillisfloorCLAUDE.md— safety-critical one-liner for per-worker pacingproject_book_backoff.mdbook notes for the error-handling and scaling chapters🤖 Generated with Claude Code