feat(act): per-reaction retry backoff (ACT-601) by Rotorsoft · Pull Request #724 · Rotorsoft/act-root

Rotorsoft · 2026-05-14T21:51:46Z

Summary

Closes #687.

Adds a backoff option on reaction handlers that paces inter-attempt timing — fixed, linear, or exponential with optional jitter. Closes the last gap in drain's retry semantics: today the framework re-claims a failed stream on the next cycle (typically within ms), turning transient outages into exhausted retry budgets.

.on("OrderPlaced")
  .do(handler, {
    maxRetries: 5,
    backoff: { strategy: "exponential", baseMs: 200, maxMs: 30_000, jitter: true },
  })
  .to(resolver)

Design notes

No DB schema change, no Store port change. The DrainController owns a Map<stream, nextAttemptAt> in process memory. Deferred streams hold their existing lease via a new claim-but-skip path in runDrainCycle — the lease itself is the per-worker pacing primitive. A setTimeout re-arms drain at the earliest pending expiry.
Per-worker semantics, by design. With N competing workers, each paces only its own re-attempts; the shared retry_count on the watermark climbs across workers, so blockOnError fires up to N× sooner than configured. Transient per-worker faults recover faster, poison messages quarantine sooner. Documented in concepts/error-handling.md and a CLAUDE.md safety one-liner.
leaseMillis as floor. Because the controller holds the lease during the backoff window, effective_backoff = max(configured, leaseMillis). Never shorter than configured.
Default omitted = current behavior. Backwards-compatible by construction — existing reactions without backoff continue to retry as soon as the lease expires.

Why this isn't an "outbox" subsystem: drain already provides ordered, at-least-once delivery, retries, dead-lettering, and competing-consumer semantics via SKIP LOCKED. The only missing primitive was inter-attempt timing, which is one knob, not a parallel system.

Test plan

12 new tests in libs/act/test/backoff.spec.ts cover all 4 strategies, jitter bounds, deferral behavior, success-clears-entry, block-clears-entry, and the no-backoff default
Full @rotorsoft/act suite passes (544 tests, no regressions)
Broader suite passes — libs/act, libs/act-sqlite, libs/act-tck, packages/wolfdesk (1172 tests total)
Typecheck clean across the workspace
Biome lint clean
Wolfdesk MessageAdded reaction wired to exponential+jitter with a flaky-delivery stub — observable in dev logs

Docs

docs/docs/concepts/error-handling.md — new Backoff section with strategy table, per-worker semantics, leaseMillis floor
CLAUDE.md — safety-critical one-liner for per-worker pacing
Memory: project_book_backoff.md book notes for the error-handling and scaling chapters

🤖 Generated with Claude Code

Adds a `backoff` option on reaction handlers that paces inter-attempt timing — fixed, linear, or exponential with optional jitter. Closes the last gap in drain's retry semantics: today the framework re-claims a failed stream on the next cycle (typically within ms), turning transient outages into exhausted retry budgets. The controller maintains the backoff window in process memory; deferred streams hold their existing lease via runDrainCycle's claim-but-skip path, so no Store contract change and no DB schema change. With N competing workers, retries escalate up to N× faster than configured — intentional: per-worker pacing speeds recovery on transient per-worker faults, and poison messages quarantine sooner. Wires wolfdesk's MessageAdded delivery reaction to exponential backoff with jitter, with a flaky-delivery stub to make the pacing observable in dev logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Drop unreachable `dispose()` method and the now-redundant `size === 0` guard inside `scheduleBackoffWake` (caller already enforces it). - Inline `gcExpiredBackoff` into the timer callback — separate method was only called from one place and made coverage harder to read. - Drop optional chain on `unref()` — Node's setTimeout always returns a Timeout with `unref()`; the optional chain registered as an uncovered branch. - Add a multi-stream test that puts two streams in the backoff map at different expiries, forcing the callback to iterate both entries and exercise the "delete expired / keep pending" branch. drain-cycle.ts now 100% on statements, lines, functions, and branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-14T22:15:42Z

🎉 This PR is included in version @rotorsoft/act-v0.41.0 🎉

The release is available on:

@rotorsoft/act-v0.41.0
GitHub release

Your semantic-release bot 📦🚀

rotorsoft and others added 2 commits May 14, 2026 17:46

Rotorsoft merged commit 622e74c into master May 14, 2026
6 checks passed

github-actions Bot added the released label May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(act): per-reaction retry backoff (ACT-601)#724

feat(act): per-reaction retry backoff (ACT-601)#724
Rotorsoft merged 2 commits into
masterfrom
feat/act-601-backoff

Rotorsoft commented May 14, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Rotorsoft commented May 14, 2026

Summary

Design notes

Test plan

Docs

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant