feat(act): NonRetryableError + app.unblock recovery primitive#736
Merged
Conversation
…(ACT-604) Adds NonRetryableError class to core; drain finalizer recognizes it and forces immediate block when blockOnError is true, regardless of lease.retry. Closes the gap documented in ACT-602 between the helper's "this is permanent" knowledge and the drain pipeline's retry-only classifier. Core changes (libs/act): - new NonRetryableError class exported from @rotorsoft/act and @rotorsoft/act/types. Errors registry gains ERR_NON_RETRYABLE. - finalize() in internal/reactions.ts gains one branch: block = blockOnError && (nonRetryable || retry >= maxRetries). operator's blockOnError: false still wins — non-retryable does not override "retry forever." - 8 integration tests covering: class shape (name, cause, instanceof), first-attempt block with default options, no-block when blockOnError:false, plain Error still consumes retry budget, immediate block ignores backoff, batch handler path. Helper changes (libs/act-http) — breaking at 0.1.0: - WebhookError split into two classes. WebhookError extends Error for retryable cases (5xx, network, timeout); NonRetryableWebhookError extends NonRetryableError for 4xx. the "retryable: boolean" field is removed — the class itself is the signal. - webhook() throws the appropriate subclass based on status. drain finalizer auto-blocks on 4xx via the inherited NonRetryableError marker. original ACT-602 acceptance criterion (4xx blocks on first attempt) now holds. - webhook tests updated for the new class shape (instanceof checks instead of boolean field reads). Docs: - docs/docs/concepts/error-handling.md gains a "Non-retryable errors" section after the webhook one. the webhook section now describes the two-class split; the new section covers NonRetryableError as the general primitive with the validation-error example. - libs/act-http/README.md "Behavior" and "Retry & block semantics" tables updated to reflect the class-based signal. - book/act-602-act-http.md "limitation" section rewritten to point at ACT-604 as the resolution. - book/act-604-non-retryable.md — new essay covering the design decisions: class vs. flag, the blockOnError-respect asymmetry, why the pattern generalizes to user handlers and other integration helpers. Total test count: 1521 passed (8 new). 100% statements / 100% branches / 100% functions / 100% lines. BREAKING CHANGE: @rotorsoft/act-http WebhookError no longer carries a 'retryable' field. Callers checking err.retryable should switch to 'err instanceof NonRetryableWebhookError' (or 'instanceof NonRetryableError' for the framework-general check). The package is at 0.1.0 with no external consumers. Closes ACT-604. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the gap where the only way to clear a blocked stream's flag was
app.reset() — a rebuild-from-zero primitive suited for projection
rebuilds, wrong for "I fixed the bug, please retry from where you
stopped." The gap predates ACT-604 but becomes acute with non-retryable
signaling: streams now block on the first permanent failure, so the
recovery path can't require a full event replay.
Adds Store.unblock(streams) to the port contract (additive — no
breaking change for adapters that don't implement it yet; capability-
gated in the TCK). Implemented across all three in-tree adapters:
- InMemoryStore: new InMemoryStream.unblock() that flips _blocked and
returns whether the stream was actually flipped.
- PostgresStore: single UPDATE with WHERE blocked = true so rowCount
reflects only streams that flipped.
- SqliteStore: transactional UPDATE per stream, mirrors the PG semantics.
All three set retry = -1 (matching the InMemoryStore convention) so the
first post-unblock claim returns retry = 0 ("first attempt"). Storing 0
would make claim's post-bump return 1, mis-reporting the post-recovery
attempt as a continuation of the failed sequence.
Adds Act.unblock(streams) that wraps store().unblock() and arms the
orchestrator's drain flag so a settled app picks up the now-free streams
on the next cycle. Symmetric with the existing Act.reset() wrapper.
TCK: new "unblock" describe block with four cases — happy path
(blocked → unblock → claim resumes at preserved watermark, retry = 0),
no-op on unblocked stream, no-op on unknown/empty, mixed input counts
only the actually-blocked streams.
Integration test in non-retryable.spec.ts exercises the full
NonRetryableError → block → unblock → reprocess flow: handler throws
permanent error, drain blocks immediately, app.unblock(streams) clears
the flag, next drain succeeds at the SAME event (not replayed from
zero).
Docs:
- docs/concepts/error-handling.md gains an "unblock" subsection
contrasting it with reset.
- docs/architecture/concurrency-model.md's "block" exit description
updated to mention NonRetryableError and the unblock/reset choice.
- docs/guides/production-checklist.md changes the recovery instruction
from "Unblock with app.reset" to "recover with app.unblock; reset is
for rebuilds."
- libs/act-http/README.md adds a "Recovering a blocked stream"
subsection — important because 4xx blocks are now the common case
and reset would re-fire all historical webhooks.
- book/act-604-non-retryable.md gains a section on the recovery
primitive, including the retry = -1 convention rationale.
Tests: 1556 passed (3 new unblock tests in TCK, 2 new in non-retryable
spec). Coverage 99.95% branches globally — drops from 100% are in
defensive error paths (rowCount ?? 0, rollback) that mirror the
existing untested paths in reset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….blocked_streams
API audit on the back of ACT-604: only three Store-port methods select
streams without per-row data (reset, unblock, prioritize), but reset and
unblock were name-array-only while prioritize was filter-only. Aligns
them so all three accept the same filter shape, and surfaces the bulk
recovery use case that came up around poison-message storms (unblock
every blocked stream in a family in one call).
Signature changes:
- Store.reset(input: string[] | StreamFilter)
- Store.unblock(input: string[] | StreamFilter)
- Store.prioritize(filter: StreamFilter) — type rename only
- Act.reset / Act.unblock follow
New type `StreamFilter` is the canonical name; `PrioritizeFilter` stays
as a non-breaking alias. Identical shape — `Pick<QueryStreams, "stream"
| "stream_exact" | "source" | "source_exact" | "blocked">`.
Filter semantics:
- reset(filter): applies the filter as-is. `reset({ blocked: true })`
rebuilds only blocked streams; `reset({ stream: "^proj-" })` rebuilds
a projection family. Empty filter matches every registered stream
(documented footgun, no runtime block — operators use it sparingly).
- unblock(filter): always forces `blocked = true` regardless of what
the caller passes. There is no use case for "unblock unblocked
streams," so the framework removes that footgun at the boundary. An
explicit `blocked: false` matches nothing.
Adapter implementations:
- InMemoryStore: extracted _filterPredicate helper; reused across
reset, unblock, and prioritize.
- PostgresStore: extracted _filterClause helper that returns a WHERE
fragment + parameter values. UPDATE statements compose it with their
fixed set clauses; reset/unblock/prioritize all reuse it.
- SqliteStore: same shape, libSQL-positional placeholders.
New Act method:
- app.blocked_streams({ after?, limit? }): convenience wrapper around
store().query_streams(cb, { blocked: true, ... }). Returns an array
of StreamPosition for the discover → unblock workflow.
TCK additions:
- "unblock" describe block gains three filter cases (stream-pattern
match, empty-filter family scope, explicit blocked:false matches
nothing).
- New "reset filter form" describe block (pattern match preserves
unmatched watermarks; blocked-only filter restricts the rebuild
scope).
Integration tests in non-retryable.spec.ts add:
- app.unblock(filter) for bulk recovery across a family.
- app.blocked_streams() discovers blocked streams and confirms the
list goes empty after recovery.
Docs:
- docs/concepts/error-handling.md gains examples of all three forms
plus a discovery-first workflow snippet.
- docs/guides/production-checklist.md updated to mention
blocked_streams() as the discovery primitive.
- libs/act-http/README.md "Recovering a blocked stream" shows the
filter form for webhook families.
- book/act-604-non-retryable.md gains "Names or filter" and
"Discovering what's blocked" sections covering the design call.
Tests: 1573 passing (up from 1556). Coverage 99.87% branches globally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Missed in the prior commit due to a stale read; production-checklist.md still referenced 'Unblock with app.reset' which is now wrong (reset rebuilds from zero; unblock is the resume primitive). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…streams section
Three loose-end doc updates after the API audit:
1. CLAUDE.md gains two safety-critical one-liners:
- "Blocked-stream recovery — unblock resumes, reset rebuilds":
covers the load-bearing distinction (don't use reset to clear a
blocked webhook — it'd re-fire every historical event), the
string[] | StreamFilter shape, and pointers at app.blocked_streams
for discovery.
- "Non-retryable errors signal permanent failure": NonRetryableError
as the handler-side block signal, the blockOnError: false respect
asymmetry, and the act-http/webhook NonRetryableWebhookError.
2. docs/architecture/extension-points.md updates the Store interface
reference to include unblock and reflect the string[] | StreamFilter
shape for reset/unblock/prioritize. Adds a one-paragraph note on the
shared filter type and the reset-vs-unblock semantic split.
3. docs/concepts/error-handling.md "Blocked Streams" section rewritten
to (a) describe both paths streams can block (maxRetries exhausted
*or* NonRetryableError on first attempt), (b) point at the new
unblock / reset / blocked_streams recovery surface with anchor
links, instead of the stale "they need an explicit app.reset() (or
external unblock)" wording.
No code changes, no test changes. All existing tests still pass
(1573); lint clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ctions-to-database Three remaining docs referenced app.reset / app.unblock without mentioning that they also accept a StreamFilter. Adds the filter option callout in each: - architecture/concurrency-model.md "block" exit: shows both forms with the bulk-recovery example and the post-incident "unblock everything blocked" sweep. - concepts/event-sourcing.md "Projection Rebuild": new paragraph describing the StreamFilter shape (shared with unblock and prioritize) and a forward-link to error-handling.md for the rebuild-vs-recovery distinction. - guides/projections-to-database.md "Batched replay": multi-projection family-rebuild example via the filter form. Code examples in each doc keep the array form as the primary illustration — concrete one-name calls read cleaner than filters in a quickstart context. The prose around them now documents the broader shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the coverage gaps surfaced after the filter-form additions: - libs/act-pg: three new fault-injection tests in store.error.spec.ts cover the 'rowCount ?? 0' defensive branch on reset(filter), unblock(array), unblock(filter). Mirrors the existing prioritize test pattern (vi.spyOn pg.Pool.prototype.query → null rowCount). - libs/act-sqlite: two new rollback-path tests in store.error.spec.ts cover the transaction error handler on unblock (both array and filter forms) via the existing mockClientFailOn fixture. - libs/act-tck: the 'unblock preserves watermark' test was asserting that 's' wasn't in a subsequent claim() result. When the fixture state left claim() empty (no other claimable streams), the find() callback never ran and registered as uncovered. Switched to a query_streams() check on the blocked flag — deterministic, doesn't depend on what else the fixture has lying around. Coverage: 100% statements / 100% branches / 100% functions / 100% lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 16, 2026
|
🎉 This PR is included in version @rotorsoft/act-v0.43.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version @rotorsoft/act-http-v0.2.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version @rotorsoft/act-pg-v0.23.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version @rotorsoft/act-sqlite-v0.7.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version @rotorsoft/act-tck-v0.2.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #735.
Implements ACT-604 plus the operational primitives that came up during implementation and review: the
unblockrecovery method, thestring[] | StreamFilterunion forreset/unblock, and theapp.blocked_streams()discovery wrapper.Three coupled additions, one coherent recovery loop:
NonRetryableError— a handler-signaled "this is permanent, block now" class. The drain finalizer recognizeserror instanceof NonRetryableErrorand forcesblock = blockOnErrorregardless oflease.retry. Closes the documented gap in ACT-602 between the webhook helper's "this is permanent" knowledge and the drain pipeline's retry-only classifier.app.unblock(input)/Store.unblock(input)— the operational recovery path. Clears the blocked flag (plus retry, error, lease) without touching theatwatermark. The stream resumes from where it stopped, not from event 0. Required because ACT-604 makes streams block on the first permanent failure — falling back toapp.reset()would mean re-firing every historical event for a webhook stream blocked on one bad payload.app.blocked_streams({ after?, limit? })— convenience wrapper aroundstore().query_streams(cb, { blocked: true }). Closes the discovery half of the loop ("show me what's broken") before recovery.Both
resetandunblockaccept either an explicitstring[]of stream names or aStreamFilterfor bulk operations (the same shapeprioritizealready used, now canonically namedStreamFilterwithPrioritizeFilterretained as an alias). One filter shape across three Store methods; one_filterPredicate/_filterClausehelper per adapter.Why
unblockand the filter union ended up in this PRStore.block()and the framework documented blocked streams as something you "manually unblock," but the only path to clear the flag wasapp.reset()— a rebuild-from-zero primitive suited for projection rebuilds, wrong for poison-message recovery. The gap was hidden by patience while streams only blocked after exhaustingmaxRetries. Once non-retryable errors block on first attempt, the gap became load-bearing — recovery from a one-off validation failure can't require replaying the entire stream.The filter union came out of an API audit: three Store-port methods select streams without per-row data (
reset,unblock,prioritize), but onlyprioritizeused a filter. After the audit, they share a single shape.Core changes (
libs/act)NonRetryableErrorexported from@rotorsoft/actand@rotorsoft/act/types.Errorsregistry gainsERR_NON_RETRYABLE.finalize()ininternal/reactions.tsgains one branch:block = blockOnError && (nonRetryable || retry >= maxRetries). The operator'sblockOnError: falsestill wins — non-retryable doesn't override "retry forever."Store.unblock(input: string[] | StreamFilter)added to the port contract. Atomic single-statement UPDATE per adapter, always restricts toblocked = true.retry = -1(matching the InMemoryStore convention) so claim's post-bump returnsretry = 0for the first post-unblock attempt.Store.reset(input: string[] | StreamFilter)widened.StreamFilteradded to the type surface;PrioritizeFilteris now an alias (no breaking change).Act.unblock(input)wrapsstore().unblock()and arms the drain flag so settled apps pick up the now-free streams on the next cycle. Symmetric withAct.reset().Act.blocked_streams({ after?, limit? })for discovery.Adapter changes
InMemoryStore— extracted_filterPredicatehelper reused acrossreset,unblock,prioritize.PostgresStore(@rotorsoft/act-pg) — extracted_filterClausehelper returning a WHERE fragment + parameter values; reused across the same three methods.SqliteStore(@rotorsoft/act-sqlite) — same pattern, libSQL-positional placeholders.Helper changes (
libs/act-http) — breaking at 0.1.0WebhookErrorsplit.WebhookError extends Errorfor retryable cases (5xx, network, timeout);NonRetryableWebhookError extends NonRetryableErrorfor 4xx. Theretryable: booleanfield is removed — the class itself is the signal.webhook()throws the appropriate subclass based on status. Drain finalizer auto-blocks on 4xx via the inheritedNonRetryableErrormarker.Pre-1.0 package; no external consumers.
TCK additions
unblockdescribe block with happy-path, no-op, mixed-input, filter-form (stream pattern, empty filter, explicitblocked: false) cases.reset filter formdescribe block with pattern-match and blocked-only filter cases.Docs (final pass)
CLAUDE.md— two new safety-critical one-liners:unblockvsresetdistinction andNonRetryableErrorsemantics with theblockOnError: falseasymmetry.docs/concepts/error-handling.md— "Non-retryable errors", "Recovering a blocked stream —app.unblock" (array + filter forms + comparison table), "Discovering blocked streams —app.blocked_streams()" sections. "Blocked Streams" section rewritten to cover both block paths.docs/architecture/extension-points.md— Store interface listing updated to twelve methods; the sharedStreamFiltertype and reset-vs-unblock split called out.docs/architecture/concurrency-model.md— "block" exit description mentionsNonRetryableErrorand both forms (array + filter) of recovery.docs/concepts/event-sourcing.md— projection rebuild section mentions the filter form and forward-links tounblockfor the rebuild-vs-recovery distinction.docs/guides/projections-to-database.md— bulk family-rebuild example via the filter form.docs/guides/production-checklist.md—blocked_streams→unblock(filter)workflow as the recovery prescription.libs/act-http/README.md— Behavior + Retry/block tables rewritten around the two-class split; "Recovering a blocked stream" with the family-unblock filter example andblocked_streamsdiscovery snippet.book/act-602-act-http.md— "4xx limitation" rewritten to point at ACT-604 as the resolution.book/act-604-non-retryable.md— new essay (~200 lines): class-vs-flag design,blockOnErrorasymmetry,retry = -1storage convention, the unblock recovery primitive, the names-or-filter API audit, the three-primitive recovery loop.Test coverage
1583 tests passing total (up from 1513).
NonRetryableErrorinlibs/act/test/non-retryable.spec.ts(class shape, drain integration, unblock recovery flow including filter-form bulk recovery andblocked_streamsdiscovery).unblockandresetfilter forms (run against InMemory, PG, SQLite).rowCount ?? 0branches.unblocktransaction error handling.Coverage: 100% statements / 100% branches / 100% functions / 100% lines.
Stability charter impact
All additive to charter-covered surface:
NonRetryableError— new exported class on@rotorsoft/act.Store.unblock— new method on theStoreinterface; capability-gated in the TCK so existing adapters keep passing.StreamFilter— new exported type;PrioritizeFilterretained as alias.Act.unblock,Act.blocked_streams— new public methods.Store.reset/Act.reset/Store.prioritize/Act.prioritize— signature widening (string[]→string[] | StreamFilter,PrioritizeFilter→StreamFilter), backwards-compatible at the call site.No removals, no renames, no narrowed types. The
WebhookErrorchange inact-httpis breaking but the package is at0.1.0(pre-1.0) and one release old.Test plan
pnpm test— 1583 passed, 100% coverage on every metricpnpm typecheck— cleanpnpm lint— clean (only pre-existing warnings)unblock+reset filter formblocks pass against InMemory, act-pg, act-sqliteNonRetryableError→ block →unblock(filter)→ reprocess flowrowCount ?? 0defensive branches and SQLite rollback paths@rotorsoft/act@X.Y.0(minor, additive) and@rotorsoft/act-http@0.2.0(minor with breaking change, but pre-1.0 conventional-commits → minor bump per.releaserc)Follow-ups (parked)
Retry-Afterheader parsing inwebhook(parked in ACT-604 open questions).shouldBlock(error): booleanpredicate (parked).NonRetryableErrorfrom action handlers — different code path, separate design.🤖 Generated with Claude Code