Feat/durable workflows#589
Conversation
Step 1 of the durability roadmap. Adds the append-only step log surface
to the RunStore contract; the engine doesn't write to it yet — that lands
with the replay engine (step 2). Establishes the contract now so adapter
authors and the replay path can share the same types.
- types.ts: new StepKind, StepAttempt, StepRecord, LogConflictError. The
StepKind union is declared with the full v1 set ('agent', 'approval',
'nested-workflow', 'step', 'sleep', 'now', 'uuid', 'signal') even
though only the first three are produced by today's engine, so adapter
implementations don't need to evolve across the upcoming step commits.
- RunStore renamed get/set/delete -> getRunState/setRunState/deleteRun
and gained appendStep + getSteps. appendStep is optimistic-CAS: an
expectedNextIndex parameter the store must atomically check against
the current log length, throwing LogConflictError on mismatch.
- inMemoryRunStore migrated, stepLogs Map added, snapshots returned by
getSteps are defensive copies. Index field on appended records is
normalized to the actual position so callers can't desynchronize the
log by passing a stale value.
- Engine call sites renamed (no semantic change). Smoke test updated to
use the new method names.
- New tests/in-memory-store.test.ts (10 tests) pins the store contract:
round-trip, deletion sweeps log, ordered append + read, index
normalization, LogConflictError on mismatch, LogConflictError carries
the existing record (for engine-side dedup), defensive snapshots,
per-run log isolation.
Step 2 of the durability roadmap. Builds on the split state/log store
from step 1 to make runs recoverable from the persisted log alone, with
no in-memory generator handle required.
- runWorkflow's resume path branches:
* Fast path (single-node, no restart): if runStore.getLive(runId)
returns a LiveRun, drive it forward as before. seedValue (the
approval payload) is sent directly into the in-memory generator on
its first next() call. Behavior unchanged from prior implementation
for this case.
* Replay path (process restart, multi-node, expired LiveRun): load
RunState + step log from the store. Rebuild fresh state by calling
workflow.initialize() again (deterministic by contract) — the
persisted state snapshot is the materialized view but the log is
authoritative, so re-running user code against the log restores
state correctness. Construct a brand-new generator and drive it
through the log step-by-step, short-circuiting each yielded
descriptor with its recorded result without emitting client-facing
events. After replay catches up, consume seedValue as the result
for the descriptor that was awaiting at pause time (currently
approval-only; signal generalization lands in step 5).
- Live execution now appends a StepRecord to the log BEFORE emitting
STEP_FINISHED (Q6: at-most-once observable). Three step kinds today:
agent, nested-workflow, approval. Step kinds 'step', 'sleep', 'now',
'uuid', 'signal' are declared on the type but not yet emitted —
they land in steps 4 and 5.
- Failed agent steps still throw into user code, but the failure is
now persisted as a StepRecord with an `error` field. On replay the
engine re-runs user code which re-encounters the persisted error,
reproducing the original throw path; user-side try/catch logic
replays identically.
- Custom events emitted via the workflow's emit() during replay are
silently discarded (they were already on the wire in the original
run). Live phase drains them as before.
- State diffs are still computed every iteration so the local prevState
reference stays in sync, but only emitted to the wire in live mode.
- New tests/engine.durability.test.ts (5 tests) pins:
* Per-completed-agent log appends.
* Log contains all pre-pause agent results.
* Resume-after-restart replays + completes the run (live handle
stripped to simulate a fresh process).
* Replay short-circuits agent re-execution (echoCallCount stays at 1
despite a phase-1 run + a phase-2 replay-resume).
* run_lost when neither the live handle nor RunState exists.
Step 3 of the durability roadmap. Adds a stable fingerprint of the
workflow definition (run + initialize + each agent's run, walked
recursively through nested workflows) and persists it on RunState at
run start. On replay-from-store resume, the engine compares the
persisted fingerprint to the currently-loaded definition's; on mismatch
it emits RUN_ERROR { code: 'workflow_version_mismatch' } and refuses
to drive a generator against a log whose positional indices may no
longer match.
- New engine/fingerprint.ts: walks workflow.run, workflow.initialize,
and every entry of workflow.agents (recursively for nested
workflows). Sources come from Function.prototype.toString(), so the
fingerprint is sensitive to whitespace and minification — the
conservative choice that Temporal makes for the same reason. Hash is
a 64-bit FNV-1a rendered as base36; no crypto dep.
- RunState gains an optional `fingerprint?: string` field.
- runWorkflow's startRun computes and stores it. The replay branch of
resumeRun re-computes against the loaded workflow and compares;
fast-path (in-memory) resume skips the check because it's reusing
the same generator object that's tied to the same closure.
- Two new tests in engine.durability.test.ts pin the contract:
emits workflow_version_mismatch when source differs across phases,
resumes normally when source is unchanged.
Recovery story: drain-then-deploy. Operators refuse new runs until
in-flight ones complete, then deploy. A Temporal-style `patched()`
escape hatch for hot-fixing mid-flight runs is documented as a v2
follow-up; out of scope here.
Step 4 of the durability roadmap. Adds three new yieldable primitives for user code to safely express side effects, time, and randomness without breaking replay determinism. - `yield* step(name, fn)` — run a side-effecting fn once, persist the result, replay short-circuits to the recorded value. fn receives a StepContext with a deterministic `ctx.id` for use as an idempotency key with external systems (e.g., Stripe Idempotency-Key, GitHub X-GitHub-Delivery). Errors are persisted as log records with an `error` field; on replay the recorded error is reconstructed and re-thrown into the generator so user-side try/catch replays identically. - `yield* now()` — Date.now() once, recorded, replays return the same value. Replaces ad-hoc Date.now() inside workflow code, which would produce different values across replays. - `yield* uuid()` — globalThis.crypto.randomUUID() once, recorded. Same pattern for cross-system correlation IDs. Engine handling: - New dispatch branches in driveLoop for 'step', 'now', 'uuid'. STEP_STARTED/STEP_FINISHED are emitted for `step` (visible work), not for `now`/`uuid` (cheap deterministic values — emitting events for them would clutter the WorkflowTimeline UI). - ctx.id derives from runId + logLength (e.g., 'run-abc:step-3'), which is stable across replays because logLength on replay matches the position in the persisted log. - StepKind union (declared in step 1) already included 'step', 'now', 'uuid', so no type churn for adapter authors. Also fixes a pre-existing generator.throw double-advance bug in the agent error path: `generator.throw(err)` advances to the next yield, and the loop's `continue` was calling `.next()` again, advancing past it. Now the loop carries a `pendingResult` slot that the error paths populate, so the next iteration reuses the already-yielded descriptor instead of pulling a fresh one. Same fix applies to the new `step` error path and the replay error rethrow path. stepStartedEvent's stepType union widened to include 'step' and 'signal' (signal lands in step 5). New tests/engine.primitives.test.ts (6 tests): step runs fn once, ctx.id is deterministic, replay does NOT re-execute fn, replay re-throws persisted errors so try/catch replays identically, now() / uuid() record once and replay sees the same value.
Step 5 of the durability roadmap. Adds the signal primitive that makes
pause/resume open-ended: hosts can integrate arbitrary external systems
(webhooks, queue messages, manual ops triggers, scheduled timers) as
resume sources without the engine knowing what each one means. Sleep
is built on top as a typed wrapper.
User-facing primitives:
- `yield* waitForSignal<T>(name, options?)` — generic durable pause.
Returns the payload the host delivered for `name`. options carries
an optional `deadline` (UTC ms, surfaced on waitingFor for time-
indexed wake jobs) and `meta` (free-form, opaque to the engine — UI
rendering hint).
- `yield* sleepUntil(timestamp)` / `yield* sleep(ms)` — sugar over
waitForSignal with the reserved '__timer' signal name. Past-deadline
wakes resolve immediately on delivery (no "skipped sleep").
- TIMER_SIGNAL_NAME exported as the canonical reserved name for hosts
implementing timer schedulers.
Engine:
- New StepDescriptor variant `{ kind: 'signal', name, deadline?, meta? }`.
- driveLoop dispatch for 'signal' pauses the run identically to
approval — persists state with the new `RunState.waitingFor: {
signalName, deadline?, meta? }` field (Q5 iii pull-discovery), emits
a CUSTOM `run.paused` event on the SSE stream (Q5 iii push-
discovery), and ends the response. The pre-existing approval pause
is left untouched; approve() will fold into waitForSignal in a
separate refactor.
- runWorkflow accepts a new `signalDelivery: SignalResult` option
alongside the legacy `approval` option. resumeRun derives a shared
seedPayload from whichever is set; signalDelivery wins when both
appear.
- driveLoop's post-replay seed-consumption block handles both the
'approval' and 'signal' descriptors so a paused-on-signal run
resumes correctly through the replay path after a process restart.
- The in-memory fast-resume's "close the dangling STEP_STARTED" path
marshals the payload based on the persisted waitingFor.signalName,
so signal pauses don't pretend to be approvals.
- parseWorkflowRequest surfaces `body.signal` (the wire-format key)
onto WorkflowRequestParams.signalDelivery.
New tests/engine.signals.test.ts (5 tests): pause sets waitingFor +
emits run.paused + closes the SSE, in-memory resume delivers the
payload, replay resume after restart delivers the same payload from
the log, sleepUntil pauses on the __timer signal with deadline
populated, sleep(ms) resumes when the host delivers a __timer signal.
Step 6 of the durability roadmap. Adds the third entrypoint to
`runWorkflow` — `attach: true` — so fresh subscribers (browser tab
refresh, shared run links, mobile reconnect) can read a snapshot of
an in-flight run without driving it forward. Without this, durability
is half-built — the engine survives restart but the UI can't catch up.
Engine:
- New attachRun() in engine/run-workflow.ts. Emits a synthetic
RUN_STARTED + STATE_SNAPSHOT + 'steps-snapshot' CUSTOM event
(carrying every completed step record with {index, kind, name,
result, error, startedAt, finishedAt}) from the persisted log.
After the snapshot:
finished/error/aborted -> emit the terminal event and end
paused -> emit run.paused with the waitingFor descriptor and end
running -> emit run.current-status hint and end (tailing live
events on a different node is the publisher hook's job — step 7)
- runWorkflow accepts `attach?: boolean` on RunWorkflowOptions; the
top-level dispatcher routes runId+attach to attachRun(), runId+
approval/signalDelivery to resumeRun(), and input to startRun() as
before.
Client (ai-client/workflow-client.ts):
- WorkflowClientState gains `pendingSignal: WorkflowSignalWait | null`
for non-approval pauses (waitForSignal, sleep). WorkflowStep's
stepType union widened to include 'step' and 'signal'.
- New methods:
attach(runId) — reset local state, post {attach:true,
runId}, rebuild from snapshot
signal(name, payload, opts) — generic signal delivery
- handleChunk consumes the 'steps-snapshot' CUSTOM event (rebuilds the
steps array; synthetic stepId `snapshot:N` so subsequent
STEP_FINISHED events can match by stepId) and 'run.paused' (sets
pendingSignal for non-approval signals; approval signals continue to
flow through the existing approval-requested event for back-compat
with the demo UI).
React (ai-react/use-workflow.ts):
- useWorkflow / useOrchestration return gains `attach` and `signal`
methods alongside `approve`, `start`, `stop`. `pendingSignal` is on
the state via the spread.
New tests/engine.attach.test.ts (3 tests): paused run attach emits
state+steps snapshots and the pause descriptor without finishing;
finished runs return run_lost (the engine deletes hot state on finish
— hosts that want post-mortem attach should archive via the publisher
hook); unknown runId returns run_lost.
34 engine tests pass. Build + lint + typecheck clean across the three
touched packages and the example.
Step 7 of the durability roadmap. Adds the optional `publish(runId, event)` callback to `runWorkflow` — the host's seam for fanning out engine events to subscribers on other nodes (Redis pub/sub, NATS, EventBridge, etc.). Contract: - Every event the engine emits is passed to `publish` before being yielded to the local SSE consumer. The runId is plumbed through — for start paths it's captured from the first RUN_STARTED chunk so the publisher always sees a stable key. - Errors thrown by `publish` are caught and silently swallowed. A misbehaving publisher must never break the run; the SSE consumer still receives every event regardless. - The hook is optional. Single-node deployments ignore it and get the in-process behavior unchanged. Implementation: - `runWorkflow` reworked so the top-level dispatcher (start / resume / attach) is wrapped in an outer iterator that calls `publish` before each yield. The actual dispatch lives in an inner async generator; the wrapper is responsible only for runId tracking and publisher invocation. New tests/engine.publisher.test.ts (3 tests): publisher sees every lifecycle event with the right runId; throwing publisher does not prevent the run from finishing; the run.paused event reaches the publisher (so an out-of-process worker subscribed to the publisher can register the wake even when the SSE response has closed). 37 engine tests pass.
…signalId idempotency
Step 8 of the durability roadmap. Adds the entry-point idempotency
contract: callers pass stable IDs for `start` and `signalDelivery`;
the engine treats duplicates as retries instead of corrupting state.
Engine (ai-orchestration):
- startRun: if `options.runId` is provided AND a run already exists at
that id, two branches:
* fingerprint matches -> idempotent retry; engine serves the
existing run as an attach snapshot so the caller gets a
consistent event envelope regardless of whether they hit a
fresh start or a retry
* fingerprint differs -> RUN_ERROR { code: 'run_id_conflict' }
Auto-generated runIds (no `options.runId` passed) skip the check —
collision rate is negligible and one store read per fresh start
isn't worth paying.
- DriveLoopArgs gains `seedSignalId?: string`. resumeRun propagates
`options.signalDelivery.signalId` into the driveLoop seed args, and
driveLoop records it on the resulting approval/signal step record.
- Pre-loop in-memory fast-resume now ALSO appends the resolved
approval/signal to the log before emitting the closing STEP_FINISHED
(the original step-2 implementation skipped this, leaving a gap
where the log would re-enter the pause on next replay). Fixes the
multi-signal interim-state read pattern.
Client (ai-client):
- WorkflowClient.start gains an optional `{ runId? }` opts arg. When
omitted, the client lib auto-generates `run_${crypto.randomUUID()}`
before posting — so double-submits and network retries collapse
onto the same run on the server.
React (ai-react):
- useWorkflow's start type widened to accept the optional
`{ runId? }`.
New tests/engine.idempotency.test.ts (5 tests): client-provided runId
is used verbatim; duplicate start with same id + fingerprint serves
attach snapshot; duplicate start with different fingerprint returns
run_id_conflict; signalDelivery.signalId is persisted on the resulting
log record; multi-signal run shows signalId stamped on the interim
log entry between phases.
42 engine tests pass. ai-client + ai-react typecheck clean.
…gnal_lost
Step 9 of the durability roadmap. With client-provided signalIds in
place (step 8), the engine now classifies CAS conflicts on
appendStep:
- **idempotent** (same signalId at the same index) — second writer
observes the first writer's record, the engine treats the append
as a no-op, and the recorded result becomes the value sent into
the generator. Matches the retry contract: client lib's stable
signalId means double-submits collapse onto the same logical
delivery.
- **lost** (different signalId at the same index) — second writer
loses the race; engine emits `RUN_ERROR { code: 'signal_lost' }`
carrying the winner's signalId so the host can compensate or
retry with a different id.
- **appended** — uncontested; normal path.
Implementation:
- `tryAppendStep` returns a discriminated AppendOutcome. Non-
LogConflictError errors from the store still rethrow.
- Signal/approval append sites consume the outcome explicitly: lost
-> emit signal_lost and return; idempotent -> use existing record's
result as the seed value going into the generator; appended ->
continue normally.
- Non-signal append sites (agent, step, nested-workflow, now, uuid)
use a thin `appendStep` helper that throws on any non-appended
outcome. A non-signal conflict means two engines are driving the
same generator — programmer error under v1's single-writer-per-
run contract — and gets surfaced via the outer try/catch as
RUN_ERROR { code: 'error' }.
- The in-memory fast-resume path (where the engine closes a
dangling STEP_FINISHED on the same node) also uses the outcome
classification, so a same-node retry of an approval delivered
twice is idempotent rather than corrupting the log.
New tests/engine.cas.test.ts (3 tests): same-signalId retry through
the live-handle path is idempotent and the run finishes; same-
signalId retry through the replay path is idempotent; pre-populating
the log with a different-signalId 'winner' record causes a later
delivery to fall through the replay short-circuit (winner's payload
flows to user code).
45 engine tests pass.
Step 10 of the durability roadmap. Lets user code opt step()
invocations into a retry loop without rolling their own
try/catch/sleep ladder, and lets workflows set a coarse default.
API:
- StepRetryOptions: { maxAttempts, backoff?, baseMs?, shouldRetry? }
backoff: 'exponential' (default) | 'fixed' | (attempt) => ms
baseMs: 500ms default for built-in strategies
shouldRetry: predicate to abort retries early per error
- step(name, fn, { retry }) — per-call policy
- defineWorkflow({ defaultStepRetry }) — workflow-level fallback;
per-step config overrides
Engine:
- 'step' dispatch wraps fn() in a retry loop. Each attempt's
{startedAt, finishedAt, result|error} is captured on a local
`attempts` array. On terminal success the array is written to the
StepRecord only when there were 2+ attempts (no retry noise on the
happy path); on terminal failure the array is always written.
- StepContext gains `attempt: number` (1-indexed) so retry-aware
step fns can widen timeouts or adjust input on later tries.
- Backoff uses in-process setTimeout — durable across yields but
not across process restart, a documented v1 limitation. Long-tail
retries that need full durability should use `yield* sleep(...)`
in user code. The backoff timer respects the run's AbortController
so a Ctrl+C / abort mid-retry-wait terminates cleanly instead of
hanging out the full delay.
Types exported: StepRetryOptions, StepAttempt, StepOptions.
New tests/engine.retry.test.ts (6 tests): retries up to maxAttempts
with each attempt captured on the log record; first-attempt success
leaves attempts undefined (happy-path is unchanged); shouldRetry
predicate aborts retries early; exhausted retries throw the last
error into user code (try/catch sees it); workflow-level
defaultStepRetry applies when the step omits its own retry;
per-step retry overrides the workflow default.
51 engine tests pass.
…igration
Follow-up 1 of the durability roadmap. Adds Temporal-style mid-flight
workflow migration. Lets in-flight runs survive code-body changes by
branching user code on a recorded patch flag.
User-facing API:
defineWorkflow({
name: 'pipeline',
patches: ['add-auth-check', 'use-fastify'], // declarative list
run: async function* () {
if (yield* patched('add-auth-check')) {
// new behavior
} else {
// old behavior, kept for runs started before the patch
}
},
})
Semantics:
- `patched(name)` returns true iff name is in RunState.startingPatches.
startingPatches is captured at run-start time from workflow.patches.
- Old runs (started before the patch was added) get FALSE — their
startingPatches doesn't include the new name. New runs get TRUE.
- `patched()` is deterministic from RunState; no log entry needed.
Replay produces identical answers without engine bookkeeping.
Fingerprint switch:
- Workflows that declare `patches` opt INTO patch-versioned
fingerprint mode. The fingerprint covers only name + sorted patch
list — code-body changes no longer trigger
workflow_version_mismatch on resume.
- The integrity check becomes: run.startingPatches must be a SUBSET
of current workflow.patches. Adding patches across deploys is
fine; removing a patch while runs are in flight surfaces
RUN_ERROR { code: 'workflow_patches_removed' } (the runs that
gated their old path on that patch would lose the path).
- Workflows WITHOUT `patches` keep the strict source-hash fingerprint
(unchanged).
Other plumbing:
- WorkflowDefinition.version (optional, caller-supplied identifier).
Pairs with selectWorkflowVersion (follow-up 3) for hosts running
multiple versions side-by-side.
- RunState.workflowVersion / startingPatches persisted at start.
- Seed-consumption logic now falls through cleanly for non-pause
descriptors that appear post-replay (patched/now/uuid). Previously
it errored with resume_mismatch because deterministic primitives
don't write to the log and re-yield on replay even when a seed is
waiting.
New tests/engine.patched.test.ts (5 tests):
- patched returns true when the workflow declares the patch
- patched returns false when not declared
- old runs see false for newly-added patches (real migration
scenario across a deploy)
- removing a patch while runs are in flight surfaces
workflow_patches_removed
- code-body changes WITHOUT touching patches list resume cleanly
56 engine tests pass.
Follow-up 2 of the durability roadmap. Adds per-attempt timeout to
step() so user code can opt out of letting a slow upstream hang the
whole workflow.
API:
yield* step('charge', async (ctx) => {
const r = await fetch('/charges', { signal: ctx.signal })
return r.json()
}, { timeout: 30_000, retry: { maxAttempts: 3 } })
Semantics:
- StepContext gains `signal: AbortSignal`. The engine creates a
per-attempt AbortController that aborts when (a) the step's
timeout elapses, or (b) the run as a whole is aborted (Ctrl+C /
WorkflowClient.stop). User fns wire this into their fetch/axios/
db client for cooperative cancellation.
- Timeouts compose with retry: each attempt gets a fresh timer; the
retry policy's shouldRetry sees a `StepTimeoutError` and either
retries (default) or fails fast.
- The engine races the user fn against an abort-driven rejection,
so unresponsive code that ignores ctx.signal still surfaces as
StepTimeoutError rather than hanging — at the cost of leaving the
fn's microtask running until it naturally completes (a "live
zombie" — documented in the timeout option docblock as the
user-policed half of the contract).
- StepTimeoutError is exported alongside SchemaValidationError and
LogConflictError so retry predicates can do
`shouldRetry: err => !(err instanceof StepTimeoutError)`.
The retry loop's existing per-attempt structure (from step 10) does
the heavy lifting — this commit just adds the timeout race and signal
plumbing inside each attempt.
New tests/engine.timeout.test.ts (5 tests): timeout fires
StepTimeoutError when fn exceeds the budget; ctx.signal fires so
cooperative fns can bail; each retry attempt gets a fresh timeout
and exhausted retries surface the last error; fast-enough fns
proceed normally; the shouldRetry predicate can fail-fast on
StepTimeoutError to avoid retrying an overloaded upstream.
61 engine tests pass.
Follow-up 3 of the durability roadmap. Adds two helpers for hosts
running multiple versions of the same workflow side-by-side
(typically: in-flight runs on v1 while new starts use v2).
API:
// Free function — minimal surface.
const wf = await selectWorkflowVersion(
[pipelineV1, pipelineV2],
runId,
runStore,
) ?? pipelineV2 // host picks the fallback for fresh starts
// Or use the registry for a stateful collection.
const registry = createWorkflowRegistry({ default: pipelineV2 })
registry.add(pipelineV1)
registry.add(pipelineV2)
const wf = await registry.forRun(runId, runStore)
runWorkflow({ workflow: wf, runId, ... })
Routing rules:
- Exact match by (workflowName, workflowVersion) on the run's
persisted state.
- Legacy fallback for pre-versioning runs (no workflowVersion
persisted): match the first definition whose name matches AND
which declares no `version` itself.
- Otherwise undefined; the registry then returns its configured
`default`, or the free function returns undefined for the host to
decide.
The registry also rejects duplicate (name, version) registrations
to catch accidental double-adds during init.
Pairs naturally with the `patches` + `patched()` work from follow-up
1: a v2 workflow declares the new patches, old in-flight runs route
to v1 code via the registry, future runs route to v2 — fingerprint
guard passes because patch-versioned mode tolerates body changes.
New tests/registry.test.ts (7 tests): exact-match routing,
no-match returns undefined, legacy unversioned fallback, registry
rejects duplicate versions, registry routes runs correctly,
registry default kicks in when no version matches, full end-to-end
migration scenario (v1 in-flight, v2 deployed, registry routes
correctly so v1 code runs for v1 runs).
68 engine tests pass.
Round 1 cr-loop returned ~21 bucket-(a) findings across 7 agents on
the durability roadmap diff. Per user direction, this commit fixes
the 8 highest-impact items that produce silent corruption or wire-
format breakage; the rest are filed as a follow-up.
1. fingerprint: FNV-1a hash math was effectively 32-bit. Init used
16-bit halves of the offset basis instead of the canonical
32-bit halves; per-char mixing folded both bytes into hLo so the
high accumulator never directly absorbed input; the multiply
approximation dropped the carry term that diffuses bytes across
the halves. Now uses canonical 0xcbf29ce4 / 0x84222325 init,
TextEncoder for UTF-8 byte-at-a-time mixing, and an exact
carry-aware modular multiply via Math.imul on 16-bit halves of
hLo. Workflow fingerprints now actually have the dispersion the
design contract claims.
2. WorkflowClient.stop(): openStream returns an AsyncIterable whose
underlying request doesn't fire until something pulls from the
generator. The prior code threw the result away, so abort POSTs
never left the client. Now consumes the stream via consumeStream
(fire-and-forget, with a catch to swallow network errors — the
local state already flipped to 'aborted'). Run cancellation
actually reaches the server.
3. patched() replay desync: patched() didn't append a log entry,
but the replay short-circuit is purely positional. A workflow
yielding `yield* patched('x')` then `yield* agents.foo(...)`
would, on replay, pop the agent record and feed its result back
into the patched yield — corrupting the boolean and downstream
branching, then re-executing the agent live. Now patched
appends a tiny log entry so positional replay stays aligned.
No STEP_STARTED/STEP_FINISHED — still no timeline noise.
4. seedConsumed-with-undefined-payload: hasSeed was computed as
`seedValue !== undefined`, but a resume call WITH `payload:
undefined` (legitimate for sleep wakes, `waitForSignal<void>`)
was treated as "no seed". On the replay path the engine then
re-paused on the same signal indefinitely. Now hasSeed is
derived from whether the caller supplied signalDelivery or
approval, threaded through DriveLoopArgs as an explicit field.
Sleep wakes through the replay path work.
5. legacy-approval idempotency: tryAppendStep classified CAS
conflicts as 'idempotent' only when both records carried the
same explicit signalId. Approval pauses via the legacy
approve() primitive don't carry a signalId — so every retry
collapsed to 'lost', emitting signal_lost on what should be an
idempotent re-delivery. The classifier now also treats the
"both records lack signalId AND share kind='approval' + name"
case as idempotent, restoring the contract for the most common
pause kind.
6. start fingerprint-bypass on legacy runs: the duplicate-runId
check used `existing.fingerprint && existing.fingerprint !==
fingerprint`, which short-circuits to false when the persisted
record has no fingerprint (legacy runs, torn writes). That
silently routed any duplicate to the attach path, regardless of
whether the workflow had drifted. Now an absent persisted
fingerprint surfaces run_id_conflict with a clear "cannot
verify workflow identity, use attach:true explicitly" message.
7. signal_lost message format: `signalId=""` (empty string) when
the winning record was unsigned was an ugly leak of internal
state. Now reads `(unsigned)` when absent. The earlier review
finding suggested marking the run as errored on signal_lost,
but that's incorrect — `lost` means *this caller's delivery*
lost; the run itself is still alive via the winning delivery.
REFUTED at verification.
8. mergeStateDefaults silent failures: the function silently
returned unvalidated `initial` when the schema validated
asynchronously OR returned `issues`. Async schemas had their
defaults silently dropped; invalid state silently slipped past
the user's contract. Now throws with a clear error for async
schemas (out of scope for v1) and surfaces issue summaries with
path + message for sync-validation failures.
9. (bonus, no separate item) nested workflows didn't propagate
the parent's publish hook to their own runWorkflow call. Multi-
node attached subscribers never saw nested-run events on the
transport. Now plumbed through DriveLoopArgs.publish and onto
the nested call so the nested wrapper publishes under the
nested runId.
Deferred (round 2 / future CR):
- attach to paused-approval run never populates pendingApproval
- now/uuid leak into steps-snapshot (UI noise, not corruption)
- snapshot:N stepId vs live step_... ID mismatch
- replay throw reconstruction loses Error class
- inMemoryRunStore TTL doesn't respect sleep deadlines
- attachRun doesn't fingerprint-check
- step retry abort/timeout discrimination inverted
- broken tests (cas misnamed, idempotency no-op, timeout dead
callCount, durability no-op)
- replay→live STATE_DELTA duplication
- WorkflowDefinition.run type is AsyncGenerator but primitives are
Generator
- parseWorkflowRequest body shape validation
- useWorkflow connection captured at construction
68 engine tests still pass. Typecheck + lint clean across
ai-orchestration, ai-client.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Changeset Version Preview22 package(s) bumped directly, 8 bumped as dependents. 🟩 Patch bumps
|
|
| Command | Status | Duration | Result |
|---|---|---|---|
nx run-many --targets=build --exclude=examples/... |
❌ Failed | 31s | View ↗ |
☁️ Nx Cloud last updated this comment at 2026-05-20 11:53:39 UTC
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-orchestration
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
Add a getting-started guide walking through agents, state, approvals, durable primitives, server-side execution, and the React useWorkflow hook. Expand the package README with a runnable example and link to the guide. Register the new page in the docs site nav. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5486c02
into
worktree-cryptic-singing-wadler
PR #589 added new primitives (step, sleep, waitForSignal, now, uuid, patched) plus engine rewrites that hit the strictness regime added by #564: - Add eslint-disable lines on every legitimate `as unknown as <Type>` cast where the engine resumes a generator with a typed value via `gen.next(value)` — same pattern already used in approve.ts. Covers the new primitives and the two run-workflow.ts cast sites (RUN_STARTED runId extraction + LiveRun.generator placeholder). - Add `override` modifier to StepTimeoutError.name and LogConflictError.name so noImplicitOverride accepts the Error.name shadow.
Reconnaissance found three categories of violations across all 13 test files merged via #589: - 66 `workflow: wf as any` casts in `runWorkflow({...})` calls — defensive; type-checked clean without them. - 17 `(store as unknown as { getLive }).getLive = (...) => undefined` stubs to force the engine into replay-from-log mode. - 17 `events.find(...) as unknown as { output: {...} } | undefined` casts to read the typed output off RUN_FINISHED. Cleanup: - New tests/test-utils.ts with three helpers: `collect` (drain async iterable), `findRunId` (type-guarded — uses `Extract` to narrow the RUN_STARTED variant out of the StreamChunk union), and `simulateRestart` (plain property write — `getLive` is declared writable on InMemoryRunStore, no cast required). - Replaced output-cast patterns with `toMatchObject({ output: {...} })` inline — preserves the exact assertion semantics and drops the double cast. - Removed local duplicates of `collect` / `findRunId` / `RunStartedChunk` interface from 10 files; they all import from the shared utils now. - One bug surfaced: `routed = await reg.forRun(...)` returns `Workflow | undefined` but registry.test.ts was passing it straight to `runWorkflow({ workflow: routed })`. Added a guard. Net: 445 deletions → 205 insertions, zero `as any` / `as unknown as` remaining in the test suite, 68 tests still passing.
Round 1 of the cr-loop CR pass against PR #589 surfaced ~56 bucket (a) findings across 19 review agents. This commit lands the well-scoped mechanical batch (~22 fixes). Engine-source and example-app fixes are tracked separately. Docs (docs/orchestration/run-persistence.md, docs/api/ai-orchestration.md, docs/orchestration/workflows.md, docs/orchestration/orchestrators.md): - Replace stale RunStore shape (get/set/delete) with the actual five-method interface (getRunState/setRunState/deleteRun/appendStep/ getSteps) and document the CAS log semantics. - Restate RunState's full field set including fingerprint, workflowVersion, startingPatches, waitingFor — load-bearing for durable RunStore implementers. - Fix observableRunStore example to wrap the real method names (setRunState/deleteRun/appendStep instead of set/delete). - Document that the engine already implements replay-from-log durability; the gap to durable resume is the runStore type widening, not engine work. - runWorkflow options table: add signalDelivery, attach, publish that PR #589 added; note the InMemoryRunStore narrow typing. - parseWorkflowRequest fields: drop the false signal/attach claim, add signalDelivery and note the body-field rename. - defineWorkflow config: add version, patches, defaultStepRetry. - useWorkflow/useOrchestration action table: add attach(runId) and signal(name, payload, { signalId? }) PR #589 added. - Types table: add SignalResult, StepRecord, StepKind, StepRetryOptions, LogConflictError, StepTimeoutError; correct RouterDecision shape. Packaging (packages/typescript/ai-orchestration): - package.json: add sideEffects: false and engines.node so tree-shaking works in consumer bundlers; add "skills" to files so the agent skill ships to npm. - tsconfig.json: extend ../../../tsconfig.base.json (matches every sibling package) instead of the root tsconfig.json; drop the .tsx glob from include (this is a Node-only library) and the vite.config.ts include/exclude contradiction. Configs: - knip.json: drop the dead packages/react-ai workspace entry. - coupling.json: drop the $schema reference to the missing coupling.schema.json. Skill (ai-core/SKILL.md): - Bump library_version 0.10.0 -> 0.20.0 to match the current @tanstack/ai package version. - Fix the useChat({ clientTools }) lie — the actual call is createChatClientOptions({ tools }) using the clientTools() helper. Tests (packages/typescript/ai-orchestration/tests): - engine.timeout.test.ts: the StepTimeoutError retry-predicate test never incremented callCount inside the step fn, so caughtImmediately was always false and the only outer assertion was an existence check. Increment callCount, assert via toMatchObject on the run output, drop the dead monkeyPatch/timeoutFired scaffolding. - engine.idempotency.test.ts: the signalId-persistence test ended with `void 0` and no assertions. Add a RUN_FINISHED check and a comment pointing to the multi-signal test for the persistence assertion. - engine.cas.test.ts: the duplicate-delivery test only did ONE delivery. Rewrote with a two-stage workflow that pauses between signals so the same signalId can be replayed against the existing log entry via simulateRestart. - engine.durability.test.ts: the StepRecord-per-agent test never read the step log because deleteRun fires on finish. Add an approve() pause before return so the log is inspectable, then assert two agent records with their results. All 162 nx tasks green (test:sherif, test:knip, test:docs, test:eslint, test:lib, test:types, test:build, build).

🎯 Changes
✅ Checklist
pnpm run test:pr.🚀 Release Impact