feat(core): checkpoint/resume (1.R) + human-gate suspend/resume & timeout (1.Q) by cemililik · Pull Request #22 · HodeTech/Relavium

cemililik · 2026-06-14T20:36:56Z

Lands the two 1.m4 critical-path workstreams toward M2, both in @relavium/core (engine-only; zero platform imports). The whole diff is green on pnpm turbo run lint typecheck test build (622 core tests) and Leakwatch-clean, and was put through two adversarially-verified multi-agent review passes (round 1: 21 findings fixed incl. a real HIGH concurrency bug; round 2: 9 findings fixed, no blockers/highs).

1.R — `Checkpointer` + resume (critical path)

Derived checkpoint, no table (ADR-0003): reconstructCheckpointState(events) is a pure, total fold over the persisted run_events — run status, surrogate workflowId, per-node settled/paused states (a condition's branch from node:completed.selected, dimmed branches from the new node:skipped), pending + already-resolved gate ids, last sequenceNumber, token/cost tallies. The exact shape lives only in checkpoint.ts.
WorkflowEngine.resumeFromCheckpoint({runId, workflow, gateId, decision}) — rehydrates a fresh RunExecution from the reconstructed state (seeds node states / pending gates / tallies / the sequenceNumber so post-resume events stay gap-free; no run:started re-emit) and returns a RunHandle.
Reconstruction trap (b): a started-but-unfinished node is absent → seeded pending → re-run (bounded by the runId+nodeId+retryCount idempotency key).
Idempotent re-delivery (3 arms): already-terminal → closed handle (nothing re-emitted/re-persisted); already-resolved gate on a live run → drive the remaining work without re-applying; pending gate → apply. The residual concurrent-double-resolve window is closed by a Phase-2 store-level uniqueness constraint (documented).
Identity guard: the surrogate workflowId must match the workflow handed to resume, else a typed workflow_mismatch. The stronger same-slug-edited guard rides on the Phase-2 runs.workflow_definition_snapshot column (its canonical home).

1.Q — Human-gate suspend/resume + timeout

human_in_the_loop handler (node-handlers/human-gate.ts): resolves message_template / assignee and returns { kind: 'paused', gate }; wired into createStandardNodeExecutor (the type no longer fails loud). Secrets are parse-gated (inputs/ctx) + runtime-masked (run.outputs).
One-shot timer port ExecutionHost.setTimer (injected — core never names the ambient setTimeout); createManualTimerController is the deterministic test timer.
Timeout lifecycle: arm on pause, disarm on resume / terminal settle. approve auto-resolves the gate as approved (decidedBy: 'timeout', run continues); reject (the safe default) fails the run with run_timeout (the AwaitingGate → Failed edge) — never routed through resume(). A human decision (incl. rejected) continues the run; a human decision that beats the timer disarms it.
human_gate:paused carries timeoutAction (the effective policy) so a surface can show how a gate auto-resolves and a Phase-2 crash-resume can re-arm from the log.

Contracts & docs (one canonical home)

@relavium/shared: node:skipped (+ NodeSkippedReason), node:completed.selected, human_gate:paused.timeoutAction — schemas, RUN_EVENT_TYPES, per-variant type exports, and sse-event-schema.md all updated.
execution-model.md §4 (decision-continues vs the two timeout outcomes) and shared-core-engine.md (the derived CheckpointState, the reconstruction trap, the two resume entries, the idempotency + identity boundaries) updated.

Review trail

Round 1 headline: resume() mutated the gate vertex state after the durable await, so a sibling gate's timeout firing mid-persist could mis-read the run as stalled → spurious run:failed{internal}. Fixed (synchronous pre-emit mutation, mirroring #settleCompleted) with a deterministic multi-gate regression test.
Round 2: consistency/test-fidelity tightening; confirmed no regressions.

Deferred (documented, intentional)

docs/roadmap/current.md "next workstream" pointer + marking 1.R/1.Q Done happen in the post-merge roadmap commit (project pattern; "done after merge" rule).
Cross-process gate-timer re-arm on rehydration → Phase-2 crash-reconciliation (the data it needs is now persisted on human_gate:paused; no backfill).
Content-hash workflow-snapshot identity guard → Phase-2 runs.workflow_definition_snapshot.

Refs: ADR-0003, ADR-0036

🤖 Generated with Claude Code

Summary by Sourcery

Add event-derived checkpoint/replay support with a cross-process resume API and implement human gate suspend/resume with one-shot timeouts in the core workflow engine.

New Features:

Introduce a checkpoint reconstruction module and checkpointer interface that derive run state from persisted events, plus a cross-process WorkflowEngine.resumeFromCheckpoint API.
Add a human_in_the_loop node handler and gate timeout policy that arm one-shot timers for human gates, including auto-approve and run-timeout behaviors with idempotent decision handling.

Enhancements:

Emit node:skipped events with explicit reasons and extend node:completed and human_gate:paused payloads to support accurate checkpoint reconstruction and observability.
Extend the in-memory execution host with a deterministic manual timer controller and in-memory checkpointer, and expose new core engine types and utilities from the public index.
Tighten the run loop to persist skip propagation, track resolved gates, seed sequence numbers on resume, and guard workflow identity and already-active runs during checkpoint-based resumption.

Documentation:

Update execution model, shared core engine architecture, and SSE event schema docs to describe checkpoint-derived state, node:skipped semantics, human gate decisions and timeout behavior, and the new resumeFromCheckpoint flow.

Tests:

Add comprehensive tests for gate timeout behavior, skip propagation, checkpoint reconstruction, in-memory checkpointer, manual timer controller, human gate handler behavior, and resumeFromCheckpoint idempotency and error cases.

Summary by CodeRabbit

New Features
- Added checkpoint-based run resumption via resumeFromCheckpoint, including cross-process continuation and deterministic checkpoint reconstruction exports.
- Added human_in_the_loop node support plus human-gate timeout policies (approve/reject), with resumed gate completion.
- Added node:skipped events for conditional branches not taken, including skip reasons.
Bug Fixes
- Improved idempotent gate timeout/resume behavior (one-shot timers, correct disarming, and safe handling for terminal runs).
Documentation
- Expanded human-gate decision lifecycle and checkpoint reconstruction/resume semantics; updated SSE/run-event contracts for skips, branch selection, and timeout metadata.

…uisite) A skip-propagated vertex emitted NOTHING, so the persisted event stream could not record which nodes a condition dimmed — checkpoint/resume (1.R) reconstructs run state by replaying that stream, so resume after a condition would mis-route. This adds `node:skipped` to make the log a complete, replayable record (and it closes a real observability gap — surfaces never saw a node get skipped before). - shared: `NodeSkippedEventSchema` ({ nodeId, reason: 'branch_not_taken' | 'upstream_unreachable' }) + `NodeSkippedReason`; added to RUN_EVENT_TYPES + the RunEvent union; the contract-parity test now pins 19 names with a valid + reject fixture. - engine: `#propagateSkips` collects the vertices it newly dims (+ a derived reason via `#skipReason`); `#step` emits a durable `node:skipped` for each BEFORE any terminal settle (persist-before-deliver, gap-free) so the log stays a complete record. - docs: documented `node:skipped` in its canonical home (sse-event-schema.md). - test: the 1.P condition e2e now asserts the dimmed branch emits node:skipped{branch_not_taken}. Decided (per the 1.R Understand pass, maintainer-approved): a new `node:skipped` event over adding a `selected` field to node:completed — it persists the skip decisions directly (no selectedTargets needed on resume) and surfaces skips. Additive within ADR-0036; no new ADR. pnpm turbo run lint typecheck test build format:check: green (579 core, 245 shared). Leakwatch: 0. Refs: ADR-0036 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The read side that rebuilds a run's state from its persisted event stream so an interrupted run (crash, or suspended at a gate) can resume — no checkpoint table; the state is DERIVED from `run_events` (ADR-0003; execution-model.md §5). In-memory reference now; the SQLite/cloud store is Phase-2/CLI. - checkpoint.ts: `Checkpointer { load(runId) }` + `CheckpointState` (schemaVersion, runStatus, nodeStates, completedNodeIds, pendingGates, lastSequenceNumber) + the pure `reconstructCheckpointState(events)` — a deterministic fold of the ordered stream. Trap (b) baked in: a node that emitted `node:started` but no terminal event is ABSENT from nodeStates, so the rehydrating engine seeds it `pending` and re-runs it (bounded by the idempotency key, not by skipping). A condition's `selectedTargets` is restored from `node:completed.selected`; dimmed branches from `node:skipped`; a gate-parked run yields `pendingGates` + a `paused` node; a resumed gate records the decision as the node output. - run-event.ts: `node:completed.selected?` — the authoritative record of a condition's branch selection (the reconstruction needs it; `node:skipped` alone can't survive a crash between the condition's completion and the dimmed branches' skip-emission). engine `#settleCompleted` sets it for a branch outcome. - execution-host.ts: `ExecutionHost.checkpointer` (a SEPARATE read port from the write `RunStore`) + `createInMemoryCheckpointer` reconstructing from an `InMemoryRunStore`; wired into `createInMemoryHost`. - index.ts: export the checkpoint surface. Tests: 11 reconstruction + in-memory-checkpointer cases. pnpm turbo run lint typecheck test build format:check: green (590 core). Leakwatch: 0. Refs: ADR-0036, ADR-0003 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Complete the 1.R resume path on top of the Checkpointer read-side: a run suspended at a gate in a prior process is rehydrated from its reconstructed CheckpointState and driven to completion behind the one engine loop. - WorkflowEngine.resumeFromCheckpoint({runId, workflow, gateId, decision}): the cross-process resume entry. Loads the checkpoint, rehydrates a fresh RunExecution (seeds per-node states, pending/resolved gates, token+cost tallies, and the bus sequence so post-resume events stay gap-free), applies the decision, and returns a RunHandle. No run:started is re-emitted. - RunExecution: a checkpoint constructor arm (#seedFromCheckpoint), prepareResume (clock only), kick (drive without re-applying), and #resolvedGates so a re-delivered decision is an idempotent no-op rather than advancing the run twice. - Idempotent re-delivery, three arms: an already-terminal checkpoint returns a closed handle (nothing re-emitted/re-persisted, createClosedRunHandle); an already-resolved gate on a live run drives remaining work without re-applying; a still-pending gate applies the decision. The residual concurrent TOCTOU (two processes loading the same pending gate before either persists) is closed by a Phase-2 store-level uniqueness constraint, documented in checkpoint.ts. - Identity guard: the surrogate workflowId reconstructed from run:started must match the workflow handed to resume, else a typed EngineStateError 'workflow_mismatch'. The stronger same-slug-edited guard rides on the Phase-2 runs.workflow_definition_snapshot column (database-schema.md), not run:started. - event-bus: seedSequence(key, next) — seed the per-run counter on rehydration, never lowering an advanced one. - CheckpointState gains workflowId (from run:started) for the identity guard. - Tests: 7 resume-from-checkpoint e2e cases (cross-process resume gap-free, idempotent re-delivery to a terminal run, workflow_mismatch, unknown_run, already-in-memory, invalid_decision); checkpoint workflowId capture. - Docs: the canonical Checkpoint-and-resume section in shared-core-engine.md now describes the derived CheckpointState, the reconstruction trap (started-but- unfinished node re-runs), what is NOT checkpointed (the resolved ctx, with the structuredClone transport rule), the two resume entries, and the idempotency + identity boundaries — pointing to checkpoint.ts for the exact field set. Refs: ADR-0003, ADR-0036 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fill the `paused`/`GateRequest` arm 1.N/1.P reserved: the `human_in_the_loop` node handler plus the engine-side timeout lifecycle, on top of 1.R. - node-handlers/human-gate.ts: the gate handler resolves `message_template` / `assignee` against inputs + run.outputs and returns `{ kind: 'paused', gate }`. Raw resolution is safe — a `secret` reference in either field is rejected at parse (secret-taint `node-text` category), mirroring the agent's prompt. It is thin and clock-free; deadlines are the engine's job. Wired into createStandardNodeExecutor (the type no longer fails loud). - Timer port: ExecutionHost.setTimer (one-shot, returns disarm) — injected so core never names the ambient setTimeout (purity lib). createInMemoryHost ships a manual, deterministic timer (createManualTimerController) fired by hand in tests (fireTimers/armedCount); a real surface injects a setTimeout-backed one. - Engine timeout lifecycle: #settlePaused computes expiresAt from the host clock and arms the timer; a decision (human or timeout-approve) disarms it; a terminal settle disarms all. On fire, `approve` auto-resolves the gate as approved (decidedBy: 'timeout', run continues); `reject` (the safe default) fails the run with run_timeout (the AwaitingGate→Failed edge) — never routed through resume(), which would wrongly complete the gate. A human decision that beats the timer disarms it (single resolution). - GateRequest gains timeoutAction ('approve'|'reject'); the handler supplies it from the node's timeout_action (default reject). Re-arming a still-pending gate's timer on rehydration is deferred to Phase-2 crash-reconciliation (needs timeout_action persisted on human_gate:paused) — documented in #seedFromCheckpoint. - Tests: 7 handler unit tests (template resolution, default/explicit timeout_action, no-timeout, cancel, validation, wrong-node) + 4 engine timeout e2e (approve auto-resolve, reject→run_timeout, disarm-on-human-decision, no-timer-without-timeout) + the dispatcher gate-wiring assertion. - Docs: execution-model.md §4 made precise on the decision-continues vs the two timeout outcomes. Refs: ADR-0036 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fold the confirmed findings from the adversarially-verified review of the 1.R + 1.Q diff (21/26 survived refutation). Correctness: - resume(): mark the gate vertex completed SYNCHRONOUSLY before the durable emit (mirroring #settleCompleted), closing a multi-gate stall race where a sibling gate's timeout firing during the persist saw the gate as deleted-but-paused and mis-read the run as stalled (spurious run:failed{internal}). [HIGH] - #failGateOnTimeout now adds the gateId to #resolvedGates (symmetry with the approve/human path) so a late re-delivery of a reject-timed-out gate's decision is an idempotent no-op, not a run_already_terminal throw. Clarity / contracts: - New EngineStateError code `run_already_active` for resumeFromCheckpoint on a run already in memory (was the contradictory `unknown_run`); unknown_run comment fixed. - human_gate:paused gains optional `timeoutAction` (reuses TimeoutActionSchema), populated by the engine — immediate observability + pre-captures the data a Phase-2 crash-resume needs to re-arm a gate timer (no future backfill). - human-gate.ts header corrected: distinguishes parse-time taint (inputs/ctx) from the runtime masking that keeps run.outputs secret-free. Docs (one canonical home): - sse-event-schema.md: add NodeSkippedEvent to the RunEvent union + interface, node:completed.selected, and human_gate:paused.timeoutAction (interfaces + table). - run-event.test.ts: fixture carries timeoutAction/expiresAt; stale "18" -> "19". Tests (+13): the multi-gate stall-race regression (two timeout-approve gates settled in one timer sweep), the kick() path (gate already resolved in a prior process drives the remaining work without re-applying), reject-timeout re-delivery no-op, skip-before- fail ordering, expiresAt deadline value, post-terminal timer no-op, no-rearm-on- rehydration, token/cost tally restoration, and ManualTimerController unit tests. Refs: ADR-0003, ADR-0036 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Second adversarially-verified review pass (9/20 findings survived; no blockers/ highs — the round-1 fixes held). Fold the confirmed items: - #settlePaused now emits the EFFECTIVE timeoutAction (default `reject`) used for both the armed timer and the persisted human_gate:paused event, so the log always reflects the exact policy the engine acts on — even when a handler set timeoutMs but left timeoutAction implicit (a Phase-2 crash-resume reads it back to re-arm). - shared: add the missing `export type NodeSkippedEvent` (restores the per-variant type-export pattern alongside NodeCompletedEvent/NodeFailedEvent). - resumeFromCheckpoint: a comment marking the single point a future engine guards/ migrates an older checkpoint.schemaVersion (the field's purpose; inert at v1). - docs: execution-model.md paragraph break before the cross-reference sentence. Tests (+3, strengthened 2): - a human `rejected` decision completes the gate and CONTINUES the run (the documented "rejection is not a failure" path), the decision reaching run.outputs. - an armed gate timer is disarmed by #settle when the run terminates for an unrelated reason (cancel) — the disarm-by-settle path (vs disarm-by-resume). - the kick-path test now also asserts gap-free sequence continuation, and the no-rearm-on-rehydration test spies on setTimer to prove it is NEVER called (distinguishing "never armed" from "armed then disarmed"). Deferred (documented, not a code change): docs/roadmap/current.md still names 1.Q as the next workstream — the roadmap status page is updated in the post-merge commit (project pattern; ADR/roadmap "done after merge" rule), not pre-merge. Refs: ADR-0003, ADR-0036 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sourcery-ai · 2026-06-14T20:37:03Z

Reviewer's Guide

Implements event-derived checkpoint/resume for the workflow engine and adds human-gate suspend/resume with one-shot timeouts, wiring them through the core engine, execution host, node handlers, and shared run-event contracts with comprehensive tests and documentation updates.

Sequence diagram for resumeFromCheckpoint cross-process gate resume

sequenceDiagram
  participant Caller
  participant WorkflowEngine
  participant ExecutionHost
  participant Checkpointer
  participant RunExecution
  participant RunEventBus

  Caller->>WorkflowEngine: resumeFromCheckpoint(input)
  WorkflowEngine->>WorkflowEngine: GateDecisionSchema.safeParse(input.decision)
  WorkflowEngine->>ExecutionHost: access checkpointer
  ExecutionHost->>Checkpointer: load(input.runId)
  Checkpointer-->>ExecutionHost: CheckpointState | undefined
  ExecutionHost-->>WorkflowEngine: CheckpointState | undefined
  alt no checkpoint
    WorkflowEngine-->>Caller: throw EngineStateError unknown_run
  else checkpoint exists
    WorkflowEngine->>ExecutionHost: store.resolveWorkflowId(input.workflow.workflow.id)
    ExecutionHost-->>WorkflowEngine: workflowId
    alt workflowId mismatch
      WorkflowEngine-->>Caller: throw EngineStateError workflow_mismatch
    else workflowId matches
      alt checkpoint.runStatus in TERMINAL_RUN_STATUSES
        WorkflowEngine->>WorkflowEngine: createClosedRunHandle(input.runId)
        WorkflowEngine-->>Caller: RunHandle(events completes immediately)
      else non-terminal checkpoint
        WorkflowEngine->>RunEventBus: new RunEventBus
        WorkflowEngine->>RunExecution: new RunExecution({checkpoint,...})
        RunExecution->>RunExecution: #seedFromCheckpoint(plan, checkpoint, bus, runId)
        RunExecution->>RunEventBus: seedSequence(runId, checkpoint.lastSequenceNumber + 1)
        RunExecution->>RunExecution: prepareResume()
        WorkflowEngine->>WorkflowEngine: #runs.set(runId, execution)
        alt input.gateId in checkpoint.resolvedGateIds
          WorkflowEngine->>RunExecution: kick()
        else gate still pending
          WorkflowEngine->>RunExecution: resume(input.gateId, decision)
        end
        WorkflowEngine-->>Caller: RunHandle(events from resumed run)
      end
    end
  end

Sequence diagram for human_gate pause, timeout, and resume with one-shot timer

sequenceDiagram
  participant RunExecution
  participant ExecutionHost
  participant ManualTimerController as ManualTimer

  %% Gate pause path
  RunExecution->>RunExecution: #settlePaused(vertex, gate)
  RunExecution->>RunExecution: #states.set(vertex.id, {status paused})
  RunExecution->>RunExecution: #pendingGates.set(gateId, {vertexId})
  RunExecution->>RunExecution: compute effectiveAction, expiresAt
  alt gate.timeoutMs defined
    RunExecution->>ExecutionHost: setTimer(gate.timeoutMs, onGateTimeout)
    ExecutionHost-->>RunExecution: disarm()
    RunExecution->>RunExecution: #gateTimers.set(gateId, disarm)
  end
  RunExecution->>RunExecution: #emitDurable(human_gate:paused)

  %% Human decision arrives before timeout
  RunExecution->>RunExecution: resume(gateId, decision)
  RunExecution->>RunExecution: check #resolvedGates.has(gateId)
  RunExecution->>RunExecution: #resolvedGates.add(gateId)
  RunExecution->>RunExecution: #pendingGates.delete(gateId)
  RunExecution->>RunExecution: #disarmTimer(gateId)
  RunExecution->>RunExecution: update vertex.state to completed
  RunExecution->>RunExecution: #emitDurable(human_gate:resumed)
  RunExecution->>RunExecution: #schedule()

  %% Timer fires first
  ManualTimer->>RunExecution: #onGateTimeout(gateId, vertexId, action)
  RunExecution->>RunExecution: #disarmTimer(gateId)
  alt action == approve
    RunExecution->>RunExecution: resume(gateId, {decision approved, decidedBy timeout})
  else action == reject
    RunExecution->>RunExecution: #failGateOnTimeout(gateId, vertexId)
    RunExecution->>RunExecution: #pendingGates.delete(gateId)
    RunExecution->>RunExecution: #resolvedGates.add(gateId)
    RunExecution->>RunExecution: #settleFailed(vertex, run_timeout)
    RunExecution->>RunExecution: #schedule()
  end

  %% Terminal settle disarms any remaining timers
  RunExecution->>RunExecution: #settle(type)
  RunExecution->>RunExecution: for gateId in #gateTimers.keys()
  RunExecution->>RunExecution: #disarmTimer(gateId)

File-Level Changes

Change	Details	Files
Add derived checkpoint reconstruction and cross-process resumeFromCheckpoint entrypoint to WorkflowEngine, including idempotent re-delivery and workflow identity guarding.	Introduce CheckpointState model, reconstructCheckpointState() fold, and Checkpointer interface to derive run state from ordered RunEvent streams. Extend RunEventBus to support seeding sequence numbers so resumed runs continue with gap-free sequenceNumber values. Augment RunExecution to seed internal vertex state, pending/resolved gates, tallies, and sequence counters from a checkpoint, and add prepareResume() and kick() paths. Implement WorkflowEngine.resumeFromCheckpoint() to load a checkpoint via ExecutionHost.checkpointer, enforce workflow identity and active-run guards, no-op on terminal checkpoints via a closed RunHandle, and either apply the gate decision or just drive remaining work. Extend EngineStateError codes with run_already_active, workflow_mismatch, and reuse unknown_run/invalid_decision for the new resumeFromCheckpoint flow.	`packages/core/src/engine/checkpoint.ts` `packages/core/src/engine/checkpoint.test.ts` `packages/core/src/engine/event-bus.ts` `packages/core/src/engine/engine.ts` `packages/core/src/engine/errors.ts` `packages/core/src/engine/engine.test.ts` `packages/core/src/engine/run-handle.ts` `packages/core/src/index.ts` `docs/architecture/shared-core-engine.md`
Introduce human-in-the-loop gate handler and one-shot timeout lifecycle, integrating gate timeouts into the run loop with proper disarming and failure semantics.	Add createHumanGateNodeExecutor to resolve message_template and assignee with template interpolation, enforce secret-handling contracts, and surface GateRequest with timeoutMs/timeoutAction. Wire human_in_the_loop into createStandardNodeExecutor so human_gate nodes suspend instead of failing, and export the handler/deps from the core index. Extend RunExecution to track resolved gates and per-gate timeout timers, disarming timers on resume and terminal settle, and make resume() idempotent on already-resolved gates while synchronously updating gate vertex state before durable emit. Implement #settlePaused gate timeout wiring using injected ExecutionHost.setTimer, computing expiresAt, deriving effective timeoutAction, and emitting human_gate:paused that carries timeoutMs, timeoutAction, and expiresAt. Add #onGateTimeout and #failGateOnTimeout to auto-approve gates or fail runs with run_timeout on timeout_action: reject, ensuring late decisions become no-ops and that timers never fire after run termination.	`packages/core/src/engine/node-handlers/human-gate.ts` `packages/core/src/engine/node-handlers/human-gate.test.ts` `packages/core/src/engine/node-handlers/dispatcher.ts` `packages/core/src/engine/node-handlers/node-handlers.test.ts` `packages/core/src/engine/engine.ts` `packages/core/src/engine/engine.test.ts` `packages/core/src/index.ts` `docs/architecture/execution-model.md`
Extend ExecutionHost with a platform-free timer port and in-memory checkpointer/timer implementations to support timeouts and checkpoint-based resume in tests and the reference engine.	Define SetTimer type and add setTimer and checkpointer to ExecutionHost, keeping timer and checkpoint responsibilities distinct from RunStore. Implement createManualTimerController as a deterministic one-shot timer with fireTimers and armedCount helpers for tests, including race/edge-case coverage. Extend createInMemoryHost to provide a ManualTimerController-backed setTimer, expose fireTimers/armedCount for tests, and wire in an in-memory Checkpointer using reconstructCheckpointState over InMemoryRunStore. Add createInMemoryCheckpointer helper that only reconstructs checkpoints when the underlying RunStore is InMemoryRunStore, returning undefined for opaque/custom stores. Update engine tests to use the manual timer host helpers to exercise gate timeout behavior, multi-gate races, and ensure timers are disarmed on resume and terminal closure.	`packages/core/src/engine/execution-host.ts` `packages/core/src/engine/execution-host.test.ts` `packages/core/src/engine/checkpoint.ts` `packages/core/src/engine/checkpoint.test.ts` `packages/core/src/engine/engine.test.ts` `packages/core/src/index.ts`
Enrich shared run-event contracts with node:skipped, condition-branch selection, and timeoutAction metadata to make the event log fully replayable for checkpoint reconstruction and gate timeout UX.	Add NodeSkippedEvent and NodeSkippedReason (branch_not_taken/upstream_unreachable) to shared run-event schemas and constants, plus tests and SSE contract docs, and include it in RunEventUnion/RunEventType. Update NodeCompletedEvent to optionally carry selected target ids for condition nodes, and document this in the SSE schema as the authoritative branch record. Modify RunExecution skip propagation to compute a structured skip reason per vertex, return newly skipped nodes, and emit node:skipped events before terminals so reconstruction and UIs can see dimmed branches. Extend HumanGatePausedEvent to include timeoutAction alongside timeoutMs and expiresAt, and update tests and SSE docs accordingly. Update execution-model and architecture docs to describe gate decision semantics vs timeout outcomes, and how node:skipped and selected are used in checkpoint/resume.	`packages/shared/src/run-event.ts` `packages/shared/src/run-event.test.ts` `packages/shared/src/constants.ts` `packages/core/src/engine/engine.ts` `packages/core/src/engine/node-handlers/node-handlers.e2e.test.ts` `docs/reference/contracts/sse-event-schema.md` `docs/architecture/shared-core-engine.md` `docs/architecture/execution-model.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-06-14T20:37:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4a8457d-3b90-4092-890a-85a853fe28cd

📥 Commits

Reviewing files that changed from the base of the PR and between 012b2bb and 8e8cd9c.

📒 Files selected for processing (7)

docs/architecture/shared-core-engine.md
docs/reference/contracts/sse-event-schema.md
packages/core/src/engine/checkpoint.test.ts
packages/core/src/engine/engine.test.ts
packages/core/src/engine/engine.ts
packages/core/src/engine/node-handlers/human-gate.test.ts
packages/core/src/index.ts

🚧 Files skipped from review as they are similar to previous changes (4)

packages/core/src/index.ts
docs/architecture/shared-core-engine.md
packages/core/src/engine/engine.test.ts
docs/reference/contracts/sse-event-schema.md

📝 Walkthrough

Walkthrough

This PR adds a human_in_the_loop node handler with one-shot timeout support (approve/reject), a checkpoint read-side (reconstructCheckpointState) enabling cross-process gate resumption via a new WorkflowEngine.resumeFromCheckpoint method, durable node:skipped events with branch_not_taken/upstream_unreachable reasons, deterministic test timer infrastructure, and updates shared event schemas, public API surface, and architecture documentation throughout.

Changes

Human Gate, Checkpoint Resume & node:skipped

Layer / File(s)	Summary
Shared event schema extensions `packages/shared/src/constants.ts`, `packages/shared/src/run-event.ts`, `packages/shared/src/run-event.test.ts`, `docs/reference/contracts/sse-event-schema.md`	Adds `node:skipped` to `RUN_EVENT_TYPES` and `RunEventUnionSchema`. Extends `NodeCompletedEventSchema` with optional `selected` array (branch target ids). Adds optional `timeoutAction` to `HumanGatePausedEventSchema` with cross-field validation (requires `timeoutMs` when present). Introduces `NodeSkippedReasonSchema` and `NodeSkippedEventSchema` with `branch_not_taken` and `upstream_unreachable` reasons. Updates contract documentation and validates all changes via expanded test matrix.
`human_in_the_loop` node handler and dispatcher wiring `packages/core/src/engine/node-executor.ts`, `packages/core/src/engine/node-handlers/human-gate.ts`, `packages/core/src/engine/node-handlers/dispatcher.ts`, `packages/core/src/engine/node-handlers/human-gate.test.ts`, `packages/core/src/engine/node-handlers/node-handlers.test.ts`	Adds `timeoutAction?: 'approve' \| 'reject'` to `GateRequest` to control gate timeout behavior. Implements `createHumanGateNodeExecutor`: validates node kind, handles aborts (including during template resolution), resolves `message_template` and `assignee` via `RunScope` with inputs/outputs, maps interpolation failures to `validation` errors, constructs `GateRequest` with optional timeout fields (defaults `timeoutAction` to `'reject'` when `timeout_ms` is set). Wires handler into `createStandardNodeExecutor` via optional `humanGate` dependency in `StandardNodeExecutorDeps`. Comprehensive tests validate interpolation, timeout defaults, abort handling, non-gate errors, and integration with standard executor.
Checkpoint read-side: `reconstructCheckpointState` `packages/core/src/engine/checkpoint.ts`, `packages/core/src/engine/checkpoint.test.ts`	Introduces complete `checkpoint.ts` module: `CheckpointNodeState`, `CheckpointPendingGate`, `CheckpointState` types, `Checkpointer` interface, `CHECKPOINT_SCHEMA_VERSION` constant. Core `reconstructCheckpointState` deterministically folds persisted `RunEvent` stream (in order) into derived state: returns `undefined` if no `run:started`; reconstructs run identity, status, and per-node terminal/paused states (omitting nodes with only `node:started` so they re-run); restores branch selections via `selectedTargets`; reconstructs gate-parked runs with `pendingGates`; handles gate resume by completing gate node and moving to `resolvedGateIds`; accumulates token totals and cumulative cost from `cost:updated`. Tests cover all scenarios including completed runs, in-flight nodes, branch/skip restoration, gate pause/resume cycles, cost accounting, typed failure preservation, and `createInMemoryCheckpointer` integration with `InMemoryRunStore`.
`ExecutionHost`: timer seam, `ManualTimerController`, utilities `packages/core/src/engine/execution-host.ts`, `packages/core/src/engine/execution-host.test.ts`, `packages/core/src/engine/event-bus.ts`, `packages/core/src/engine/run-handle.ts`, `packages/core/src/engine/errors.ts`	Extends `ExecutionHost` contract with `checkpointer: Checkpointer` port and `setTimer: (ms, onFire) => disarm` one-shot timer port. Adds `ManualTimerController` interface and `createManualTimerController` for deterministic test-time timer control: `setTimer` arms timers, `fireTimers` fires all currently-armed timers exactly once (idempotent across sweeps), `armedCount` reports remaining timers. Expands `createInMemoryHost` to optionally accept injected `checkpointer`, wire manual timer controller as `setTimer`, and expose `fireTimers`/`armedCount` test controls. Adds `createInMemoryCheckpointer` (reconstructs checkpoint state from `InMemoryRunStore`, returns `undefined` for opaque stores). Adds `seedSequence` to `RunEventBus` for idempotent monotonic sequence counter advance during rehydration. Adds `createClosedRunHandle` for already-terminal runs (closed event stream, inert `cancel`/`subscribe`, resolved `whenConsumersReady`). Extends `EngineStateErrorCode` with `'run_already_active'` and `'workflow_mismatch'` discriminants.
Engine core: checkpoint seeding, skip propagation, gate timeouts, resumeFromCheckpoint `packages/core/src/engine/engine.ts`, `packages/core/src/engine/engine.test.ts`, `packages/core/src/engine/node-handlers/node-handlers.e2e.test.ts`	Adds `ResumeFromCheckpointInput` interface for cross-process checkpoint resumption. Introduces `WorkflowEngine.resumeFromCheckpoint(...)`: loads `CheckpointState` via checkpointer, enforces workflow identity guard (rejects mismatch), returns `createClosedRunHandle` for terminal checkpoints, otherwise creates checkpoint-seeded `RunExecution` and either kicks (gate already resolved) or applies decision via `resume`. Extends `RunExecution` with `#resolvedGates` for idempotent gate tracking and `#gateTimers` for timer disarm callbacks. Adds optional `checkpoint` constructor parameter and `#seedFromCheckpoint` method to rehydrate vertex states, pending/resolved gates, token/cost tallies, and sequence counter. Adds `kick()` to continue without re-applying a gate decision. Makes `resume` idempotent: skips re-application if gate already in `#resolvedGates`. During gate resume: marks gate resolved, disarms timer, clears pause state, synchronously completes gate vertex with stored output, emits `human_gate:resumed`, schedules continuation. Refactors skip propagation: `#propagateSkips` now returns newly skipped vertices with `NodeSkippedReason`; scheduler emits durable `node:skipped` events with reasons before checking terminal conditions. Persists branch outcomes: `node:completed` now includes `selected` targets when outcome is `branch`. Enhances gate pause: `#settlePaused` computes `timeoutAction` and `expiresAt`, arms one-shot timer via `setTimer`, stores disarm callback. Implements `#disarmTimer` (idempotent), `#onGateTimeout` (approve auto-resumes; reject fails run with `run_timeout`), `#failGateOnTimeout` (idempotent gate resolution with run failure). On terminal settlement, disarms all remaining gate timers and clears `#gateTimers`. Comprehensive test coverage validates timeout metadata, auto-resolve vs auto-fail, timer disarm on early decision, no timer when `timeoutMs` absent, rejection continuation, timer disarm on cancel, idempotent late re-delivery, correct skip event ordering, concurrent multi-gate timeout in single sweep, cross-process rehydration with gap-free sequencing, terminal run no-op, workflow mismatch guard, already-active rejection, invalid decision validation, kick-path regression, and no timer arm during rehydration. E2E tests assert `node:skipped` events with `branch_not_taken` reason for unselected branches.
Public API surface and documentation `packages/core/src/index.ts`, `docs/architecture/execution-model.md`, `docs/architecture/shared-core-engine.md`	Re-exports `ResumeFromCheckpointInput`, checkpoint types (`Checkpointer`, `CheckpointState`, `CheckpointNodeState`, `CheckpointPendingGate`), checkpoint functions (`reconstructCheckpointState`, `CHECKPOINT_SCHEMA_VERSION`), timer types (`SetTimer`, `ManualTimerController`) and function (`createManualTimerController`), checkpointer factory (`createInMemoryCheckpointer`), and human-gate executor (`createHumanGateNodeExecutor`, `HumanGateNodeExecutorDeps`). Updates `execution-model.md` to specify human-gate full decision lifecycle (emit `human_gate:resumed`, continue run, checkpoint-idempotent resolution, allow parallel pending gates), expand timeout behavior (one-shot timer from injected clock, `reject` vs `approve` differ in run-timeout failure vs auto-resolve, first-arriving decision disarms). Updates `shared-core-engine.md` with detailed deterministic `reconstructCheckpointState` event-fold model, clarify derived state contents and exclusions (`ctx.*` not reconstructed), specify `structuredClone` requirement for checkpoint boundaries, expand gate-resume semantics (in-process vs restart paths, workflow identity guard with definition snapshot, idempotent decision re-delivery, Phase-2 store uniqueness constraint for concurrency race closure).

Sequence Diagram(s)

sequenceDiagram
  rect rgba(70, 130, 180, 0.5)
    note over Caller, RunEventBus: Cross-process gate resumption via checkpoint
  end
  participant Caller
  participant Engine as WorkflowEngine
  participant Checkpointer
  participant Store as RunStore
  participant Exec as RunExecution
  participant Bus as RunEventBus
  Caller->>Engine: resumeFromCheckpoint({runId, workflow, gateId, decision})
  Engine->>Checkpointer: load(runId)
  Checkpointer->>Store: getEvents(runId)
  Store-->>Checkpointer: RunEvent[]
  Checkpointer->>Checkpointer: reconstructCheckpointState(events)
  Checkpointer-->>Engine: CheckpointState
  Engine->>Engine: enforce workflow identity guard
  alt run is terminal
    Engine-->>Caller: createClosedRunHandle(runId)
  else run paused at gate
    Engine->>Exec: new RunExecution(checkpoint)
    Exec->>Exec: `#seedFromCheckpoint` (vertices, gates, tallies)
    Exec->>Bus: seedSequence(lastSequenceNumber + 1)
    alt gateId already resolved
      Engine->>Exec: kick()
    else gateId pending
      Engine->>Exec: resume(gateId, decision)
      Exec->>Exec: mark gate completed, disarm timer
      Exec->>Store: persist human_gate:resumed
      Exec->>Exec: schedule next step
    end
    Engine-->>Caller: RunHandle (active)
  end

sequenceDiagram
  rect rgba(200, 150, 50, 0.5)
    note over RunExecution, RunStore: Gate timeout lifecycle with one-shot timer
  end
  participant RunExecution
  participant SetTimer as setTimer/ManualTimerController
  participant RunStore
  participant Scheduler
  RunExecution->>RunExecution: `#settlePaused` (gate node)
  RunExecution->>RunExecution: compute expiresAt from clock.now() + timeoutMs
  RunExecution->>SetTimer: setTimer(timeoutMs, onFire)
  SetTimer-->>RunExecution: disarm callback → store in `#gateTimers`
  RunExecution->>RunStore: persist human_gate:paused {timeoutAction, expiresAt}
  par Human decision arrives first
    RunExecution->>RunExecution: resume(gateId, decision)
    RunExecution->>RunExecution: `#disarmTimer`(gateId)
  and Timer fires
    SetTimer->>RunExecution: `#onGateTimeout`(gateId, timeoutAction)
    alt timeoutAction = approve
      RunExecution->>RunExecution: resolve gate, mark approved
      RunExecution->>Scheduler: schedule next step
    else timeoutAction = reject
      RunExecution->>RunExecution: `#failGateOnTimeout`
      RunExecution->>RunStore: persist run:failed (run_timeout)
    end
  end
  RunExecution->>RunExecution: on terminal: disarm/clear all `#gateTimers`

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

HodeTech/Relavium#20: Both PRs modify the node-handler dispatch wiring to extend supported node types—main PR adds human_in_the_loop/humanGate while the retrieved PR introduced the standard per-type handler composition framework.
HodeTech/Relavium#17: Both PRs extend the core engine substrate around ExecutionHost contracts and run-loop control flow—main PR's checkpoint/resume and gate-timeout additions build on the run-loop foundation introduced in retrieved PR.

Poem

🐇 Hop, hop! The gate swings wide or closes tight,
A timer ticks—approve or reject by night.
From frozen events, the state is rebuilt anew,
node:skipped now echoes with branch_not_taken too.
Cross-process or in-mem, the run finds its way,
This bunny checkpointed every carrot today! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.59% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the two main features: checkpoint/resume (1.R) and human-gate suspend/resume with timeout (1.Q), aligning precisely with the PR objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch development

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

In resumeFromCheckpoint, the ability to pass new inputs/executionMode/planOptions for an already-started run could diverge the rehydrated execution from the original run:started state; consider either ignoring these in favor of checkpointed values (once available) or enforcing/clarifying the intended invariants so callers can’t accidentally change execution characteristics on resume.
The skip-propagation reason in #skipReason is determined by the first condition dependency encountered; if a node has multiple upstream conditions and mixed reasons, consider making the selection rule explicit (e.g. prefer branch_not_taken only when all relevant deps are conditions) or documenting this precedence to avoid surprising node:skipped.reason values.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `resumeFromCheckpoint`, the ability to pass new `inputs`/`executionMode`/`planOptions` for an already-started run could diverge the rehydrated execution from the original `run:started` state; consider either ignoring these in favor of checkpointed values (once available) or enforcing/clarifying the intended invariants so callers can’t accidentally change execution characteristics on resume.
- The skip-propagation reason in `#skipReason` is determined by the first condition dependency encountered; if a node has multiple upstream conditions and mixed reasons, consider making the selection rule explicit (e.g. prefer `branch_not_taken` only when all relevant deps are conditions) or documenting this precedence to avoid surprising `node:skipped.reason` values.

## Individual Comments

### Comment 1
<location path="packages/core/src/engine/engine.ts" line_range="304-305" />
<code_context>
+
+  /** Prepare a checkpoint-seeded run to resume — set the lifecycle clock. State was seeded in the
+   *  constructor; NO `run:started` is re-emitted (it is already in the persisted log). */
+  prepareResume(): void {
+    this.#startEpochMs = Date.parse(this.#host.clock.now());
+  }
+
</code_context>
<issue_to_address>
**question (bug_risk):** Resumed runs lose pre-crash wall-clock duration in `run:completed.durationMs`.

In `prepareResume` you set `#startEpochMs` to `clock.now()`, so `durationMs` for terminal events only covers time after resume. If `durationMs` is expected to represent total wall-clock run time, this will under-report resumed runs. Consider deriving `#startEpochMs` from the original `run:started.timestamp` stored in the checkpoint, or, if the new behavior is intentional, verify that downstream consumers don’t assume `durationMs` is total duration across resumes.
</issue_to_address>

### Comment 2
<location path="packages/core/src/engine/engine.ts" line_range="1096-1097" />
<code_context>
+      executor: this.#executor,
+      bus,
+      capacity: this.#capacity,
+      onSettled: () => {
+        /* retained like a started run (see start) */
+      },
+      checkpoint,
</code_context>
<issue_to_address>
**issue (bug_risk):** `resumeFromCheckpoint` executions are never evicted from `#runs`, which can leak memory.

`start` wires `onSettled` to remove executions from `#runs`, but `resumeFromCheckpoint` uses a no-op instead. This leaves resumed runs in the map indefinitely in long‑lived processes. Unless there’s a specific need to retain them, consider reusing the same `onSettled` handler as `start` so resumed runs are also removed from `#runs` on completion.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-14T20:38:14Z

+      onSettled: () => {
+        /* retained like a started run (see start) */


issue (bug_risk): resumeFromCheckpoint executions are never evicted from #runs, which can leak memory.

start wires onSettled to remove executions from #runs, but resumeFromCheckpoint uses a no-op instead. This leaves resumed runs in the map indefinitely in long‑lived processes. Unless there’s a specific need to retain them, consider reusing the same onSettled handler as start so resumed runs are also removed from #runs on completion.

gemini-code-assist

Code Review

This pull request implements the Checkpoint/Resume (1.R) and Human Gate Timeout (1.Q) execution model features, allowing runs to resume from a prior process using event-log reconstruction and introducing one-shot timers for gate timeouts. It also adds a new node:skipped event to ensure a complete, replayable event log. The review feedback highlights three key areas for improvement: a potential memory state leak in resumeFromCheckpoint if the resume execution throws an error, a map mutation safety issue during event iteration in reconstructCheckpointState, and a bug in the human gate handler where an abort signal during template resolution could be incorrectly caught and reported as a fatal validation failure instead of a clean cancellation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-14T20:38:21Z

+    this.#runs.set(input.runId, execution);
+    if (checkpoint.resolvedGateIds.includes(input.gateId)) {
+      // The gate was already resolved in the prior process (double-delivery); do not re-apply the
+      // decision — just drive any unfinished downstream work (or re-pause on a remaining gate).
+      execution.kick();
+    } else {
+      // Apply the decision + drive the loop (events buffer on the handle for the returned consumer).
+      await execution.resume(input.gateId, parsed.data);
+    }
+    return execution.handle;


If execution.resume throws an error (e.g., due to an unknown_gate or invalid_decision error), the run remains registered in this.#runs. Because of this, any subsequent attempt to call resumeFromCheckpoint for this run will fail with a run_already_active error, leaving the run permanently stuck in memory in a broken state. Wrapping the execution in a try...catch block to clean up this.#runs on failure prevents this state leak.

Suggested change

this.#runs.set(input.runId, execution);

if (checkpoint.resolvedGateIds.includes(input.gateId)) {

// The gate was already resolved in the prior process (double-delivery); do not re-apply the

// decision — just drive any unfinished downstream work (or re-pause on a remaining gate).

execution.kick();

} else {

// Apply the decision + drive the loop (events buffer on the handle for the returned consumer).

await execution.resume(input.gateId, parsed.data);

}

return execution.handle;

this.#runs.set(input.runId, execution);

try {

if (checkpoint.resolvedGateIds.includes(input.gateId)) {

// The gate was already resolved in the prior process (double-delivery); do not re-apply the

// decision — just drive any unfinished downstream work (or re-pause on a remaining gate).

execution.kick();

} else {

// Apply the decision + drive the loop (events buffer on the handle for the returned consumer).

await execution.resume(input.gateId, parsed.data);

}

} catch (error) {

this.#runs.delete(input.runId);

throw error;

}

return execution.handle;

gemini-code-assist · 2026-06-14T20:38:21Z

+        for (const [gateId, nodeId] of pendingGates) {
+          if (nodeId === event.nodeId) {
+            pendingGates.delete(gateId);
+            resolvedGateIds.add(gateId);
+          }
+        }


Mutating a Map (pendingGates) while iterating over it using a for...of loop can lead to subtle bugs or maintainability issues if the iteration logic is refactored. It is safer and more declarative to collect the keys to resolve first, and then perform the deletions.

Suggested change

for (const [gateId, nodeId] of pendingGates) {

if (nodeId === event.nodeId) {

pendingGates.delete(gateId);

resolvedGateIds.add(gateId);

}

}

const toResolve: string[] = [];

for (const [gateId, nodeId] of pendingGates) {

if (nodeId === event.nodeId) {

toResolve.push(gateId);

}

}

for (const gateId of toResolve) {

pendingGates.delete(gateId);

resolvedGateIds.add(gateId);

}

gemini-code-assist · 2026-06-14T20:38:22Z

+  } catch (err) {
+    // An interpolation failure is an authoring/data fault, not a transient one — fatal `validation`,
+    // matching the agent handler's prompt-resolution failure mapping (agent-runner.ts).
+    return failed(
+      'validation',
+      err instanceof Error ? err.message : 'gate template interpolation failed',
+      false,
+    );
+  }


If ctx.signal is aborted while resolveTemplate is executing (e.g., during a slow file read or network call), resolveTemplate will throw an error. The catch block will intercept this and return a fatal validation failure instead of a clean cancelled() outcome. Checking ctx.signal.aborted inside the catch block ensures cancellation is handled correctly.

} catch (err) { if (ctx.signal.aborted) { return cancelled(); } // An interpolation failure is an authoring/data fault, not a transient one — fatal `validation`, // matching the agent handler's prompt-resolution failure mapping (agent-runner.ts). return failed( 'validation', err instanceof Error ? err.message : 'gate template interpolation failed', false, ); }

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/shared/src/run-event.ts (1)

235-249: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce timeoutAction only when a timeout exists.

HumanGatePausedEventSchema currently accepts timeoutAction without timeoutMs, which creates an invalid persisted state for pause/timeout resume semantics.

Suggested schema guard

-export const HumanGatePausedEventSchema = z.object({
+export const HumanGatePausedEventSchema = z.object({
   type: z.literal('human_gate:paused'),
   ...runBase,
   nodeId: nonEmptyString,
   gateId: nonEmptyString,
   gateType: GateTypeSchema,
   message: z.string(),
   assignee: z.string().optional(),
   timeoutMs: nonNegativeInt.optional(),
   timeoutAction: TimeoutActionSchema.optional(),
   expiresAt: z.string().datetime({ offset: true }).optional(),
-});
+}).superRefine((event, ctx) => {
+  if (event.timeoutAction !== undefined && event.timeoutMs === undefined) {
+    ctx.addIssue({
+      code: z.ZodIssueCode.custom,
+      path: ['timeoutAction'],
+      message: 'timeoutAction requires timeoutMs',
+    });
+  }
+});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/shared/src/run-event.ts` around lines 235 - 249, The
HumanGatePausedEventSchema currently allows timeoutAction to be present without
timeoutMs, which violates the intended pause/timeout resume semantics where
timeoutAction should only exist when a timeout is configured. Add a conditional
validation constraint to the HumanGatePausedEventSchema object using Zod's
refine or superRefine method to ensure that if timeoutAction is provided,
timeoutMs must also be present, preventing invalid persisted states.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/engine/checkpoint.ts`:
- Around line 80-181: The `reconstructCheckpointState` function exceeds the
cognitive complexity threshold (19 > 15) due to its large switch statement
handling multiple event types. Extract the event-application logic into separate
helper functions organized by category: one for handling run events
(run:started, run:paused, run:completed, run:failed, run:cancelled), one for
node events (node:completed, node:failed, node:skipped), one for gate events
(human_gate:paused, human_gate:resumed), and one for accounting (cost:updated).
Call these helpers from the main loop instead of inline switch cases, preserving
all current behavior and state mutations.

In `@packages/core/src/engine/engine.ts`:
- Around line 302-306: The prepareResume() method currently reinitializes the
`#startEpochMs` field to the current host time, which causes the run duration to
be reset on resume rather than continuing from the original start. Modify the
checkpoint serialization to persist the original `#startEpochMs` value (or the
accumulated elapsed duration) when creating a checkpoint, and update the
prepareResume() method to restore that persisted value instead of calling
Date.parse(this.#host.clock.now()). This ensures that run:completed.durationMs
accurately reflects the total elapsed time across the entire run including both
the pre-resume and post-resume segments.
- Around line 1102-1110: The execution is registered in this.#runs at line 1102
before validation occurs in execution.resume() at line 1109. If the resume call
throws a validation error, the half-initialized execution remains in `#runs`,
causing subsequent retries to fail with run_already_active instead of the
original error. Move the this.#runs.set(input.runId, execution) registration to
after both the execution.kick() path (for already-resolved gates) and the
execution.resume() path have completed successfully, or alternatively wrap the
entire resume/kick logic in a try-catch that deletes the execution from `#runs`
before rethrowing any validation errors.

In `@packages/core/src/engine/node-handlers/human-gate.ts`:
- Around line 60-67: In the catch block handling errors from resolveTemplate in
the human-gate.ts file, add logic to distinguish abort errors from other
interpolation failures. First, import the InterpolationError class at the top of
the file. Then, in the catch block (around lines 60-67), add a conditional check
before the existing failed call: if the caught error is an instance of
InterpolationError and its code property equals 'aborted', return cancelled() to
properly indicate the abort status; otherwise, proceed with the existing
failed('validation') logic for other interpolation errors. Additionally, add a
test case that verifies abort signals during template resolution are correctly
handled by returning cancelled() instead of failed().

In `@packages/shared/src/run-event.ts`:
- Around line 205-208: The `selected` field in the `node:completed` schema
currently allows empty arrays via `z.array(nonEmptyString).optional()`, which
creates an ambiguous branch outcome state that should not be permitted. Modify
the schema validation for the `selected` field to ensure that when the array is
present, it must contain at least one element. Use the `.min(1)` method on the
array validation to enforce that empty arrays are rejected, or alternatively add
a `.refine()` constraint that validates the array is non-empty when it is
defined.

---

Outside diff comments:
In `@packages/shared/src/run-event.ts`:
- Around line 235-249: The HumanGatePausedEventSchema currently allows
timeoutAction to be present without timeoutMs, which violates the intended
pause/timeout resume semantics where timeoutAction should only exist when a
timeout is configured. Add a conditional validation constraint to the
HumanGatePausedEventSchema object using Zod's refine or superRefine method to
ensure that if timeoutAction is provided, timeoutMs must also be present,
preventing invalid persisted states.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec0189e8-fe90-45ed-a3a2-cf9962a20d5a

📥 Commits

Reviewing files that changed from the base of the PR and between 0a0019b and f912ce0.

📒 Files selected for processing (22)

docs/architecture/execution-model.md
docs/architecture/shared-core-engine.md
docs/reference/contracts/sse-event-schema.md
packages/core/src/engine/checkpoint.test.ts
packages/core/src/engine/checkpoint.ts
packages/core/src/engine/engine.test.ts
packages/core/src/engine/engine.ts
packages/core/src/engine/errors.ts
packages/core/src/engine/event-bus.ts
packages/core/src/engine/execution-host.test.ts
packages/core/src/engine/execution-host.ts
packages/core/src/engine/node-executor.ts
packages/core/src/engine/node-handlers/dispatcher.ts
packages/core/src/engine/node-handlers/human-gate.test.ts
packages/core/src/engine/node-handlers/human-gate.ts
packages/core/src/engine/node-handlers/node-handlers.e2e.test.ts
packages/core/src/engine/node-handlers/node-handlers.test.ts
packages/core/src/engine/run-handle.ts
packages/core/src/index.ts
packages/shared/src/constants.ts
packages/shared/src/run-event.test.ts
packages/shared/src/run-event.ts

coderabbitai · 2026-06-14T20:45:42Z

+  // The immediate downstream ids a `condition` kept live (its branch selection). Present ONLY for a
+  // condition's branch outcome — it is the authoritative record checkpoint/resume (1.R) reconstructs
+  // `selectedTargets` from, so a selected branch that was mid-flight at a crash re-runs (not skipped).
+  selected: z.array(nonEmptyString).optional(),


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject empty selected arrays in node:completed.

The branch-selection field should be non-empty when present; allowing selected: [] admits an impossible/ambiguous branch outcome.

Suggested tightening

- selected: z.array(nonEmptyString).optional(), + selected: z.array(nonEmptyString).min(1).optional(),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// The immediate downstream ids a `condition` kept live (its branch selection). Present ONLY for a

// condition's branch outcome — it is the authoritative record checkpoint/resume (1.R) reconstructs

// `selectedTargets` from, so a selected branch that was mid-flight at a crash re-runs (not skipped).

selected: z.array(nonEmptyString).optional(),

// The immediate downstream ids a `condition` kept live (its branch selection). Present ONLY for a

// condition's branch outcome — it is the authoritative record checkpoint/resume (1.R) reconstructs

// `selectedTargets` from, so a selected branch that was mid-flight at a crash re-runs (not skipped).

selected: z.array(nonEmptyString).min(1).optional(),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/shared/src/run-event.ts` around lines 205 - 208, The `selected` field in the `node:completed` schema currently allows empty arrays via `z.array(nonEmptyString).optional()`, which creates an ambiguous branch outcome state that should not be permitted. Modify the schema validation for the `selected` field to ensure that when the array is present, it must contain at least one element. Use the `.min(1)` method on the array validation to enforce that empty arrays are rejected, or alternatively add a `.refine()` constraint that validates the array is non-empty when it is defined.

Verified each finding against current code; fixed the still-valid ones, reverted one that contradicts engine semantics, skipped two with reasons. Fixed: - checkpoint.ts: reduce reconstructCheckpointState cognitive complexity (19→under 15) by extracting per-category appliers (applyRunEvent / applyNodeEvent / applyGateEvent) over a shared accumulator — behavior identical. The gate-resolve arm now collects gate ids first, then deletes (no mutate-while-iterating the Map). - Resumed-run durationMs: the checkpoint now carries the original start epoch (`startedAtMs`, from run:started.timestamp); a rehydrated run measures durationMs from it (seeded in #seedFromCheckpoint), so a terminal reports total wall-clock across pre-/post-resume — not just the post-resume segment. prepareResume removed. - resumeFromCheckpoint: wrap resume()/kick() in try/catch that deletes the run from #runs on a validation throw (unknown_gate / run_not_paused), so a retry isn't wrongly rejected with run_already_active and no broken run is stranded in memory. - human-gate.ts: an abort DURING template resolution now returns cancelled() (a deliberate fatal reason) rather than failed('validation') — checked via ctx.signal.aborted in the catch; +unit test. - run-event.ts: HumanGatePausedEvent — timeoutAction is now refused without timeoutMs (union-level superRefine; a discriminatedUnion member can't self-refine). - engine.ts #settle: disarm gate timers via values()+clear() (no array spread); document #skipReason precedence (branch_not_taken wins over upstream_unreachable); document the ResumeFromCheckpointInput invariant (caller passes the original inputs/executionMode until the checkpoint persists them). - execution-host.ts fireTimers: snapshot the armed set as a named array (keeps the required snapshot — a fired callback may arm/disarm timers — without the inline spread Sonar flags). Reverted / skipped (with reason): - selected .min(1) (REVERTED): an empty `selected` is a VALID outcome — a condition that routes to no branch, which the engine skip-propagates downstream (engine.ts #hasLiveEdge); .min(1) would reject that legitimate node:completed. - onSettled "#runs leak" on resume (SKIP): start() also retains settled runs via a no-op onSettled by design (for run_already_terminal reporting; TTL prune is future scope) — resumeFromCheckpoint is consistent, not divergent. - execution-host for-of "unnecessary array" (addressed, not removed): the snapshot is load-bearing; restructured to a named array rather than dropped. Refs: ADR-0003, ADR-0036 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cemililik · 2026-06-14T21:48:03Z

@coderabbitai review all

coderabbitai · 2026-06-14T21:48:13Z

@cemililik Sure! I'll kick off a full review of all the changes in this PR right away.

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/architecture/shared-core-engine.md (1)
167-179: ⚠️ Potential issue | 🔴 Critical

Remove step_executions from the checkpoint reconstruction description.

Lines 168–169 incorrectly list step_executions rows as an input to checkpoint reconstruction. The code is definitive: reconstructCheckpointState takes only run_events (see packages/core/src/engine/checkpoint.ts:176 and execution-host.ts:308). The header comment in checkpoint.ts explicitly states "There is no checkpoint table — the CheckpointState is derived from the ordered run_events." Rewrite lines 168–169 to name run_events alone as the source, and keep lines 172–173's "pure fold" statement. step_executions is separate run-history metadata, not part of checkpoint reconstruction.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/architecture/shared-core-engine.md` around lines 167 - 179, The
checkpoint reconstruction description incorrectly includes step_executions rows
as input data when in fact only run_events should be listed as the source.
Remove the references to step_executions (status, attempt_number, output_json,
error_json) from lines 168–169 and rewrite that sentence to state that the
Checkpointer reconstructs CheckpointState solely from the ordered, replayable
run_events log. Keep the description of the messages field and preserve the
subsequent explanation of reconstructCheckpointState as a pure fold operation
over the ordered event stream that derives CheckpointState fields. This aligns
the documentation with the actual code implementation where
reconstructCheckpointState takes only the events parameter, not step_executions
data.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/reference/contracts/sse-event-schema.md`:
- Line 76: The `selected?` field documentation for the `node:completed` event at
line 76 in docs/reference/contracts/sse-event-schema.md currently implies it
always contains at least one target id, but a condition can route to no branch
making it an empty array. Update the description of the `selected?` field to
explicitly clarify that it can be an empty array. Apply the same clarification
to the other affected location at lines 122-123 in the same file where
`selected` is documented.

In `@packages/core/src/engine/checkpoint.test.ts`:
- Around line 103-127: The test for the resumed gate scenario in the `'a resumed
gate clears the pending gate + records the decision as the node output'` test
case is missing an assertion to verify that the gate id is correctly moved to
`resolvedGateIds` after the gate is resumed. Add an expect statement after the
existing assertions to verify that state?.resolvedGateIds includes the gate id
'g1' that was paused and then resumed, ensuring the gate tracking is correct for
idempotent re-delivery detection.

In `@packages/core/src/engine/node-handlers/human-gate.test.ts`:
- Line 1: The file human-gate.test.ts has code formatting violations detected by
Prettier. Run prettier --write on this file to automatically fix all formatting
issues according to the project's Prettier configuration.

In `@packages/core/src/index.ts`:
- Around line 93-116: The export statements in packages/core/src/index.ts are
not properly formatted according to prettier standards and are causing the CI
prettier --check to fail. Run prettier formatting on all export blocks in the
file at lines 93-116 (the StartInput, ResumeFromCheckpointInput, and related
engine exports), lines 124-126 (additional exports), and lines 169-170
(checkpoint-related exports) to ensure they all comply with the project's
formatting standards before merging.

---

Outside diff comments:
In `@docs/architecture/shared-core-engine.md`:
- Around line 167-179: The checkpoint reconstruction description incorrectly
includes step_executions rows as input data when in fact only run_events should
be listed as the source. Remove the references to step_executions (status,
attempt_number, output_json, error_json) from lines 168–169 and rewrite that
sentence to state that the Checkpointer reconstructs CheckpointState solely from
the ordered, replayable run_events log. Keep the description of the messages
field and preserve the subsequent explanation of reconstructCheckpointState as a
pure fold operation over the ordered event stream that derives CheckpointState
fields. This aligns the documentation with the actual code implementation where
reconstructCheckpointState takes only the events parameter, not step_executions
data.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf6bba9f-c688-44de-9840-91d2379b3df0

📥 Commits

Reviewing files that changed from the base of the PR and between 0a0019b and 012b2bb.

📒 Files selected for processing (22)

docs/architecture/execution-model.md
docs/architecture/shared-core-engine.md
docs/reference/contracts/sse-event-schema.md
packages/core/src/engine/checkpoint.test.ts
packages/core/src/engine/checkpoint.ts
packages/core/src/engine/engine.test.ts
packages/core/src/engine/engine.ts
packages/core/src/engine/errors.ts
packages/core/src/engine/event-bus.ts
packages/core/src/engine/execution-host.test.ts
packages/core/src/engine/execution-host.ts
packages/core/src/engine/node-executor.ts
packages/core/src/engine/node-handlers/dispatcher.ts
packages/core/src/engine/node-handlers/human-gate.test.ts
packages/core/src/engine/node-handlers/human-gate.ts
packages/core/src/engine/node-handlers/node-handlers.e2e.test.ts
packages/core/src/engine/node-handlers/node-handlers.test.ts
packages/core/src/engine/run-handle.ts
packages/core/src/index.ts
packages/shared/src/constants.ts
packages/shared/src/run-event.test.ts
packages/shared/src/run-event.ts

coderabbitai · 2026-06-14T21:58:57Z

+export type {
+  StartInput,
+  ResumeFromCheckpointInput,
+  WorkflowEngineDeps,
+} from './engine/engine.js';
 export { RunEventBus } from './engine/event-bus.js';
 export type { RunEventBusOptions, RunEventListener, RunEventDraft } from './engine/event-bus.js';
 export type { RunHandle } from './engine/run-handle.js';
 export {
  InMemoryRunStore,
  createInMemoryHost,
+  createInMemoryCheckpointer,
  createAbortController,
+  createManualTimerController,
 } from './engine/execution-host.js';
+// Checkpointer + resume (1.R) — reconstruct a run's state from its persisted event stream (no checkpoint
+// table; ADR-0003). The in-memory reference ships here; the SQLite/cloud one is Phase-2/CLI.
+export { reconstructCheckpointState, CHECKPOINT_SCHEMA_VERSION } from './engine/checkpoint.js';
+export type {
+  Checkpointer,
+  CheckpointState,
+  CheckpointNodeState,
+  CheckpointPendingGate,
+} from './engine/checkpoint.js';


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reformat the updated export surface.

CI is already failing prettier --check here, so the new export blocks need to be formatted before merge.

Also applies to: 124-126, 169-170

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/index.ts` around lines 93 - 116, The export statements in packages/core/src/index.ts are not properly formatted according to prettier standards and are causing the CI prettier --check to fail. Run prettier formatting on all export blocks in the file at lines 93-116 (the StartInput, ResumeFromCheckpointInput, and related engine exports), lines 124-126 (additional exports), and lines 169-170 (checkpoint-related exports) to ensure they all comply with the project's formatting standards before merging.

Source: Pipeline failures

…#22 review) - Run Prettier on the four files the CI format:check flagged (engine.ts, engine.test.ts, index.ts, human-gate.test.ts) — formatting only, no logic change. - sse-event-schema.md: clarify node:completed.selected MAY be an empty array (a condition routing to no branch), matching the reverted .min(1) and the engine's skip-propagation — both the event table and the interface block. - shared-core-engine.md: align the checkpoint-reconstruction description with the implementation — CheckpointState is folded from the ordered run_events log alone (each node's output/error rides node:completed/node:failed); step_executions / messages are denormalized persistence for the run-trace UI, not inputs the fold requires (reconstructCheckpointState takes only events). - checkpoint.test.ts: assert the resumed gate id moves into resolvedGateIds (the idempotent re-delivery guard), not just that pendingGates clears. Refs: ADR-0003, ADR-0036 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-06-14T22:37:34Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Post-merge roadmap + status update now that checkpoint/resume (1.R) and the human gate (1.Q) have merged. - phase-1-engine-and-llm.md: ✅ Done markers on §1.Q and §1.R; top status block records the PR #22 landing and the remaining 1.m4 lane (1.S, 1.AC). - current.md: status narrative + next-workstream pointer advanced to node retry (1.S); last-updated 2026-06-15. - CLAUDE.md: status paragraph + detailed status reflect 1.R/1.Q landed, 1.S next. - deferred-tasks.md: re-point the now-landed-context items — the structuredClone `ctx`-transport obligation moves off 1.R (the checkpoint carries no resolved ctx) to the ctx-threading work; mid-tool-loop resume noted as Phase-2 (1.R resumes at gate boundaries only); the ctx-threading fold-into-1.Q/1.R window noted closed (now its own task). New "Checkpoint/resume + human gate (1.R/1.Q) follow-ups" section captures the three confirmed Phase-2 deferrals (gate-timer re-arm on rehydration, content-hash workflow-snapshot identity guard, cross-process gate-resolve TOCTOU → store-level uniqueness). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cemililik and others added 6 commits June 14, 2026 20:41

sourcery-ai Bot reviewed Jun 14, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

cemililik merged commit 7013d49 into main Jun 14, 2026
7 checks passed

cemililik mentioned this pull request Jun 15, 2026

feat(core): thread workflow context (ctx.*) into node scope + post-merge roadmap #23

Merged

This was referenced Jun 15, 2026

feat(core): node-retry budget above the fallback chain (1.S, ADR-0040) #24

Merged

feat(llm,core,shared,db): 1.AG — media output generation (Phase D, ADR-0045/0046) #37

Merged

		onSettled: () => {
		/* retained like a started run (see start) */

Uh oh!

Conversation

cemililik commented Jun 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1.R — Checkpointer + resume (critical path)

1.Q — Human-gate suspend/resume + timeout

Contracts & docs (one canonical home)

Review trail

Deferred (documented, intentional)

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for resumeFromCheckpoint cross-process gate resume

Sequence diagram for human_gate pause, timeout, and resume with one-shot timer

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

cemililik commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 14, 2026

Quality Gate passed

Uh oh!

Uh oh!

cemililik commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

1.R — `Checkpointer` + resume (critical path)

sourcery-ai Bot commented Jun 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading