feat(core): add workflow restart and crash recovery APIs by omeraplak · Pull Request #1098 · VoltAgent/voltagent

omeraplak · 2026-02-21T19:29:24Z

PR Checklist

Please check if your PR fulfills the following requirements:

The commit message follows our guidelines: https://voltagent.dev/docs/community/contributing/#commit-convention

Bugs / Features

Related issue(s) linked
Tests for the changes have been added
Docs have been added / updated
Changesets have been added https://voltagent.dev/docs/community/contributing/#creating-a-changeset

What is the current behavior?

Workflows support suspend/resume, but there is no workflow-level restart API for interrupted running executions after crashes/restarts. There is also no bulk restart helper for active runs.

What is the new behavior?

Adds restart/crash-recovery APIs and checkpoint persistence:

New workflow APIs

workflow.restart(executionId, options?)
workflow.restartAllActive(options?)
workflowChain.restart(executionId, options?)
workflowChain.restartAllActive(options?)

New registry helpers

WorkflowRegistry.restartWorkflowExecution(workflowId, executionId, options?)
WorkflowRegistry.restartAllActiveWorkflowRuns(options?)

Runtime changes

Persist running checkpoints during execution (step progress, stepData snapshots, workflowState, context, usage, event sequence) in workflow metadata.
Restore checkpoint data during restart to continue deterministically.

Docs

website/docs/workflows/overview.md
website/docs/workflows/suspend-resume.md

Validation

pnpm --filter @voltagent/core test:single src/workflow/core.spec.ts src/workflow/chain.spec.ts src/workflow/suspend-resume.spec.ts
pnpm --filter @voltagent/core typecheck
Smoke test via examples/with-workflow (programmatic):
- suspended execution
- checkpoint present
- workflow.restart(executionId) => completed

fixes (issue)

N/A

Notes for reviewers

This PR includes only plan 02 implementation scope.
docs/workflow-parity-plans/ is intentionally not included.

Summary by cubic

Adds workflow restart and crash-recovery with periodic checkpoints. Checkpoints capture step outputs/status, stepData, usage, workflow state, context, and event sequence for deterministic restarts and bulk recovery.

New Features
- Restart one or all active executions via workflow.restart(...) and workflow.restartAllActive(...); also available on WorkflowChain.
- Registry helpers for cross-workflow recovery: restartWorkflowExecution(...) and restartAllActiveWorkflowRuns({ workflowId? }); returns restarted and failed summaries.
- Persist running checkpoints every N completed steps (checkpointInterval, default 1) or disable via disableCheckpointing; checkpoints include stepData snapshots, output/status, usage, and serialize/rehydrate step errors.
- Exported types: WorkflowRestartAllResult and WorkflowRestartCheckpoint.
Migration
- Restart is supported only for runs in "running" state.
- Prefer idempotent steps; external side effects may have already occurred before a crash.
- Persisted context is restored when tuple-serialized (Map-like entries); invalid shapes are ignored.

^{Written for commit 925c089. Summary will update on new commits.}

Summary by CodeRabbit

New Features
- Added workflow restart APIs to recover interrupted runs from persisted running checkpoints (single-execution restart, restart-all-active per workflow, and registry-driven cross-workflow restarts). Checkpoints persist step progress, shared state, context and usage for deterministic recovery. Added runtime options to control checkpointing.
Documentation
- Added guides and examples for restarting interrupted runs and crash-recovery workflows.
Tests
- New tests covering restart flows, checkpoint restoration, and registry-driven restarts.

changeset-bot · 2026-02-21T19:29:29Z

🦋 Changeset detected

Latest commit: 925c089

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@voltagent/core	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-02-21T19:29:46Z

📝 Walkthrough

Walkthrough

Adds deterministic restart and crash-recovery: new workflow.restart / restartAllActive APIs on Workflow, WorkflowChain, and WorkflowRegistry; persists running checkpoints (stepData, usage, step progress, event sequence, workflow state) and restores them to resume interrupted runs; exports new restart-related types and docs.

Changes

Cohort / File(s)	Summary
Public Types & Exports `packages/core/src/index.ts`, `packages/core/src/workflow/index.ts`, `packages/core/src/workflow/types.ts`	Adds `WorkflowRestartAllResult`, `WorkflowRestartCheckpoint`, `WorkflowStepData`, `WorkflowSerializedStepError`; exposes restart-related types and extends public Workflow interface with `restart` and `restartAllActive`.
Checkpoint Persistence Types `packages/core/src/memory/types.ts`, `packages/core/src/workflow/internal/state.ts`	Augments persisted suspension/checkpoint with `stepData` and `usage`; ensures `usage` is preserved in MutableWorkflowState transformations.
Core Restart Logic `packages/core/src/workflow/core.ts`	Implements checkpoint serialization/deserialization, persistRunningCheckpoint, resume restoration of usage/stepData/workflowState, and public `restart` / `restartAllActive` implementations; augments suspension/completion metadata and exports restart helpers.
Facade & Registry `packages/core/src/workflow/chain.ts`, `packages/core/src/workflow/registry.ts`	Adds `restart` / `restartAllActive` on WorkflowChain; adds `restartWorkflowExecution` and `restartAllActiveWorkflowRuns` on WorkflowRegistry with per-workflow aggregation and error reporting.
Internal Types Narrowing `packages/core/src/workflow/steps/types.ts`	InternalWorkflow now omits `restart` and `restartAllActive` from the overridable Workflow surface.
Tests `packages/core/src/workflow/core.spec.ts`, `packages/core/src/workflow/chain.spec.ts`, `packages/core/src/workflow/suspend-resume.spec.ts`	Adds suites validating restart from checkpoint, state/context/usage preservation, step re-execution, error rehydration, non-running restart failure, registry bulk restart behavior, and chain-level restart APIs.
Docs & Changelog `website/docs/workflows/overview.md`, `website/docs/workflows/suspend-resume.md`, `.changeset/old-geese-smell.md`	Documents restart/crash-recovery usage and examples; adds changelog entry and usage guidance.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant REG as WorkflowRegistry
    participant WF as Workflow
    participant MEM as MemoryStore

    Client->>REG: restartAllActiveWorkflowRuns()
    REG->>MEM: list active executions / metadata
    MEM-->>REG: checkpoints + workflow states
    REG->>WF: restart(executionId, options)
    WF->>MEM: get persisted checkpoint & state
    MEM-->>WF: return checkpoint (stepData, usage, indices)
    WF->>WF: restore state, apply usage & stepData
    WF->>WF: re-execute remaining steps
    WF->>MEM: persistRunningCheckpoint() (progress/update)
    WF-->>Client: WorkflowExecutionResult

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

fix(core): persist workflow context mutations across steps #1078: Modifies workflow state persistence and context normalization—overlaps checkpoint/context restore logic added here.
fix(workflow): persist execution metadata and fix SQL metadata filters #1085: Changes workflow state persistence and related types—intersects with the new persisted stepData/usage fields and registry behavior.
fix(workflow): persist user context and harden resume state handling #1048: Adjusts suspend/resume persistence paths and tests that this PR extends with checkpointing and restart tests.

Suggested reviewers

lzj960515

Poem

🐇 I stored each hop in a silver thread,
When runs fell flat, I nudged them ahead,
Step by step, I stitch and mend,
Bounce back to finish, race to the end,
Hooray — checkpoints saved, and off we tread!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(core): add workflow restart and crash recovery APIs' is specific, concise, and directly reflects the main change—adding new workflow restart and crash-recovery APIs to the core package.
Description check	✅ Passed	The PR description comprehensively covers all template sections: checklist completed, new behavior clearly documented with API examples, testing and documentation additions confirmed, and changesets added. The description is well-structured and informative.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/workflow-restart-crash-recovery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

1 issue found across 15 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/src/workflow/core.ts">

<violation number="1" location="packages/core/src/workflow/core.ts:1155">
P2: `Error` objects do not survive JSON serialization — `JSON.stringify(new Error('x'))` produces `'{}'`. The checkpoint's `error` field should be serialized to a plain object (e.g., `{ message, stack }`) or `null` so that error context is preserved across restarts and the type contract is maintained after deserialization.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cloudflare-workers-and-pages · 2026-02-21T19:35:23Z

Deploying voltagent with Cloudflare Pages

Latest commit:	`26bda80`
Status:	✅ Deploy successful!
Preview URL:	https://9ccdbb05.voltagent.pages.dev
Branch Preview URL:	https://feat-workflow-restart-crash.voltagent.pages.dev

View logs

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (10)

.changeset/old-geese-smell.md (1)
5-18: Consider surfacing newly exported public types in the changelog prose.

The AI summary notes that WorkflowRestartAllResult and WorkflowRestartCheckpoint are newly exported from the core package surface. Changelog consumers (library users) typically benefit from knowing about new types so they can adopt them in their own type annotations.
✏️ Suggested addition
 The workflow runtime now persists running checkpoints during execution, including step progress, shared workflow state, context, and usage snapshots, so interrupted runs in `running` state can be recovered deterministically.
+
+New public types exported: `WorkflowRestartAllResult`, `WorkflowRestartCheckpoint`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.changeset/old-geese-smell.md around lines 5 - 18, The changelog currently
lists new restart/crash-recovery APIs but omits mentioning newly exported public
types; update the changelog prose to explicitly surface the new exported types
WorkflowRestartAllResult and WorkflowRestartCheckpoint (and any other new public
exports from the core package) so users can adopt them in their type
annotations—add a short sentence in the changelog paragraph that names these
types and notes they are exported from the core package and available for
consumer use.
website/docs/workflows/overview.md (1)
561-576: Consider documenting that restart() is also available directly on WorkflowChain.

The snippet shows workflow.toWorkflow().restart(...), but WorkflowChain also exposes .restart() directly (as exercised in chain.spec.ts line 112). Mentioning both avoids confusion for users who work with chains.
📝 Suggested documentation note
+> **Note:** `restart()` and `restartAllActive()` are also available directly on a `WorkflowChain` returned by `createWorkflowChain`, not only on the result of `.toWorkflow()`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/docs/workflows/overview.md` around lines 561 - 576, Add a short note
to this section explaining that restart() can be called directly on a
WorkflowChain as well as on a Workflow instance (so users don't think they must
call workflow.toWorkflow().restart()); mention the equivalent methods
(WorkflowChain.restart and workflow.toWorkflow().restart) and optionally include
a one-line example or pointer to
WorkflowRegistry.getInstance().restartAllActiveWorkflowRuns() for cross-workflow
recovery to make the available options explicit.
packages/core/src/memory/types.ts (1)
151-159: error?: unknown | null is redundant — null is already assignable to unknown.

unknown | null is semantically identical to unknown. The explicit | null doesn't add a type constraint and may mislead readers into thinking null and undefined are treated differently here.
🛠️ Suggested fix
-      error?: unknown | null;
+      error?: unknown;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/memory/types.ts` around lines 151 - 159, The error field in
the stepData type is declared as "error?: unknown | null", which is redundant
because null is already assignable to unknown; change the declaration on the
stepData entry so the error property is typed simply as "error?: unknown" (or
remove the explicit "| null") in the Record value type to avoid misleading
readers; update the definition around the stepData property and any related
types that reference this exact shape to keep consistency.
packages/core/src/workflow/core.spec.ts (1)
514-536: Extract __voltagent_restart_checkpoint into a named constant.

The magic string "__voltagent_restart_checkpoint" is spread across multiple test fixtures (lines 515 and 677). If the key name ever changes in the implementation, these fixtures will silently diverge. Extract it to a shared constant so both the implementation and the tests stay in sync.
#!/bin/bash
# Check if this key is already exported as a constant anywhere in the codebase
rg -rn '__voltagent_restart_checkpoint' --type ts
Also applies to: 677-699
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/core.spec.ts` around lines 514 - 536, Replace the
hard-coded magic string "__voltagent_restart_checkpoint" in the test fixtures
with a single exported constant (e.g., VOLTAGENT_RESTART_CHECKPOINT_KEY) defined
in the implementation module that owns the checkpoint logic; export that
constant and import it into packages/core/src/workflow/core.spec.ts, then update
all occurrences in the test (including the metadata keys at the shown blocks) to
use the imported constant so tests and implementation remain in sync if the key
changes.
packages/core/src/workflow/chain.spec.ts (1)
78-116: Consider adding a chain.restartAllActive() test.

The new describe.sequential("workflow.restart") only tests chain.restart(executionId). Per the PR description, workflowChain.restartAllActive(options?) is a new public API, but its direct invocation on a WorkflowChain instance isn't covered here — core.spec.ts tests restartAllActive() on a Workflow from createWorkflow, not on a WorkflowChain.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/chain.spec.ts` around lines 78 - 116, Add a test
in this describe.sequential block that exercises the new WorkflowChain API by
calling restartAllActive on the chain instance (e.g.,
workflow.restartAllActive(options?)) instead of only testing
restart(executionId); set up the same Memory/InMemoryStorageAdapter, register
the chain via WorkflowRegistry.registerWorkflow(workflow.toWorkflow()), seed a
running execution state (workflow id "chain-restart" like the existing test),
call workflow.restartAllActive() and assert the returned execution(s) have
status "completed" and the expected result { value: 12 } to mirror the existing
restart test but via restartAllActive on the WorkflowChain.
packages/core/src/workflow/registry.ts (1)
261-265: Return type allows null but workflow.restart() never returns null.

restartWorkflowExecution is typed as Promise<WorkflowExecutionResult<any, any> | null>, but it delegates to workflow.restart(...) which returns Promise<WorkflowExecutionResult<...>> (non-nullable) or throws. The | null is dead code and may mislead callers into adding unnecessary null checks.
Proposed fix
   public async restartWorkflowExecution(
     workflowId: string,
     executionId: string,
     options?: WorkflowRunOptions,
-  ): Promise<WorkflowExecutionResult<any, any> | null> {
+  ): Promise<WorkflowExecutionResult<any, any>> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/registry.ts` around lines 261 - 265, The return
type of restartWorkflowExecution incorrectly includes "| null" even though it
simply awaits and returns workflow.restart(...) which never returns null; update
the signature of restartWorkflowExecution to return
Promise<WorkflowExecutionResult<any, any>> (remove the nullable union) and
ensure any callers relying on a possible null are updated accordingly; locate
the method named restartWorkflowExecution and the delegation to
workflow.restart(...) to make this change.
packages/core/src/workflow/chain.ts (1)
974-1002: restart / restartAllActive recreate the workflow on each call — same caveat as run.

Both methods follow the existing pattern of calling createWorkflow() per invocation. This means each call gets a fresh in-memory store when no persistent memory is configured — so restart() won't find any prior state and will throw. This is the same limitation as run()/stream(), but it's worth noting that restart is inherently stateful and more sensitive to this.

Users who call chain.restart() instead of chain.toWorkflow().restart() with no persistent memory will always get "Workflow state not found". Consider adding a brief JSDoc note (like the toWorkflow() guidance for suspend/resume) warning that restart requires persistent memory or a registered workflow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/chain.ts` around lines 974 - 1002, Add a JSDoc
note to the restart and restartAllActive methods on the Chain class (the async
restart(...) and async restartAllActive(...) functions) warning that these
methods recreate a workflow via createWorkflow(...) on each call and therefore
require a persistent memory or a registered workflow to find prior state
(otherwise they will get "Workflow state not found"); mirror the existing
guidance used for toWorkflow() (suggest calling chain.toWorkflow().restart(...)
when using ephemeral in-memory stores) so users know to configure persistent
memory or register the workflow before invoking restart/restartAllActive.
packages/core/src/workflow/core.ts (2)
2222-2294: restartExecution implementation is thorough and well-guarded.

Good validation chain: checks state existence, workflow ID ownership, and "running" status before proceeding. The input fallback to workflow-start event is a nice compatibility touch for adapters that don't store input directly.

One subtlety: the persistedState.context cast on line 2264 assumes it was stored as Array<[string | symbol, unknown]> (matching the serialization at line 989). This works for the current in-memory path, but external storage adapters that flatten or transform entries could break the new Map(...) reconstruction. Worth a defensive check.
Proposed defensive guard
     const persistedContext = persistedState.context
-      ? new Map(persistedState.context as Array<[string | symbol, unknown]>)
+      ? Array.isArray(persistedState.context)
+        ? new Map(persistedState.context as Array<[string | symbol, unknown]>)
+        : undefined
       : undefined;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/core.ts` around lines 2222 - 2294, The
persistedState.context reconstruction in restartExecution assumes
persistedState.context is an Array<[string|symbol, unknown]> and does new
Map(persistedState.context), which can throw or produce wrong maps for external
adapters that serialize/flatten context; guard this by validating
persistedState.context is an array of 2-length tuples (e.g., Array.isArray and
each item is an array of length 2 with a string/symbol key) before calling new
Map, otherwise set persistedContext to undefined (or reconstruct safely by
iterating and selectively adding valid entries); update the code around
persistedContext/new Map in restartExecution and ensure downstream use of
restartOptions.context tolerates undefined.
1160-1192: Checkpoint I/O on every step completion — consider the cost for long workflows.

persistRunningCheckpoint performs a read (mergeExecutionMetadata → getWorkflowState) plus a write (updateWorkflowState) after every step. For workflows with many small/fast steps or latency-sensitive storage backends, this doubles the memory-layer I/O per step.

This is a reasonable durability-vs-performance tradeoff for crash recovery, but you might want to offer a way to opt out or throttle (e.g., a checkpointInterval config on the workflow or per-execution options) for users who prefer speed over granular recovery.

Also applies to: 1875-1883
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/core.ts` around lines 1160 - 1192,
persistRunningCheckpoint currently does a metadata read via
mergeExecutionMetadata (which calls getWorkflowState) plus
executionMemory.updateWorkflowState on every step, causing double I/O per step;
add a configurable checkpointInterval or disableCheckpointing flag on the
workflow/execution options and use it in persistRunningCheckpoint (and the
similar checkpoint call elsewhere) to skip the merge+update on steps that are
not multiples of the interval (or when disabled), and when skipping avoid
calling mergeExecutionMetadata entirely; ensure you reference and read the new
option where persistRunningCheckpoint is invoked and before calling
mergeExecutionMetadata/updateWorkflowState so the behavior is gated by the new
config.
packages/core/src/workflow/types.ts (1)
668-683: restartAllActive workflowId option is redundant on a single Workflow instance.

The Workflow interface already carries its own id. Passing { workflowId?: string } to restartAllActive at this level is only needed for the registry aggregation path. On the Workflow object itself the parameter is misleading — callers might pass a different workflow's ID and get an empty result without any error. Consider narrowing the public signature here to (options?: Omit<…, 'workflowId'>) and only expose the workflowId filter on the registry API, or at least document the behaviour when a mismatched ID is supplied.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/types.ts` around lines 668 - 683, The
restartAllActive signature on the Workflow interface exposes a redundant
workflowId filter; change the method on Workflow to remove the workflowId option
so it becomes restartAllActive(options?: {}) =>
Promise<WorkflowRestartAllResult> (or simply restartAllActive() =>
Promise<WorkflowRestartAllResult>), update the Workflow interface declaration in
types.ts accordingly, update any implementations of Workflow.restartAllActive to
stop expecting options.workflowId, and keep the original { workflowId?: string }
filter only on the registry-level API; also update the JSDoc comment to document
that this per-Workflow call operates on that specific Workflow's executions and
that cross-workflow filtering lives on the registry API.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/workflow/core.ts`:
- Around line 1147-1158: serializeStepDataSnapshot currently copies
stepData.error (Error | null) directly, which will be lost when persisted via
JSON; change serializeStepDataSnapshot to convert stepData.error into a
serializable shape (e.g., null or { message: string; stack?: string; name?:
string }) before returning, update the WorkflowStepData error type (or introduce
a separate serialized checkpoint type) to accept this shape, and ensure you
read/rehydrate that shape on restore from executionContext.stepData so the
message/stack are preserved across JSON round-trips.

In `@packages/core/src/workflow/registry.ts`:
- Around line 298-309: The catch block in the loop over this.workflows
incorrectly populates failed entries with executionId: workflowId; update the
failure shape to distinguish workflow-level failures by adding an optional
workflowId (or a boolean flag like isWorkflowFailure) to
WorkflowRestartAllResult.failed and push { workflowId, error: ...,
isWorkflowFailure: true } (instead of using executionId), or alternatively push
a distinct structure for workflow-level errors; update usages that consume
aggregate.failed to handle the new optional field/flag; touch the loop around
registeredWorkflow.workflow.restartAllActive and the
WorkflowRestartAllResult.failed type to implement this change.

---

Nitpick comments:
In @.changeset/old-geese-smell.md:
- Around line 5-18: The changelog currently lists new restart/crash-recovery
APIs but omits mentioning newly exported public types; update the changelog
prose to explicitly surface the new exported types WorkflowRestartAllResult and
WorkflowRestartCheckpoint (and any other new public exports from the core
package) so users can adopt them in their type annotations—add a short sentence
in the changelog paragraph that names these types and notes they are exported
from the core package and available for consumer use.

In `@packages/core/src/memory/types.ts`:
- Around line 151-159: The error field in the stepData type is declared as
"error?: unknown | null", which is redundant because null is already assignable
to unknown; change the declaration on the stepData entry so the error property
is typed simply as "error?: unknown" (or remove the explicit "| null") in the
Record value type to avoid misleading readers; update the definition around the
stepData property and any related types that reference this exact shape to keep
consistency.

In `@packages/core/src/workflow/chain.spec.ts`:
- Around line 78-116: Add a test in this describe.sequential block that
exercises the new WorkflowChain API by calling restartAllActive on the chain
instance (e.g., workflow.restartAllActive(options?)) instead of only testing
restart(executionId); set up the same Memory/InMemoryStorageAdapter, register
the chain via WorkflowRegistry.registerWorkflow(workflow.toWorkflow()), seed a
running execution state (workflow id "chain-restart" like the existing test),
call workflow.restartAllActive() and assert the returned execution(s) have
status "completed" and the expected result { value: 12 } to mirror the existing
restart test but via restartAllActive on the WorkflowChain.

In `@packages/core/src/workflow/chain.ts`:
- Around line 974-1002: Add a JSDoc note to the restart and restartAllActive
methods on the Chain class (the async restart(...) and async
restartAllActive(...) functions) warning that these methods recreate a workflow
via createWorkflow(...) on each call and therefore require a persistent memory
or a registered workflow to find prior state (otherwise they will get "Workflow
state not found"); mirror the existing guidance used for toWorkflow() (suggest
calling chain.toWorkflow().restart(...) when using ephemeral in-memory stores)
so users know to configure persistent memory or register the workflow before
invoking restart/restartAllActive.

In `@packages/core/src/workflow/core.spec.ts`:
- Around line 514-536: Replace the hard-coded magic string
"__voltagent_restart_checkpoint" in the test fixtures with a single exported
constant (e.g., VOLTAGENT_RESTART_CHECKPOINT_KEY) defined in the implementation
module that owns the checkpoint logic; export that constant and import it into
packages/core/src/workflow/core.spec.ts, then update all occurrences in the test
(including the metadata keys at the shown blocks) to use the imported constant
so tests and implementation remain in sync if the key changes.

In `@packages/core/src/workflow/core.ts`:
- Around line 2222-2294: The persistedState.context reconstruction in
restartExecution assumes persistedState.context is an Array<[string|symbol,
unknown]> and does new Map(persistedState.context), which can throw or produce
wrong maps for external adapters that serialize/flatten context; guard this by
validating persistedState.context is an array of 2-length tuples (e.g.,
Array.isArray and each item is an array of length 2 with a string/symbol key)
before calling new Map, otherwise set persistedContext to undefined (or
reconstruct safely by iterating and selectively adding valid entries); update
the code around persistedContext/new Map in restartExecution and ensure
downstream use of restartOptions.context tolerates undefined.
- Around line 1160-1192: persistRunningCheckpoint currently does a metadata read
via mergeExecutionMetadata (which calls getWorkflowState) plus
executionMemory.updateWorkflowState on every step, causing double I/O per step;
add a configurable checkpointInterval or disableCheckpointing flag on the
workflow/execution options and use it in persistRunningCheckpoint (and the
similar checkpoint call elsewhere) to skip the merge+update on steps that are
not multiples of the interval (or when disabled), and when skipping avoid
calling mergeExecutionMetadata entirely; ensure you reference and read the new
option where persistRunningCheckpoint is invoked and before calling
mergeExecutionMetadata/updateWorkflowState so the behavior is gated by the new
config.

In `@packages/core/src/workflow/registry.ts`:
- Around line 261-265: The return type of restartWorkflowExecution incorrectly
includes "| null" even though it simply awaits and returns workflow.restart(...)
which never returns null; update the signature of restartWorkflowExecution to
return Promise<WorkflowExecutionResult<any, any>> (remove the nullable union)
and ensure any callers relying on a possible null are updated accordingly;
locate the method named restartWorkflowExecution and the delegation to
workflow.restart(...) to make this change.

In `@packages/core/src/workflow/types.ts`:
- Around line 668-683: The restartAllActive signature on the Workflow interface
exposes a redundant workflowId filter; change the method on Workflow to remove
the workflowId option so it becomes restartAllActive(options?: {}) =>
Promise<WorkflowRestartAllResult> (or simply restartAllActive() =>
Promise<WorkflowRestartAllResult>), update the Workflow interface declaration in
types.ts accordingly, update any implementations of Workflow.restartAllActive to
stop expecting options.workflowId, and keep the original { workflowId?: string }
filter only on the registry-level API; also update the JSDoc comment to document
that this per-Workflow call operates on that specific Workflow's executions and
that cross-workflow filtering lives on the registry API.

In `@website/docs/workflows/overview.md`:
- Around line 561-576: Add a short note to this section explaining that
restart() can be called directly on a WorkflowChain as well as on a Workflow
instance (so users don't think they must call workflow.toWorkflow().restart());
mention the equivalent methods (WorkflowChain.restart and
workflow.toWorkflow().restart) and optionally include a one-line example or
pointer to WorkflowRegistry.getInstance().restartAllActiveWorkflowRuns() for
cross-workflow recovery to make the available options explicit.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/workflow/core.spec.ts`:
- Around line 819-826: The test is passing a plain object for the
WorkflowStateEntry.context (currently `context: { role: "admin" } as any`) which
violates the expected type Array<[string | symbol, unknown]>; update the call to
memory.setWorkflowState in the spec so the context is provided as the correct
array-of-tuples form (e.g., [['role', 'admin']]) or use a small typed helper
that returns Array<[string | symbol, unknown]> and reference that helper in the
test; ensure the change is applied where WorkflowStateEntry is constructed so
types align without using `as any`.

In `@packages/core/src/workflow/core.ts`:
- Around line 2392-2426: The restartOptions currently sets userId/conversationId
from persistedState but drops persistedState.metadata when there is no
checkpoint; update restartOptions to include merged metadata by taking
persistedState.metadata and shallow-merging options?.metadata (so caller
overrides take precedence), e.g. compute mergedMetadata = {
...(persistedState.metadata ?? {}), ...(options?.metadata ?? {}) } and assign
metadata: Object.keys(mergedMetadata).length ? mergedMetadata : undefined on
restartOptions; before assigning, remove the restart/checkpoint marker key from
mergedMetadata (the metadata key used to indicate restart state) so we don't
carry restart-specific data forward; adjust code around persistedState,
metadataUserId, metadataConversationId, and restartOptions accordingly.

coderabbitai · 2026-02-21T20:17:24Z

+    await memory.setWorkflowState(executionId, {
+      id: executionId,
+      workflowId: workflow.id,
+      workflowName: workflow.name,
+      status: "running",
+      input: { value: 9 },
+      context: { role: "admin" } as any,
+      createdAt: now,


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's look at the specific code in question sed -n '815,830p' packages/core/src/workflow/core.spec.ts

Repository: VoltAgent/voltagent

Length of output: 655

🏁 Script executed:

#!/bin/bash # Inspect persisted workflow context type rg -n "context" packages/core/src/memory/types.ts -C3

Repository: VoltAgent/voltagent

Length of output: 2034

🏁 Script executed:

#!/bin/bash # Let's also look at the WorkflowState type definition rg -n "WorkflowState\|interface.*State\|type.*State" packages/core/src/memory/types.ts -A5

Repository: VoltAgent/voltagent

Length of output: 45

🏁 Script executed:

#!/bin/bash # Get exact lines 819-826 sed -n '819,826p' packages/core/src/workflow/core.spec.ts

Repository: VoltAgent/voltagent

Length of output: 455

🏁 Script executed:

#!/bin/bash # Search for WorkflowState type definition rg -n "interface WorkflowState\|type WorkflowState" packages/core/src -A10

Repository: VoltAgent/voltagent

Length of output: 45

🏁 Script executed:

#!/bin/bash # Look for test utilities or helpers for creating workflow state rg -n "setWorkflowState|WorkflowState" packages/core/src/workflow/core.spec.ts -B2 -A2 | head -40

Repository: VoltAgent/voltagent

Length of output: 1531

🏁 Script executed:

#!/bin/bash # Get exact lines around 819-826 using cat cat -n packages/core/src/workflow/core.spec.ts | sed -n '815,830p'

Repository: VoltAgent/voltagent

Length of output: 767

🏁 Script executed:

#!/bin/bash # Search for the complete WorkflowState definition rg -n "WorkflowState" packages/core/src/memory/types.ts

Repository: VoltAgent/voltagent

Length of output: 584

🏁 Script executed:

#!/bin/bash # Look at the full test function starting around line 810 cat -n packages/core/src/workflow/core.spec.ts | sed -n '808,835p'

Repository: VoltAgent/voltagent

Length of output: 1232

🏁 Script executed:

#!/bin/bash # Search for the actual code with context and as any rg -n "context.*as any\|role.*admin" packages/core/src/workflow/core.spec.ts -B5 -A5

Repository: VoltAgent/voltagent

Length of output: 45

🏁 Script executed:

#!/bin/bash # Also search more broadly for the pattern rg -n "as any" packages/core/src/workflow/core.spec.ts

Repository: VoltAgent/voltagent

Length of output: 310

🏁 Script executed:

#!/bin/bash # Get context around the actual line 1026 cat -n packages/core/src/workflow/core.spec.ts | sed -n '1020,1035p'

Repository: VoltAgent/voltagent

Length of output: 644

🏁 Script executed:

#!/bin/bash # Get the complete WorkflowStateEntry interface definition rg -n "export interface WorkflowStateEntry" packages/core/src/memory/types.ts -A30

Repository: VoltAgent/voltagent

Length of output: 1212

Correct line number and use proper typing for context parameter. Line 1026 contains context: { role: "admin" } as any, but WorkflowStateEntry expects context to be Array<[string | symbol, unknown]>. The as any cast masks this type mismatch. Replace it with either the correct array-of-tuples format or a typed helper to maintain type safety per coding guidelines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/core.spec.ts` around lines 819 - 826, The test is passing a plain object for the WorkflowStateEntry.context (currently `context: { role: "admin" } as any`) which violates the expected type Array<[string | symbol, unknown]>; update the call to memory.setWorkflowState in the spec so the context is provided as the correct array-of-tuples form (e.g., [['role', 'admin']]) or use a small typed helper that returns Array<[string | symbol, unknown]> and reference that helper in the test; ensure the change is applied where WorkflowStateEntry is constructed so types align without using `as any`.

coderabbitai · 2026-02-21T20:17:24Z

+    const metadataUserId =
+      typeof persistedState.metadata?.userId === "string"
+        ? (persistedState.metadata.userId as string)
+        : undefined;
+    const metadataConversationId =
+      typeof persistedState.metadata?.conversationId === "string"
+        ? (persistedState.metadata.conversationId as string)
+        : undefined;
+    const persistedContext = toValidContextMap(persistedState.context);
+    const effectiveWorkflowState =
+      options?.workflowState ?? checkpoint?.workflowState ?? persistedState.workflowState ?? {};
+
+    const restartOptions: WorkflowRunOptions = {
+      ...options,
+      executionId,
+      userId: options?.userId ?? persistedState.userId ?? metadataUserId,
+      conversationId:
+        options?.conversationId ?? persistedState.conversationId ?? metadataConversationId,
+      context: options?.context ?? persistedContext,
+      workflowState: effectiveWorkflowState,
+      resumeFrom: checkpoint
+        ? {
+            executionId,
+            resumeStepIndex: checkpoint.resumeStepIndex,
+            lastEventSequence: checkpoint.eventSequence,
+            checkpoint: {
+              stepExecutionState: checkpoint.stepExecutionState,
+              completedStepsData: checkpoint.completedStepsData,
+              workflowState: checkpoint.workflowState ?? effectiveWorkflowState,
+              stepData: checkpoint.stepData,
+              usage: checkpoint.usage,
+            },
+          }
+        : undefined,
+    };


⚠️ Potential issue | 🟡 Minor

Preserve persisted metadata when restarting without a checkpoint.
Lines 2404-2412 build restart options without carrying stored metadata if no checkpoint exists, which can drop tenant/custom metadata for runs that crash before the first checkpoint (or when checkpointing is disabled). Consider carrying forward persisted metadata (minus the restart checkpoint key) unless the caller overrides it.

Suggested fix

+ const persistedMetadata = isObjectRecord(persistedState.metadata) + ? (persistedState.metadata as Record<string, unknown>) + : undefined; + const restartMetadata = + options?.metadata ?? + (persistedMetadata + ? Object.fromEntries( + Object.entries(persistedMetadata).filter( + ([key]) => key !== VOLTAGENT_RESTART_CHECKPOINT_KEY, + ), + ) + : undefined); + const restartOptions: WorkflowRunOptions = { ...options, executionId, userId: options?.userId ?? persistedState.userId ?? metadataUserId, conversationId: options?.conversationId ?? persistedState.conversationId ?? metadataConversationId, context: options?.context ?? persistedContext, workflowState: effectiveWorkflowState, + metadata: restartMetadata, resumeFrom: checkpoint ? { executionId, resumeStepIndex: checkpoint.resumeStepIndex,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/core.ts` around lines 2392 - 2426, The restartOptions currently sets userId/conversationId from persistedState but drops persistedState.metadata when there is no checkpoint; update restartOptions to include merged metadata by taking persistedState.metadata and shallow-merging options?.metadata (so caller overrides take precedence), e.g. compute mergedMetadata = { ...(persistedState.metadata ?? {}), ...(options?.metadata ?? {}) } and assign metadata: Object.keys(mergedMetadata).length ? mergedMetadata : undefined on restartOptions; before assigning, remove the restart/checkpoint marker key from mergedMetadata (the metadata key used to indicate restart state) so we don't carry restart-specific data forward; adjust code around persistedState, metadataUserId, metadataConversationId, and restartOptions accordingly.

coderabbitai

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/core/src/workflow/core.spec.ts`:
- Around line 1020-1028: The test currently uses a casted loose object for
persisted context when calling memory.setWorkflowState (executionId, {...,
context: { role: "admin" } as any,...}), which bypasses the expected tuple/array
format and weakens type safety; update the context value to match the project's
persisted-context type (the tuple/array shape the system expects) instead of
using "as any"—locate the set in memory.setWorkflowState and replace the cast
with a properly typed context value conforming to the persisted tuple-array
format so the test enforces real types for executionId/context handling.

In `@packages/core/src/workflow/core.ts`:
- Around line 2420-2464: The restart flow drops persisted metadata; update the
creation of restartOptions so it includes metadata merged from
persistedState.metadata (excluding any checkpoint key) unless options.metadata
is provided; specifically, when building the WorkflowRunOptions object
(restartOptions) merge options?.metadata ?? persistedState.metadata
(sanitizing/removing any checkpoint field) so tenant/custom metadata is
preserved across restarts while still allowing the caller to override metadata.

feat(core): add workflow restart and crash recovery APIs

4a1bd64

This comment has been minimized.

Sign in to view

cubic-dev-ai bot reviewed Feb 21, 2026

View reviewed changes

Comment thread packages/core/src/workflow/core.ts Outdated

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

Comment thread packages/core/src/workflow/core.ts Outdated

Comment thread packages/core/src/workflow/registry.ts

omeraplak added 2 commits February 21, 2026 12:07

fix(workflow): address restart and checkpoint review comments

26bda80

chore: resolve main merge conflicts in workflow modules

925c089

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

omeraplak merged commit b610ec6 into main Feb 21, 2026
20 of 22 checks passed

omeraplak deleted the feat/workflow-restart-crash-recovery branch February 21, 2026 20:22

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

voltagent-bot mentioned this pull request Feb 21, 2026

ci(changesets): version packages #1093

Merged

coderabbitai bot mentioned this pull request Feb 22, 2026

feat(core): add workflow time-travel deterministic replay APIs #1099

Merged

5 tasks

Uh oh!

Conversation

omeraplak commented Feb 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Bugs / Features

What is the current behavior?

What is the new behavior?

New workflow APIs

New registry helpers

Runtime changes

Docs

Validation

Notes for reviewers

Summary by cubic

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

This comment has been minimized.

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying voltagent with Cloudflare Pages

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

omeraplak commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Feb 21, 2026 •

edited

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Feb 21, 2026 •

edited

Loading