feat(core): add workflow restart and crash recovery APIs#1098
feat(core): add workflow restart and crash recovery APIs#1098
Conversation
🦋 Changeset detectedLatest commit: 925c089 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This comment has been minimized.
This comment has been minimized.
📝 WalkthroughWalkthroughAdds deterministic restart and crash-recovery: new workflow.restart / restartAllActive APIs on Workflow, WorkflowChain, and WorkflowRegistry; persists running checkpoints (stepData, usage, step progress, event sequence, workflow state) and restores them to resume interrupted runs; exports new restart-related types and docs. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant REG as WorkflowRegistry
participant WF as Workflow
participant MEM as MemoryStore
Client->>REG: restartAllActiveWorkflowRuns()
REG->>MEM: list active executions / metadata
MEM-->>REG: checkpoints + workflow states
REG->>WF: restart(executionId, options)
WF->>MEM: get persisted checkpoint & state
MEM-->>WF: return checkpoint (stepData, usage, indices)
WF->>WF: restore state, apply usage & stepData
WF->>WF: re-execute remaining steps
WF->>MEM: persistRunningCheckpoint() (progress/update)
WF-->>Client: WorkflowExecutionResult
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
1 issue found across 15 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/src/workflow/core.ts">
<violation number="1" location="packages/core/src/workflow/core.ts:1155">
P2: `Error` objects do not survive JSON serialization — `JSON.stringify(new Error('x'))` produces `'{}'`. The checkpoint's `error` field should be serialized to a plain object (e.g., `{ message, stack }`) or `null` so that error context is preserved across restarts and the type contract is maintained after deserialization.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Deploying voltagent with
|
| Latest commit: |
26bda80
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://9ccdbb05.voltagent.pages.dev |
| Branch Preview URL: | https://feat-workflow-restart-crash.voltagent.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (10)
.changeset/old-geese-smell.md (1)
5-18: Consider surfacing newly exported public types in the changelog prose.The AI summary notes that
WorkflowRestartAllResultandWorkflowRestartCheckpointare newly exported from the core package surface. Changelog consumers (library users) typically benefit from knowing about new types so they can adopt them in their own type annotations.✏️ Suggested addition
The workflow runtime now persists running checkpoints during execution, including step progress, shared workflow state, context, and usage snapshots, so interrupted runs in `running` state can be recovered deterministically. + +New public types exported: `WorkflowRestartAllResult`, `WorkflowRestartCheckpoint`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.changeset/old-geese-smell.md around lines 5 - 18, The changelog currently lists new restart/crash-recovery APIs but omits mentioning newly exported public types; update the changelog prose to explicitly surface the new exported types WorkflowRestartAllResult and WorkflowRestartCheckpoint (and any other new public exports from the core package) so users can adopt them in their type annotations—add a short sentence in the changelog paragraph that names these types and notes they are exported from the core package and available for consumer use.website/docs/workflows/overview.md (1)
561-576: Consider documenting thatrestart()is also available directly onWorkflowChain.The snippet shows
workflow.toWorkflow().restart(...), butWorkflowChainalso exposes.restart()directly (as exercised inchain.spec.tsline 112). Mentioning both avoids confusion for users who work with chains.📝 Suggested documentation note
+> **Note:** `restart()` and `restartAllActive()` are also available directly on a `WorkflowChain` returned by `createWorkflowChain`, not only on the result of `.toWorkflow()`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@website/docs/workflows/overview.md` around lines 561 - 576, Add a short note to this section explaining that restart() can be called directly on a WorkflowChain as well as on a Workflow instance (so users don't think they must call workflow.toWorkflow().restart()); mention the equivalent methods (WorkflowChain.restart and workflow.toWorkflow().restart) and optionally include a one-line example or pointer to WorkflowRegistry.getInstance().restartAllActiveWorkflowRuns() for cross-workflow recovery to make the available options explicit.packages/core/src/memory/types.ts (1)
151-159:error?: unknown | nullis redundant —nullis already assignable tounknown.
unknown | nullis semantically identical tounknown. The explicit| nulldoesn't add a type constraint and may mislead readers into thinkingnullandundefinedare treated differently here.🛠️ Suggested fix
- error?: unknown | null; + error?: unknown;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/memory/types.ts` around lines 151 - 159, The error field in the stepData type is declared as "error?: unknown | null", which is redundant because null is already assignable to unknown; change the declaration on the stepData entry so the error property is typed simply as "error?: unknown" (or remove the explicit "| null") in the Record value type to avoid misleading readers; update the definition around the stepData property and any related types that reference this exact shape to keep consistency.packages/core/src/workflow/core.spec.ts (1)
514-536: Extract__voltagent_restart_checkpointinto a named constant.The magic string
"__voltagent_restart_checkpoint"is spread across multiple test fixtures (lines 515 and 677). If the key name ever changes in the implementation, these fixtures will silently diverge. Extract it to a shared constant so both the implementation and the tests stay in sync.#!/bin/bash # Check if this key is already exported as a constant anywhere in the codebase rg -rn '__voltagent_restart_checkpoint' --type tsAlso applies to: 677-699
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/core.spec.ts` around lines 514 - 536, Replace the hard-coded magic string "__voltagent_restart_checkpoint" in the test fixtures with a single exported constant (e.g., VOLTAGENT_RESTART_CHECKPOINT_KEY) defined in the implementation module that owns the checkpoint logic; export that constant and import it into packages/core/src/workflow/core.spec.ts, then update all occurrences in the test (including the metadata keys at the shown blocks) to use the imported constant so tests and implementation remain in sync if the key changes.packages/core/src/workflow/chain.spec.ts (1)
78-116: Consider adding achain.restartAllActive()test.The new
describe.sequential("workflow.restart")only testschain.restart(executionId). Per the PR description,workflowChain.restartAllActive(options?)is a new public API, but its direct invocation on aWorkflowChaininstance isn't covered here —core.spec.tstestsrestartAllActive()on aWorkflowfromcreateWorkflow, not on aWorkflowChain.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/chain.spec.ts` around lines 78 - 116, Add a test in this describe.sequential block that exercises the new WorkflowChain API by calling restartAllActive on the chain instance (e.g., workflow.restartAllActive(options?)) instead of only testing restart(executionId); set up the same Memory/InMemoryStorageAdapter, register the chain via WorkflowRegistry.registerWorkflow(workflow.toWorkflow()), seed a running execution state (workflow id "chain-restart" like the existing test), call workflow.restartAllActive() and assert the returned execution(s) have status "completed" and the expected result { value: 12 } to mirror the existing restart test but via restartAllActive on the WorkflowChain.packages/core/src/workflow/registry.ts (1)
261-265: Return type allowsnullbutworkflow.restart()never returnsnull.
restartWorkflowExecutionis typed asPromise<WorkflowExecutionResult<any, any> | null>, but it delegates toworkflow.restart(...)which returnsPromise<WorkflowExecutionResult<...>>(non-nullable) or throws. The| nullis dead code and may mislead callers into adding unnecessary null checks.Proposed fix
public async restartWorkflowExecution( workflowId: string, executionId: string, options?: WorkflowRunOptions, - ): Promise<WorkflowExecutionResult<any, any> | null> { + ): Promise<WorkflowExecutionResult<any, any>> {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/registry.ts` around lines 261 - 265, The return type of restartWorkflowExecution incorrectly includes "| null" even though it simply awaits and returns workflow.restart(...) which never returns null; update the signature of restartWorkflowExecution to return Promise<WorkflowExecutionResult<any, any>> (remove the nullable union) and ensure any callers relying on a possible null are updated accordingly; locate the method named restartWorkflowExecution and the delegation to workflow.restart(...) to make this change.packages/core/src/workflow/chain.ts (1)
974-1002:restart/restartAllActiverecreate the workflow on each call — same caveat asrun.Both methods follow the existing pattern of calling
createWorkflow()per invocation. This means each call gets a fresh in-memory store when no persistentmemoryis configured — sorestart()won't find any prior state and will throw. This is the same limitation asrun()/stream(), but it's worth noting thatrestartis inherently stateful and more sensitive to this.Users who call
chain.restart()instead ofchain.toWorkflow().restart()with no persistent memory will always get"Workflow state not found". Consider adding a brief JSDoc note (like thetoWorkflow()guidance for suspend/resume) warning that restart requires persistent memory or a registered workflow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/chain.ts` around lines 974 - 1002, Add a JSDoc note to the restart and restartAllActive methods on the Chain class (the async restart(...) and async restartAllActive(...) functions) warning that these methods recreate a workflow via createWorkflow(...) on each call and therefore require a persistent memory or a registered workflow to find prior state (otherwise they will get "Workflow state not found"); mirror the existing guidance used for toWorkflow() (suggest calling chain.toWorkflow().restart(...) when using ephemeral in-memory stores) so users know to configure persistent memory or register the workflow before invoking restart/restartAllActive.packages/core/src/workflow/core.ts (2)
2222-2294:restartExecutionimplementation is thorough and well-guarded.Good validation chain: checks state existence, workflow ID ownership, and "running" status before proceeding. The input fallback to
workflow-startevent is a nice compatibility touch for adapters that don't store input directly.One subtlety: the
persistedState.contextcast on line 2264 assumes it was stored asArray<[string | symbol, unknown]>(matching the serialization at line 989). This works for the current in-memory path, but external storage adapters that flatten or transform entries could break thenew Map(...)reconstruction. Worth a defensive check.Proposed defensive guard
const persistedContext = persistedState.context - ? new Map(persistedState.context as Array<[string | symbol, unknown]>) + ? Array.isArray(persistedState.context) + ? new Map(persistedState.context as Array<[string | symbol, unknown]>) + : undefined : undefined;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/core.ts` around lines 2222 - 2294, The persistedState.context reconstruction in restartExecution assumes persistedState.context is an Array<[string|symbol, unknown]> and does new Map(persistedState.context), which can throw or produce wrong maps for external adapters that serialize/flatten context; guard this by validating persistedState.context is an array of 2-length tuples (e.g., Array.isArray and each item is an array of length 2 with a string/symbol key) before calling new Map, otherwise set persistedContext to undefined (or reconstruct safely by iterating and selectively adding valid entries); update the code around persistedContext/new Map in restartExecution and ensure downstream use of restartOptions.context tolerates undefined.
1160-1192: Checkpoint I/O on every step completion — consider the cost for long workflows.
persistRunningCheckpointperforms a read (mergeExecutionMetadata→getWorkflowState) plus a write (updateWorkflowState) after every step. For workflows with many small/fast steps or latency-sensitive storage backends, this doubles the memory-layer I/O per step.This is a reasonable durability-vs-performance tradeoff for crash recovery, but you might want to offer a way to opt out or throttle (e.g., a
checkpointIntervalconfig on the workflow or per-execution options) for users who prefer speed over granular recovery.Also applies to: 1875-1883
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/core.ts` around lines 1160 - 1192, persistRunningCheckpoint currently does a metadata read via mergeExecutionMetadata (which calls getWorkflowState) plus executionMemory.updateWorkflowState on every step, causing double I/O per step; add a configurable checkpointInterval or disableCheckpointing flag on the workflow/execution options and use it in persistRunningCheckpoint (and the similar checkpoint call elsewhere) to skip the merge+update on steps that are not multiples of the interval (or when disabled), and when skipping avoid calling mergeExecutionMetadata entirely; ensure you reference and read the new option where persistRunningCheckpoint is invoked and before calling mergeExecutionMetadata/updateWorkflowState so the behavior is gated by the new config.packages/core/src/workflow/types.ts (1)
668-683:restartAllActiveworkflowIdoption is redundant on a singleWorkflowinstance.The
Workflowinterface already carries its ownid. Passing{ workflowId?: string }torestartAllActiveat this level is only needed for the registry aggregation path. On theWorkflowobject itself the parameter is misleading — callers might pass a different workflow's ID and get an empty result without any error. Consider narrowing the public signature here to(options?: Omit<…, 'workflowId'>)and only expose theworkflowIdfilter on the registry API, or at least document the behaviour when a mismatched ID is supplied.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/workflow/types.ts` around lines 668 - 683, The restartAllActive signature on the Workflow interface exposes a redundant workflowId filter; change the method on Workflow to remove the workflowId option so it becomes restartAllActive(options?: {}) => Promise<WorkflowRestartAllResult> (or simply restartAllActive() => Promise<WorkflowRestartAllResult>), update the Workflow interface declaration in types.ts accordingly, update any implementations of Workflow.restartAllActive to stop expecting options.workflowId, and keep the original { workflowId?: string } filter only on the registry-level API; also update the JSDoc comment to document that this per-Workflow call operates on that specific Workflow's executions and that cross-workflow filtering lives on the registry API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/workflow/core.ts`:
- Around line 1147-1158: serializeStepDataSnapshot currently copies
stepData.error (Error | null) directly, which will be lost when persisted via
JSON; change serializeStepDataSnapshot to convert stepData.error into a
serializable shape (e.g., null or { message: string; stack?: string; name?:
string }) before returning, update the WorkflowStepData error type (or introduce
a separate serialized checkpoint type) to accept this shape, and ensure you
read/rehydrate that shape on restore from executionContext.stepData so the
message/stack are preserved across JSON round-trips.
In `@packages/core/src/workflow/registry.ts`:
- Around line 298-309: The catch block in the loop over this.workflows
incorrectly populates failed entries with executionId: workflowId; update the
failure shape to distinguish workflow-level failures by adding an optional
workflowId (or a boolean flag like isWorkflowFailure) to
WorkflowRestartAllResult.failed and push { workflowId, error: ...,
isWorkflowFailure: true } (instead of using executionId), or alternatively push
a distinct structure for workflow-level errors; update usages that consume
aggregate.failed to handle the new optional field/flag; touch the loop around
registeredWorkflow.workflow.restartAllActive and the
WorkflowRestartAllResult.failed type to implement this change.
---
Nitpick comments:
In @.changeset/old-geese-smell.md:
- Around line 5-18: The changelog currently lists new restart/crash-recovery
APIs but omits mentioning newly exported public types; update the changelog
prose to explicitly surface the new exported types WorkflowRestartAllResult and
WorkflowRestartCheckpoint (and any other new public exports from the core
package) so users can adopt them in their type annotations—add a short sentence
in the changelog paragraph that names these types and notes they are exported
from the core package and available for consumer use.
In `@packages/core/src/memory/types.ts`:
- Around line 151-159: The error field in the stepData type is declared as
"error?: unknown | null", which is redundant because null is already assignable
to unknown; change the declaration on the stepData entry so the error property
is typed simply as "error?: unknown" (or remove the explicit "| null") in the
Record value type to avoid misleading readers; update the definition around the
stepData property and any related types that reference this exact shape to keep
consistency.
In `@packages/core/src/workflow/chain.spec.ts`:
- Around line 78-116: Add a test in this describe.sequential block that
exercises the new WorkflowChain API by calling restartAllActive on the chain
instance (e.g., workflow.restartAllActive(options?)) instead of only testing
restart(executionId); set up the same Memory/InMemoryStorageAdapter, register
the chain via WorkflowRegistry.registerWorkflow(workflow.toWorkflow()), seed a
running execution state (workflow id "chain-restart" like the existing test),
call workflow.restartAllActive() and assert the returned execution(s) have
status "completed" and the expected result { value: 12 } to mirror the existing
restart test but via restartAllActive on the WorkflowChain.
In `@packages/core/src/workflow/chain.ts`:
- Around line 974-1002: Add a JSDoc note to the restart and restartAllActive
methods on the Chain class (the async restart(...) and async
restartAllActive(...) functions) warning that these methods recreate a workflow
via createWorkflow(...) on each call and therefore require a persistent memory
or a registered workflow to find prior state (otherwise they will get "Workflow
state not found"); mirror the existing guidance used for toWorkflow() (suggest
calling chain.toWorkflow().restart(...) when using ephemeral in-memory stores)
so users know to configure persistent memory or register the workflow before
invoking restart/restartAllActive.
In `@packages/core/src/workflow/core.spec.ts`:
- Around line 514-536: Replace the hard-coded magic string
"__voltagent_restart_checkpoint" in the test fixtures with a single exported
constant (e.g., VOLTAGENT_RESTART_CHECKPOINT_KEY) defined in the implementation
module that owns the checkpoint logic; export that constant and import it into
packages/core/src/workflow/core.spec.ts, then update all occurrences in the test
(including the metadata keys at the shown blocks) to use the imported constant
so tests and implementation remain in sync if the key changes.
In `@packages/core/src/workflow/core.ts`:
- Around line 2222-2294: The persistedState.context reconstruction in
restartExecution assumes persistedState.context is an Array<[string|symbol,
unknown]> and does new Map(persistedState.context), which can throw or produce
wrong maps for external adapters that serialize/flatten context; guard this by
validating persistedState.context is an array of 2-length tuples (e.g.,
Array.isArray and each item is an array of length 2 with a string/symbol key)
before calling new Map, otherwise set persistedContext to undefined (or
reconstruct safely by iterating and selectively adding valid entries); update
the code around persistedContext/new Map in restartExecution and ensure
downstream use of restartOptions.context tolerates undefined.
- Around line 1160-1192: persistRunningCheckpoint currently does a metadata read
via mergeExecutionMetadata (which calls getWorkflowState) plus
executionMemory.updateWorkflowState on every step, causing double I/O per step;
add a configurable checkpointInterval or disableCheckpointing flag on the
workflow/execution options and use it in persistRunningCheckpoint (and the
similar checkpoint call elsewhere) to skip the merge+update on steps that are
not multiples of the interval (or when disabled), and when skipping avoid
calling mergeExecutionMetadata entirely; ensure you reference and read the new
option where persistRunningCheckpoint is invoked and before calling
mergeExecutionMetadata/updateWorkflowState so the behavior is gated by the new
config.
In `@packages/core/src/workflow/registry.ts`:
- Around line 261-265: The return type of restartWorkflowExecution incorrectly
includes "| null" even though it simply awaits and returns workflow.restart(...)
which never returns null; update the signature of restartWorkflowExecution to
return Promise<WorkflowExecutionResult<any, any>> (remove the nullable union)
and ensure any callers relying on a possible null are updated accordingly;
locate the method named restartWorkflowExecution and the delegation to
workflow.restart(...) to make this change.
In `@packages/core/src/workflow/types.ts`:
- Around line 668-683: The restartAllActive signature on the Workflow interface
exposes a redundant workflowId filter; change the method on Workflow to remove
the workflowId option so it becomes restartAllActive(options?: {}) =>
Promise<WorkflowRestartAllResult> (or simply restartAllActive() =>
Promise<WorkflowRestartAllResult>), update the Workflow interface declaration in
types.ts accordingly, update any implementations of Workflow.restartAllActive to
stop expecting options.workflowId, and keep the original { workflowId?: string }
filter only on the registry-level API; also update the JSDoc comment to document
that this per-Workflow call operates on that specific Workflow's executions and
that cross-workflow filtering lives on the registry API.
In `@website/docs/workflows/overview.md`:
- Around line 561-576: Add a short note to this section explaining that
restart() can be called directly on a WorkflowChain as well as on a Workflow
instance (so users don't think they must call workflow.toWorkflow().restart());
mention the equivalent methods (WorkflowChain.restart and
workflow.toWorkflow().restart) and optionally include a one-line example or
pointer to WorkflowRegistry.getInstance().restartAllActiveWorkflowRuns() for
cross-workflow recovery to make the available options explicit.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/workflow/core.spec.ts`:
- Around line 819-826: The test is passing a plain object for the
WorkflowStateEntry.context (currently `context: { role: "admin" } as any`) which
violates the expected type Array<[string | symbol, unknown]>; update the call to
memory.setWorkflowState in the spec so the context is provided as the correct
array-of-tuples form (e.g., [['role', 'admin']]) or use a small typed helper
that returns Array<[string | symbol, unknown]> and reference that helper in the
test; ensure the change is applied where WorkflowStateEntry is constructed so
types align without using `as any`.
In `@packages/core/src/workflow/core.ts`:
- Around line 2392-2426: The restartOptions currently sets userId/conversationId
from persistedState but drops persistedState.metadata when there is no
checkpoint; update restartOptions to include merged metadata by taking
persistedState.metadata and shallow-merging options?.metadata (so caller
overrides take precedence), e.g. compute mergedMetadata = {
...(persistedState.metadata ?? {}), ...(options?.metadata ?? {}) } and assign
metadata: Object.keys(mergedMetadata).length ? mergedMetadata : undefined on
restartOptions; before assigning, remove the restart/checkpoint marker key from
mergedMetadata (the metadata key used to indicate restart state) so we don't
carry restart-specific data forward; adjust code around persistedState,
metadataUserId, metadataConversationId, and restartOptions accordingly.
| await memory.setWorkflowState(executionId, { | ||
| id: executionId, | ||
| workflowId: workflow.id, | ||
| workflowName: workflow.name, | ||
| status: "running", | ||
| input: { value: 9 }, | ||
| context: { role: "admin" } as any, | ||
| createdAt: now, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's look at the specific code in question
sed -n '815,830p' packages/core/src/workflow/core.spec.tsRepository: VoltAgent/voltagent
Length of output: 655
🏁 Script executed:
#!/bin/bash
# Inspect persisted workflow context type
rg -n "context" packages/core/src/memory/types.ts -C3Repository: VoltAgent/voltagent
Length of output: 2034
🏁 Script executed:
#!/bin/bash
# Let's also look at the WorkflowState type definition
rg -n "WorkflowState\|interface.*State\|type.*State" packages/core/src/memory/types.ts -A5Repository: VoltAgent/voltagent
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Get exact lines 819-826
sed -n '819,826p' packages/core/src/workflow/core.spec.tsRepository: VoltAgent/voltagent
Length of output: 455
🏁 Script executed:
#!/bin/bash
# Search for WorkflowState type definition
rg -n "interface WorkflowState\|type WorkflowState" packages/core/src -A10Repository: VoltAgent/voltagent
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Look for test utilities or helpers for creating workflow state
rg -n "setWorkflowState|WorkflowState" packages/core/src/workflow/core.spec.ts -B2 -A2 | head -40Repository: VoltAgent/voltagent
Length of output: 1531
🏁 Script executed:
#!/bin/bash
# Get exact lines around 819-826 using cat
cat -n packages/core/src/workflow/core.spec.ts | sed -n '815,830p'Repository: VoltAgent/voltagent
Length of output: 767
🏁 Script executed:
#!/bin/bash
# Search for the complete WorkflowState definition
rg -n "WorkflowState" packages/core/src/memory/types.tsRepository: VoltAgent/voltagent
Length of output: 584
🏁 Script executed:
#!/bin/bash
# Look at the full test function starting around line 810
cat -n packages/core/src/workflow/core.spec.ts | sed -n '808,835p'Repository: VoltAgent/voltagent
Length of output: 1232
🏁 Script executed:
#!/bin/bash
# Search for the actual code with context and as any
rg -n "context.*as any\|role.*admin" packages/core/src/workflow/core.spec.ts -B5 -A5Repository: VoltAgent/voltagent
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Also search more broadly for the pattern
rg -n "as any" packages/core/src/workflow/core.spec.tsRepository: VoltAgent/voltagent
Length of output: 310
🏁 Script executed:
#!/bin/bash
# Get context around the actual line 1026
cat -n packages/core/src/workflow/core.spec.ts | sed -n '1020,1035p'Repository: VoltAgent/voltagent
Length of output: 644
🏁 Script executed:
#!/bin/bash
# Get the complete WorkflowStateEntry interface definition
rg -n "export interface WorkflowStateEntry" packages/core/src/memory/types.ts -A30Repository: VoltAgent/voltagent
Length of output: 1212
Correct line number and use proper typing for context parameter. Line 1026 contains context: { role: "admin" } as any, but WorkflowStateEntry expects context to be Array<[string | symbol, unknown]>. The as any cast masks this type mismatch. Replace it with either the correct array-of-tuples format or a typed helper to maintain type safety per coding guidelines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/workflow/core.spec.ts` around lines 819 - 826, The test is
passing a plain object for the WorkflowStateEntry.context (currently `context: {
role: "admin" } as any`) which violates the expected type Array<[string |
symbol, unknown]>; update the call to memory.setWorkflowState in the spec so the
context is provided as the correct array-of-tuples form (e.g., [['role',
'admin']]) or use a small typed helper that returns Array<[string | symbol,
unknown]> and reference that helper in the test; ensure the change is applied
where WorkflowStateEntry is constructed so types align without using `as any`.
| const metadataUserId = | ||
| typeof persistedState.metadata?.userId === "string" | ||
| ? (persistedState.metadata.userId as string) | ||
| : undefined; | ||
| const metadataConversationId = | ||
| typeof persistedState.metadata?.conversationId === "string" | ||
| ? (persistedState.metadata.conversationId as string) | ||
| : undefined; | ||
| const persistedContext = toValidContextMap(persistedState.context); | ||
| const effectiveWorkflowState = | ||
| options?.workflowState ?? checkpoint?.workflowState ?? persistedState.workflowState ?? {}; | ||
|
|
||
| const restartOptions: WorkflowRunOptions = { | ||
| ...options, | ||
| executionId, | ||
| userId: options?.userId ?? persistedState.userId ?? metadataUserId, | ||
| conversationId: | ||
| options?.conversationId ?? persistedState.conversationId ?? metadataConversationId, | ||
| context: options?.context ?? persistedContext, | ||
| workflowState: effectiveWorkflowState, | ||
| resumeFrom: checkpoint | ||
| ? { | ||
| executionId, | ||
| resumeStepIndex: checkpoint.resumeStepIndex, | ||
| lastEventSequence: checkpoint.eventSequence, | ||
| checkpoint: { | ||
| stepExecutionState: checkpoint.stepExecutionState, | ||
| completedStepsData: checkpoint.completedStepsData, | ||
| workflowState: checkpoint.workflowState ?? effectiveWorkflowState, | ||
| stepData: checkpoint.stepData, | ||
| usage: checkpoint.usage, | ||
| }, | ||
| } | ||
| : undefined, | ||
| }; |
There was a problem hiding this comment.
Preserve persisted metadata when restarting without a checkpoint.
Lines 2404-2412 build restart options without carrying stored metadata if no checkpoint exists, which can drop tenant/custom metadata for runs that crash before the first checkpoint (or when checkpointing is disabled). Consider carrying forward persisted metadata (minus the restart checkpoint key) unless the caller overrides it.
Suggested fix
+ const persistedMetadata = isObjectRecord(persistedState.metadata)
+ ? (persistedState.metadata as Record<string, unknown>)
+ : undefined;
+ const restartMetadata =
+ options?.metadata ??
+ (persistedMetadata
+ ? Object.fromEntries(
+ Object.entries(persistedMetadata).filter(
+ ([key]) => key !== VOLTAGENT_RESTART_CHECKPOINT_KEY,
+ ),
+ )
+ : undefined);
+
const restartOptions: WorkflowRunOptions = {
...options,
executionId,
userId: options?.userId ?? persistedState.userId ?? metadataUserId,
conversationId:
options?.conversationId ?? persistedState.conversationId ?? metadataConversationId,
context: options?.context ?? persistedContext,
workflowState: effectiveWorkflowState,
+ metadata: restartMetadata,
resumeFrom: checkpoint
? {
executionId,
resumeStepIndex: checkpoint.resumeStepIndex,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/workflow/core.ts` around lines 2392 - 2426, The
restartOptions currently sets userId/conversationId from persistedState but
drops persistedState.metadata when there is no checkpoint; update restartOptions
to include merged metadata by taking persistedState.metadata and shallow-merging
options?.metadata (so caller overrides take precedence), e.g. compute
mergedMetadata = { ...(persistedState.metadata ?? {}), ...(options?.metadata ??
{}) } and assign metadata: Object.keys(mergedMetadata).length ? mergedMetadata :
undefined on restartOptions; before assigning, remove the restart/checkpoint
marker key from mergedMetadata (the metadata key used to indicate restart state)
so we don't carry restart-specific data forward; adjust code around
persistedState, metadataUserId, metadataConversationId, and restartOptions
accordingly.
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/core/src/workflow/core.spec.ts`:
- Around line 1020-1028: The test currently uses a casted loose object for
persisted context when calling memory.setWorkflowState (executionId, {...,
context: { role: "admin" } as any,...}), which bypasses the expected tuple/array
format and weakens type safety; update the context value to match the project's
persisted-context type (the tuple/array shape the system expects) instead of
using "as any"—locate the set in memory.setWorkflowState and replace the cast
with a properly typed context value conforming to the persisted tuple-array
format so the test enforces real types for executionId/context handling.
In `@packages/core/src/workflow/core.ts`:
- Around line 2420-2464: The restart flow drops persisted metadata; update the
creation of restartOptions so it includes metadata merged from
persistedState.metadata (excluding any checkpoint key) unless options.metadata
is provided; specifically, when building the WorkflowRunOptions object
(restartOptions) merge options?.metadata ?? persistedState.metadata
(sanitizing/removing any checkpoint field) so tenant/custom metadata is
preserved across restarts while still allowing the caller to override metadata.
PR Checklist
Please check if your PR fulfills the following requirements:
Bugs / Features
What is the current behavior?
Workflows support suspend/resume, but there is no workflow-level restart API for interrupted
runningexecutions after crashes/restarts. There is also no bulk restart helper for active runs.What is the new behavior?
Adds restart/crash-recovery APIs and checkpoint persistence:
New workflow APIs
workflow.restart(executionId, options?)workflow.restartAllActive(options?)workflowChain.restart(executionId, options?)workflowChain.restartAllActive(options?)New registry helpers
WorkflowRegistry.restartWorkflowExecution(workflowId, executionId, options?)WorkflowRegistry.restartAllActiveWorkflowRuns(options?)Runtime changes
Docs
website/docs/workflows/overview.mdwebsite/docs/workflows/suspend-resume.mdValidation
pnpm --filter @voltagent/core test:single src/workflow/core.spec.ts src/workflow/chain.spec.ts src/workflow/suspend-resume.spec.tspnpm --filter @voltagent/core typecheckexamples/with-workflow(programmatic):workflow.restart(executionId)=> completedfixes (issue)
N/A
Notes for reviewers
docs/workflow-parity-plans/is intentionally not included.Summary by cubic
Adds workflow restart and crash-recovery with periodic checkpoints. Checkpoints capture step outputs/status, stepData, usage, workflow state, context, and event sequence for deterministic restarts and bulk recovery.
New Features
Migration
Written for commit 925c089. Summary will update on new commits.
Summary by CodeRabbit
New Features
Documentation
Tests