Skip to content

feat(core): add workflow time-travel deterministic replay APIs#1099

Merged
omeraplak merged 7 commits intomainfrom
feat/workflow-time-travel-deterministic-replay-clean
Feb 22, 2026
Merged

feat(core): add workflow time-travel deterministic replay APIs#1099
omeraplak merged 7 commits intomainfrom
feat/workflow-time-travel-deterministic-replay-clean

Conversation

@omeraplak
Copy link
Copy Markdown
Member

@omeraplak omeraplak commented Feb 22, 2026

PR Checklist

Please check if your PR fulfills the following requirements:

Bugs / Features

What is the current behavior?

VoltAgent workflows support restart/crash recovery and async start, but do not expose a deterministic replay API for re-running historical executions from a selected step.

What is the new behavior?

Adds deterministic replay APIs for workflow and chain surfaces:

  • workflow.timeTravel(options)
  • workflow.timeTravelStream(options)
  • workflowChain.timeTravel(options)
  • workflowChain.timeTravelStream(options)

Also adds:

  • WorkflowTimeTravelOptions public export
  • Replay lineage fields persisted on workflow state:
    • replayedFromExecutionId
    • replayFromStepId
  • Documentation updates with usage examples in:
    • website/docs/workflows/overview.md
    • website/docs/workflows/suspend-resume.md
    • website/docs/workflows/streaming.md
  • Changeset for @voltagent/core minor release

Smoke tests run:

  • pnpm --filter @voltagent/core test:single src/workflow/time-travel.spec.ts
  • pnpm --filter @voltagent/core test:single src/workflow/chain.spec.ts
  • pnpm --filter @voltagent/core typecheck

fixes (issue)

N/A

Notes for reviewers

  • timeTravel rejects running source executions; crash recovery remains restart(...) scope.
  • Replay writes a new execution state and preserves original run history.

Summary by cubic

Adds deterministic time-travel replay for workflows and chains, letting you re-run a past execution from a specific step into a new execution while keeping the original run unchanged. Includes a replay REST endpoint and OpenTelemetry lineage linking for better debugging.

  • New Features

    • APIs: workflow.timeTravel/timeTravelStream and workflowChain.timeTravel/timeTravelStream; export WorkflowTimeTravelOptions.
    • Replay engine: validates workflow/step, blocks running sources; reconstructs prior steps from checkpoints and step-complete events (by id/index/name); infers selected-step input with optional inputData/resumeData/workflowStateOverride; writes a new execution with lineage fields.
    • Streaming: supports timeTravelStream with resume() and optional stepId override during replay; same stream interface as .stream().
    • Observability: persists replayedFromExecutionId/replayFromStepId on workflow state; adds OTEL replay links and attributes (skips resume attrs for replays).
    • REST: POST /workflows/:id/executions/:executionId/replay with typed request/response and error mapping (400/404/500); protected route wired in Elysia, Hono, and serverless-Hono.
    • Docs & tests: endpoint docs with cURL/JS, updated workflow guides (overview/streaming/suspend-resume), new time-travel and OTEL trace-context tests; registry reset utility for deterministic tests.
  • Bug Fixes

    • Streaming resumes now preserve the configured memory adapter across replay/resume calls.
    • Server maps replay preparation issues (for example, missing historical snapshots) to HTTP 400 for clearer errors.

Written for commit 6829072. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Deterministic "time travel" replay for workflows (standard + streaming), with optional input/resume/state overrides and persisted replay lineage.
  • SDK

    • New workflow and workflow-chain methods to initiate deterministic replay and streaming replay; new public type to configure replays.
  • API

    • New REST endpoint to create deterministic replay executions; request/response schemas updated to include replay options and lineage fields.
  • Observability

    • Added replay-related trace/span attributes for provenance.
  • Documentation

    • Comprehensive docs and examples for time travel, streaming, and suspend/resume.
  • Tests

    • End-to-end tests covering replay flows, overrides, errors, and streaming events.
  • Chores

    • Registry reset utility added for deterministic test isolation.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Feb 22, 2026

🦋 Changeset detected

Latest commit: 6829072

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@voltagent/core Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@joggrbot

This comment has been minimized.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds deterministic "time travel" replay APIs to Workflow and WorkflowChain, implements streaming and resume semantics, persists replay lineage metadata on workflow state, updates observability/tracing for replay provenance, exposes public types, adds server endpoints/handlers, tests, and docs.

Changes

Cohort / File(s) Summary
Public types & memory
packages/core/src/workflow/types.ts, packages/core/src/workflow/index.ts, packages/core/src/memory/types.ts
Introduce WorkflowTimeTravelOptions, WorkflowReplayOptions, export the new type, and add replayedFromExecutionId / replayFromStepId to in-memory WorkflowStateEntry.
Core workflow logic
packages/core/src/workflow/core.ts, packages/core/src/workflow/types.ts
Implement timeTravel() and timeTravelStream() with preparation/validation, reconstruction of input/state, replay metadata, streaming/resume mechanics, and helper utilities; add internal preparation types and extend run options for replay lineage.
WorkflowChain integration
packages/core/src/workflow/chain.ts
Add timeTravel() and timeTravelStream() helpers that recreate a Workflow instance and delegate to workflow APIs.
Observability / tracing
packages/core/src/workflow/open-telemetry/trace-context.ts, packages/core/src/workflow/open-telemetry/trace-context.spec.ts, packages/core/src/observability/types.ts
Add replay lineage links/attributes and span attributes for replay provenance; update tests to assert replay attributes and increase retry timing.
Server API & handlers
packages/server-core/src/handlers/workflow.handlers.ts, packages/server-core/src/auth/defaults.ts, packages/server-core/src/routes/definitions.ts, packages/server-core/src/schemas/agent.schemas.ts
New handler handleReplayWorkflow, request/response schemas, route definition for POST /workflows/:id/executions/:executionId/replay, and protect the route in auth defaults.
Server frameworks — route wiring
packages/server-elysia/src/routes/workflow.routes.ts, packages/server-elysia/src/schemas.ts, packages/server-hono/src/routes/agent.routes.ts, packages/server-hono/src/routes/index.ts, packages/serverless-hono/src/routes.ts
Register replay route across adapters and expose schemas/handlers; some files contain duplicate route blocks to review.
Tests
packages/core/src/workflow/time-travel.spec.ts
Add Vitest suite covering replay from steps, input/resume overrides, error cases, preservation of original history, and streaming event assertions.
Docs — user guides & API docs
website/docs/workflows/overview.md, website/docs/workflows/streaming.md, website/docs/workflows/suspend-resume.md, website/docs/api/endpoints/workflows.md
Add Time Travel / Deterministic Replay guides, streaming examples, and REST API docs; some duplicated sections/blocks appear and should be deduplicated.
Misc / infra & registry
.changeset/four-cobras-yawn.md, packages/core/src/workflow/registry.ts, test retry tuning
Changelog entry, add WorkflowRegistry.reset() for test isolation, adjust test retry timing, and small manifest updates.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Workflow
    participant MemoryV2 as Memory (V2)
    participant Engine as ExecutionEngine

    Client->>Workflow: timeTravel({executionId, stepId, ...})
    activate Workflow
    Workflow->>MemoryV2: fetch source execution + per-step data
    activate MemoryV2
    MemoryV2-->>Workflow: execution record & step snapshots
    deactivate MemoryV2

    Workflow->>Workflow: validate, locate target step, reconstruct input/state, build lineage metadata
    Workflow->>Engine: start replay execution (new executionId) from target step
    activate Engine
    Engine->>MemoryV2: persist replay execution, steps, lineage metadata
    activate MemoryV2
    MemoryV2-->>Engine: persisted
    deactivate MemoryV2
    Engine-->>Workflow: emit replay events / suspension / completion
    deactivate Engine

    Workflow-->>Client: return WorkflowExecutionResult or stream events
    deactivate Workflow
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐇 I hopped back through steps and tracked each tiny clue,

A new run took the carrot while the old one stayed true,
Inputs nudged like treats, state tucked in neat,
Lineage stamped and tidy, every hop a beat,
Streams chimed each bounce — replay, precise and new.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main feature being added: workflow time-travel deterministic replay APIs for the core module.
Description check ✅ Passed The PR description is comprehensive and complete, with all required sections properly filled and detailed information about the new feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/workflow-time-travel-deterministic-replay-clean

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 10 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/src/workflow/core.ts">

<violation number="1" location="packages/core/src/workflow/core.ts:126">
P2: Silently defaulting an unrecognized step status to `"success"` can mask data corruption during replay. Consider throwing an error (or at least logging a warning) when the status is invalid, rather than silently promoting it to success.</violation>

<violation number="2" location="packages/core/src/workflow/core.ts:2587">
P1: Using `.find()` returns the first matching step-complete event instead of the last. For retried steps, this retrieves data from the initial (failed) attempt rather than the final (successful) one. Use a reverse search to get the most recent event for the step.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/core/src/workflow/core.ts Outdated
continue;
}

const fallbackEvent = sourceStepCompleteEvents.find(
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Using .find() returns the first matching step-complete event instead of the last. For retried steps, this retrieves data from the initial (failed) attempt rather than the final (successful) one. Use a reverse search to get the most recent event for the step.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/src/workflow/core.ts, line 2587:

<comment>Using `.find()` returns the first matching step-complete event instead of the last. For retried steps, this retrieves data from the initial (failed) attempt rather than the final (successful) one. Use a reverse search to get the most recent event for the step.</comment>

<file context>
@@ -2493,6 +2518,193 @@ export function createWorkflow<
+        continue;
+      }
+
+      const fallbackEvent = sourceStepCompleteEvents.find(
+        (event) =>
+          event.from === step.id ||
</file context>
Fix with Cubic

Comment thread packages/core/src/workflow/core.ts
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Feb 22, 2026

Deploying voltagent with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6829072
Status: ✅  Deploy successful!
Preview URL: https://078a49e8.voltagent.pages.dev
Branch Preview URL: https://feat-workflow-time-travel-de.voltagent.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
packages/core/src/workflow/chain.ts (1)

991-1023: Consider adding the same memory-requirement note as restart to both JSDoc comments.

timeTravel and timeTravelStream need access to the source execution state in persistent memory, just like restart. The existing restart and restartAllActive methods already document this explicitly (lines 1025–1030 and 1047–1050). Without the same caveat, callers using an ephemeral/default in-memory store may get a confusing runtime error.

📝 Proposed JSDoc additions
-  /**
-   * Replay a historical execution from the selected step
-   */
+  /**
+   * Replay a historical execution from the selected step.
+   * This recreates a workflow instance via `createWorkflow(...)` on each call.
+   * Use persistent/shared memory (or register the workflow) so prior execution state is discoverable.
+   * For ephemeral setup patterns, prefer `chain.toWorkflow().timeTravel(...)` and reuse that instance.
+   */
   async timeTravel(

-  /**
-   * Stream a historical replay from the selected step
-   */
+  /**
+   * Stream a historical replay from the selected step.
+   * This recreates a workflow instance via `createWorkflow(...)` on each call.
+   * Use persistent/shared memory (or register the workflow) so prior execution state is discoverable.
+   * For ephemeral setup patterns, prefer `chain.toWorkflow().timeTravelStream(...)` and reuse that instance.
+   */
   timeTravelStream(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/chain.ts` around lines 991 - 1023, Add the same
persistent-memory requirement note used on restart/restartAllActive to the JSDoc
for timeTravel and timeTravelStream: update the comments for the timeTravel and
timeTravelStream methods in chain.ts (methods named timeTravel and
timeTravelStream) to explicitly state that these operations require access to
the source execution state in a persistent store (not the ephemeral/default
in-memory store) so callers know to configure a persistent memory backend to
avoid runtime errors.
packages/core/src/workflow/time-travel.spec.ts (1)

9-13: Add a test for the explicitly documented running-state rejection.

The PR notes call out: "timeTravel rejects source executions in the running state (crash recovery continues to use restart(...))", and the overview docs repeat this under "Notes". This is a load-bearing invariant that currently has no test coverage.

A test for a non-existent executionId would also be worth adding as a companion edge case.

✅ Suggested test additions
it("should reject time travel on a running execution", async () => {
  const memory = new Memory({ storage: new InMemoryStorageAdapter() });

  const workflow = createWorkflow(
    {
      id: "time-travel-running-rejection",
      name: "Time Travel Running Rejection",
      input: z.object({ value: z.number() }),
      result: z.object({ value: z.number() }),
      memory,
    },
    andThen({ id: "step-1", execute: async ({ data }) => data }),
  );

  const registry = WorkflowRegistry.getInstance();
  registry.registerWorkflow(workflow);

  // Manually persist a "running" state to simulate an in-flight execution
  const runningId = "exec-running-test";
  await memory.setWorkflowState(runningId, {
    id: runningId,
    workflowId: "time-travel-running-rejection",
    workflowName: "Time Travel Running Rejection",
    status: "running",
    createdAt: new Date(),
    updatedAt: new Date(),
  });

  await expect(
    workflow.timeTravel({ executionId: runningId, stepId: "step-1" }),
  ).rejects.toThrow(/running/i);
});

it("should fail with actionable error when execution does not exist", async () => {
  const memory = new Memory({ storage: new InMemoryStorageAdapter() });

  const workflow = createWorkflow(
    {
      id: "time-travel-missing-exec",
      name: "Time Travel Missing Exec",
      input: z.object({ value: z.number() }),
      result: z.object({ value: z.number() }),
      memory,
    },
    andThen({ id: "step-1", execute: async ({ data }) => data }),
  );

  const registry = WorkflowRegistry.getInstance();
  registry.registerWorkflow(workflow);

  await expect(
    workflow.timeTravel({ executionId: "non-existent-id", stepId: "step-1" }),
  ).rejects.toThrow();
});

Want me to open a new issue to track these missing test cases?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/time-travel.spec.ts` around lines 9 - 13, Add two
unit tests to packages/core/src/workflow/time-travel.spec.ts: one that persists
a workflow state with status "running" (use Memory with InMemoryStorageAdapter,
createWorkflow and WorkflowRegistry.registerWorkflow) and asserts that
workflow.timeTravel({ executionId: runningId, stepId }) rejects with an error
mentioning "running"; and a second that calls workflow.timeTravel with a
non-existent executionId and asserts it rejects with an actionable error. Use
the existing createWorkflow/andThen helpers and Memory/WorkflowRegistry symbols
so the tests simulate the running/inexistent states without starting actual
executions.
packages/core/src/workflow/core.ts (1)

122-127: Silent default to "success" for unrecognized statuses.

This is used when reconstructing historical step data for replay. If a step had an unexpected status value (e.g., data corruption), it would silently be treated as "success". Consider logging a warning when the fallback is used, so replay anomalies are observable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/core.ts` around lines 122 - 127, The
toWorkflowStepStatus function silently defaults unrecognized values to
"success", which can hide replay anomalies; update toWorkflowStepStatus to log a
warning (including the raw value) whenever isWorkflowStepStatus(value) is false
before returning "success" — use the project's preferred logger (or console.warn
if none available) and reference toWorkflowStepStatus, isWorkflowStepStatus, and
WorkflowStepData["status"] so the warning makes replay data issues observable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@website/docs/workflows/streaming.md`:
- Around line 45-50: The docs list of VoltAgent methods omits the new
.timeTravel() API; update the methods list in
website/docs/workflows/streaming.md to include `.timeTravel()` alongside
`.timeTravelStream()`, `.stream()`, `.run()`, and `.startAsync()` so the list
reflects all five public methods and their brief descriptions.

---

Nitpick comments:
In `@packages/core/src/workflow/chain.ts`:
- Around line 991-1023: Add the same persistent-memory requirement note used on
restart/restartAllActive to the JSDoc for timeTravel and timeTravelStream:
update the comments for the timeTravel and timeTravelStream methods in chain.ts
(methods named timeTravel and timeTravelStream) to explicitly state that these
operations require access to the source execution state in a persistent store
(not the ephemeral/default in-memory store) so callers know to configure a
persistent memory backend to avoid runtime errors.

In `@packages/core/src/workflow/core.ts`:
- Around line 122-127: The toWorkflowStepStatus function silently defaults
unrecognized values to "success", which can hide replay anomalies; update
toWorkflowStepStatus to log a warning (including the raw value) whenever
isWorkflowStepStatus(value) is false before returning "success" — use the
project's preferred logger (or console.warn if none available) and reference
toWorkflowStepStatus, isWorkflowStepStatus, and WorkflowStepData["status"] so
the warning makes replay data issues observable.

In `@packages/core/src/workflow/time-travel.spec.ts`:
- Around line 9-13: Add two unit tests to
packages/core/src/workflow/time-travel.spec.ts: one that persists a workflow
state with status "running" (use Memory with InMemoryStorageAdapter,
createWorkflow and WorkflowRegistry.registerWorkflow) and asserts that
workflow.timeTravel({ executionId: runningId, stepId }) rejects with an error
mentioning "running"; and a second that calls workflow.timeTravel with a
non-existent executionId and asserts it rejects with an actionable error. Use
the existing createWorkflow/andThen helpers and Memory/WorkflowRegistry symbols
so the tests simulate the running/inexistent states without starting actual
executions.

Comment thread website/docs/workflows/streaming.md Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
website/docs/api/endpoints/workflows.md (1)

623-647: ⚠️ Potential issue | 🟡 Minor

Replay lineage fields (replayedFromExecutionId, replayFromStepId) absent from state response example.

The changeset documents that workflow state records now carry replayedFromExecutionId and replayFromStepId. API consumers polling the state endpoint on a replay execution will receive these fields with no reference in the docs.

📝 Proposed addition to the state response example
     "context": {
       "userId": "user-123",
       "priority": "high"
-    }
+    },
+    "replayedFromExecutionId": "exec_1234567890_abc123",
+    "replayFromStepId": "approval-required"
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/docs/api/endpoints/workflows.md` around lines 623 - 647, Update the
state response example to include the new replay lineage fields by adding
"replayedFromExecutionId" and "replayFromStepId" into the top-level "data"
object of the JSON example (use realistic sample values like
"exec_0987654321_replay" and "step-approval-1"); ensure these keys appear
alongside existing fields such as "executionId", "workflowId", "status", and
"suspension" so API consumers polling a replay execution will see the replay
lineage in the documented response.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@website/docs/api/endpoints/workflows.md`:
- Line 566: Fix the grammar in the error description string "400 - Invalid
replay parameters (for example invalid `stepId` or source execution still
running)" by inserting a comma after "for example" so it reads "...(for example,
invalid `stepId` or source execution still running)"; update that exact line in
workflows.md accordingly.
- Around line 546-562: Add the missing "suspended" response variant to the
replay response docs for the Execute endpoint: after the existing completed
example include a JSON example where "status": "suspended", "endAt": null,
"result": null and a "suspension" object containing "suspendedAt", "reason", and
"suspendedStepIndex" (keep executionId/startAt values consistent with the
example); ensure field names match the existing schema (executionId, startAt,
endAt, status, result, suspension) so replay documentation covers both completed
and suspended outcomes.
- Around line 513-613: Remove the entire "Replay Workflow" documentation block
that declares the POST /workflows/:id/executions/:executionId/replay endpoint
because there is no server route for it; instead, update docs to reflect that
replay/time-travel functionality is SDK-only (referencing the timeTravelStream
API) and ensure any examples, request/response schemas, and error cases for that
endpoint are deleted so the docs only show implemented REST endpoints (execute,
stream, suspend, cancel, resume, state retrieval).

---

Outside diff comments:
In `@website/docs/api/endpoints/workflows.md`:
- Around line 623-647: Update the state response example to include the new
replay lineage fields by adding "replayedFromExecutionId" and "replayFromStepId"
into the top-level "data" object of the JSON example (use realistic sample
values like "exec_0987654321_replay" and "step-approval-1"); ensure these keys
appear alongside existing fields such as "executionId", "workflowId", "status",
and "suspension" so API consumers polling a replay execution will see the replay
lineage in the documented response.

Comment on lines +513 to +613
## Replay Workflow

Create a deterministic replay execution from a historical run and selected step.

**Endpoint:** `POST /workflows/:id/executions/:executionId/replay`

**Request Body:**

```json
{
"stepId": "approval-step",
"inputData": {
"amount": 2500
},
"resumeData": {
"approved": true,
"approvedBy": "ops-user-1"
},
"workflowStateOverride": {
"replayReason": "incident-1234"
}
}
```

**Parameters:**

| Field | Type | Description |
| ----------------------- | ------ | --------------------------------------- |
| `stepId` | string | Historical step ID to replay from |
| `inputData` | any | Optional selected-step input override |
| `resumeData` | any | Optional resume payload override |
| `workflowStateOverride` | object | Optional shared workflow state override |

**Response:**

```json
{
"success": true,
"data": {
"executionId": "exec_replay_123",
"startAt": "2024-01-15T11:00:00.000Z",
"endAt": "2024-01-15T11:00:02.250Z",
"status": "completed",
"result": {
"approved": true,
"finalAmount": 2500
}
}
}
```

**Error Cases:**

- `400` - Invalid replay parameters (for example invalid `stepId` or source execution still running)
- `404` - Workflow or source execution not found
- `500` - Replay failed due to server error

**cURL Example (Default Replay):**

```bash
curl -X POST http://localhost:3141/workflows/order-approval/executions/exec_123/replay \
-H "Content-Type: application/json" \
-d '{
"stepId": "approval-step"
}'
```

**cURL Example (Replay With Overrides):**

```bash
curl -X POST http://localhost:3141/workflows/order-approval/executions/exec_123/replay \
-H "Content-Type: application/json" \
-d '{
"stepId": "approval-step",
"inputData": { "amount": 2500 },
"resumeData": { "approved": true, "approvedBy": "ops-user-1" },
"workflowStateOverride": { "replayReason": "incident-1234" }
}'
```

**JavaScript Example:**

```javascript
const response = await fetch(
"http://localhost:3141/workflows/order-approval/executions/exec_123/replay",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
stepId: "approval-step",
inputData: { amount: 2500 },
resumeData: { approved: true, approvedBy: "ops-user-1" },
workflowStateOverride: { replayReason: "incident-1234" },
}),
}
);

const replay = await response.json();
console.log("Replay execution ID:", replay.data.executionId);
console.log("Replay status:", replay.data.status);
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n --type ts -C3 "replay.*stream|stream.*replay" --ignore-case -g '!*.spec.ts' -g '!*.md'

Repository: VoltAgent/voltagent

Length of output: 6140


🏁 Script executed:

# Also search for route/endpoint definitions related to replay
rg -n --type ts -C2 "POST.*replay|replay.*endpoint" --ignore-case -g '!*.spec.ts' -g '!*.md'

Repository: VoltAgent/voltagent

Length of output: 45


🏁 Script executed:

# Search for timeTravelStream method definitions
rg -n "timeTravelStream" --type ts -g '!*.spec.ts' -g '!*.md' -C3

Repository: VoltAgent/voltagent

Length of output: 2446


🏁 Script executed:

# Look for streaming endpoint definitions (like /stream endpoints) to understand the pattern
rg -n "\/stream|streaming.*endpoint" --type ts -C2 -g '!*.spec.ts' -g '!*.md' | head -100

Repository: VoltAgent/voltagent

Length of output: 157


🏁 Script executed:

# Look for all replay route definitions
rg -n "replay" --type ts packages/server-core/src/routes/ -C3

Repository: VoltAgent/voltagent

Length of output: 856


🏁 Script executed:

# Check if there are any timeTravelStream route handlers
rg -n "timeTravelStream" --type ts packages/server-elysia/src/routes/ packages/server-core/src/routes/ -C3

Repository: VoltAgent/voltagent

Length of output: 45


🏁 Script executed:

# Find all POST endpoint definitions in routes
rg -n "POST|method.*POST" --type ts packages/server-core/src/routes/definitions.ts -C2 | grep -A2 "replay"

Repository: VoltAgent/voltagent

Length of output: 45


🏁 Script executed:

# Find all route/path definitions in the routes files
rg -n "path.*replay|replay" --type ts packages/server-core/src/routes/definitions.ts packages/server-elysia/src/routes/ -C2

Repository: VoltAgent/voltagent

Length of output: 1196


🏁 Script executed:

# Check the actual route file to see all registered paths
head -150 packages/server-core/src/routes/definitions.ts | tail -100

Repository: VoltAgent/voltagent

Length of output: 3727


🏁 Script executed:

# List all routes in routes/definitions.ts to see the complete API
rg "path:" --type ts packages/server-core/src/routes/definitions.ts | head -20

Repository: VoltAgent/voltagent

Length of output: 737


🏁 Script executed:

# Get the complete list of paths to see all workflow-related routes
rg "path: \"/workflows" --type ts packages/server-core/src/routes/definitions.ts

Repository: VoltAgent/voltagent

Length of output: 515


🏁 Script executed:

# Search more broadly in the entire routes definitions file
wc -l packages/server-core/src/routes/definitions.ts

Repository: VoltAgent/voltagent

Length of output: 114


🏁 Script executed:

# Look for the replay endpoint in the entire file
cat packages/server-core/src/routes/definitions.ts | grep -n "replay" -i

Repository: VoltAgent/voltagent

Length of output: 282


🏁 Script executed:

# Check if there's a replay endpoint implementation elsewhere
rg -n "replay" --type ts packages/server-core/src/ packages/server-elysia/src/ -C2 | grep -i "post\|method\|handler\|route"

Repository: VoltAgent/voltagent

Length of output: 5370


🏁 Script executed:

# Check the handlers directory for replay-related code
ls -la packages/server-core/src/handlers/

Repository: VoltAgent/voltagent

Length of output: 1320


🏁 Script executed:

# Look at the workflow handlers to see what operations are exposed
rg -n "export.*function|export.*const" packages/server-core/src/handlers/workflow.handlers.ts | head -30

Repository: VoltAgent/voltagent

Length of output: 550


🏁 Script executed:

# Check if there's a replay handler defined somewhere
rg -n "handleReplay\|replay.*handler" --type ts packages/server-core/src/ packages/server-elysia/src/

Repository: VoltAgent/voltagent

Length of output: 45


🏁 Script executed:

# Get the actual list of all endpoints in routes/definitions.ts with method and path
rg "method:|path:" --type ts packages/server-core/src/routes/definitions.ts | grep -A1 "workflow" | head -50

Repository: VoltAgent/voltagent

Length of output: 800


🏁 Script executed:

# Search in the elysia routes file for any replay endpoint
cat packages/server-elysia/src/routes/workflow.routes.ts | grep -n "replay\|time.*travel" -i

Repository: VoltAgent/voltagent

Length of output: 265


🏁 Script executed:

# Double-check by searching the entire routes definitions for "replay" with full context
rg -n "replay" --type ts packages/server-core/src/routes/definitions.ts -B5 -A5

Repository: VoltAgent/voltagent

Length of output: 679


🏁 Script executed:

# Check the elysia routes file more comprehensively
cat packages/server-elysia/src/routes/workflow.routes.ts | grep -n "post\|get\|put\|delete" | head -40

Repository: VoltAgent/voltagent

Length of output: 423


🏁 Script executed:

# Check if the /replay endpoint exists in the route definitions at all
cat packages/server-core/src/routes/definitions.ts | grep -n "/replay" -i

Repository: VoltAgent/voltagent

Length of output: 45


🏁 Script executed:

# Search the entire codebase for any /replay route
rg "executions.*replay|/replay" --type ts -i

Repository: VoltAgent/voltagent

Length of output: 144


🏁 Script executed:

# Check if there's a handler for the replay endpoint
rg -n "handleReplay\|replayWorkflow" --type ts

Repository: VoltAgent/voltagent

Length of output: 45


🏁 Script executed:

# Check the elysia routes file for all POST endpoints to see if replay is there
cat packages/server-elysia/src/routes/workflow.routes.ts | grep -B2 "app.post"

Repository: VoltAgent/voltagent

Length of output: 505


🏁 Script executed:

# Check the actual documentation file to see what endpoints are documented
cat website/docs/api/endpoints/workflows.md | sed -n '513,613p'

Repository: VoltAgent/voltagent

Length of output: 2715


🏁 Script executed:

# Check if there are ANY replay endpoints documented further in the file
cat website/docs/api/endpoints/workflows.md | grep -n "replay\|time.*travel" -i

Repository: VoltAgent/voltagent

Length of output: 1213


🏁 Script executed:

# Verify that the /replay endpoint is actually NOT in the routes
wc -l packages/server-core/src/routes/definitions.ts && \
cat packages/server-core/src/routes/definitions.ts | grep -c "path:"

Repository: VoltAgent/voltagent

Length of output: 117


🏁 Script executed:

# Get a complete list of all endpoint definitions
rg "^\s+[a-zA-Z]+WorkflowOperations?:" --type ts packages/server-core/src/routes/definitions.ts | head -20

Repository: VoltAgent/voltagent

Length of output: 45


🏁 Script executed:

# Check if replay endpoint might be registered separately
find packages/server-elysia/src/routes -name "*.ts" -exec grep -l "replay" {} \;

Repository: VoltAgent/voltagent

Length of output: 115


🏁 Script executed:

# Search for any route handler that might implement replay
rg -n "replay" --type ts packages/server-elysia/src/routes/workflow.routes.ts

Repository: VoltAgent/voltagent

Length of output: 181


🏁 Script executed:

# Final verification: list all endpoint keys in definitions.ts
rg "^\s+[a-zA-Z]+Workflow[a-zA-Z]*:\s*\{" --type ts packages/server-core/src/routes/definitions.ts

Repository: VoltAgent/voltagent

Length of output: 270


🏁 Script executed:

# Check the documentation to confirm the endpoint claim is actually in the docs
grep -n "POST /workflows/:id/executions/:executionId/replay" website/docs/api/endpoints/workflows.md

Repository: VoltAgent/voltagent

Length of output: 133


Remove the entire "Replay Workflow" section (lines 513-613) — this endpoint does not exist in the server implementation.

The documented POST /workflows/:id/executions/:executionId/replay endpoint has no corresponding route handler in packages/server-core/src/routes/definitions.ts or packages/server-elysia/src/routes/workflow.routes.ts. The server implements only these workflow execution endpoints: execute, stream, suspend, cancel, resume, and state retrieval. The timeTravelStream method introduced in this PR exists as an SDK-only API (packages/core/src/workflow/core.ts), not as a REST endpoint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/docs/api/endpoints/workflows.md` around lines 513 - 613, Remove the
entire "Replay Workflow" documentation block that declares the POST
/workflows/:id/executions/:executionId/replay endpoint because there is no
server route for it; instead, update docs to reflect that replay/time-travel
functionality is SDK-only (referencing the timeTravelStream API) and ensure any
examples, request/response schemas, and error cases for that endpoint are
deleted so the docs only show implemented REST endpoints (execute, stream,
suspend, cancel, resume, state retrieval).

Comment thread website/docs/api/endpoints/workflows.md
Comment thread website/docs/api/endpoints/workflows.md Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
packages/server-core/src/schemas/agent.schemas.ts (1)

459-460: Consider adding .min(1) to the stepId schema field.

The handler rejects empty stepId with a 400 (line 841–847 of workflow.handlers.ts), but the schema allows it — so an empty stepId passes schema validation and is rejected only at the handler level. Adding .min(1) to the schema would surface the error earlier and produce a more informative validation message.

-  stepId: z.string().describe("Step ID to replay from"),
+  stepId: z.string().min(1).describe("Step ID to replay from"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/server-core/src/schemas/agent.schemas.ts` around lines 459 - 460,
The WorkflowReplayRequestSchema currently allows empty stepId values; update the
stepId validator in WorkflowReplayRequestSchema (the z.string().describe("Step
ID to replay from") entry) to enforce non-empty input by adding .min(1) so
schema validation rejects empty strings earlier and returns a proper validation
error instead of relying on the handler-level 400 check.
packages/core/src/workflow/time-travel.spec.ts (1)

10-13: Test cleanup should also clear activeExecutions to prevent cross-test interference in the sequential suite.

The test's beforeEach only clears the workflows Map, but the registry has a second stateful property: public activeExecutions (line 49 of registry.ts), which tracks active workflow suspensions and is actively modified during execution. Since the suite uses describe.sequential(), tests run one after another, and stale entries in activeExecutions from a previous test could interfere with subsequent tests.

Add (registry as any).activeExecutions.clear() to the beforeEach, or expose a public reset method in WorkflowRegistry to clean both maps without type-casting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/time-travel.spec.ts` around lines 10 - 13, The
beforeEach in the test clears only the workflows Map on WorkflowRegistry, but
leaves the registry's public activeExecutions state intact which can leak
between sequential tests; update the beforeEach to also clear activeExecutions
by calling (registry as any).activeExecutions.clear(), or preferably add and
call a new public reset/clear method on WorkflowRegistry that clears both
workflows and activeExecutions so tests start from a clean registry state before
each test.
packages/server-core/src/handlers/workflow.handlers.ts (1)

858-872: Import and use the exported WorkflowTimeTravelOptions type instead of the inline definition.

The inline type at lines 859–864 is a simplified replica of the exported WorkflowTimeTravelOptions from @voltagent/core. The differences include:

  • inputData and resumeData should use the typed DangerouslyAllowAny instead of unknown
  • workflowStateOverride should reference WorkflowStateStore instead of a raw Record<string, unknown>

Additionally, the return type (lines 865–871) is overly generic and differs from the actual WorkflowExecutionResult<RESULT_SCHEMA, RESUME_SCHEMA> signature—it includes an undefined startAt field and loses type safety for status and result.

Import WorkflowTimeTravelOptions from @voltagent/core and apply it to the type cast to maintain consistency with the library's type definitions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/server-core/src/handlers/workflow.handlers.ts` around lines 858 -
872, Replace the inline time-travel type on workflowWithReplay with the
library's exported types: import WorkflowTimeTravelOptions (and related types
like WorkflowExecutionResult, DangerouslyAllowAny, WorkflowStateStore) from
`@voltagent/core` and use WorkflowTimeTravelOptions as the parameter type for
timeTravel on registeredWorkflow.workflow; also update the timeTravel return
type to the proper WorkflowExecutionResult generic signature instead of the
ad-hoc object so the method signature on workflowWithReplay matches the core
definitions (refer to workflowWithReplay and registeredWorkflow.workflow to
locate the cast).
packages/core/src/workflow/core.ts (2)

2947-2960: The trailing .catch() is dead code.

.then(onFulfilled, onRejected) already handles both settlement cases — exactly one handler is called. Since neither the onFulfilled nor the onRejected branch here throws or returns a rejected promise, the promise returned by .then() always fulfills, and the trailing .catch(() => {}) never fires.

♻️ Remove the unreachable catch
  replayPromise
    .then(
      (result) => {
        if (result.status !== "suspended") {
          streamController.close();
        }
      },
      () => {
        streamController.close();
      },
-   )
-   .catch(() => {
-     // Error is surfaced through promise-backed fields on stream result.
-   });
+   );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/core.ts` around lines 2947 - 2960, The trailing
.catch on replayPromise is dead code; remove the final .catch(() => {}) and keep
the existing .then(result => { if (result.status !== "suspended")
streamController.close(); }, () => { streamController.close(); }) so both
fulfillment and rejection cases close the streamController; locate the call by
the unique symbols replayPromise and streamController in the file and delete
only the trailing .catch handler (or replace the whole chain with a .finally
that closes streamController if you prefer a single lifecycle handler).

1038-1097: Minor observability gap: logs at lines 1061–1066 lack execution context.

The logger used here is the workflow-level LoggerProxy without executionId or traceId, because runLogger isn't created until line 1123. The warn on line 1063 ("No source trace IDs found in replay workflow state metadata") won't carry the replay execution ID, making it hard to correlate with a specific replay when debugging production issues.

The lookup logic and replayedFrom construction are otherwise correct.

♻️ Consider propagating executionId in the log call
-      logger.warn("No source trace IDs found in replay workflow state metadata");
+      logger.warn("No source trace IDs found in replay workflow state metadata", {
+        replayExecutionId: executionId,
+        sourceExecutionId: options.replayFrom.executionId,
+      });
-      logger.warn("Failed to get source trace IDs for replay:", { error });
+      logger.warn("Failed to get source trace IDs for replay:", {
+        error,
+        replayExecutionId: executionId,
+        sourceExecutionId: options.replayFrom.executionId,
+      });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/core.ts` around lines 1038 - 1097, The warning
logs when resolving replay/resume trace IDs (the logger.warn calls in the replay
branch around the handling of options.replayFrom.executionId and the
corresponding catch) lack execution context; update those logger.warn
invocations to include the replay executionId (options.replayFrom.executionId)
and any available execution identifiers (e.g., executionId) in the
metadata/object passed to the logger so the message can be correlated to the
specific replay (refer to replayedFrom, executionMemory.getWorkflowState, and
logger); do the same for the catch logger.warn that logs the error (include the
executionId value in the logged object) and mirror this small change in the
resume branch warnings so all trace-resolution warnings include executionId
context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/workflow/core.ts`:
- Around line 2629-2644: The fallback matching is ambiguous because
event.from/event.name can equal a non-unique step.name; update the matching
logic in the sourceStepCompleteEvents.find call to first attempt an unambiguous
match using a stable identifier (prefer event.stepIndex === step.index if
stepIndex is present on events, or event.stepId === step.id if you emit stepId)
before falling back to name-based checks; adjust the search to check
event.stepIndex/event.stepId first, then only use event.from === step.id or
event.name === step.name as last resorts, and ensure replayStepData assignment
(and the call to toWorkflowStepStatus) uses the matched event.
- Around line 3048-3054: The resume path for time-travel streams currently
delegates from resumedStreamResult.resume to streamResult.resume which rebuilds
resume options from the original replay's metadata
(replayResult.suspension.checkpoint) causing second suspensions to replay from
the wrong checkpoint; change resumedStreamResult.resume (and the analogous logic
in the stream method) to construct resumeOptions from the current suspension
metadata (nextResult.suspension) instead of relying on
replayPromise/replayResult, i.e., use nextResult.suspension.checkpoint and
nextResult.suspension.metadata when calling resume so that subsequent resumes
start from the most recent suspension rather than the original one.

In `@packages/server-core/src/handlers/workflow.handlers.ts`:
- Around line 905-912: The httpStatus decision incorrectly treats any error
message containing "step" as a 400; update the conditional logic around
normalizedMessage in the httpStatus assignment (the variable named httpStatus
using normalizedMessage.includes(...)) to remove the broad
normalizedMessage.includes("step") check and either delete that clause entirely
or replace it with a precise matcher for missing-step cases (e.g., only match
"step not found" or a regex like /step.*not found/ or /step.*'[^']*' not found/)
so only genuine "step not found" errors map to 404/400 as intended and other
"step" mentions continue to result in 500.

---

Nitpick comments:
In `@packages/core/src/workflow/core.ts`:
- Around line 2947-2960: The trailing .catch on replayPromise is dead code;
remove the final .catch(() => {}) and keep the existing .then(result => { if
(result.status !== "suspended") streamController.close(); }, () => {
streamController.close(); }) so both fulfillment and rejection cases close the
streamController; locate the call by the unique symbols replayPromise and
streamController in the file and delete only the trailing .catch handler (or
replace the whole chain with a .finally that closes streamController if you
prefer a single lifecycle handler).
- Around line 1038-1097: The warning logs when resolving replay/resume trace IDs
(the logger.warn calls in the replay branch around the handling of
options.replayFrom.executionId and the corresponding catch) lack execution
context; update those logger.warn invocations to include the replay executionId
(options.replayFrom.executionId) and any available execution identifiers (e.g.,
executionId) in the metadata/object passed to the logger so the message can be
correlated to the specific replay (refer to replayedFrom,
executionMemory.getWorkflowState, and logger); do the same for the catch
logger.warn that logs the error (include the executionId value in the logged
object) and mirror this small change in the resume branch warnings so all
trace-resolution warnings include executionId context.

In `@packages/core/src/workflow/time-travel.spec.ts`:
- Around line 10-13: The beforeEach in the test clears only the workflows Map on
WorkflowRegistry, but leaves the registry's public activeExecutions state intact
which can leak between sequential tests; update the beforeEach to also clear
activeExecutions by calling (registry as any).activeExecutions.clear(), or
preferably add and call a new public reset/clear method on WorkflowRegistry that
clears both workflows and activeExecutions so tests start from a clean registry
state before each test.

In `@packages/server-core/src/handlers/workflow.handlers.ts`:
- Around line 858-872: Replace the inline time-travel type on workflowWithReplay
with the library's exported types: import WorkflowTimeTravelOptions (and related
types like WorkflowExecutionResult, DangerouslyAllowAny, WorkflowStateStore)
from `@voltagent/core` and use WorkflowTimeTravelOptions as the parameter type for
timeTravel on registeredWorkflow.workflow; also update the timeTravel return
type to the proper WorkflowExecutionResult generic signature instead of the
ad-hoc object so the method signature on workflowWithReplay matches the core
definitions (refer to workflowWithReplay and registeredWorkflow.workflow to
locate the cast).

In `@packages/server-core/src/schemas/agent.schemas.ts`:
- Around line 459-460: The WorkflowReplayRequestSchema currently allows empty
stepId values; update the stepId validator in WorkflowReplayRequestSchema (the
z.string().describe("Step ID to replay from") entry) to enforce non-empty input
by adding .min(1) so schema validation rejects empty strings earlier and returns
a proper validation error instead of relying on the handler-level 400 check.

Comment thread packages/core/src/workflow/core.ts Outdated
Comment thread packages/core/src/workflow/core.ts Outdated
Comment thread packages/server-core/src/handlers/workflow.handlers.ts
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="website/docs/api/endpoints/workflows.md">

<violation number="1" location="website/docs/api/endpoints/workflows.md:650">
P3: `replayedFromExecutionId` is documented elsewhere as the original execution ID; this example value is labeled like a replay execution ID, which inverts the lineage and can mislead users. Use an original/source execution ID in the example.</violation>
</file>

<file name="packages/core/src/workflow/core.ts">

<violation number="1" location="packages/core/src/workflow/core.ts:158">
P1: `Number("")` returns `0` which passes `Number.isInteger`, so an empty-string `metadata.stepIndex` is silently treated as step index 0. This can cause a false match against step 0 in the time-travel fallback logic, loading incorrect historical data into the replay. Add a guard for non-empty strings and non-negative values.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

return metadataStepIndex;
}

if (typeof metadataStepIndex === "string") {
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Number("") returns 0 which passes Number.isInteger, so an empty-string metadata.stepIndex is silently treated as step index 0. This can cause a false match against step 0 in the time-travel fallback logic, loading incorrect historical data into the replay. Add a guard for non-empty strings and non-negative values.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/src/workflow/core.ts, line 158:

<comment>`Number("")` returns `0` which passes `Number.isInteger`, so an empty-string `metadata.stepIndex` is silently treated as step index 0. This can cause a false match against step 0 in the time-travel fallback logic, loading incorrect historical data into the replay. Add a guard for non-empty strings and non-negative values.</comment>

<file context>
@@ -139,6 +139,49 @@ const toWorkflowStepStatus = (
+    return metadataStepIndex;
+  }
+
+  if (typeof metadataStepIndex === "string") {
+    const parsed = Number(metadataStepIndex);
+    if (Number.isInteger(parsed)) {
</file context>
Fix with Cubic

"executionId": "exec_1234567890_abc123",
"workflowId": "order-approval",
"status": "suspended",
"replayedFromExecutionId": "exec_0987654321_replay",
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: replayedFromExecutionId is documented elsewhere as the original execution ID; this example value is labeled like a replay execution ID, which inverts the lineage and can mislead users. Use an original/source execution ID in the example.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At website/docs/api/endpoints/workflows.md, line 650:

<comment>`replayedFromExecutionId` is documented elsewhere as the original execution ID; this example value is labeled like a replay execution ID, which inverts the lineage and can mislead users. Use an original/source execution ID in the example.</comment>

<file context>
@@ -627,6 +647,8 @@ Retrieve the current state of a workflow execution.
     "executionId": "exec_1234567890_abc123",
     "workflowId": "order-approval",
     "status": "suspended",
+    "replayedFromExecutionId": "exec_0987654321_replay",
+    "replayFromStepId": "step-approval-1",
     "startAt": "2024-01-15T10:00:00.000Z",
</file context>
Suggested change
"replayedFromExecutionId": "exec_0987654321_replay",
"replayedFromExecutionId": "exec_0987654321_original",
Fix with Cubic

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/core/src/workflow/time-travel.spec.ts (1)

10-14: Registry cleanup accesses private members via any cast.

This pattern ((registry as any).workflows.clear()) is fragile — if the internal property is renamed or restructured, the test will silently stop cleaning up. Consider whether WorkflowRegistry could expose a reset() or clear() method for test use.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/time-travel.spec.ts` around lines 10 - 14, The
test is reaching into WorkflowRegistry internals by casting to any and calling
(registry as any).workflows.clear() and activeExecutions.clear(), which is
brittle; add a public test-only method on WorkflowRegistry such as reset() or
clear() that encapsulates clearing its internal maps, then update the spec to
call WorkflowRegistry.getInstance().reset() (or .clear()) instead of accessing
workflows/activeExecutions directly; reference the existing WorkflowRegistry
class and its getInstance method and the internal maps named workflows and
activeExecutions when implementing the new method.
packages/server-core/src/handlers/workflow.handlers.ts (1)

848-848: body parameter is typed as any — consider using the Zod-inferred type.

Since WorkflowReplayRequestSchema exists in the schemas package, you could type body as z.infer<typeof WorkflowReplayRequestSchema> for stronger compile-time guarantees. This matches how the schema is enforced at the route layer and would surface mismatches earlier.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/server-core/src/handlers/workflow.handlers.ts` at line 848, The
handler currently destructures properties from an untyped body; update the
handler parameter type to z.infer<typeof WorkflowReplayRequestSchema> (importing
WorkflowReplayRequestSchema from the schemas package and z from 'zod' if needed)
so that body is strongly typed, then keep the existing destructuring of stepId,
inputData, resumeData, workflowStateOverride from body; this will align the
handler with the route validation and surface type mismatches at compile time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/workflow/core.ts`:
- Around line 2627-2820: prepareTimeTravelExecution always uses defaultMemory
which breaks replays for executions stored in a custom memory adapter; add an
optional memory?: MemoryAdapter (or appropriate type) to
WorkflowTimeTravelOptions and update prepareTimeTravelExecution signature to
accept/use a memory parameter (falling back to defaultMemory) so calls to
getWorkflowState and setWorkflowState use that adapter; update callers
timeTravel and timeTravelStream to pass options.memory through (mirror how run()
and restartExecution() handle options?.memory) and ensure
restartCheckpoint/getRestartCheckpointFromMetadata logic still works with the
chosen adapter.

---

Duplicate comments:
In `@packages/core/src/workflow/core.ts`:
- Around line 2686-2712: The name-based fallback can produce false matches when
multiple steps share the same step.name; change the fallback to only perform
name-based matching when the name is unambiguous: before evaluating the
name-based branch in the fallbackEvent search (the code that references
getEventStepIndex, getEventStepId, event.from, event.name, step.id, step.name on
sourceStepCompleteEvents), check that no other workflow step shares step.name
(i.e., verify uniqueness among the steps collection) and only then allow
matching by name; if the name is not unique, skip the name checks so
fallbackEvent only matches by step index or step id, then continue populating
replayStepData[step.id] with input/output/status via toWorkflowStepStatus as
before.

---

Nitpick comments:
In `@packages/core/src/workflow/time-travel.spec.ts`:
- Around line 10-14: The test is reaching into WorkflowRegistry internals by
casting to any and calling (registry as any).workflows.clear() and
activeExecutions.clear(), which is brittle; add a public test-only method on
WorkflowRegistry such as reset() or clear() that encapsulates clearing its
internal maps, then update the spec to call
WorkflowRegistry.getInstance().reset() (or .clear()) instead of accessing
workflows/activeExecutions directly; reference the existing WorkflowRegistry
class and its getInstance method and the internal maps named workflows and
activeExecutions when implementing the new method.

In `@packages/server-core/src/handlers/workflow.handlers.ts`:
- Line 848: The handler currently destructures properties from an untyped body;
update the handler parameter type to z.infer<typeof WorkflowReplayRequestSchema>
(importing WorkflowReplayRequestSchema from the schemas package and z from 'zod'
if needed) so that body is strongly typed, then keep the existing destructuring
of stepId, inputData, resumeData, workflowStateOverride from body; this will
align the handler with the route validation and surface type mismatches at
compile time.

Comment thread packages/core/src/workflow/core.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/workflow/core.ts`:
- Around line 3070-3081: The resumeOptions object created for replay/stream
resumes (the one that sets executionId and resumeFrom) fails to include the
effective memory adapter, so resumes use defaultMemory and lose state; update
the resumeOptions construction in the resume path (where
resumedSuspendController is created) to include the same memory adapter used by
the original run (e.g., add memory: effectiveMemory or memoryAdapter/the
workflow run's memory property) so the resumed WorkflowRunOptions carries the
same memory instance for reads/writes during replay/stream resumes.

In `@packages/server-core/src/handlers/workflow.handlers.ts`:
- Around line 904-912: The current httpStatus classification using
message/normalizedMessage in workflow.handlers.ts assigns many replay-prep
errors a 500; update the logic that sets httpStatus (the normalizedMessage
checks that produce 404/400/500) to treat user-correctable replay-prep
messages—e.g., phrases like "missing historical snapshots", "missing snapshot",
"missing input", "no historical", or "missing history"—as 400 (bad request)
instead of 500 while keeping the existing "not found" -> 404 mapping; modify the
conditional that computes httpStatus (refer to the variables message,
normalizedMessage, and httpStatus) to include these additional
normalizedMessage.includes(...) checks so those errors return 400.

Comment on lines +3070 to +3081
const resumedSuspendController = createDefaultSuspendController();
const resumeOptions: WorkflowRunOptions = {
executionId: suspendedResult.executionId,
resumeFrom: {
executionId: suspendedResult.executionId,
checkpoint: suspendedResult.suspension.checkpoint,
resumeStepIndex,
resumeData: resumeInput,
},
suspendController: resumedSuspendController,
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate the effective memory adapter into replay/stream resumes.

Line 3072–3079 and Line 3248–3254: resume options omit memory, so a replay/stream that started with a custom memory adapter will resume against defaultMemory and fail to find state. Carry the same memory through resume to keep state reads/writes consistent.

🧩 Suggested fix
   timeTravelStream: (timeTravelOptions: WorkflowTimeTravelOptions) => {
     const streamController = new WorkflowStreamController();
     const executionId = randomUUID();
     const startAt = new Date();
+    const replayMemory = timeTravelOptions.memory ?? defaultMemory;
     const suspendController = createDefaultSuspendController();
@@
-        const replayExecutionOptions: WorkflowRunOptions = {
-          ...preparedReplay.executionOptions,
-          suspendController,
-        };
+        const replayExecutionOptions: WorkflowRunOptions = {
+          ...preparedReplay.executionOptions,
+          memory: replayMemory,
+          suspendController,
+        };
@@
-        const resumeOptions: WorkflowRunOptions = {
-          executionId: suspendedResult.executionId,
-          resumeFrom: {
-            executionId: suspendedResult.executionId,
-            checkpoint: suspendedResult.suspension.checkpoint,
-            resumeStepIndex,
-            resumeData: resumeInput,
-          },
-          suspendController: resumedSuspendController,
-        };
+        const resumeOptions: WorkflowRunOptions = {
+          executionId: suspendedResult.executionId,
+          memory: replayMemory,
+          resumeFrom: {
+            executionId: suspendedResult.executionId,
+            checkpoint: suspendedResult.suspension.checkpoint,
+            resumeStepIndex,
+            resumeData: resumeInput,
+          },
+          suspendController: resumedSuspendController,
+        };
@@
     stream: (input: WorkflowInput<INPUT_SCHEMA>, options?: WorkflowRunOptions) => {
       const streamController = new WorkflowStreamController();
       const executionId = options?.executionId || crypto.randomUUID();
+      const executionMemory = options?.memory ?? defaultMemory;
@@
       const executionOptions: WorkflowRunOptions = {
         ...options,
         executionId,
         suspendController,
+        memory: executionMemory,
       };
@@
-        const resumeOptions: WorkflowRunOptions = {
-          executionId: suspendedResult.executionId,
-          resumeFrom: {
-            executionId: suspendedResult.executionId,
-            checkpoint: suspendedResult.suspension.checkpoint,
-            resumeStepIndex,
-            resumeData: resumeInput,
-          },
-          suspendController: resumedSuspendController,
-        };
+        const resumeOptions: WorkflowRunOptions = {
+          executionId: suspendedResult.executionId,
+          memory: executionMemory,
+          resumeFrom: {
+            executionId: suspendedResult.executionId,
+            checkpoint: suspendedResult.suspension.checkpoint,
+            resumeStepIndex,
+            resumeData: resumeInput,
+          },
+          suspendController: resumedSuspendController,
+        };

Also applies to: 3246-3256

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/workflow/core.ts` around lines 3070 - 3081, The
resumeOptions object created for replay/stream resumes (the one that sets
executionId and resumeFrom) fails to include the effective memory adapter, so
resumes use defaultMemory and lose state; update the resumeOptions construction
in the resume path (where resumedSuspendController is created) to include the
same memory adapter used by the original run (e.g., add memory: effectiveMemory
or memoryAdapter/the workflow run's memory property) so the resumed
WorkflowRunOptions carries the same memory instance for reads/writes during
replay/stream resumes.

Comment on lines +904 to +912
const message = error instanceof Error ? error.message : "Failed to replay workflow";
const normalizedMessage = message.toLowerCase();
const httpStatus = normalizedMessage.includes("not found")
? 404
: normalizedMessage.includes("cannot time travel") ||
normalizedMessage.includes("still running") ||
normalizedMessage.includes("belongs to workflow")
? 400
: 500;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Classify replay-prep errors as 400 instead of 500.

Line 906–912: errors like “missing historical snapshots/input” are user-correctable; returning 500 hides actionable feedback. Consider mapping those to 400.

🧩 Suggested update
-    const httpStatus = normalizedMessage.includes("not found")
+    const httpStatus = normalizedMessage.includes("not found")
       ? 404
-      : normalizedMessage.includes("cannot time travel") ||
-          normalizedMessage.includes("still running") ||
-          normalizedMessage.includes("belongs to workflow")
+      : normalizedMessage.includes("cannot time travel") ||
+          normalizedMessage.includes("still running") ||
+          normalizedMessage.includes("belongs to workflow") ||
+          normalizedMessage.includes("missing historical") ||
+          normalizedMessage.includes("missing persisted workflow input") ||
+          normalizedMessage.includes("missing historical input data")
         ? 400
         : 500;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/server-core/src/handlers/workflow.handlers.ts` around lines 904 -
912, The current httpStatus classification using message/normalizedMessage in
workflow.handlers.ts assigns many replay-prep errors a 500; update the logic
that sets httpStatus (the normalizedMessage checks that produce 404/400/500) to
treat user-correctable replay-prep messages—e.g., phrases like "missing
historical snapshots", "missing snapshot", "missing input", "no historical", or
"missing history"—as 400 (bad request) instead of 500 while keeping the existing
"not found" -> 404 mapping; modify the conditional that computes httpStatus
(refer to the variables message, normalizedMessage, and httpStatus) to include
these additional normalizedMessage.includes(...) checks so those errors return
400.

@omeraplak omeraplak merged commit 160e60b into main Feb 22, 2026
23 checks passed
@omeraplak omeraplak deleted the feat/workflow-time-travel-deterministic-replay-clean branch February 22, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant