Skip to content

Add .rollback() api for workflow step promises#6330

Open
vaishnav-mk wants to merge 16 commits intocloudflare:mainfrom
vaishnav-mk:vaish/feat/saga-rollback-api
Open

Add .rollback() api for workflow step promises#6330
vaishnav-mk wants to merge 16 commits intocloudflare:mainfrom
vaishnav-mk:vaish/feat/saga-rollback-api

Conversation

@vaishnav-mk
Copy link
Copy Markdown
Contributor

adds steppromise wrapper with .rollback() api for workflow saga rollbacks. step.do() and step.waitforevent() now return steppromise which lets users register a compensation function before the step rpc fires.

  • sleep/sleepuntil are unchanged.
  • includes type declarations and tests.

@vaishnav-mk vaishnav-mk requested review from a team as code owners March 16, 2026 10:01
@vaishnav-mk
Copy link
Copy Markdown
Contributor Author

/bigbonk review this pr?

@vaishnav-mk
Copy link
Copy Markdown
Contributor Author

/bonk review this pr

@vaishnav-mk
Copy link
Copy Markdown
Contributor Author

the overlord hates me

@jasnell
Copy link
Copy Markdown
Collaborator

jasnell commented Mar 18, 2026

/bigbonk review this pr

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of StepPromise rollback API.

This review was generated by an AI assistant and may contain inaccuracies.

Summary of findings:

  • StepPromise is not a real Promise — it's a thenable. The d.ts declares StepPromise<T> extends Promise<T> which is misleading and will cause instanceof Promise to return false.
  • The rollback function return type in rpc.d.ts says Promise<void> but the test returns a value from the rollback fn (and verifies it). One of these needs to match the other.
  • Constructor-time prototype patching runs on every instantiation, even though it only patches once. This works but adds overhead on every new call for the hasOwnProperty + typeof + symbol check.
  • The generated-snapshot/ types were not regenerated — CI will likely fail the diff check.
  • Copyright year on new files should be 2026 (current year), not 2024.

Comment on lines +328 to +336
export interface StepPromise<T> extends Promise<T> {
rollback(
fn: (ctx: RollbackContext<T>) => Promise<void>,
): Promise<T>;
rollback(
config: WorkflowStepConfig,
fn: (ctx: RollbackContext<T>) => Promise<void>,
): Promise<T>;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StepPromise<T> extends Promise<T> is inaccurate. The runtime StepPromise class is a thenable (it has .then(), .catch(), .finally()) but does NOT extend Promise — it's a plain class. This means stepPromise instanceof Promise will be false at runtime, which contradicts this declaration.

Two options:

  1. Make the runtime StepPromise a proper subclass of Promise (complex, but accurate).
  2. Change the d.ts to declare StepPromise<T> as a PromiseLike<T> with the extra .rollback() method — this more accurately describes the runtime behavior.

Option 2 is simpler and more honest:

Suggested change
export interface StepPromise<T> extends Promise<T> {
rollback(
fn: (ctx: RollbackContext<T>) => Promise<void>,
): Promise<T>;
rollback(
config: WorkflowStepConfig,
fn: (ctx: RollbackContext<T>) => Promise<void>,
): Promise<T>;
}
export interface StepPromise<T> extends PromiseLike<T> {
rollback(
fn: (ctx: RollbackContext<T>) => Promise<void>,
): StepPromise<T>;
rollback(
config: WorkflowStepConfig,
fn: (ctx: RollbackContext<T>) => Promise<void>,
): StepPromise<T>;
}

Note: I also changed the return type of .rollback() from Promise<T> to StepPromise<T> since the runtime returns this. Returning Promise<T> prevents TypeScript from seeing .rollback() on the return value, which is fine for the "only once" semantics but makes the chained .rollback().rollback() error test impossible to type-check.


export interface StepPromise<T> extends Promise<T> {
rollback(
fn: (ctx: RollbackContext<T>) => Promise<void>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rollback callback return type is Promise<void>, but the test RollbackCallableWorkflow returns a non-void value from its rollback fn:

.rollback(async ({ error, output, stepName }) => {
  return { receivedError: error, receivedOutput: output, receivedStepName: stepName };
});

And the mock verifies this returned value (mock.rollbackResult). Either the d.ts should allow a return value (Promise<unknown>) or the test is testing unsupported behavior.

Which is the intended contract? If the engine will use the return value, Promise<void> is wrong.

};
}

class WorkflowEntrypointWrapper extends BaseWorkflowEntrypoint {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prototype patching in the constructor runs the hasOwnProperty + typeof + symbol check on every new UserWorkflow() instantiation, even though it only patches once (the first time). This is fine for correctness (the kStepWrapped guard prevents double-patching), but it's worth noting that in a hot path with many instantiations, this is unnecessary repeated work.

An alternative would be to patch once per class (e.g., via a WeakSet of already-patched prototypes, or patching at class definition time instead of construction time). Not blocking, but worth considering.

export const ServiceStub = entrypoints.ServiceStub;
export const WorkflowEntrypoint = entrypoints.WorkflowEntrypoint;

type RollbackFn = ((...args: unknown[]) => Promise<void>) | null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RollbackFn type says it returns Promise<void>, matching the d.ts. But as noted in the rpc.d.ts comment, the test actually returns a value from the rollback fn. This type should be consistent with the actual contract.

If the engine can use the return value:

Suggested change
type RollbackFn = ((...args: unknown[]) => Promise<void>) | null;
type RollbackFn = ((...args: unknown[]) => Promise<unknown>) | null;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be addressed?

// to classify entrypoints (workflowClasses vs actorClasses vs statelessClasses). A Proxy breaks
// identity comparison: `Proxy !== target`. A JS subclass preserves the chain:
// UserWorkflow -> our JS class -> C++ WorkflowEntrypoint (matched by runtime)
const BaseWorkflowEntrypoint =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast as unknown as new (...args: unknown[]) => Record<string, unknown> discards all type information from the C++ entrypoint. This is understandable given the boundary between the internal module and TypeScript, but worth a comment explaining that this is intentional and that the type is deliberately widened because we only need the constructor chain for runtime classification.

Also: if the C++ WorkflowEntrypoint constructor signature ever changes (e.g., adds required parameters), this cast will silently hide the mismatch. Consider adding a brief note about this risk.

@@ -0,0 +1,100 @@
// Copyright (c) 2024 Cloudflare, Inc.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright year should be 2026, not 2024.

modules = [
(name = "worker", esModule = embed "workflows-step-promise-test.js"),
],
compatibilityFlags = ["nodejs_compat", "experimental"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the test worker and the workflows worker use "experimental" compat flag. Is there an intent to gate the rollback feature behind a specific compat flag? If this is a new API surface, it probably should be gated rather than always available. If it's intentionally not gated (because the wrapper is purely additive and doesn't change existing behavior), a comment explaining this decision would be helpful.

#promise: Promise<unknown> | null = null;
#launched = false;

get [Symbol.toStringTag](): string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbol.toStringTag returns 'StepPromise' which means Object.prototype.toString.call(sp) gives [object StepPromise]. This is a nice touch, but it further highlights that this is NOT a real Promise — Object.prototype.toString.call(new Promise(() => {})) gives [object Promise]. Users who check the string tag to distinguish promises from other objects will get unexpected results. This is probably fine for the workflow context, but worth documenting in the class comment.


// Wraps the step RPC stub so that step.do() and step.waitForEvent() return StepPromise instances.
// sleep() and sleepUntil() are passed through unchanged (they return Promise<void>, no rollback).
function wrapStep(jsStep: StepRpcStub): unknown {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type is unknown. Since this object is passed to user code as step in run(event, step), the user sees untyped methods. The type safety is provided by the d.ts declarations in rpc.d.ts, so this is fine at runtime, but it means the wrapper layer has zero internal type safety — any method name typo in this object literal (e.g., sleepuntil instead of sleepUntil) would be silently accepted by TypeScript. Consider typing the return value as matching the StepRpcStub interface (or a similar shape) to catch mistakes.

proto &&
Object.prototype.hasOwnProperty.call(proto, 'run') &&
typeof proto.run === 'function' &&
!proto[kStepWrapped]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: proto[kStepWrapped] is set to true (a boolean), but the check is !proto[kStepWrapped] which relies on truthiness. This is fine, but since the symbol is on a prototype shared by all instances, it's worth noting that this is visible to user code via Object.getOwnPropertySymbols(). Users could theoretically observe or tamper with it. Using a WeakSet<object> instead of a symbol property would hide this implementation detail from user introspection.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 18, 2026

Review posted successfully. Here's a summary of the key findings:

High priority:

  1. StepPromise is not a real Promise — the d.ts declares StepPromise<T> extends Promise<T> but the runtime class is a thenable (plain class with .then()/.catch()/.finally()). instanceof Promise will be false. Suggested changing to extends PromiseLike<T>.

  2. Rollback function return type mismatchrpc.d.ts declares rollback callbacks as returning Promise<void>, but the test RollbackCallableWorkflow returns a value and the mock verifies it. The types and tests need to agree on the contract.

  3. generated-snapshot/ not regenerated — CI validates that the checked-in snapshots match the generated output. Since rpc.d.ts was modified, the snapshots need to be regenerated (likely via just generate-types).

Medium priority:
4. Copyright year — new files use 2024 instead of 2026.
5. Compat flag gating — no discussion of whether this new API surface should be gated behind a compat flag.
6. wrapStep return type is unknown — the wrapper object has no internal type safety, so method name typos would be silently accepted.

Low priority:
7. Constructor-time prototype patching repeats checks on every instantiation.
8. kStepWrapped symbol is observable via Object.getOwnPropertySymbols().
9. Truthiness check if (rollbackFn) could be more precise as if (rollbackFn !== null).

github run

@vaishnav-mk
Copy link
Copy Markdown
Contributor Author

hey @anonrig, i made a couple of changes based on bonk's review that i missed. specifically, StepPromise now extends Promise with a Symbol.species override so instanceof Promise is true, finally() works correctly and catch()/finally() overrides are no longer needed

apart from that, i've gated behind workflows_step_rollback experimental compat flag

Comment on lines +200 to +201
originalRun: (event: unknown, step: unknown, ...rest: unknown[]) => unknown
): (event: unknown, step: unknown, ...rest: unknown[]) => unknown {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 arguments duplication can be avoided. Not a blocker but this is also applied on line 203. (so there is technically 3 duplicated tyeps here...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, done, extracted into a RunFn type alias, used in all 3 places!

Comment on lines +268 to +270
export const WorkflowEntrypoint = Cloudflare.compatibilityFlags[
'workflows_step_rollback'
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid the [...] access. This is not type safe. Let's add a global type for this compat flag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, saw this in a couple of other places and decided to go with this. i've switched to dot access (Cloudflare.compatibilityFlags.workflows_step_rollback) and hopefully that's better

@anonrig anonrig requested a review from jasnell March 30, 2026 14:30

// Wraps the step RPC stub so that step.do() and step.waitForEvent() return StepPromise instances.
// sleep() and sleepUntil() are passed through unchanged (they return Promise<void>, no rollback).
function wrapStep(jsStep: StepRpcStub): WrappedStep {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to wrap it with an object instead of a Proxy? this will make it so that every new step.* will also need to be added here, wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, updated!

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 31, 2026

Merging this PR will improve performance by 25.73%

⚡ 1 improved benchmark
✅ 69 untouched benchmarks
⏩ 129 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
bodyWithHeaders[Response] 38.7 µs 30.8 µs +25.73%

Comparing vaishnav-mk:vaish/feat/saga-rollback-api (5645187) with main (ecfaecd)

Open in CodSpeed

Footnotes

  1. 129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@vaishnav-mk vaishnav-mk force-pushed the vaish/feat/saga-rollback-api branch from 5645187 to 6bf7aa2 Compare April 2, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants