Fix Durable Object ordering when mixing RPC and fetch calls by threepointone · Pull Request #6562 · cloudflare/workerd

threepointone · 2026-04-11T11:36:23Z

Summary

When a caller fires interleaved stub.rpc() and stub.fetch() calls on the same Durable Object stub without awaiting, all fetch calls were processed before all RPC calls, regardless of send order. This violated the expected E-order guarantee.

// Caller sends: rpc-0, fetch-1, rpc-2, fetch-3, rpc-4, fetch-5, ...
//
// Before fix:   fetch-1, fetch-3, fetch-5, ..., rpc-0, rpc-2, rpc-4, ...
// After fix:    rpc-0, fetch-1, rpc-2, fetch-3, rpc-4, fetch-5, ...

Same-type ordering (pure RPC or pure fetch) was already correct. This fix addresses the cross-type ordering.

Root Cause

stub.fetch() and stub.rpc() both create a WorkerInterface via ActorChannel::startRequest(), but they call different methods on it — request() for fetch, customEvent() for RPC. These two paths reach the DO's InputGate through a very different number of async hops:

fetch path (~1 hop to InputGate):

WorkerEntrypoint::request()
  → context.run()
    → InputGate::wait()      ← queued here

RPC path (~4+ hops to InputGate):

WorkerEntrypoint::customEvent()
  → JsRpcSessionCustomEvent::run()
    → delivered()
    → Cap'n Proto session setup + capability fulfillment
       → JsRpcTargetBase::call()
         → co_await kj::yield()
         → ctx.run()
           → InputGate::wait()   ← queued here

Since all operations originate from the same synchronous JS execution, fetch calls (fewer hops) always enqueued at the InputGate before any RPC calls (more hops).

Fix

Eagerly acquire the InputGate position in JsRpcSessionCustomEvent::run(), right after delivered() — the same level where WorkerEntrypoint::request() acquires it via context.run(). The lock is stored on EntrypointJsRpcTarget and consumed by the first ctx.run() call via the existing IoContext::run(func, Maybe<InputGate::Lock>) overload (the same pattern used by ensureConstructedImpl).

Key properties:

The lock is consumed on first use — it is NOT held for the RPC session lifetime
The kj::yield() in JsRpcTargetBase::call() still runs (InputGate position is already reserved, so ExternalPusher ordering is preserved)
Non-actor RPC is unaffected (guarded by KJ_IF_SOME(a, ioctx.getActor()))
TransientJsRpcTarget (within-session stubs) is unaffected — only EntrypointJsRpcTarget uses the pre-acquired lock
The ServerTopLevelMembrane ensures exactly one call per getClientForOneCall() session, so the lock always matches the single call

Performance

The InputGate lock is acquired a few event loop turns earlier than before — during the Cap'n Proto session setup and kj::yield() period (microseconds). ExternalPusher calls (pushByteStream, pushAbortSignal) don't need the InputGate, so they are unaffected. Other events wait one extra turn at most. For single-caller latency, the gate is free and wait() returns immediately.

Changes

worker-rpc.c++: 27 lines — early InputGate acquisition in JsRpcSessionCustomEvent::run(), lock threading through JsRpcTargetBase to ctx.run()
js-rpc-test.js: New OrderingActor DO class + 3 test cases (mixedRpcFetchOrdering, pureRpcOrdering, pureFetchOrdering)
js-rpc-test.wd-test: Register OrderingActor binding and namespace

Test plan

mixedRpcFetchOrdering — interleaves 20 RPC and fetch calls, asserts send-order delivery (the primary bug)
pureRpcOrdering — 20 fire-and-forget RPC calls preserve order (regression)
pureFetchOrdering — 20 fire-and-forget fetch calls preserve order (regression)
Existing eOrderTest passes (ExternalPusher + e-order regression)
Existing receiveStubOverRpc passes (service stub RPC ordering regression)
Existing namedActorBinding passes (DO RPC regression)
Full js-rpc-test@ suite passes locally

Reproduction

Repro repo: https://github.com/threepointone/test-do-rpc-fetch-ordering
Live demo: curl https://test-ordering.threepointone.workers.dev/test-all?n=20

Root cause analysis, fix design, and implementation developed in collaboration with Cursor.

Made with Cursor

ask-bonk · 2026-04-11T11:41:45Z

LGTM

github run

When a caller fires interleaved stub.rpc() and stub.fetch() calls on the same DO stub without awaiting, all fetch calls were processed before all RPC calls, regardless of send order. This violated the expected E-order guarantee for actors. Root cause: fetch reaches the InputGate in ~1 async hop (via WorkerEntrypoint::request() -> context.run() -> InputGate::wait()), while RPC takes ~4+ hops (customEvent -> JsRpcSessionCustomEvent::run() -> Cap'n Proto session setup -> capability fulfillment -> pipelined call dispatch -> JsRpcTargetBase::call() -> kj::yield() -> ctx.run() -> InputGate::wait()). Since all operations originate from the same synchronous JS execution, fetch calls always reached the FIFO queue before any RPC calls. Fix: eagerly acquire the InputGate position in JsRpcSessionCustomEvent::run() — at the same level where WorkerEntrypoint::request() acquires it — then thread the lock through to the first ctx.run() call via the existing IoContext::run(func, Maybe<InputGate::Lock>) overload. The lock is consumed by the first RPC method invocation and is NOT held for the session lifetime. The kj::yield() in JsRpcTargetBase::call() and ExternalPusher ordering are unaffected. Fixes #6561 Made-with: Cursor

kentonv · 2026-04-15T15:35:57Z

This is a clever solution and I mostly like it.

One possible issue is, technically, this would allow a client (of the capnp RPC interface) to lock up a DO by sending the initial call to open an RPC session, and then failing to send the actual call. It basically gives clients the ability to take a lock on a DO, which obviously they shouldn't have.

This is an internal RPC interface, though, not exposed to malicious clients. So we don't really have to worry about this being used in some sort of a DoS attack. But I suppose it's technically possible that due to a bug or a poorly-timed network hiccup, the "call" message may be significantly delayed after the "open RPC session" message.

Ugh I can't quite convince myself that this is fine, nor can I think of an easy way to make it safe.

A much deeper, more thorough solution would be to do what I've always wanted to do and make the KJ event loop ordering fully depth-first. I think that would just solve this problem inherently. But, that's a pretty delicate change with impact across the whole runtime that should be strictly good in theory, but may have unintended consequences (stuff unintentionally depending on the current ordering)... hmm.

Meanwhile, though, I wonder how you're using this.

I'm not sure you can actually safely rely on this in practice, due to hibernation. If you are relying on an initial call to initialize some in-memory state, and need that in-memory state to be initialized for the second call... there's no guarantee that the DO doesn't hibernate in between. True, if you make these calls in rapid succession, it's extremely unlikely that hibernation would happen, but it's always theoretically possible if a packet gets lost somewhere resulting in a delay, or if the runtime suddenly decides it needs to shut the DO down for an update or whatnot.

The safe thing to do is to have the first call return an RpcTarget, and the second call is pipelined on that. That always guarantees the two calls land on the same instance (or, rarely, the second call fails).

Could that approach work for your use case? I personally have been using this pattern a fair amount and feel like it turns out nicely.

threepointone requested review from a team as code owners April 11, 2026 11:36

threepointone mentioned this pull request Apr 11, 2026

Durable Object ordering is not preserved when mixing RPC and fetch calls on the same stub #6561

Open

threepointone force-pushed the fix/do-rpc-fetch-ordering branch from 42bf758 to 3cad02c Compare April 11, 2026 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Durable Object ordering when mixing RPC and fetch calls#6562

Fix Durable Object ordering when mixing RPC and fetch calls#6562
threepointone wants to merge 1 commit intomainfrom
fix/do-rpc-fetch-ordering

threepointone commented Apr 11, 2026

Uh oh!

ask-bonk bot commented Apr 11, 2026

Uh oh!

kentonv commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

threepointone commented Apr 11, 2026

Summary

Root Cause

Fix

Performance

Changes

Test plan

Reproduction

Uh oh!

ask-bonk bot commented Apr 11, 2026

Uh oh!

kentonv commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants