Skip to content

Refactor jsRpcSession span parenting and ownership#6704

Open
Ankcorn wants to merge 16 commits intomainfrom
tankcorn/jsrpc-span-parenting
Open

Refactor jsRpcSession span parenting and ownership#6704
Ankcorn wants to merge 16 commits intomainfrom
tankcorn/jsrpc-span-parenting

Conversation

@Ankcorn
Copy link
Copy Markdown
Member

@Ankcorn Ankcorn commented Apr 30, 2026

Prep for native jsRpcSession span enrichment.

Background

jsRpcSession is the user span for one RPC session. Today it's created by Fetcher::getClientForOneCall (HTTP) and kept alive via .attach() on the
WorkerInterface. AIG enrichment needs to write to this span from callImpl's response handler — but .attach() makes it unreachable, and Fetcher constructing an
RPC-layer event blurs HTTP/RPC layering.

Commit 1 — Test: jsRpcSession parented to enclosing enterSpan

RPC analog of fetchInsideEnterSpan. Locks in the contract before moving the span.

Commit 2 — Move span ownership into JsRpcSessionCustomEvent

  1. Lifetime alignment. The event's lifetime is the session. Was correct via .attach() ordering (side effect); now correct by construction.
  2. Reachability. .attach() hides the object; AIG needs a named path. event->jsRpcSessionSpan is that path.
  3. Layering (partial). "jsRpcSession" string moves out of generic HTTP plumbing into Fetcher::getJsRpcClient returning {worker, span}. OutgoingFactory
    variants pass SpanBuilder(nullptr) — they already make a durable_object_subrequest span.

Commit 3 — Move event construction into callImpl

  1. Layering (completion). HTTP no longer instantiates RPC events. Fetcher produces {worker, span}; callImpl names and uses them.
  2. No cross-boundary pointer. AIG needs &event->jsRpcSessionSpan in callImpl's response lambda; constructing the event there keeps it in-scope. Otherwise
    the AIG PR would need a ClientForOneCall { client, SpanBuilder* } struct on the provider API.
  3. Honest interface. JsRpcClientProvider had one virtual hiding two shapes. Now two, each defaulted:
    • getClientForOneCall — cap-holders (JsRpcStub, JsRpcPromise).
    • tryGetJsRpcSessionClient — session-creators (Fetcher).

Note. Both virtuals append to a shared path vector. callImpl gives the session attempt a scratch vector and only commits on success, so forwarders can
append unconditionally. (Caught early — 4 pipelining tests failed before the fix.)

Net effect

Before After
Span owner .attach()-ed to WorkerInterface Member of JsRpcSessionCustomEvent
Event constructor Fetcher (HTTP) callImpl (RPC)
JsRpcClientProvider One virtual, two shapes Two virtuals, one each
AIG coupling Needs struct + raw pointer Reads event->jsRpcSessionSpan

@Ankcorn Ankcorn requested review from a team as code owners April 30, 2026 15:15
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

UnknownError: ProviderInitError

github run

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

@Ankcorn Bonk workflow failed. Check the logs for details.

View workflow run · To retry, trigger Bonk again.

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This lifts the jsRpcSession span out of TraceContext and onto JsRpcSessionCustomEvent, so the span is reachable from the event and available for enrichment in the followup PR. The span lifetime and ownership model looks correct.

One minor note (low severity):

Comment thread src/workerd/api/http.c++ Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

Review posted successfully. Here's a summary of my analysis:

Overall assessment: The PR is well-structured. The span lifetime model is correct — SpanBuilder is move-only with RAII semantics, and storing it on JsRpcSessionCustomEvent gives the span the right lifetime (session duration). The SpanParent::addRef() usage correctly creates independent ref-counted references. The test coverage is good, adding a new jsRpcInsideEnterSpan test case that validates the span parenting via the tail worker's hierarchy-aware collector.

One minor issue flagged: The new getJsRpcClient method constructs SubrequestMetadata without featureFlagsForFl, which the old code path through getSubrequestChannelImpl did set. This is unlikely to matter in practice for RPC sessions but is a behavioral difference worth noting. I posted a suggestion comment with the fix.

github run

@Ankcorn Ankcorn force-pushed the tankcorn/jsrpc-span-parenting branch from 04e17f6 to b28449b Compare April 30, 2026 16:48
@Ankcorn
Copy link
Copy Markdown
Member Author

Ankcorn commented Apr 30, 2026

@ask-bonk review this PR

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The featureFlagsForFl issue from my previous review has been addressed — the uint channel case now populates it via the metadataExtra callback.

On re-review, the span ownership model is sound and the code is correct. One minor nit below on a comment that looks like it has placeholder text left over from editing.

This review was generated by an AI assistant (Bonk) and may contain mistakes.

Comment thread src/workerd/api/http.c++ Outdated
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

Review posted successfully. Here's a summary of the re-review:

Previous feedback status: The featureFlagsForFl issue I flagged in the first review has been addressed — the uint channel case now populates it via the metadataExtra callback at src/workerd/api/http.c++:2440-2444.

Re-review findings: The code is correct. The span ownership model is sound:

  • SpanBuilder is move-only with RAII end() in its destructor — storing it on JsRpcSessionCustomEvent gives the correct session-scoped lifetime.
  • SpanParent::addRef() creates independent ref-counted references via kj::Refcounted, so the SubrequestMetadata gets its own reference to the observer.
  • No redundant span is created because getSubrequest is called without operationName, skipping the makeUserTraceSpan path in getSubrequestNoChecks.
  • All lambdas capture by reference ([&]) and are called synchronously, so no dangling references.

One minor nit posted: A comment at line 2414 in http.c++ has placeholder text that appears to have been swallowed (angle brackets rendered as empty HTML tags).

github run

@Ankcorn Ankcorn force-pushed the tankcorn/jsrpc-span-parenting branch 2 times, most recently from fdc5681 to 04aa1ca Compare April 30, 2026 21:19
@Ankcorn
Copy link
Copy Markdown
Member Author

Ankcorn commented Apr 30, 2026

@ask-bonk review this PR

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both issues from prior reviews (missing featureFlagsForFl, truncated comment) are addressed. On re-review the refactoring is correct — no new issues found.

Summary of verification:

  • Span lifetime. SpanBuilder is move-only with RAII end() in its destructor. Storing it on JsRpcSessionCustomEvent gives it session-scoped lifetime, matching the logical span duration. The event is kept alive by .attach() on the customEvent() promise. Correct.
  • SpanParent ref-counting. SpanParent(SpanBuilder&) calls mapAddRef on the builder's observer (a kj::Refcounted). addRef() does the same. Both the sessionSpan builder and the SubrequestMetadata::userSpanParent hold independent kj::Own references to the same observer. Correct.
  • Capture-by-reference safety. All [&] lambdas in withSessionSpan / getSubrequest are called synchronously (kj::FunctionParam is non-owning by design). No dangling references.
  • Scratch path vector. callImpl uses a separate sessionPath vector for tryGetJsRpcSessionClient and only kj::mvs it into path on success. JsRpcProperty appending unconditionally to the scratch vector is harmless since it's discarded on the kj::none fallback. The StringPtrs point to JsRpcProperty::name members that outlive the synchronous methodPath construction. Correct.
  • getCap() before event move. getCap() moves the promise-side cap out of the event; the cap then lives independently via capnp's refcounting. Moving the event into customEvent() afterward is safe — same pattern as the original code.
  • OutgoingFactory variants. Returning SpanBuilder(nullptr) for DO/cross-process actor factories correctly skips the jsRpcSession span. These factories produce their own outer span (e.g. durable_object_subrequest), so no redundant span is created. Matches prior behavior.
  • Test. The new jsRpcInsideEnterSpan test validates that getCurrentUserTraceSpan().newChild() picks up the enterSpan-pushed AsyncContextFrame, mirroring the existing fetchInsideEnterSpan case for HTTP. The .wd-test correctly adds the rpc compat flag and the entrypoint binding.

This review was generated by an AI assistant (Bonk) and may contain mistakes.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

Review posted successfully. Here's a summary:

Both issues from my prior reviews have been addressed (featureFlagsForFl now populated in the uint channel path, comment text restored). On this re-review, the refactoring is correct with no new issues found.

I verified seven aspects of correctness:

  • Span lifetime — RAII SpanBuilder on JsRpcSessionCustomEvent gives session-scoped lifetime
  • SpanParent ref-countingaddRef() creates independent refcounted references, no use-after-free
  • Capture-by-reference safety — all [&] lambdas in withSessionSpan/getSubrequest are called synchronously via kj::FunctionParam
  • Scratch path vector — discarded on kj::none fallback, StringPtrs outlive their use
  • getCap() before event move — cap lives independently via capnp refcounting
  • OutgoingFactory variants — correctly skip jsRpcSession span (matching prior behavior)
  • Test coverage — new jsRpcInsideEnterSpan properly validates span parenting through AsyncContextFrame

github run

@Ankcorn Ankcorn changed the title Tankcorn/jsrpc span parenting Refactor jsRpcSession span parenting and ownership Apr 30, 2026
@Ankcorn Ankcorn force-pushed the tankcorn/jsrpc-span-parenting branch from 04aa1ca to 685b7fa Compare April 30, 2026 22:08
Ankcorn added 3 commits April 30, 2026 23:14
Adds Case 7 to tracing-hierarchy-instrumentation-test: an RPC call made
inside enterSpan() must produce a jsRpcSession user span whose parent is
the enterSpan, not the top-level onset span. This is the RPC analog of
the existing fetchInsideEnterSpan case.
The jsRpcSession user span is now created in Fetcher::getJsRpcClient() —
a new helper that returns the WorkerInterface plus the span — and
transferred to the JsRpcSessionCustomEvent where it lives as a member
SpanBuilder until the event is destroyed (i.e. session end).

Previously the span was created inside getClientWithTracing() and
attached to the WorkerInterface via .attach(). The event itself had no
visibility into the span, which made it impossible for callImpl() to
reach it for runtime enrichment (e.g. AI Gateway binding span tags).

Behavioural notes:
- Span created on the caller side before startRequest() so its ID is
  available for USER_SPAN_CONTEXT_PROPAGATION (the callee's onset event
  reports this span as its parent).
- Direct channel variants (uint, IoOwn<SubrequestChannel>) get a
  jsRpcSession span. OutgoingFactory variants (DurableObject stubs,
  cross-process actors) already create their own outer span
  (e.g. durable_object_subrequest), so jsRpcSession is omitted to avoid
  redundancy. This matches pre-existing behaviour for those variants.
- ioContext.now() (I/O time) is used for the explicit start time so we
  remain Spectre-safe and deterministic in test mode.
Splits JsRpcClientProvider's single virtual into two paths that match
the two real shapes of providers:

  - getClientForOneCall(js, path) — for cap-holders (JsRpcStub,
    JsRpcPromise) that already hold a live cap. Defaulted to
    KJ_UNIMPLEMENTED so session-creating providers don't have to
    pretend to implement it.

  - tryGetJsRpcSessionClient(ioContext, path) — for session-creating
    providers (Fetcher). Returns the WorkerInterface and jsRpcSession
    SpanBuilder so callImpl can construct the JsRpcSessionCustomEvent
    itself. Default returns kj::none.

callImpl checks the session path first and falls back to the cap path.
Fetcher no longer constructs JsRpcSessionCustomEvent; it just exposes
the building blocks.

Why: the previous shape made Fetcher (an HTTP-layer concept) responsible
for instantiating an RPC-layer event class. Worse, callImpl needed a
raw SpanBuilder* returned from getClientForOneCall to apply enrichment
to a span owned by an event it didn't construct. Moving event
construction into callImpl puts session lifecycle next to the only code
that depends on it and removes the cross-boundary pointer.
@Ankcorn Ankcorn force-pushed the tankcorn/jsrpc-span-parenting branch from 685b7fa to b367ed8 Compare April 30, 2026 23:28
Ankcorn added 2 commits May 1, 2026 08:21
The refactor that moved jsRpcSession span ownership into
JsRpcSessionCustomEvent only created the user-facing span via
ioContext.getCurrentUserTraceSpan().newChild(...), but the original
Fetcher::getClient("jsRpcSession") path used IoContext::makeUserTraceSpan
which produces a TraceContext containing BOTH an internal trace span and
a user span.

Without the internal span, the SubrequestMetadata.parentSpan handed to
the callee was an unobserved/empty SpanParent (the default-constructed
TraceContext local in IoContext::getSubrequestNoChecks), so trace
context never propagated to downstream subrequests the callee made
(e.g. the QuickSilver Read.get the dynamic-worker ban check issues).

This regressed edgeworker's DynamicWorkerBan."Dynamic worker loads when
not banned" test, which asserts traceIdLow on the QS request matches
the injected trace ID. Reproduced locally and verified fix passes all
six previously failing edgeworker tests.
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 1, 2026

Merging this PR will not alter performance

✅ 72 untouched benchmarks
⏩ 129 skipped benchmarks1


Comparing tankcorn/jsrpc-span-parenting (a1dafc5) with main (d092243)

Open in CodSpeed

Footnotes

  1. 129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Ankcorn added 2 commits May 1, 2026 12:22
Previously JsRpcClientProvider::getClientForOneCall and
tryGetJsRpcSessionClient took kj::Vector<kj::StringPtr>& path as an
out-parameter. Each override had to know whether to append (only
JsRpcProperty does, because it's the only non-root provider) and not
to double-append, and callImpl needed a scratch path vector to handle
the kj::none-from-Fetcher case without leaking partial state.

Move path into the return value: getClientForOneCall returns
OneCall { client, path }, and tryGetJsRpcSessionClient's
JsRpcSessionClient now carries its own path. JsRpcProperty's overrides
forward to the parent and append to the returned struct's path.

Benefits:
- Forgetting to handle kj::none before touching path is now a
  compile error (you have to KJ_IF_SOME the Maybe first), not a
  silent runtime corruption.
- Each frame owns its own path vector — no shared state, no
  double-append risk.
- Drops the (void)path; / scratch-vector / commit-on-success
  ceremony in callImpl.
- Drops several explanatory comments that were warning readers about
  the shared-mutation footgun -- the type system now enforces the
  invariant, so the comments are unnecessary.

Behaviour is unchanged. All workerd tests pass; edgeworker
DynamicWorkerBan still passes.
Ankcorn added 3 commits May 1, 2026 17:07
Pure relocation -- the function stays a member of Fetcher (declared in
http.h, accessing its private channelOrClientFactory and isInHouse),
just defined in a different .c++. Builds in the same translation-unit
graph; no header or BUILD changes besides one #include.

The whole jsRpcSession lifecycle now lives in one file:
  - tryGetJsRpcSessionClient: creates the user + internal spans and
    builds the WorkerInterface
  - callImpl: takes the user span out of the result, hands it to a
    new JsRpcSessionCustomEvent, dispatches the worker
  - JsRpcSessionCustomEvent::run: callee side
  (and on the AIG branch, the response lambda that applies enrichment
  to the user span sits in callImpl right next to where the span was
  handed in)

Reviewers can follow the lifecycle without jumping between files.
Trade-off: Fetcher's implementation now spans http.c++ and
worker-rpc.c++, but only one method is split out -- the jsRpc-specific
one.
The base class JsRpcClientProvider::tryGetJsRpcSessionClient already
documents the contract; the override doesn't change semantics so
copy-pasting the doc adds no information.
Reverts the structural part of 8de09c1 (path-by-value via OneCall +
JsRpcSessionClient.path). On reflection the refactor was net-negative
for this small, stable hierarchy (4 providers, 1 forwarder):

- Every leaf return now spells out an empty path: `{client, {}}`,
  `{worker, span, {}}`. Three leaves, in perpetuity.
- The footgun the refactor was supposed to prevent ("forget to handle
  kj::none before mutating path") was already prevented by callImpl's
  scratch-vector pattern -- the failure mode was theoretical, not real.

Restored layout:
- getClientForOneCall(jsg::Lock&, kj::Vector<kj::StringPtr>& path)
- tryGetJsRpcSessionClient(IoContext&, kj::Vector<kj::StringPtr>& path)
- callImpl uses a scratch `sessionPath` for the session attempt,
  commits on success.
- JsRpcProperty appends unconditionally (the convention).
- Leaves don't touch `path`.

Three rules across four places, all simple, no return-value boilerplate
at the leaves. The structural improvements that *did* land earlier
(span-pair separation, return-value-only struct for JsRpcSessionClient)
are preserved.
tryGetJsRpcSessionClient used to create the user-facing jsRpcSession
span and the internal trace span itself, then return them inside
JsRpcSessionClient. That puts span policy in a Fetcher accessor that
shouldn't know about jsRpc semantics, only about channel variants.

Hoist span ownership into callImpl, which is the function that knows
it's about to start a jsRpc session (creating the membrane via
JsRpcSessionCustomEvent). Move maps the code structure onto the future
span hierarchy: callImpl will own jsRpcSession (here, today) and
jsRpcCall / jsRpcTargetCall (followups). Each span lives in the branch
that owns its lifecycle.

To preserve trace context propagation to the callee (the parent IDs
must be present in SubrequestMetadata when getSubrequest runs inside
tryGet), span creation has to happen *before* tryGet is called and
the parents passed in. Two virtuals:

  * wouldCreateJsRpcSessionSpan() lets callImpl peek at policy
    (true for service bindings / direct channels, false for
    OutgoingFactory variants that already produce
    durable_object_subrequest etc. and would only get redundant
    nesting from a jsRpcSession on top).

  * tryGetJsRpcSessionClient() now takes SpanParent internalSpanParent
    and SpanParent userSpanParent as inputs, plumbed straight into
    SubrequestMetadata. Both may be unobserved on the negative path.

JsRpcSessionClient shrinks to just { worker } -- the WorkerInterface.
callImpl then does:
  - peek wouldCreateJsRpcSessionSpan(), conditionally make spans
  - call tryGet with the SpanParents
  - attach internal span to the worker (lives = session lifetime)
  - move user span into JsRpcSessionCustomEvent (also session-scoped,
    and reachable from callImpl's response lambda for binding span
    enrichment in the AIG followup)

JsRpcProperty's forwarder grows a wouldCreateJsRpcSessionSpan
override that delegates to its parent, same shape as the existing
tryGet forwarder.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 1, 2026

Codecov Report

❌ Patch coverage is 87.09677% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.54%. Comparing base (37ff673) to head (a1dafc5).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
src/workerd/api/worker-rpc.c++ 87.09% 3 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6704      +/-   ##
==========================================
- Coverage   66.57%   66.54%   -0.03%     
==========================================
  Files         402      402              
  Lines      115911   115902       -9     
  Branches    19407    19412       +5     
==========================================
- Hits        77163    77131      -32     
- Misses      27172    27188      +16     
- Partials    11576    11583       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Ankcorn added 5 commits May 1, 2026 20:24
The previous shape had two virtuals (`wouldCreateJsRpcSessionSpan` and
`tryGetJsRpcSessionClient`) plus a peek-then-act dance in callImpl, all
to keep span construction visually inside callImpl. The cost was two
parallel switches over Fetcher::channelOrClientFactory that had to stay
in sync -- a real maintenance smell.

Collapse to one virtual that returns both the WorkerInterface and the
user-facing jsRpcSession span. For OutgoingFactory variants that don't
want a session span (DurableObject stubs etc. produce their own
durable_object_subrequest), return SpanBuilder(nullptr); moving an
unobserved SpanBuilder into the event is a no-op so callImpl handles
both cases uniformly without any branch.

Result:
  - Drops `wouldCreateJsRpcSessionSpan` (base + Fetcher + JsRpcProperty).
  - Single switch in Fetcher::tryGetJsRpcSessionClient.
  - JsRpcProperty has one forwarder instead of two.
  - callImpl no longer peeks or constructs spans -- it just moves what
    tryGet returns into the event.
  - ~50 lines net.

Dan's principle ("instrumentation in the function with knowledge of
internal semantics") still holds: tryGetJsRpcSessionClient lives in
worker-rpc.c++ and its job is literally to start a jsRpc session.
Constructing the session span at that point is the right place. The
followup PRs (jsRpcCall, jsRpcTargetCall) will still slot cleanly into
callImpl's existing two-branch structure -- those are per-call spans
and naturally belong in callImpl, not here.
Per Dan's vision for the future jsRpc span hierarchy:

  jsRpcSession         (server-side, membrane lifetime; future)
    jsRpcCall          (top-level method call; this is what we emit today)
      jsRpcTargetCall  (call on a returned RpcTarget; future)

The user-facing span we emit today on the client side from
tryGetJsRpcSessionClient lives for the duration of one method call,
not the membrane lifetime, so it's a jsRpcCall, not a jsRpcSession.
Rename it now so:

  - users reading their tail stream see the right name from day one,
  - the followup PRs that add jsRpcSession (server-side) and
    jsRpcTargetCall don't need a confusing rename + rewire.

Internal trace span renamed alongside for consistency. Field on
JsRpcSessionCustomEvent renamed jsRpcSessionSpan -> jsRpcCallSpan.

The class name JsRpcSessionCustomEvent is unchanged because the event
itself genuinely is the membrane / session -- only the span we attach
to it (which represents the single top-level call) gets renamed.

Tests in tracing-hierarchy and tail-worker assertions updated.
Per Dan's full span hierarchy:

  jsRpcSession (server)        <-- this commit; emitted by run()
    jsRpcCall:server (future)  <-- followup; emitted by EntrypointJsRpcTarget
      jsRpcTargetCall:server   <-- followup; emitted by TransientJsRpcTarget

  caller (already emits):
    jsRpcCall:client           <-- emitted by tryGetJsRpcSessionClient
      jsRpcTargetCall:client   <-- followup; emitted by callImpl cap-holder branch

The server-side jsRpcSession represents the membrane lifetime: it opens
when the callee starts serving the session and closes when donePromise
resolves (= no caps remain across the membrane). Distinct from the
per-call jsRpcCall:client on the caller side, which only wraps a single
top-level method call. One jsRpcSession on the server can serve many
client-side calls when the user holds onto a returned RpcTarget stub.

Parented to ioctx.getCurrentUserTraceSpan(), which (post
USER_SPAN_CONTEXT_PROPAGATION) is the caller's jsRpcCall:client span.

Tests:
- tail-worker-test fixtures updated: each callee invocation now emits
  spanOpen/spanClose for jsRpcSession around its events.
- expectedWithPropagation tree adjusted: under USER_SPAN_CONTEXT_PROPAGATION,
  sibling RPC calls on the same callee worker now appear nested due to
  span ID collision in the test's tree-builder heuristic. The collision
  is a test-infrastructure artefact (numeric span IDs reset per
  invocation; both the caller's jsRpcCall:client and each callee's
  jsRpcSession land at span ID 1). Comment added in the fixture
  explaining this.
…endRpc

Per Dan's hierarchy: jsRpcSession represents the membrane lifetime, both
client and server side. The server-side span landed last commit; this
adds the client-side counterpart in sendRpc.

sendRpc only fires for cross-process dispatch (workerd in-process
service bindings go straight from WorkerEntrypoint::customEvent to
event->run() without wire serialization). Span wraps the full coroutine
body, observing the wire round-trip from sending the jsRpcSessionRequest
through to the session completing (membrane drained).

Parented to the current user trace span (the caller's jsRpcCall) when
an IoContext is in scope; constructs as unobserved otherwise so capnp
dispatch contexts that lack an IoContext don't crash.

For in-process callers (the AIG case in workerd tests) this span is
absent, which is correct: there is no wire round-trip to observe. The
caller's jsRpcCall on the channel side and the callee's jsRpcSession in
run() both still exist; they meet directly without a transport layer
between them.
Per Dan's literal mapping, callImpl owns the client-side per-call spans.
Previously jsRpcCall was constructed in Fetcher::tryGetJsRpcSessionClient
(which lives in worker-rpc.c++ but is conceptually a channel accessor),
not in callImpl itself.

Hoist span construction into callImpl. tryGetJsRpcSessionClient becomes
a pure WorkerInterface accessor: it takes SpanParent arguments callImpl
already constructed and plumbs them into SubrequestMetadata.
JsRpcSessionClient struct is gone; tryGet just returns
kj::Maybe<kj::Own<WorkerInterface>>.

callImpl unconditionally constructs the jsRpcCall user span and its
internal trace counterpart. For OutgoingFactory variants (DurableObject
stubs, cross-process actors) the channel layer ignores the span parents
and emits durable_object_subrequest as a sibling of jsRpcCall rather
than a child. Visible behaviour change for DO calls: the trace tree
gains an extra jsRpcCall span at the per-call observation layer. This
is correct under Dan's model (every method call gets its per-call
client span, regardless of channel variant) and has the side benefit
that enrichBindingSpan now applies to DO calls too -- previously the
sessionSpan field was unobserved for DO paths, making AIG-style
enrichment a silent no-op there.

Tests:
- tail-worker-test fixture for jsrpcDoSubrequest now expects the
  additional jsRpcCall spans the DO path emits. Spans visible in the
  trace are: outermost jsRpcCall (from the caller's invocation
  dispatching the DO method call), inner jsRpcCall (from the DO worker
  itself wrapping the dispatch), durable_object_subrequest (from the
  factory), nested jsRpcCall (server-side dispatch into the actor),
  plus extra jsRpcCalls for the subsequent service-binding calls
  (getCounter, nonFunctionProperty) made in the same invocation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants