Skip to content

fix(app-router): extend OTel tracer provider for Cache Component span context#1868

Merged
james-elicx merged 3 commits into
mainfrom
fix/issue-1495-cache-component-otel-spans
Jun 9, 2026
Merged

fix(app-router): extend OTel tracer provider for Cache Component span context#1868
james-elicx merged 3 commits into
mainfrom
fix/issue-1495-cache-component-otel-spans

Conversation

@james-elicx

Copy link
Copy Markdown
Member

What failed

The test cache-components OTEL spans > should allow creating Spans during resuming a fallback - inside a Cache Component (in test/e2e/app-dir/cache-components-allow-otel-spans/) expects that t7 and t8 span IDs are:

  1. Non-zero (spans are actually created during fallback resume)
  2. Different on each reload (each request produces fresh span IDs, not cached ones)

With cacheComponents: true, a [slug]/fallback route prerenderers a static shell and resumes the dynamic content on every request. The test calls tracer.startActiveSpan('span-active-span', fn) during the resume render and checks that the returned span has a real, per-request span ID.

Root cause

During Cache Component renders (warmup → fallback-resume phases), the Next.js workUnitAsyncStorage carries a prerender or cache store. OpenTelemetry span creation (tracer.startSpan, tracer.startActiveSpan) runs inside this ALS context.

In Next.js, after instrumentation.register() returns, it calls extendInstrumentationAfterRegistration() from instrumentation-node-extensions.ts. This function monkey-patches the registered OTel tracer provider so that:

  • tracer.startSpan(...) calls workUnitAsyncStorage.exit(...) before creating the span — the span is created outside the work unit context, getting a fresh span ID.
  • tracer.startActiveSpan(name, fn) also exits the work unit context for span creation, but re-enters the store for the fn callback so that headers(), cookies(), etc. still work inside the span body.

Without this extension, the prerender/cache work unit store leaks into span creation, causing span IDs to be either reused across requests (frozen context from the prerender phase) or not generated at all (the OTel SDK may short-circuit under certain ALS states).

Vinext called register() in ensureInstrumentationRegistered() but did not call the equivalent extendTracerProviderForCacheComponents() afterward.

Fix

Added packages/vinext/src/server/otel-tracer-extension.ts, a direct port of Next.js's instrumentation-node-extensions.ts. It exports extendTracerProviderForCacheComponents() which:

  1. Wraps provider.getTracer(...) so every returned tracer has patched startSpan / startActiveSpan.
  2. startSpan uses workUnitAsyncStorage.exit(...) to create the span in a clean context.
  3. startActiveSpan uses workUnitAsyncStorage.exit(...) for span creation and wraps the user callback with workUnitAsyncStorage.run(workUnitStore, fn) to restore the work unit context inside the body.

ensureInstrumentationRegistered() in instrumentation-runtime.ts now calls extendTracerProviderForCacheComponents() immediately after register() resolves, matching Next.js's lifecycle.

The extension is a no-op when @opentelemetry/api is not installed or when no store is active.

Mapping to the test

The failing test navigates to /novel/fallback (a path NOT in generateStaticParams, so the fallback shell is used and content is resumed at request time). The CachedTracedComponentActiveSpan (t7) and TracedComponentActiveSpan (t8) components call tracer.startActiveSpan(...) during the resume render. With the fix, spans exit the work unit context and get fresh IDs on every request, satisfying both not.toEqual(0) and not.toEqual(t7value) assertions.

Verification

  • vp check: 22 errors, all pre-existing env TS errors (magic-string, image-size, etc.) — zero new errors.
  • vp test run tests/ppr-fallback-shell.test.ts tests/app-fallback-renderer.test.ts tests/request-pipeline.test.ts: all pass.
  • The E2E test cache-components-allow-otel-spans requires the full Playwright suite and is left to CI.

Fixes #1495

…gister() for Cache Component spans

During Cache Component fallback resume, `workUnitAsyncStorage` carries a
prerender/cache store. Without the tracer extension, calls to
`tracer.startActiveSpan()` / `tracer.startSpan()` inside user RSC code
inherited that frozen context, causing span IDs to be reused across requests
or not created at all.

Mirrors Next.js's `extendInstrumentationAfterRegistration()` in
`instrumentation-node-extensions.ts`: wraps the OTel tracer provider's
`startSpan` and `startActiveSpan` to exit `workUnitAsyncStorage` when
creating spans, then re-enters the store for the `startActiveSpan` callback.

Fixes #1495
@pkg-pr-new

pkg-pr-new Bot commented Jun 9, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@vinext/cloudflare@1868
npm i https://pkg.pr.new/vinext@1868

commit: 0045710

@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: extend OTel tracer provider for Cache Component span context

Reviewed the port against Next.js's instrumentation-node-extensions.ts (canary) and against existing vinext conventions. The overall approach — porting afterRegistration()extendTracerProviderForCacheComponents() and calling it after register() resolves — is sound and matches Next.js's lifecycle. A few issues, one of which is likely a runtime bug on the primary deployment target (Cloudflare Workers).

1. (Blocking) Bare require("@opentelemetry/api") will not work in the Workers bundle

otel-tracer-extension.ts:64 uses a bare require(...):

api = require("@opentelemetry/api");

This code runs at request time inside the bundled server (app-rsc-entryensureInstrumentationRegisteredextendTracerProviderForCacheComponents). For the Cloudflare Workers target the SSR build bundles everything (index.ts:1560 — "Workers bundle everything, can't have Node.js externals") and emits ESM, where a bare require identifier is not defined at runtime. The existing optional-OTel access pattern in this exact codebase deliberately avoids bare require for this reason — see server/client-trace-metadata.ts:55-61:

// Use require() at runtime so `@opentelemetry/api` is an optional peer.
const req = (globalThis as { require?: (id: string) => unknown }).require;
if (typeof req === "function") {
  api = req("@opentelemetry/api") as OpenTelemetryApi;
}

The other bare require() calls in the repo (utils/vite-version.ts, config/next-config.ts) run at build/config time in Node, not in the bundled runtime, so they aren't comparable. Please mirror the client-trace-metadata.ts approach (globalThis.require guarded by typeof req === "function") so the no-op degradation actually holds on Workers instead of throwing ReferenceError: require is not defined. As written, the catch {} only catches a module-not-found, not the missing-require case — and in dev (Node ESM) bare require is likewise not in scope inside an ESM module.

2. (Should fix) Missing use cache callback warning from the Next.js source

The upstream startActiveSpan patch warns when a use cache function is passed as the span callback, because it would receive a per-invocation random Span ID and cause cache misses (instrumentation-node-extensions.ts:90-96, via isUseCacheFunction). This port drops that guard silently. Since the whole PR is about Cache Component span-ID correctness, this is exactly the footgun the warning protects against. Either port the isUseCacheFunction check or add a code comment explaining the deliberate omission and link the upstream lines.

3. (Should fix) No unit test for the new module

AGENTS.md is explicit that behavior moved into server/* helpers should get focused unit tests, and that entry/runtime changes shouldn't rely on the full Playwright suite. extendTracerProviderForCacheComponents() is directly unit-testable by stubbing @opentelemetry/api (or globalThis.require) with a fake provider + tracer and asserting:

  • startSpan runs inside workUnitAsyncStorage.exit (store is undefined at creation),
  • startActiveSpan creates the span outside the store but re-enters it for the callback (getStore() inside the callback returns the original store),
  • arg-index detection for the (name, fn) / (name, opts, fn) / (name, opts, ctx, fn) overloads,
  • double-wrap protection via the WeakSet,
  • no-op when getTracer is absent / api missing.

tests/instrumentation.test.ts already has the vi.resetModules() + fake-runner scaffolding to model after. Note the tracerProviderExtended module-level flag means a test must vi.resetModules() between cases.

4. (Minor) tracerProviderExtended guard semantics differ from upstream

Next.js does not use a module-level "already extended" flag — it relies solely on the per-tracer WeakSet. Here tracerProviderExtended short-circuits the whole function after the first call. That's fine for a single provider, but if register() swaps the global tracer provider (some SDK setups call trace.setGlobalTracerProvider after async exporter init), a second extendTracerProviderForCacheComponents() call would no-op and leave the new provider's getTracer unwrapped. Since RSC and SSR are separate module graphs (separate copies of this module + the flag), this is unlikely to bite in practice, but consider keying off the provider instance (e.g. a WeakSet<provider>) rather than a global boolean to match upstream robustness. At minimum, confirm the ALS instance shared with prerender-work-unit-setup.ts is the same in the environment where spans are created (both import vinext/shims/internal/work-unit-async-storage, so this should hold — worth a comment).

5. (Nit) Comment references the wrong upstream file

instrumentation-runtime.ts:74-75 says it mirrors instrumentation-globals.external.ts, but the actual mirrored function (afterRegistration / extendTracerProviderForCacheComponents) lives in instrumentation-node-extensions.ts (as the new file's own header correctly states). Align the two references.

What's correct

  • Lifecycle placement (after await register(), before onRequestError wiring) matches Next.js.
  • startActiveSpan overload arg-index detection is a faithful port.
  • Re-entering the store via workUnitAsyncStorage.run(workUnitStore, fn, ...args) for the callback is correct and preserves headers()/cookies() inside the span body.
  • Using the shared vinext/shims/internal/work-unit-async-storage specifier (same instance prerender-work-unit-setup.ts uses) is the right call for ALS state sharing.

The #1 item should be fixed before merge — as written I'd expect this to throw on Workers (the primary target) the first time it's exercised under a real OTel setup, defeating the no-op-when-not-installed contract and potentially breaking instrumented apps that previously worked.

@ask-bonk

ask-bonk Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review posted on PR #1868. Summary of findings:

Blocking (1): Bare require("@opentelemetry/api") in otel-tracer-extension.ts:64 runs at request time in the bundled server. On Cloudflare Workers (the primary target), the SSR build bundles everything and emits ESM where bare require is undefined — it will throw ReferenceError instead of gracefully no-op'ing. The existing client-trace-metadata.ts:55-61 deliberately uses globalThis.require for this same optional dependency; the new file should follow that pattern.

Should fix (3):

  • Dropped the upstream isUseCacheFunction warning that guards against passing a use cache fn to startActiveSpan (the exact cache-miss footgun this PR targets).
  • No unit test for the new helper, despite AGENTS.md requiring focused unit tests for server/* behavior and the scaffolding already existing in tests/instrumentation.test.ts.
  • tracerProviderExtended global boolean diverges from upstream's per-tracer WeakSet-only approach; a swapped global provider would be left unwrapped.

Nit (1): instrumentation-runtime.ts:74-75 cites the wrong upstream filename.

I also confirmed what the port gets right: lifecycle placement, overload arg-index detection, store re-entry for the callback, and the shared ALS module specifier.

The blocking item is the key one — I'd expect it to break on Workers the first time it's exercised under a real OTel setup.

github run

BLOCKING: replace bare require("@opentelemetry/api") with globalThis.require
guard so the call is safe in ESM Worker bundles where bare require is
undefined; matches the pattern in client-trace-metadata.ts.

SHOULD-FIX #1: restore the isUseCacheFunction warning when a "use cache"
function is passed to startActiveSpan — mirrors upstream
instrumentation-node-extensions.ts. Adds USE_CACHE_FUNCTION_SYMBOL tag in
registerCachedFunction and the inline isUseCacheFn check in the extension
(avoids importing the full cache-runtime dep graph from this lightweight file).

SHOULD-FIX #2: add focused unit tests in tests/otel-tracer-extension.test.ts
covering no-op paths, startSpan ALS exit, startActiveSpan ALS re-entry,
use-cache warning, provider/tracer double-wrap guards, and provider-swap
re-wrapping. Isolated from the full vinext plugin import graph so they run
without optional build-time packages.

SHOULD-FIX #3: replace module-global boolean tracerProviderExtended with a
WeakSet<object> (extendedProviders) so that a swapped provider gets wrapped
rather than being silently skipped forever.

NIT: correct the upstream filename in the instrumentation-runtime.ts comment
from instrumentation-globals.external.ts to instrumentation-node-extensions.ts
(where afterRegistration() is actually defined).
@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor
Example Preview Production Original
app-router-cloudflare preview production
pages-router-cloudflare preview production
app-router-playground preview production original
realworld-api-rest preview production
nextra-docs-template preview production
benchmarks preview production
hackernews preview production original
workers-cache preview production
web preview production

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: extend OTel tracer provider for Cache Component span context

Re-reviewed the second commit (5522733) against the prior round and against Next.js canary instrumentation-node-extensions.ts. The earlier feedback has been addressed well:

  • (was blocking) bare require → now uses the globalThis.require guard, mirroring client-trace-metadata.ts:52-71. ✅
  • use cache callback warning → ported (otel-tracer-extension.ts:153-157), with a passing test. ✅
  • unit testtests/otel-tracer-extension.test.ts added (11 tests, all green locally via vp test run). ✅
  • provider-extended global flag → replaced with a per-provider WeakSet (extendedProviders), and there's an explicit "provider swap" test (tests/otel-tracer-extension.test.ts:247). ✅
  • wrong upstream filename in comment → fixed (instrumentation-runtime.ts:74-75 now points at instrumentation-node-extensions.ts). ✅

Verified locally: vp check on the four changed files is clean (no new format/lint/type errors), tests/otel-tracer-extension.test.ts (11) and tests/instrumentation.test.ts (33) pass. ALS sharing holds — both otel-tracer-extension.ts:32 and prerender-work-unit-setup.ts:13 import the identical bare specifier vinext/shims/internal/work-unit-async-storage, so they resolve to the same instance within an environment.

One real issue and a couple of minor parity notes remain.

1. (Should fix) isUseCacheFunction in cache-runtime.ts is dead code; the use-cache check is duplicated

This PR adds an exported isUseCacheFunction() (cache-runtime.ts:648-651) and the tagging symbol (cache-runtime.ts:639), but nothing imports or calls itotel-tracer-extension.ts re-declares its own USE_CACHE_FUNCTION_SYMBOL and isUseCacheFn() inline (otel-tracer-extension.ts:45-50) instead of using the exported helper.

$ grep -rn "isUseCacheFunction" packages/vinext/src
packages/vinext/src/shims/cache-runtime.ts:648:export function isUseCacheFunction(fn: unknown): boolean {   # only definition, never imported

Net result: two copies of the same symbol-based logic that can silently drift, plus an unused public export. Pick one:

  • Import isUseCacheFunction from cache-runtime.ts in the extension (single source of truth), or
  • Drop the export from cache-runtime.ts and keep only the inline copy.

The "avoid importing heavy deps" comment (otel-tracer-extension.ts:42-44) justifies not importing — but then the new export in cache-runtime.ts serves no purpose and should be removed rather than left as dead, untested API. Either way, only one of the two copies should survive. Since the symbols are both Symbol.for("vinext.useCacheFunction") (global registry) they're interchangeable, so the tagging at cache-runtime.ts:633 is what matters and the inline check works; this is purely a maintainability/dead-code cleanup.

2. (Nit) No Edge-runtime guard, unlike upstream

Upstream afterRegistration() throws an InvariantError when NEXT_RUNTIME === 'edge' (this module is Node-only). The port's docstring says "Must only be called in Node.js environments (not Edge runtime)" (otel-tracer-extension.ts:61) but doesn't enforce it. In practice this is benign on Workers because the missing globalThis.require already short-circuits it to a no-op, so I'd treat this as optional — worth a one-line comment noting the no-op is what enforces Node-only here, rather than an explicit throw.

3. (Nit, pre-existing) globalThis.require availability

The globalThis.require pattern relies on require being present on the global. In plain Node ESM globalThis.require is undefined (confirmed: node --input-type=module -e "typeof globalThis.require"undefined), and on Workers it's also absent. This is not introduced by this PR — it's the established convention copied verbatim from client-trace-metadata.ts — so it's not blocking here. Flagging only so it's on the radar: if client-trace-metadata's OTel path is known to work in the dev SSR runtime, this inherits the same behavior; if it silently no-ops in dev, the E2E fix would too. The E2E run in CI is the real signal.

What's correct

  • Lifecycle placement: extendTracerProviderForCacheComponents() runs right after await register() and before onRequestError wiring (instrumentation-runtime.ts:64-86), matching Next.js.
  • Runs unconditionally (not gated on cacheComponents), matching the upstream "enable universally, no-op when disabled" comment.
  • startActiveSpan overload arg-index detection (otel-tracer-extension.ts:134-144) is a faithful port.
  • Re-entering the store via workUnitAsyncStorage.run(workUnitStore, originalFn, ...cbArgs) for the callback (otel-tracer-extension.ts:161-162) preserves headers()/cookies() inside the span body; return value propagates correctly through the exit() wrapper.
  • WeakSet dedup for both providers and individual tracers, with tests covering double-wrap, provider-swap, and per-tracer cases.

Only item #1 is worth changing before merge, and it's a cleanup rather than a correctness bug. The functional behavior matches upstream.

@ask-bonk

ask-bonk Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review posted on PR #1868.

Summary

The second commit (5522733) addressed all five items from the prior review round, which I verified:

  • (was blocking) bare require → now uses the globalThis.require guard matching client-trace-metadata.ts
  • use cache warning → ported with a passing test
  • unit test → added (11 tests, all green)
  • per-provider WeakSet → replaced the global boolean, with a provider-swap test
  • wrong upstream filename → fixed

I confirmed locally: vp check clean on all 4 changed files; otel-tracer-extension.test.ts (11) and instrumentation.test.ts (33) pass; ALS instance is correctly shared between the extension and prerender setup (identical bare specifier); lifecycle placement and the overload/store-reentry logic faithfully match Next.js canary.

Findings this round:

  1. (Should fix) isUseCacheFunction exported from cache-runtime.ts is dead code — nothing imports it; otel-tracer-extension.ts re-declares its own inline copy. Pick one source of truth or drop the unused export. Purely a maintainability/dead-code cleanup, not a correctness bug.
  2. (Nit) No Edge-runtime guard unlike upstream (benign — the missing globalThis.require already no-ops it on Workers).
  3. (Nit, pre-existing) globalThis.require is undefined in plain Node ESM; inherited from the existing client-trace-metadata.ts convention, not introduced here.

No blocking issues remain. Item #1 is the only thing worth changing before merge.

github run

Nothing imported `isUseCacheFunction` — `otel-tracer-extension.ts`
already keeps its own inline copy (`isUseCacheFn`) with a comment
explaining why (avoids pulling heavy cache-runtime deps into the
lightweight server helper).  Remove the unused export to eliminate the
dead code flagged in review.
@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (commit 0045710): extend OTel tracer provider for Cache Component span context

The third commit (0045710, "drop dead isUseCacheFunction export") resolves the only outstanding item from the prior round. I re-verified the full PR against Next.js canary instrumentation-node-extensions.ts and the local code.

Prior findings — all resolved

  • (was should-fix #1) dead isUseCacheFunction export → removed. The export and its standalone definition are gone from cache-runtime.ts; the only thing the shim now does is tag the wrapper with the shared global symbol (cache-runtime.ts:633, Symbol.for("vinext.useCacheFunction")), and otel-tracer-extension.ts:45-50 reads that same global-registry symbol inline. Confirmed there is exactly one source of truth now:
    $ git grep -n "vinext.useCacheFunction\|isUseCacheFunction" packages/vinext/src tests/
    otel-tracer-extension.ts:45  const USE_CACHE_FUNCTION_SYMBOL = Symbol.for("vinext.useCacheFunction")
    otel-tracer-extension.ts:49  ...[USE_CACHE_FUNCTION_SYMBOL] === true
    cache-runtime.ts:633         (cachedFn as any)[USE_CACHE_FUNCTION_SYMBOL] = true
    cache-runtime.ts:639         const USE_CACHE_FUNCTION_SYMBOL = Symbol.for("vinext.useCacheFunction")
    
    Because both sides use Symbol.for(...) (global registry), they resolve to the identical symbol across module graphs, so the inline check correctly detects functions tagged by the shim. No dead code, no drift risk.

Verification

  • vp check on all 4 changed files: clean (no new format/lint/type errors).
  • vp test run tests/otel-tracer-extension.test.ts tests/instrumentation.test.ts: 44 passed (11 + 33).
  • ALS sharing holds: otel-tracer-extension.ts:32 and prerender-work-unit-setup.ts import the identical bare specifier vinext/shims/internal/work-unit-async-storage, so they share one instance per environment.
  • Port faithfulness vs. upstream instrumentation-node-extensions.ts (canary): lifecycle placement (after await register(), before onRequestError wiring), unconditional enablement (no-op when disabled), startSpan unconditional exit(), startActiveSpan store-check + overload arg-index detection (length===2/3/>3) + store re-entry for the callback + exit() for span creation — all match.
  • The extendedProviders WeakSet (a vinext addition over upstream's tracer-only WeakTracers) correctly handles provider swaps; covered by a dedicated test (otel-tracer-extension.test.ts:247).

Minor notes (non-blocking, unchanged from last round)

  1. (Nit) No explicit Edge-runtime InvariantError throw like upstream afterRegistration(). Benign on Workers — the missing globalThis.require already short-circuits to a no-op. The docstring at otel-tracer-extension.ts:61 documents the Node-only expectation.
  2. (Nit, pre-existing) globalThis.require is undefined in plain Node ESM and absent on Workers; inherited from the established client-trace-metadata.ts convention, not introduced here. The CI E2E run (cache-components-allow-otel-spans) is the real signal for the dev/SSR runtime path.

No blocking issues. The PR is in good shape to merge once CI (including the targeted E2E) is green.

@ask-bonk

ask-bonk Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review posted on PR #1868.

Summary

The third commit (0045710) resolves the only outstanding item from the prior review round — the dead isUseCacheFunction export.

Prior findings — all resolved:

  • The dead isUseCacheFunction export and its standalone definition are removed from cache-runtime.ts. The shim now only tags the use cache wrapper with the shared global symbol (Symbol.for("vinext.useCacheFunction")), and otel-tracer-extension.ts reads that identical global-registry symbol inline. Single source of truth, no drift risk.

Verification:

  • vp check clean on all 4 changed files.
  • 44 tests pass (otel-tracer-extension.test.ts 11 + instrumentation.test.ts 33).
  • Port faithfully matches Next.js canary instrumentation-node-extensions.ts: lifecycle placement, unconditional enablement, startSpan/startActiveSpan overload detection, store exit/re-entry semantics.
  • ALS sharing holds via the identical bare specifier.
  • The vinext-added extendedProviders WeakSet (provider-swap robustness over upstream) is correct and tested.

Findings this round: Only two non-blocking nits, both unchanged from the prior round (no Edge-runtime throw — benign no-op on Workers; pre-existing globalThis.require convention inherited from client-trace-metadata.ts).

No blocking issues remain. The PR is ready to merge once CI — including the targeted E2E cache-components-allow-otel-spans — is green.

github run

@james-elicx james-elicx marked this pull request as ready for review June 9, 2026 14:29
@james-elicx james-elicx merged commit fd919d4 into main Jun 9, 2026
51 checks passed
@james-elicx james-elicx deleted the fix/issue-1495-cache-component-otel-spans branch June 9, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

App Router: OTel spans during cache component fallback resume reuse the trace id

1 participant