fix(app-router): extend OTel tracer provider for Cache Component span context#1868
Conversation
…gister() for Cache Component spans During Cache Component fallback resume, `workUnitAsyncStorage` carries a prerender/cache store. Without the tracer extension, calls to `tracer.startActiveSpan()` / `tracer.startSpan()` inside user RSC code inherited that frozen context, causing span IDs to be reused across requests or not created at all. Mirrors Next.js's `extendInstrumentationAfterRegistration()` in `instrumentation-node-extensions.ts`: wraps the OTel tracer provider's `startSpan` and `startActiveSpan` to exit `workUnitAsyncStorage` when creating spans, then re-enters the store for the `startActiveSpan` callback. Fixes #1495
commit: |
|
/bigbonk review for issues |
There was a problem hiding this comment.
Review: extend OTel tracer provider for Cache Component span context
Reviewed the port against Next.js's instrumentation-node-extensions.ts (canary) and against existing vinext conventions. The overall approach — porting afterRegistration() → extendTracerProviderForCacheComponents() and calling it after register() resolves — is sound and matches Next.js's lifecycle. A few issues, one of which is likely a runtime bug on the primary deployment target (Cloudflare Workers).
1. (Blocking) Bare require("@opentelemetry/api") will not work in the Workers bundle
otel-tracer-extension.ts:64 uses a bare require(...):
api = require("@opentelemetry/api");This code runs at request time inside the bundled server (app-rsc-entry → ensureInstrumentationRegistered → extendTracerProviderForCacheComponents). For the Cloudflare Workers target the SSR build bundles everything (index.ts:1560 — "Workers bundle everything, can't have Node.js externals") and emits ESM, where a bare require identifier is not defined at runtime. The existing optional-OTel access pattern in this exact codebase deliberately avoids bare require for this reason — see server/client-trace-metadata.ts:55-61:
// Use require() at runtime so `@opentelemetry/api` is an optional peer.
const req = (globalThis as { require?: (id: string) => unknown }).require;
if (typeof req === "function") {
api = req("@opentelemetry/api") as OpenTelemetryApi;
}The other bare require() calls in the repo (utils/vite-version.ts, config/next-config.ts) run at build/config time in Node, not in the bundled runtime, so they aren't comparable. Please mirror the client-trace-metadata.ts approach (globalThis.require guarded by typeof req === "function") so the no-op degradation actually holds on Workers instead of throwing ReferenceError: require is not defined. As written, the catch {} only catches a module-not-found, not the missing-require case — and in dev (Node ESM) bare require is likewise not in scope inside an ESM module.
2. (Should fix) Missing use cache callback warning from the Next.js source
The upstream startActiveSpan patch warns when a use cache function is passed as the span callback, because it would receive a per-invocation random Span ID and cause cache misses (instrumentation-node-extensions.ts:90-96, via isUseCacheFunction). This port drops that guard silently. Since the whole PR is about Cache Component span-ID correctness, this is exactly the footgun the warning protects against. Either port the isUseCacheFunction check or add a code comment explaining the deliberate omission and link the upstream lines.
3. (Should fix) No unit test for the new module
AGENTS.md is explicit that behavior moved into server/* helpers should get focused unit tests, and that entry/runtime changes shouldn't rely on the full Playwright suite. extendTracerProviderForCacheComponents() is directly unit-testable by stubbing @opentelemetry/api (or globalThis.require) with a fake provider + tracer and asserting:
startSpanruns insideworkUnitAsyncStorage.exit(store isundefinedat creation),startActiveSpancreates the span outside the store but re-enters it for the callback (getStore()inside the callback returns the original store),- arg-index detection for the
(name, fn)/(name, opts, fn)/(name, opts, ctx, fn)overloads, - double-wrap protection via the
WeakSet, - no-op when
getTraceris absent / api missing.
tests/instrumentation.test.ts already has the vi.resetModules() + fake-runner scaffolding to model after. Note the tracerProviderExtended module-level flag means a test must vi.resetModules() between cases.
4. (Minor) tracerProviderExtended guard semantics differ from upstream
Next.js does not use a module-level "already extended" flag — it relies solely on the per-tracer WeakSet. Here tracerProviderExtended short-circuits the whole function after the first call. That's fine for a single provider, but if register() swaps the global tracer provider (some SDK setups call trace.setGlobalTracerProvider after async exporter init), a second extendTracerProviderForCacheComponents() call would no-op and leave the new provider's getTracer unwrapped. Since RSC and SSR are separate module graphs (separate copies of this module + the flag), this is unlikely to bite in practice, but consider keying off the provider instance (e.g. a WeakSet<provider>) rather than a global boolean to match upstream robustness. At minimum, confirm the ALS instance shared with prerender-work-unit-setup.ts is the same in the environment where spans are created (both import vinext/shims/internal/work-unit-async-storage, so this should hold — worth a comment).
5. (Nit) Comment references the wrong upstream file
instrumentation-runtime.ts:74-75 says it mirrors instrumentation-globals.external.ts, but the actual mirrored function (afterRegistration / extendTracerProviderForCacheComponents) lives in instrumentation-node-extensions.ts (as the new file's own header correctly states). Align the two references.
What's correct
- Lifecycle placement (after
await register(), beforeonRequestErrorwiring) matches Next.js. startActiveSpanoverload arg-index detection is a faithful port.- Re-entering the store via
workUnitAsyncStorage.run(workUnitStore, fn, ...args)for the callback is correct and preservesheaders()/cookies()inside the span body. - Using the shared
vinext/shims/internal/work-unit-async-storagespecifier (same instanceprerender-work-unit-setup.tsuses) is the right call for ALS state sharing.
The #1 item should be fixed before merge — as written I'd expect this to throw on Workers (the primary target) the first time it's exercised under a real OTel setup, defeating the no-op-when-not-installed contract and potentially breaking instrumented apps that previously worked.
|
Review posted on PR #1868. Summary of findings: Blocking (1): Bare Should fix (3):
Nit (1): I also confirmed what the port gets right: lifecycle placement, overload arg-index detection, store re-entry for the callback, and the shared ALS module specifier. The blocking item is the key one — I'd expect it to break on Workers the first time it's exercised under a real OTel setup. |
BLOCKING: replace bare require("@opentelemetry/api") with globalThis.require
guard so the call is safe in ESM Worker bundles where bare require is
undefined; matches the pattern in client-trace-metadata.ts.
SHOULD-FIX #1: restore the isUseCacheFunction warning when a "use cache"
function is passed to startActiveSpan — mirrors upstream
instrumentation-node-extensions.ts. Adds USE_CACHE_FUNCTION_SYMBOL tag in
registerCachedFunction and the inline isUseCacheFn check in the extension
(avoids importing the full cache-runtime dep graph from this lightweight file).
SHOULD-FIX #2: add focused unit tests in tests/otel-tracer-extension.test.ts
covering no-op paths, startSpan ALS exit, startActiveSpan ALS re-entry,
use-cache warning, provider/tracer double-wrap guards, and provider-swap
re-wrapping. Isolated from the full vinext plugin import graph so they run
without optional build-time packages.
SHOULD-FIX #3: replace module-global boolean tracerProviderExtended with a
WeakSet<object> (extendedProviders) so that a swapped provider gets wrapped
rather than being silently skipped forever.
NIT: correct the upstream filename in the instrumentation-runtime.ts comment
from instrumentation-globals.external.ts to instrumentation-node-extensions.ts
(where afterRegistration() is actually defined).
|
/bigbonk review for issues |
|
There was a problem hiding this comment.
Re-review: extend OTel tracer provider for Cache Component span context
Re-reviewed the second commit (5522733) against the prior round and against Next.js canary instrumentation-node-extensions.ts. The earlier feedback has been addressed well:
- (was blocking) bare
require→ now uses theglobalThis.requireguard, mirroringclient-trace-metadata.ts:52-71. ✅ use cachecallback warning → ported (otel-tracer-extension.ts:153-157), with a passing test. ✅- unit test →
tests/otel-tracer-extension.test.tsadded (11 tests, all green locally viavp test run). ✅ - provider-extended global flag → replaced with a per-provider
WeakSet(extendedProviders), and there's an explicit "provider swap" test (tests/otel-tracer-extension.test.ts:247). ✅ - wrong upstream filename in comment → fixed (
instrumentation-runtime.ts:74-75now points atinstrumentation-node-extensions.ts). ✅
Verified locally: vp check on the four changed files is clean (no new format/lint/type errors), tests/otel-tracer-extension.test.ts (11) and tests/instrumentation.test.ts (33) pass. ALS sharing holds — both otel-tracer-extension.ts:32 and prerender-work-unit-setup.ts:13 import the identical bare specifier vinext/shims/internal/work-unit-async-storage, so they resolve to the same instance within an environment.
One real issue and a couple of minor parity notes remain.
1. (Should fix) isUseCacheFunction in cache-runtime.ts is dead code; the use-cache check is duplicated
This PR adds an exported isUseCacheFunction() (cache-runtime.ts:648-651) and the tagging symbol (cache-runtime.ts:639), but nothing imports or calls it — otel-tracer-extension.ts re-declares its own USE_CACHE_FUNCTION_SYMBOL and isUseCacheFn() inline (otel-tracer-extension.ts:45-50) instead of using the exported helper.
$ grep -rn "isUseCacheFunction" packages/vinext/src
packages/vinext/src/shims/cache-runtime.ts:648:export function isUseCacheFunction(fn: unknown): boolean { # only definition, never imported
Net result: two copies of the same symbol-based logic that can silently drift, plus an unused public export. Pick one:
- Import
isUseCacheFunctionfromcache-runtime.tsin the extension (single source of truth), or - Drop the export from
cache-runtime.tsand keep only the inline copy.
The "avoid importing heavy deps" comment (otel-tracer-extension.ts:42-44) justifies not importing — but then the new export in cache-runtime.ts serves no purpose and should be removed rather than left as dead, untested API. Either way, only one of the two copies should survive. Since the symbols are both Symbol.for("vinext.useCacheFunction") (global registry) they're interchangeable, so the tagging at cache-runtime.ts:633 is what matters and the inline check works; this is purely a maintainability/dead-code cleanup.
2. (Nit) No Edge-runtime guard, unlike upstream
Upstream afterRegistration() throws an InvariantError when NEXT_RUNTIME === 'edge' (this module is Node-only). The port's docstring says "Must only be called in Node.js environments (not Edge runtime)" (otel-tracer-extension.ts:61) but doesn't enforce it. In practice this is benign on Workers because the missing globalThis.require already short-circuits it to a no-op, so I'd treat this as optional — worth a one-line comment noting the no-op is what enforces Node-only here, rather than an explicit throw.
3. (Nit, pre-existing) globalThis.require availability
The globalThis.require pattern relies on require being present on the global. In plain Node ESM globalThis.require is undefined (confirmed: node --input-type=module -e "typeof globalThis.require" → undefined), and on Workers it's also absent. This is not introduced by this PR — it's the established convention copied verbatim from client-trace-metadata.ts — so it's not blocking here. Flagging only so it's on the radar: if client-trace-metadata's OTel path is known to work in the dev SSR runtime, this inherits the same behavior; if it silently no-ops in dev, the E2E fix would too. The E2E run in CI is the real signal.
What's correct
- Lifecycle placement:
extendTracerProviderForCacheComponents()runs right afterawait register()and beforeonRequestErrorwiring (instrumentation-runtime.ts:64-86), matching Next.js. - Runs unconditionally (not gated on
cacheComponents), matching the upstream "enable universally, no-op when disabled" comment. startActiveSpanoverload arg-index detection (otel-tracer-extension.ts:134-144) is a faithful port.- Re-entering the store via
workUnitAsyncStorage.run(workUnitStore, originalFn, ...cbArgs)for the callback (otel-tracer-extension.ts:161-162) preservesheaders()/cookies()inside the span body; return value propagates correctly through theexit()wrapper. WeakSetdedup for both providers and individual tracers, with tests covering double-wrap, provider-swap, and per-tracer cases.
Only item #1 is worth changing before merge, and it's a cleanup rather than a correctness bug. The functional behavior matches upstream.
|
Review posted on PR #1868. SummaryThe second commit (
I confirmed locally: Findings this round:
No blocking issues remain. Item #1 is the only thing worth changing before merge. |
Nothing imported `isUseCacheFunction` — `otel-tracer-extension.ts` already keeps its own inline copy (`isUseCacheFn`) with a comment explaining why (avoids pulling heavy cache-runtime deps into the lightweight server helper). Remove the unused export to eliminate the dead code flagged in review.
|
/bigbonk review for issues |
There was a problem hiding this comment.
Re-review (commit 0045710): extend OTel tracer provider for Cache Component span context
The third commit (0045710, "drop dead isUseCacheFunction export") resolves the only outstanding item from the prior round. I re-verified the full PR against Next.js canary instrumentation-node-extensions.ts and the local code.
Prior findings — all resolved
- (was should-fix #1) dead
isUseCacheFunctionexport → removed. The export and its standalone definition are gone fromcache-runtime.ts; the only thing the shim now does is tag the wrapper with the shared global symbol (cache-runtime.ts:633,Symbol.for("vinext.useCacheFunction")), andotel-tracer-extension.ts:45-50reads that same global-registry symbol inline. Confirmed there is exactly one source of truth now:Because both sides use$ git grep -n "vinext.useCacheFunction\|isUseCacheFunction" packages/vinext/src tests/ otel-tracer-extension.ts:45 const USE_CACHE_FUNCTION_SYMBOL = Symbol.for("vinext.useCacheFunction") otel-tracer-extension.ts:49 ...[USE_CACHE_FUNCTION_SYMBOL] === true cache-runtime.ts:633 (cachedFn as any)[USE_CACHE_FUNCTION_SYMBOL] = true cache-runtime.ts:639 const USE_CACHE_FUNCTION_SYMBOL = Symbol.for("vinext.useCacheFunction")Symbol.for(...)(global registry), they resolve to the identical symbol across module graphs, so the inline check correctly detects functions tagged by the shim. No dead code, no drift risk.
Verification
vp checkon all 4 changed files: clean (no new format/lint/type errors).vp test run tests/otel-tracer-extension.test.ts tests/instrumentation.test.ts: 44 passed (11 + 33).- ALS sharing holds:
otel-tracer-extension.ts:32andprerender-work-unit-setup.tsimport the identical bare specifiervinext/shims/internal/work-unit-async-storage, so they share one instance per environment. - Port faithfulness vs. upstream
instrumentation-node-extensions.ts(canary): lifecycle placement (afterawait register(), beforeonRequestErrorwiring), unconditional enablement (no-op when disabled),startSpanunconditionalexit(),startActiveSpanstore-check + overload arg-index detection (length===2/3/>3) + store re-entry for the callback +exit()for span creation — all match. - The
extendedProvidersWeakSet (a vinext addition over upstream's tracer-onlyWeakTracers) correctly handles provider swaps; covered by a dedicated test (otel-tracer-extension.test.ts:247).
Minor notes (non-blocking, unchanged from last round)
- (Nit) No explicit Edge-runtime
InvariantErrorthrow like upstreamafterRegistration(). Benign on Workers — the missingglobalThis.requirealready short-circuits to a no-op. The docstring atotel-tracer-extension.ts:61documents the Node-only expectation. - (Nit, pre-existing)
globalThis.requireisundefinedin plain Node ESM and absent on Workers; inherited from the establishedclient-trace-metadata.tsconvention, not introduced here. The CI E2E run (cache-components-allow-otel-spans) is the real signal for the dev/SSR runtime path.
No blocking issues. The PR is in good shape to merge once CI (including the targeted E2E) is green.
|
Review posted on PR #1868. SummaryThe third commit ( Prior findings — all resolved:
Verification:
Findings this round: Only two non-blocking nits, both unchanged from the prior round (no Edge-runtime throw — benign no-op on Workers; pre-existing No blocking issues remain. The PR is ready to merge once CI — including the targeted E2E |
What failed
The test
cache-components OTEL spans > should allow creating Spans during resuming a fallback - inside a Cache Component(intest/e2e/app-dir/cache-components-allow-otel-spans/) expects thatt7andt8span IDs are:With
cacheComponents: true, a[slug]/fallbackroute prerenderers a static shell and resumes the dynamic content on every request. The test callstracer.startActiveSpan('span-active-span', fn)during the resume render and checks that the returned span has a real, per-request span ID.Root cause
During Cache Component renders (warmup → fallback-resume phases), the Next.js
workUnitAsyncStoragecarries a prerender or cache store. OpenTelemetry span creation (tracer.startSpan,tracer.startActiveSpan) runs inside this ALS context.In Next.js, after
instrumentation.register()returns, it callsextendInstrumentationAfterRegistration()frominstrumentation-node-extensions.ts. This function monkey-patches the registered OTel tracer provider so that:tracer.startSpan(...)callsworkUnitAsyncStorage.exit(...)before creating the span — the span is created outside the work unit context, getting a fresh span ID.tracer.startActiveSpan(name, fn)also exits the work unit context for span creation, but re-enters the store for thefncallback so thatheaders(),cookies(), etc. still work inside the span body.Without this extension, the prerender/cache work unit store leaks into span creation, causing span IDs to be either reused across requests (frozen context from the prerender phase) or not generated at all (the OTel SDK may short-circuit under certain ALS states).
Vinext called
register()inensureInstrumentationRegistered()but did not call the equivalentextendTracerProviderForCacheComponents()afterward.Fix
Added
packages/vinext/src/server/otel-tracer-extension.ts, a direct port of Next.js'sinstrumentation-node-extensions.ts. It exportsextendTracerProviderForCacheComponents()which:provider.getTracer(...)so every returned tracer has patchedstartSpan/startActiveSpan.startSpanusesworkUnitAsyncStorage.exit(...)to create the span in a clean context.startActiveSpanusesworkUnitAsyncStorage.exit(...)for span creation and wraps the user callback withworkUnitAsyncStorage.run(workUnitStore, fn)to restore the work unit context inside the body.ensureInstrumentationRegistered()ininstrumentation-runtime.tsnow callsextendTracerProviderForCacheComponents()immediately afterregister()resolves, matching Next.js's lifecycle.The extension is a no-op when
@opentelemetry/apiis not installed or when no store is active.Mapping to the test
The failing test navigates to
/novel/fallback(a path NOT ingenerateStaticParams, so the fallback shell is used and content is resumed at request time). TheCachedTracedComponentActiveSpan(t7) andTracedComponentActiveSpan(t8) components calltracer.startActiveSpan(...)during the resume render. With the fix, spans exit the work unit context and get fresh IDs on every request, satisfying bothnot.toEqual(0)andnot.toEqual(t7value)assertions.Verification
vp check: 22 errors, all pre-existing env TS errors (magic-string, image-size, etc.) — zero new errors.vp test run tests/ppr-fallback-shell.test.ts tests/app-fallback-renderer.test.ts tests/request-pipeline.test.ts: all pass.cache-components-allow-otel-spansrequires the full Playwright suite and is left to CI.Fixes #1495