Enable publishing process context by default#11288
Enable publishing process context by default#11288gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit into
Conversation
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
**What does this PR do?** This PR changes the default of `PROFILING_PROCESS_CONTEXT_ENABLED` to `true`, thus enabling publishing of the [OpenTelemetry process context](open-telemetry/opentelemetry-specification#4719) by default. **Motivation:** When we added this feature, the specification linked above was still a work-in-progress and that's why we chose to not enable it by default. As of version 1.62.0 of dd-trace-java we already ship with the latest version of the spec (that got accepted by opentelemetry). Publishing this information will enable the Datadog Full Host Profiler (which itself is based on the OTel eBPF Profiler) to provide a better experience when profiling apps using dd-trace-java. (And there's plans for more features built atop it -- e.g. in the Datadog agent). This feature is already turned on by default in libdatadog (thus dd-trace-rb, dd-trace-py, dd-trace-dotnet) + dd-trace-rs + dd-trace-go and we're working to make sure every lib has it enabled. **Additional Notes:** Note that the process context is only supported on Linux, so you won't see anything on macOS/Linux (outside of docker). The code already turns those into a no-op, so there's no problem there, just calling it out for folks interested in testing. **How to test the change?** Other than the unit tests, the `otel_process_ctx_dump.sh` script from [this repo](https://github.com/open-telemetry/sig-profiling/tree/main/process-context/c-and-cpp) can be used to see the feature working.
7d782f6 to
cb8c590
Compare
|
Pushed a rebase on top of master to also fix the missing signature (I had pushed from my workspace and need to fix signature -- it's not propagating correctly there) |
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
|
**What does this PR do?** In PR #347 we changed the per-thread context kept by the profiler to follow the OpenTelemetry format from open-telemetry/opentelemetry-specification#4947 But, a crucial part was missing -- calling `ContextApi::registerAttributeKeys`. Because this was not wired up, we did not announce the availability of the thread context, and thus external readers could not pick it up. This PR fixes this gap by automatically calling `registerAttributeKeys` during profiler initialization. **Motivation:** Make external readers (such as the OTel eBPF Profiler) be able to read the per-thread context information. **Additional Notes:** Right now the call to `registerAttributeKeys` relies on a previous call to `OTelContext::setProcessContext`. I believe that should be fine as: 1. dd-trace-java does call that before starting the profiler 2. It calls it by default as of DataDog/dd-trace-java#11288 A separate note is that now that `ContextApi::registerAttributeKeys` is implicitly/automatically called from `Profiler::start`, we _could_ get rid of the Java-level APIs. Yet they seem kinda useful for testing? So I didn't remove them -- thoughts welcome on this. **How to test the change?** With these changes, I was able to start an example app and use https://github.com/scottgerring/ctx-sharing-demo/tree/main/context-reader to check that the trace id/span id/local root span id/custom attribute could be read and were correct. Here's my test app: ```java package com.example; import datadog.trace.api.CorrelationIdentifier; import datadog.trace.api.Trace; import datadog.trace.api.profiling.Profiling; public class App { @trace(operationName = "my.traced.method") public void tracedMethod(boolean sleepForever) { try (var scope = Profiling.get().newScope()) { scope.setContextValue("customer_name", "test customer name"); System.out.println("Process PID: " + ProcessHandle.current().pid()); System.out.println("Trace ID: " + CorrelationIdentifier.getTraceId()); System.out.println("Span ID: " + CorrelationIdentifier.getSpanId()); if (sleepForever) { System.out.println("Sleeping forever..."); while (true) { try { Thread.sleep(Long.MAX_VALUE); } catch (InterruptedException e) { // continue sleeping } } } } } public static void main(String[] args) { boolean sleep = args.length > 0 && args[0].equals("-s"); App app = new App(); app.tracedMethod(sleep); System.out.println("Done."); } } ``` and I ran it with: ```bash java -javaagent:dd-java-agent.jar \ -Ddd.service=my-test-app \ -Ddd.profiling.enabled=true \ -Ddd.profiling.experimental.process_context.enabled=true \ -Ddd.profiling.context.attributes=customer_name \ -cp out:dd-trace-api.jar \ com.example.App "$@" ``` and here's what I saw: ``` // From Java app: Process PID: 366422 Trace ID: 69fca1f2000000000fbf0550eb5ad8ee Span ID: 6518782658524195367 Sleeping forever... // From reader: 2026-05-07T14:30:25.128376Z INFO tail: Monitoring process 366422 (java) 2026-05-07T14:30:25.201870Z INFO custom_labels::process_context::reader: Found named OTEL_CTX mapping addr="0x7d8ced3a0000" 2026-05-07T14:30:25.201893Z INFO custom_labels::process_context::reader: Read process-context header monotonic_published_at_ns=25813536699856 version=2 payload_size=449 2026-05-07T14:30:25.201907Z INFO tail: V1 reader not available: No binary found with v1 custom labels symbols 2026-05-07T14:30:25.201912Z INFO context_reader::v2_reader: V2 reader: parsed key table from process-context num_keys=3 2026-05-07T14:30:25.201915Z INFO context_reader::v2_reader: V2 reader: found otel_thread_ctx_v1 in /tmp/ddprof_ivo_anjo/pid_366422/scratch/libjavaProfiler-dd-tmp13251873446977520577.so 2026-05-07T14:30:25.202895Z INFO context_reader::tls_symbols::process: TLS for otel_thread_ctx_v1: DTV only [module_id=4, tls_offset=0xffffffffffffffff] 2026-05-07T14:30:25.202952Z INFO tail: V2 reader initialized successfully 2026-05-07T14:30:25.202955Z INFO tail: Initialized 1 TLS reader(s) 2026-05-07T14:30:25.208772Z INFO context_reader::output: [v2] iteration = 1, thread = 366423, labels = [trace_id=69fca1f2000000000fbf0550eb5ad8ee, span_id=5a775e4a394faa27, datadog.local_root_span_id=5a775e4a394faa27, _dd.trace.operation=my.traced.method, customer_name=test customer name] ```
What Does This Do
This PR changes the default of
PROFILING_PROCESS_CONTEXT_ENABLEDtotrue, thus enabling publishing of theOpenTelemetry process context by default.
Motivation
When we added this feature, the specification linked above was still a work-in-progress and that's why we chose to not enable it by default. As of version 1.62.0 of dd-trace-java we already ship with the latest version of the spec (that got accepted by opentelemetry).
Publishing this information will enable the Datadog Full Host Profiler (which itself is based on the OTel eBPF Profiler) to provide a better experience when profiling apps using dd-trace-java. (And there's plans for more features built atop it -- e.g. in the Datadog agent).
This feature is already turned on by default in libdatadog (thus dd-trace-rb, dd-trace-py, dd-trace-dotnet) + dd-trace-rs + dd-trace-go and we're working to make sure every lib has it enabled.
Additional Notes
Note that the process context is only supported on Linux, so you won't see anything on macOS/Linux (outside of docker). The code already turns those into a no-op, so there's no problem there, just calling it out for folks interested in testing.
Other than the unit tests, the
otel_process_ctx_dump.shscript from this repo can be used to see the feature working.Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: N/A
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.