Skip to content

feat: otel thread ctx FFI#1915

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 30 commits into
mainfrom
yannham/otel-thread-ctx-ffi
May 12, 2026
Merged

feat: otel thread ctx FFI#1915
gh-worker-dd-mergequeue-cf854d[bot] merged 30 commits into
mainfrom
yannham/otel-thread-ctx-ffi

Conversation

@yannham
Copy link
Copy Markdown
Contributor

@yannham yannham commented Apr 23, 2026

What does this PR do?

This PR adds a basic FFI for the OTel thread-level context feature: create a new context, attach, detach, and update in place.

We also make ThreadContextRecord public, or at least exposed in the FFI. The rationale is that:

  1. it's imposed by the spec, so it should not be a liability regarding breaking changes: we can't really touch it anyway.
  2. as mentioned in the doc of the FFI, there's a potential for SDK updating themselves the contexts without going through libdatadog at all after publication. In this usage mode, the export of the C struct ThreadContextRecord is a way to document its expected memory layout.
Generated C header
// Copyright 2026-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0


#ifndef DDOG_OTEL_THREAD_CTX_H
#define DDOG_OTEL_THREAD_CTX_H

#pragma once

#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>

/**
 * In-memory layout of a thread-level context.
 *
 * **CAUTION**: The structure MUST match exactly the OTel thread-level context specification.
 * It is read by external, out-of-process code. Do not re-order fields or modify in any way,
 * unless you know exactly what you're doing.
 *
 * # Synchronization
 *
 * Readers are async-signal handlers. The writer is always stopped while a reader runs.
 * Sharing memory with a signal handler still requires some form of synchronization, which is
 * achieved through atomics and compiler fence, using `valid` and/or the TLS slot as
 * synchronization points.
 *
 * - The writer stores `valid = 0` *before* modifying fields in-place, guarded by a fence.
 * - The writer stores `valid = 1` *after* all fields are populated, guarded by a fence.
 * - `valid` starts at `1` on construction and is never set to `0` except during an in-place
 *   update.
 */
typedef struct ddog_ThreadContextRecord {
  /**
   * Trace identifier; all-zeroes means "no trace".
   */
  uint8_t trace_id[16];
  /**
   * Span identifier.
   */
  uint8_t span_id[8];
  /**
   * Whether the record is ready/consistent. Always set to `1` except during in-place update
   * of the current record.
   */
  uint8_t valid;
  uint8_t _reserved;
  /**
   * Number of populated bytes in `attrs_data`.
   */
  uint16_t attrs_data_size;
  /**
   * Packed variable-length key-value records.
   *
   * It's a contiguous list of blocks with layout:
   *
   * 1. 1-byte `key_index`
   * 2. 1-byte `val_len`
   * 3. `val_len` bytes of a string value.
   *
   * # Size
   *
   * Currently, we always allocate the max recommended size. This potentially wastes a few
   * hundred bytes per thread, but it guarantees that we can modify the context in-place
   * without (re)allocation in the hot path. Having a hybrid scheme (starting smaller and
   * resizing up a few times) is not out of the question.
   */
  uint8_t attrs_data[ddog_MAX_ATTRS_DATA_SIZE];
} ddog_ThreadContextRecord;

#ifdef __cplusplus
extern "C" {
#endif // __cplusplus

/**
 * Allocate and initialise a new thread context.
 *
 * Returns a non-null owned handle that must eventually be released with
 * `ddog_otel_thread_ctx_free`.
 */
struct ddog_ThreadContextRecord *ddog_otel_thread_ctx_new(const uint8_t (*trace_id)[16],
                                                          const uint8_t (*span_id)[8],
                                                          const uint8_t (*local_root_span_id)[8]);

/**
 * Free an owned thread context.
 *
 * # Safety
 *
 * `ctx` must be a valid non-null pointer obtained from `ddog_otel_thread_ctx_new` or
 * `ddog_otel_thread_ctx_detach`, and must not be used after this call. In particular, `ctx`
 * must not be currently attached to a thread.
 */
void ddog_otel_thread_ctx_free(struct ddog_ThreadContextRecord *ctx);

/**
 * Attach `ctx` to the current thread. Returns the previously attached context if any, or null
 * otherwise.
 *
 * # Safety
 *
 * `ctx` must be a valid non-null pointer obtained from this API. Ownership of `ctx` is
 * transferred to the TLS slot: the caller must not drop `ctx` while it is still actively
 * attached.
 *
 * ## In-place update
 *
 * The preferred method to update the thread context in place is [ddog_otel_thread_ctx_update].
 *
 * If calling into native code is too costly, it is possible to update an attached context
 * directly in-memory without going through libdatadog (contexts are guaranteed to have a
 * stable address through their lifetime). **HOWEVER, IF DOING SO, PLEASE BE VERY CAUTIOUS OF
 * THE FOLLOWING POINTS**:
 *
 * 1. The update process requires a [seqlock](https://en.wikipedia.org/wiki/Seqlock)-like
 *    pattern: [ThreadContextRecord::valid] must be first set to `0` before the update and set
 *    to `1` again at the end. Additionally, depending on your language's memory model, you
 *    might need specific synchronization primitives (compiler fences, atomics, etc.), since
 *    the context can be read by an asynchronous signal handler at any point in time. See the
 *    [Otel thread context
 *    specification](https://github.com/open-telemetry/opentelemetry-specification/pull/4947)
 *    for more details.
 * 2. Only update the context from the thread it's attached to. Contexts are designed to be
 *    attached, written to and read from on the same thread (whether from signal code or
 *    program code). Thus, they are NOT thread-safe. Given the current specification, I don't
 *    think it's possible to safely update an attached context from a different thread, since
 *    the signal handler doesn't assume the context can be written to concurrently from another
 *    thread.
 */
struct ddog_ThreadContextRecord *ddog_otel_thread_ctx_attach(struct ddog_ThreadContextRecord *ctx);

/**
 * Remove the currently attached context from the TLS slot.
 *
 * Returns the detached context (caller now owns it and must release it with
 * `ddog_otel_thread_ctx_free`), or null if the slot was empty.
 */
struct ddog_ThreadContextRecord *ddog_otel_thread_ctx_detach(void);

/**
 * Update the currently attached context in-place.
 *
 * If no context is currently attached, one is created and attached, equivalent to calling
 * `ddog_otel_thread_ctx_new` followed by `ddog_otel_thread_ctx_attach`.
 */
void ddog_otel_thread_ctx_update(const uint8_t (*trace_id)[16],
                                 const uint8_t (*span_id)[8],
                                 const uint8_t (*local_root_span_id)[8]);

#ifdef __cplusplus
}  // extern "C"
#endif  // __cplusplus

#endif  /* DDOG_OTEL_THREAD_CTX_H */

Motivation

OTel thread-level context has been implemented in #1791 in order to provide better interop with the OTel eBPF profiler. The first user is supposed to be dd-trace-rs, but it turns out the dotnet SDK people are interested in using it as well (and eventually other non-Rust SDKs will use it and thus require an FFI).

Additional Notes

N/A

How to test the change?

There's a test to check that the TLS symbol is properly handled. For real usage, we plan to check when integrating in dotnet (or whichever is the first SDK to use it).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/yannham/otel-thread-ctx-ffi

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 21 21 No change (0%)
datadog-live-debugger 6 6 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-remote-config 3 3 No change (0%)
datadog-sidecar 57 57 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 8 8 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 15 15 No change (0%)
Total 203 203 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented Apr 23, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 23.91%
Overall Coverage: 72.61% (-0.02%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 35f80d0 | Docs | Datadog PR Page | Give us feedback!

@yannham yannham marked this pull request as ready for review April 23, 2026 14:48
@yannham yannham requested review from a team as code owners April 23, 2026 14:48
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05868a50b9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread libdd-otel-thread-ctx-ffi/tests/elf_properties.rs Outdated
@yannham yannham requested a review from ivoanjo April 23, 2026 15:42
@yannham yannham force-pushed the yannham/otel-thread-ctx-ffi branch 2 times, most recently from e98d81f to 75add55 Compare April 23, 2026 16:36
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 23.91304% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.61%. Comparing base (91fd13c) to head (35f80d0).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1915      +/-   ##
==========================================
- Coverage   72.63%   72.61%   -0.02%     
==========================================
  Files         448      451       +3     
  Lines       73582    74060     +478     
==========================================
+ Hits        53444    53778     +334     
- Misses      20138    20282     +144     
Components Coverage Δ
libdd-crashtracker 65.31% <ø> (+0.22%) ⬆️
libdd-crashtracker-ffi 37.68% <ø> (+0.85%) ⬆️
libdd-alloc 98.77% <ø> (ø)
libdd-data-pipeline 85.97% <ø> (-0.62%) ⬇️
libdd-data-pipeline-ffi 71.04% <ø> (-4.60%) ⬇️
libdd-common 79.81% <ø> (ø)
libdd-common-ffi 74.41% <ø> (ø)
libdd-telemetry 69.86% <ø> (+0.49%) ⬆️
libdd-telemetry-ffi 19.37% <ø> (ø)
libdd-dogstatsd-client 82.64% <ø> (ø)
datadog-ipc 76.22% <ø> (ø)
libdd-profiling 81.58% <ø> (+0.01%) ⬆️
libdd-profiling-ffi 64.51% <ø> (ø)
libdd-sampling 97.25% <ø> (ø)
datadog-sidecar 29.09% <ø> (-0.75%) ⬇️
datdog-sidecar-ffi 9.67% <ø> (-3.55%) ⬇️
spawn-worker 54.69% <ø> (ø)
libdd-tinybytes 93.16% <ø> (ø)
libdd-trace-normalization 81.71% <ø> (ø)
libdd-trace-obfuscation 87.39% <ø> (+0.13%) ⬆️
libdd-trace-protobuf 68.25% <ø> (ø)
libdd-trace-utils 89.31% <ø> (+0.04%) ⬆️
libdd-tracer-flare 86.88% <ø> (ø)
libdd-log 74.83% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yannham yannham requested review from gleocadie April 24, 2026 09:52
Copy link
Copy Markdown
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite reasonable. Some notes:

  1. I would strongly recommend getting one of the dotnet folks (which I believe are intended first users of this API) to give a pass on the PR

  2. I think we're missing an end-to-end example in C or C++. In particular, one that sets both the process and thread context, and thus can be picked up correctly by the external reader.

    Right now we're kinda assuming that callers will "just know" how to put this together, and suspect that will cause a bunch of confusion. Thus, it would be very helpful to provide some docs + a fully working example that could be used as a model.

  3. As I mentioned, I'm not convinced about the whole "expose the full structure", see my comment on why.

Comment thread libdd-otel-thread-ctx-ffi/build.rs Outdated
Comment thread libdd-otel-thread-ctx/src/lib.rs Outdated
// struct has the right total size.
#[repr(C)]
struct ThreadContextRecord {
pub struct ThreadContextRecord {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about this part. In particular, I don't think we're giving callers the tools that they need to directly use this without getting "burned" right now: no docs, no validation. :hurtrealbad:

My "gut feeling" is: if callers haven't asked for this yet, let's not give it to them. 🤔

If they want to bang at the bits directly, I think in that situation they might as well prefer to reimplement it from scratch, e.g. why half-go through libdatadog if you want maximum control anyway?

Copy link
Copy Markdown
Contributor Author

@yannham yannham Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about this part. In particular, I don't think we're giving callers the tools that they need to directly use this without getting "burned" right now: no docs, no validation. :hurtrealbad:

I would say that there is some documentation (and warnings!) in the documentation of attach. When talking with dotnet people, we thought it might make sense for performance to try to do everything on the SDK side for context switching, without going through native. I agree that then using libdatadog is questionable, but I think you'll still have to manage the whole TLSDESC/elf business. It might be reasonable to use libdatadog to attach the first context and then update it in place.

Also the structure is defined per the spec anyway, so it's not going to change and is "public" (as is, defined somewhere on the internet).

All of that being said, I agree that making it visible here is stronger, as we guarantee our FFI functions always return pointer to contexts as per the spec. Additionally, the in-place-update-from-the-sdk was really mentioned as "we can try that later in the future in mandated", so I would be totally fine reverting the changes to otel-thread-ctx, removing the bit about raw update, making the record type an entirely opaque pointer, and only discuss this again after there's a need for it, or at least some benchmarks.

Why do you dotnet guys think @gleocadie ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll still have to manage the whole TLSDESC/elf business.

I was thinking that if we're building a .so from C/C++, it's even easier to not use libdatadog for this part -- no need to fight linker, just declare a public thread-local symbol.

But yeah, as I mentioned above -- my suggestion is "gut feeling we're not gonna need it" -- if y'all want to try this path, we can, but I would suggest avoiding any "let's expose this now just-in-case-for-later" things ;)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that if we're building a .so from C/C++, it's even easier to not use libdatadog for this part -- no need to fight linker, just declare a public thread-local symbol.

That's true... although you still need the process context part, and re-implementing this in C/C++ starts to be a tad less trivial. But I agree that if you only need to swap the TLS slot, you're probably better off with a thin C/C++ shared lib than messing around with the Rust build 👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, I agree you'd still probably want to use the libdatadog process context impl, assuming whatever downstream consumer we're talking about is pulling libdatadog already as a dependency anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you meant would be impossible?

I suspect it would be impossible; but I may be missing some detail.

Since dotnet is the first motivation here, I guess it's @chrisnas' question to answer: even in the case of doing the update directly from dotnet, do you still see value in libdatadog for setting up the process context and attach the first thread context?

To be clear I think process context is very worth using the libdatadog one. The thread context is the one where I think the options are 100% libdatadog or 0% libdatadog -- I don't think 10% libdatadog is going to be very useful and I suspect it'll be more confusing/error-prone than not ;)

I would like to move forward with an FFI anyway, because I have some small benchmarks that rely on its existence. However, if it's [edit: NOT so] useful for dotnet in the end, I can trim the PR to make the context private again, which would be minimal first version.

Idk, I'm still in the "I don't think it's very useful to add stuff if we don't think we'll need them", but also we can and should evolve the API with what we learn from adding this to dotnet, so from me not a blocker at all.

Copy link
Copy Markdown
Contributor Author

@yannham yannham May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think 10% libdatadog is going to be very useful and I suspect it'll be more confusing/error-prone than not ;)

I'm not sure I get why. Once again you need to do some stuff in native anyway, so if you can reuse/offload and pull libdatadog's parts for the process context, it's strictly better to re-use what you can for thread context than re-implementing it yourself in unsafe C/C++ on the side...?

At any rate, I want to insist that the whole debate here is purely on the contract/visibility part of the API. There' absolutely no code or logic involved in making the thread context updatable purely on the .NET runtime side. It's mostly about making some C struct exported or not in the header, but this struct's layout is available in the spec anyway. My personal view is that it doesn't cost much to expose the possibility as long as we have that big CAUTION in the code comments (like: you can do it, but at your own risk). I think @ivoanjo is wary of making this part of the official public API because it's full of traps, which I can also understand.

So here is a proposal: for now, let's get the record struct out of the API and make returned values opaque pointers. Still, the code and the memory layout will be exactly the same as if we didn't. So it's not exposed in the API, but if @chrisnas wants to experiment with in-place update, they can and have everything they need to do it (I don't see any reason why we would change the layout or introduce indirection on the libdatadog side in the forseeable future). Once we see how it turns out, and if .NET think it's worth keeping libdatadog in that setting, we can update the FFI and make it more "official", or do nothing otherwise and keep the opaque version. That would unblock this PR and be forward-compatible with changing our mind.

What do you guys think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's strictly better to re-use what you can for thread context than re-implementing it yourself in unsafe C/C++ on the side...?

If you go 100% libdatadog thread context, then yes, you get all the benefits of the rust safety.

But to me this argument falls apart once you need to implement the logic to bang on the bytes directly. In that situation, you might as well bang on the bytes always, not "half the time is libdatadog thread context, half the time is my own code" -- I think that'll be more error prone and hard to follow. It's even harder to test ;)

I think @ivoanjo is wary of making this part of the official public API because it's full of traps, which I can also understand.

My argument is that if you're going to do things directly, you need to really understand the spec (it's the "play in hard mode choice").. So at that point, you can look at libdatadog thread context as a reference for code, but I'm not sure having the struct definition or some other small tidbits helps very much -- you really do need to get into the spec to understand how to use it, no shortcuts anymmore.

Or, you can play in easy mode and use libdatadog thread context -- to me that's the choice here. Easy or hard, 100% libdatadog thread context or 0%.

So here is a proposal

Ack from my side!

Copy link
Copy Markdown
Contributor Author

@yannham yannham May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But to me this argument falls apart once you need to implement the logic to bang on the bytes directly. In that situation, you might as well bang on the bytes always, not "half the time is libdatadog thread context, half the time is my own code" -- I think that'll be more error prone and hard to follow. It's even harder to test ;)

The big difference to me is that the goal here is to bang the bytes from the .DOTNET runtime. Since the point is to not pay for the FFI. So this byte banging part is done in C# or whatnot. However, to install the thread context, it's very likely that you need a native shared library anyway, because of the whole TLSDESC and linker business. Hence, unless like Java you want to do custom/specific stuff in the native byte banging part, the trade-off is using ready-built libdatadog for installing the context versus creating a separate C/C++/Rust project on the side just for that, figure out the TLS dialect, linking business, etc. That is, redo mostly what's already done in otel-thread-ctx. Of course if you're going to bang the byte from native, then sure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] because of the whole TLSDESC and linker business [...]

Right, but the tlsdesc is very easy to do from C/C++ and really hard to do from rust, so again, I'm not seeing a lot of benefits :P

The dotnet runtime isn't directly calling into rust, there's already a bunch of C++ glue code ;)

@yannham yannham force-pushed the yannham/otel-thread-ctx-ffi branch from dcf5548 to e1af933 Compare April 28, 2026 17:00
Comment thread libdd-otel-thread-ctx/src/lib.rs Outdated
// struct has the right total size.
#[repr(C)]
struct ThreadContextRecord {
pub struct ThreadContextRecord {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be totally fine reverting the changes to otel-thread-ctx, removing the bit about raw update, making the record type an entirely opaque pointer, and only discuss this again after there's a need for it, or at least some benchmarks.

We would definitively want to use in-place value change for .NET for performance reason due to the high number of span that can be created over a trace.

This is EXACTLY how it works today: the tracer asked the profiler (equivalent of libdatadog call) ONCE per thread where to set the span id value and then, directly update the value in memory when the span changes for the thread.

/// hundred bytes per thread, but it guarantees that we can modify the context in-place
/// without (re)allocation in the hot path. Having a hybrid scheme (starting smaller and
/// resizing up a few times) is not out of the question.
attrs_data: [u8; MAX_ATTRS_DATA_SIZE],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the place where to store the endpoint?
If this is the case, how to know the string ID corresponding the EndPoint attribute. Since the value seems to also be an interned string ID, it means that each different end must be interned in the libdatadog string table; growing over time since the endpoints could be very different

Don't forget that the endpoint is usually know close to the end of the request. It means that any client would have to "reconciliate" the real endpoint for the samples already created for this thread. More important, there is no notification for the full host to know WHEN a thread gets its endpoint so it looks like impossible to be sure of the right endpoint to use if not already set; i.e. the same thread could have switch request since the last time samples where created.

Also, how to reset the current end point? by setting an "empty string" ID?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What interning API should we use ?

Copy link
Copy Markdown
Contributor Author

@yannham yannham May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. Since the value seems to also be an interned string ID

They are not, only the keys are interned string IDs. I think that what we drafted in our first discussion, that is having a small set of known keys with fixed IDs as a convention (root_span_id is 0, endpoint or whatever is the end point attribute is 1, etc.) sounds like the way to go, ideally? So there would be no need for dynamic string interning business at all.

For the endpoint question, I'm not sure to be honest. Even beyond the current spec, given the setup and the constraints of an external eBPF profiler, I don't see an easy way out of having the profiler do some additional work and book-keeping to correlate past samples once the endpoint become known. Maybe @ivoanjo has some thoughts on this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so there's two versions of this answer for the "endpoint shows up late" problem.

The in-process datadog version of this answer: While we want to adopt this mechanism, we don't have to adopt it exclusively. E.g. like we did for Java, if it's the datadog tracer talking to the datadog in-process profiler, we can have additional channels beyond just this one to provide extra information if needed. So for sure we don't want to regress in terms of feature-set, and the Java PR is a great example of how we both implement the otel spec and still keep the entire feature-set.

The out-of-process and pure-otel version of this answer to me is: I think we'll have to live with best-effort for now. That is -- we set the endpoint name in the context when we get it, and if the eBPF Profiler misses it, such is life. If an otel backend wants an accurate picture, the otel backend will probably need to look at traces/spans for this information.

The whole basis of the thread context sharing mechanism is to be a "loose coordination" kinda thing; I'm hoping to avoid expanding the scope for now because it opens a lot of hard questions that would make the implementation a lot more complex IMO.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there would be no need for dynamic string interning business at all
Good news :^)
So, how are we supposed to set the string corresponding to the value of the interned attribute corresponding to the endpoint?

@ivoanjo: are there any other attributes that would make sense to "attach"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been keeping a list here.

I'd say definitely the datadog.local_root_span_id and the (best-effort, with limitations) datadog.trace_endpoint, as well as the datadog.thread_name and datadog.thread_id (in cases where thread id doesn't match what the OS can see).

Beyond that I see we in prof-dotnet pprofs that we have appdomain name, appdomain process id -- if you think those are useful, maybe throw them in there too? If not, don't ;) (If you do -- I suggest calling them e.g. datadog.appdomain_name and updating the doc I linked so we can try to keep everyone in-sync and avoid the backend needing to know 3 datadog names for the same thing)

@chrisnas chrisnas requested a review from andrewlock May 5, 2026 12:21
Comment thread libdd-otel-thread-ctx-ffi/src/lib.rs Outdated
/// `ddog_otel_thread_ctx_detach`, and must not be used after this call. In particular, `ctx`
/// must not be currently attached to a thread.
#[no_mangle]
pub unsafe extern "C" fn ddog_otel_thread_ctx_free(ctx: *mut ThreadContextRecord) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm the expected usage:

the first time a thread needs to set the context:

pCurrentThreadContext = ddog_otel_thread_ctx_new(...);
ddog_otel_thread_ctx_attach(pCurrentThreadContext)

when a span is processed by a thread:

// get current thread's context
...

// directly set the value of span id/trace id (in case of reset)
...

However, on shutdown, I don't see a way to "detach" all contextes but only for the current thread. Not sure it would be safe to call ddog_otel_thread_ctx_free() on each context without detaching first

Copy link
Copy Markdown
Contributor Author

@yannham yannham May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible answer is that you don't do anything, if you think this can be acceptable: if we're shutting down just before the process exit for example, the OS will reclaim the memory anyway, so not sure it's worth doing it manually just before.

Otherwise I admit the thread clean-up story might be non-trivial. If you need to free contexts either all at once, or during the app lifecycle as threads are created and freed, I can see two ways:

  • either do it as some kind of exit callback/cleanup phase on the thread before it exits (if doable, easiest)
  • or store aliases to the contexts in a separate datastructure (says a map threadid -> contexts, or a queue of to-be-cleaned, or even a pool of to-be-reused contexts), and free or reuse after the corresponding thread exited. Once the original thread is dead, the TLS has been de-allocated anyway so the profiler can't reach it anymore. It's then safe to call ddog_otel_thread_ctx_free() from another thread or to repurpose the context.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only risk I see comes from "short lived" threads for which it would be better to be able to release the memory. In .NET, we might be notified, from the exiting thread, that it will dies soon and call the free API but I don't know for the other runtimes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good for .NET then! For other runtimes, as I said there are other possibilities (overall map of active contexts/queue/pool). If you can test if a thread is still active, you can maybe do some kind of cleanup in the background or at regular intervals.

@yannham yannham requested a review from ivoanjo May 7, 2026 14:41
Copy link
Copy Markdown
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

yannham and others added 10 commits May 11, 2026 13:57
Rust's cdylib linker emits a version script with `local: *` that hides
all non-Rust symbols, preventing `custom_labels_current_set_v2` from
appearing in the dynamic symbol table. Without a dynsym entry, external
readers (e.g. the eBPF profiler) cannot locate the thread-local slot.

Add a supplementary version script with an explicit `global:` entry for
the symbol, which takes precedence over the `local: *` wildcard. Also
force lld explicitly, since merging multiple version scripts is not
supported by GNU ld.

Also adds a temporary dummy FFI wrapper around `ThreadContext::attach`
to keep the TLSDESC access live during verification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mbol

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yannham and others added 17 commits May 11, 2026 13:57
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The FFI now exposes an opaque OtelThreadCtx handle instead of
ThreadContextRecord. The core crate is unchanged — the FFI casts
between NonNull<ThreadContextRecord> and NonNull<OtelThreadCtx>
at the boundary. The in-place update documentation has been removed
since direct memory access from outside libdatadog is not officially
supported.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>
Environments with an old system lld (e.g. lld 7 from llvm-toolset-7.0
on CentOS) fail to link the cdylib because they cannot handle
R_X86_64_GOTPC32_TLSDESC relocations in shared libraries. GNU ld is
not an option either, as it cannot merge multiple version scripts.

Discover the toolchain's bundled rust-lld (LLD 19+ since the MSRV) via
`rustc --print sysroot` and pass its gcc-ld shim directory with `-B` so
the C compiler driver finds it before any system-wide lld. Fall back to
the system lld when rust-lld is not available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yannham yannham force-pushed the yannham/otel-thread-ctx-ffi branch from 13128d5 to 0464997 Compare May 11, 2026 11:57
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 11, 2026

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 81.66 MB 81.66 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.57 MB 7.57 MB 0% (0 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 97.84 MB 97.84 MB 0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.01 MB 10.01 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 24.40 MB 24.40 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 79.87 KB 79.87 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 179.62 MB 179.66 MB +.02% (+40.00 KB) 🔍
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 910.67 MB 910.67 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 7.71 MB 7.71 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 79.87 KB 79.87 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 23.11 MB 23.11 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 45.25 MB 45.25 MB 0% (0 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.02 MB 21.02 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 81.11 KB 81.11 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 183.76 MB 183.78 MB +.01% (+24.00 KB) 🔍
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 896.93 MB 896.93 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 5.98 MB 5.98 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 81.11 KB 81.11 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 24.74 MB 24.74 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 42.75 MB 42.75 MB 0% (0 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 72.78 MB 72.78 MB 0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.41 MB 8.41 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 90.53 MB 90.53 MB 0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.03 MB 10.03 MB 0% (0 B) 👌

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 866d7c8 into main May 12, 2026
92 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the yannham/otel-thread-ctx-ffi branch May 12, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants