-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Span tags for "unified naming conventions" #20
Conversation
- glibc does not cache the process ID in user space. So, as a (premature?) optimization, we cache the process ID and recalculate it whenever the process forks. - The existing behavior of tagging each span with _dd.origin wasn't tested, so in testing this process ID change I also added a test for origin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As usual, some comments for consideration.
src/datadog/random.cpp
Outdated
|
||
// Set "0100" for the most significant bits of the | ||
// second-to-least-significant byte of `high`. | ||
high[12] = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some ways this would be easier to reason about in reverse order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because we write numbers big endian (e.g. the comment just above the code). I'll consider changing it.
src/datadog/random.cpp
Outdated
// This means that we generate IDs with non-negative values that will always | ||
// fit into an `int64_t`, which is a polite thing to do when you work with | ||
// people who write Java. | ||
std::uniform_int_distribution<std::int64_t> distribution_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the biggest deal (there's still plenty of random bits involved) but doesn't this imply we're generating 1x or 2x 63-bit random values instead of 64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Also, I think that the "produce 63 bits instead of 64 because uint64 is hard" idea is no longer relevant. We could just use 64 bits everywhere. What do you think?
src/datadog/random.h
Outdated
std::uint64_t random_uint64(); | ||
|
||
// Return a pseudo-random UUID in canonical string form as described in RFC | ||
// 4122. The result does not include the "urn:uuid:" prefix. Example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the wild, I've never seen a urn
prefix on a uuid. And hope that remains the same. I think its mention in the RFC is simply as an example of a specific usage rather than what is typical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. Perhaps I'll remove that sentence to avoid distraction. The example says it all anyway.
@@ -30,6 +30,9 @@ const std::string span_sampling_rule_rate = "_dd.span_sampling.rule_rate"; | |||
const std::string span_sampling_limit = "_dd.span_sampling.max_per_second"; | |||
const std::string w3c_extraction_error = "_dd.w3c_extraction_error"; | |||
const std::string trace_id_high = "_dd.p.tid"; | |||
const std::string process_id = "process_id"; | |||
const std::string language = "language"; | |||
const std::string runtime_id = "runtime-id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if there were consistency between using hyphens and underscores
I know it's not something you chose but 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the exceptionalism of runtime-id
is even addressed in the internal documents that describe it. Basically "it's old."
src/datadog/trace_segment.cpp
Outdated
// "atfork" handler that reinitializes `process_id` in child processes. The | ||
// `at_fork_in_child` callback must be a function pointer, and so refers to | ||
// `cached_process_id` by name rather than keeping `&process_id` in a closure. | ||
static int process_id = []() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 7 different concepts intertwined.
I can see how it works, but it's a journey to read and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mosquito cried out in pain:
"A chemist has poisoned my brain!"
The cause of his sorrow
was para-dichlorodiphenyltrichloroethane
src/datadog/trace_segment.cpp
Outdated
span.tags[tags::internal::origin] = *origin_; | ||
} | ||
span.numeric_tags[tags::internal::process_id] = cached_process_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have doubts about the tags / numeric tags separation for essentially random values.
IMO it'd be reasonable for things that could either be treated as counters, gauges or categories (ie: enumerated values) but a PID is none of those things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is somebody thought "it's a number, put it in the number thing."
Also worth remembering that numeric tags are floating point, not that it matters here.
This revision adds the following tags to every span:
language
is hard-coded to "cpp".process_id
is the current PID. It's cached and recalculated on fork.runtime-id
is a pseudo-random UUID. It's cached and recalculated on fork.From the long internal list of span tags, these are the ones that we weren't yet setting and that are not integration-specific.
Getting the process ID is platform-specific (if we continue to pretend to support Windows). The other platform-specific pieces were
gethostname
andpthread_atfork
. I moved all of these into a new componentplatform_util.{h,cpp}
.Then, because of the UUID for
runtime-id
, I moved the pseudo-random facilities fromid_generator.cpp
into a newrandom.{h,cpp}
.With these changes, all platform-specific code (i.e.
#ifdef _MSC_VER
) is inplatform_util.cpp
, and all thread-local randomness stuff is inrandom.cpp
.