Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span tags for "unified naming conventions" #20

Merged
merged 22 commits into from
Jan 26, 2023

Conversation

dgoffredo
Copy link
Contributor

This revision adds the following tags to every span:

  • language is hard-coded to "cpp".
  • process_id is the current PID. It's cached and recalculated on fork.
  • runtime-id is a pseudo-random UUID. It's cached and recalculated on fork.

From the long internal list of span tags, these are the ones that we weren't yet setting and that are not integration-specific.

Getting the process ID is platform-specific (if we continue to pretend to support Windows). The other platform-specific pieces were gethostname and pthread_atfork. I moved all of these into a new component platform_util.{h,cpp}.

Then, because of the UUID for runtime-id, I moved the pseudo-random facilities from id_generator.cpp into a new random.{h,cpp}.

With these changes, all platform-specific code (i.e. #ifdef _MSC_VER) is in platform_util.cpp, and all thread-local randomness stuff is in random.cpp.

- glibc does not cache the process ID in user space.  So, as a
  (premature?) optimization, we cache the process ID and recalculate it
  whenever the process forks.
- The existing behavior of tagging each span with _dd.origin wasn't
  tested, so in testing this process ID change I also added a test for
  origin.
Copy link
Contributor

@cgilmour cgilmour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, some comments for consideration.


// Set "0100" for the most significant bits of the
// second-to-least-significant byte of `high`.
high[12] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some ways this would be easier to reason about in reverse order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because we write numbers big endian (e.g. the comment just above the code). I'll consider changing it.

// This means that we generate IDs with non-negative values that will always
// fit into an `int64_t`, which is a polite thing to do when you work with
// people who write Java.
std::uniform_int_distribution<std::int64_t> distribution_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the biggest deal (there's still plenty of random bits involved) but doesn't this imply we're generating 1x or 2x 63-bit random values instead of 64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Also, I think that the "produce 63 bits instead of 64 because uint64 is hard" idea is no longer relevant. We could just use 64 bits everywhere. What do you think?

std::uint64_t random_uint64();

// Return a pseudo-random UUID in canonical string form as described in RFC
// 4122. The result does not include the "urn:uuid:" prefix. Example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the wild, I've never seen a urn prefix on a uuid. And hope that remains the same. I think its mention in the RFC is simply as an example of a specific usage rather than what is typical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. Perhaps I'll remove that sentence to avoid distraction. The example says it all anyway.

@@ -30,6 +30,9 @@ const std::string span_sampling_rule_rate = "_dd.span_sampling.rule_rate";
const std::string span_sampling_limit = "_dd.span_sampling.max_per_second";
const std::string w3c_extraction_error = "_dd.w3c_extraction_error";
const std::string trace_id_high = "_dd.p.tid";
const std::string process_id = "process_id";
const std::string language = "language";
const std::string runtime_id = "runtime-id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if there were consistency between using hyphens and underscores
I know it's not something you chose but 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the exceptionalism of runtime-id is even addressed in the internal documents that describe it. Basically "it's old."

// "atfork" handler that reinitializes `process_id` in child processes. The
// `at_fork_in_child` callback must be a function pointer, and so refers to
// `cached_process_id` by name rather than keeping `&process_id` in a closure.
static int process_id = []() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 7 different concepts intertwined.
I can see how it works, but it's a journey to read and understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mosquito cried out in pain:
"A chemist has poisoned my brain!"
The cause of his sorrow
was para-dichlorodiphenyltrichloroethane

span.tags[tags::internal::origin] = *origin_;
}
span.numeric_tags[tags::internal::process_id] = cached_process_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have doubts about the tags / numeric tags separation for essentially random values.
IMO it'd be reasonable for things that could either be treated as counters, gauges or categories (ie: enumerated values) but a PID is none of those things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is somebody thought "it's a number, put it in the number thing."

Also worth remembering that numeric tags are floating point, not that it matters here.

@dgoffredo dgoffredo merged commit a0f9a74 into main Jan 26, 2023
@dgoffredo dgoffredo deleted the david.goffredo/unified-naming-conventions branch January 26, 2023 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants