Span tags for "unified naming conventions" #20

dgoffredo · 2023-01-19T21:31:50Z

This revision adds the following tags to every span:

language is hard-coded to "cpp".
process_id is the current PID. It's cached and recalculated on fork.
runtime-id is a pseudo-random UUID. It's cached and recalculated on fork.

From the long internal list of span tags, these are the ones that we weren't yet setting and that are not integration-specific.

Getting the process ID is platform-specific (if we continue to pretend to support Windows). The other platform-specific pieces were gethostname and pthread_atfork. I moved all of these into a new component platform_util.{h,cpp}.

Then, because of the UUID for runtime-id, I moved the pseudo-random facilities from id_generator.cpp into a new random.{h,cpp}.

With these changes, all platform-specific code (i.e. #ifdef _MSC_VER) is in platform_util.cpp, and all thread-local randomness stuff is in random.cpp.

- glibc does not cache the process ID in user space. So, as a (premature?) optimization, we cache the process ID and recalculate it whenever the process forks. - The existing behavior of tagging each span with _dd.origin wasn't tested, so in testing this process ID change I also added a test for origin.

cgilmour

As usual, some comments for consideration.

cgilmour · 2023-01-25T10:45:30Z

src/datadog/random.cpp

+
+  // Set "0100" for the most significant bits of the
+  // second-to-least-significant byte of `high`.
+  high[12] = 0;


In some ways this would be easier to reason about in reverse order.

Yes, because we write numbers big endian (e.g. the comment just above the code). I'll consider changing it.

cgilmour · 2023-01-25T10:45:39Z

src/datadog/random.cpp

+  // This means that we generate IDs with non-negative values that will always
+  // fit into an `int64_t`, which is a polite thing to do when you work with
+  // people who write Java.
+  std::uniform_int_distribution<std::int64_t> distribution_;


Not the biggest deal (there's still plenty of random bits involved) but doesn't this imply we're generating 1x or 2x 63-bit random values instead of 64?

That's a good point. Also, I think that the "produce 63 bits instead of 64 because uint64 is hard" idea is no longer relevant. We could just use 64 bits everywhere. What do you think?

cgilmour · 2023-01-25T10:48:43Z

src/datadog/random.h

+std::uint64_t random_uint64();
+
+// Return a pseudo-random UUID in canonical string form as described in RFC
+// 4122. The result does not include the "urn:uuid:" prefix.  Example:


In the wild, I've never seen a urn prefix on a uuid. And hope that remains the same. I think its mention in the RFC is simply as an example of a specific usage rather than what is typical.

Noted. Perhaps I'll remove that sentence to avoid distraction. The example says it all anyway.

cgilmour · 2023-01-25T10:52:45Z

src/datadog/tags.cpp

@@ -30,6 +30,9 @@ const std::string span_sampling_rule_rate = "_dd.span_sampling.rule_rate";
 const std::string span_sampling_limit = "_dd.span_sampling.max_per_second";
 const std::string w3c_extraction_error = "_dd.w3c_extraction_error";
 const std::string trace_id_high = "_dd.p.tid";
+const std::string process_id = "process_id";
+const std::string language = "language";
+const std::string runtime_id = "runtime-id";


It'd be nice if there were consistency between using hyphens and underscores
I know it's not something you chose but 🤷

Yes, the exceptionalism of runtime-id is even addressed in the internal documents that describe it. Basically "it's old."

cgilmour · 2023-01-25T10:56:36Z

src/datadog/trace_segment.cpp

+  // "atfork" handler that reinitializes `process_id` in child processes. The
+  // `at_fork_in_child` callback must be a function pointer, and so refers to
+  // `cached_process_id` by name rather than keeping `&process_id` in a closure.
+  static int process_id = []() {


This is 7 different concepts intertwined.
I can see how it works, but it's a journey to read and understand.

A mosquito cried out in pain:
"A chemist has poisoned my brain!"
The cause of his sorrow
was para-dichlorodiphenyltrichloroethane

cgilmour · 2023-01-25T11:00:22Z

src/datadog/trace_segment.cpp

      span.tags[tags::internal::origin] = *origin_;
    }
+    span.numeric_tags[tags::internal::process_id] = cached_process_id();


I still have doubts about the tags / numeric tags separation for essentially random values.
IMO it'd be reasonable for things that could either be treated as counters, gauges or categories (ie: enumerated values) but a PID is none of those things.

My guess is somebody thought "it's a number, put it in the number thing."

Also worth remembering that numeric tags are floating point, not that it matters here.

dgoffredo added 7 commits January 19, 2023 13:37

net_util -> platform_util

52c3dd5

get_process_id()

ba67373

move pthread_atfork into platform_util.cpp

4d8e85a

Tag every span with the language "cpp".

09d88ce

move random number generation into its own component

2041618

Tag every span with a resource-id UUID.

a30e2b9

dgoffredo requested a review from cgilmour January 19, 2023 21:31

dgoffredo and others added 9 commits January 19, 2023 16:32

update includes graph

c197d84

missed a spot

f92f877

fix typo

b2b8bcf

be consistent with the order

cadea37

wrong again!

3ede6cf

inline some at_fork handlers

4e93a1b

remove unnecessary includes

2fb8704

comment the cached_* functions

d2ef654

fix tag name: error.msg -> error.message

ee6a565

cgilmour approved these changes Jan 25, 2023

View reviewed changes

dgoffredo added 6 commits January 25, 2023 19:02

remove unnecessary caveat

86fa38a

delambdify

7622b92

generate 64 bits of randomness in a go, not 63

e7600a1

modify bits high-to-low

d99a37b

zero most significant bit of 64-bit trace IDs

f9650f3

zero most significant bit of span IDs

bd1037e

dgoffredo merged commit a0f9a74 into main Jan 26, 2023

dgoffredo deleted the david.goffredo/unified-naming-conventions branch January 26, 2023 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Span tags for "unified naming conventions" #20

Span tags for "unified naming conventions" #20

dgoffredo commented Jan 19, 2023

cgilmour left a comment

cgilmour Jan 25, 2023

dgoffredo Jan 25, 2023

cgilmour Jan 25, 2023

dgoffredo Jan 25, 2023

cgilmour Jan 25, 2023

dgoffredo Jan 25, 2023

cgilmour Jan 25, 2023

dgoffredo Jan 25, 2023

cgilmour Jan 25, 2023

dgoffredo Jan 25, 2023

cgilmour Jan 25, 2023

dgoffredo Jan 25, 2023

Span tags for "unified naming conventions" #20

Span tags for "unified naming conventions" #20

Conversation

dgoffredo commented Jan 19, 2023

cgilmour left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment