Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w3c: Adds p datadog member to tracestate #3488

Merged
merged 12 commits into from
Mar 20, 2024
Merged

w3c: Adds p datadog member to tracestate #3488

merged 12 commits into from
Mar 20, 2024

Conversation

mabdinur
Copy link
Contributor

@mabdinur mabdinur commented Feb 26, 2024

W3C Phase 2 support

The following operations will allow us to reconstruct traces that contain non datadog spans and use w3c tracecontext headers:

  • Injects last seen datadog span id in tracestate headers
  • Extracts last seen datadog span id and uses this value to set the _dd.parent_id distributed tracing tag

Tests: DataDog/system-tests#2181

2.0 Upgrade Guide notes

What does this PR do?

Motivation:

Additional Notes:

How to test the change?

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@mabdinur mabdinur requested a review from a team as a code owner February 26, 2024 21:08
@mabdinur mabdinur marked this pull request as draft February 26, 2024 21:08
@mabdinur mabdinur force-pushed the munir/add-p branch 2 times, most recently from 6d0c425 to 9249d34 Compare February 27, 2024 21:34
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.23%. Comparing base (81e6060) to head (c9530f6).
Report is 51 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3488      +/-   ##
==========================================
- Coverage   98.23%   98.23%   -0.01%     
==========================================
  Files        1275     1275              
  Lines       75167    75175       +8     
  Branches     3548     3549       +1     
==========================================
+ Hits        73842    73848       +6     
- Misses       1325     1327       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mabdinur mabdinur marked this pull request as ready for review February 28, 2024 15:00
@marcotc
Copy link
Member

marcotc commented Feb 28, 2024

It looks like the span_id value in the traceparent:


will always be equal to the value of the new tag:
tracestate << "p:#{format('%016x', digest.span_id)};"

Doesn't this mean the new tagged introduced in this PR is actually redundant, given the same span_id is already captured in the main traceparent?

In other words, when can the traceparent span_id and the new tag span_id diverge?

@marcotc
Copy link
Member

marcotc commented Feb 28, 2024

I think I found my answer.

It looks like when both values diverge, when extracting remote distributed headers, we should use the value of the tag instead of the value from the traceparent:
https://github.com/DataDog/system-tests/blob/f2c546afe09dce0ad7319c0b6108908c378638da/tests/parametric/test_headers_tracecontext.py#L780-L781

We should add a unit test to cover this specific behaviour in spec/datadog/tracing/distributed/trace_context_spec.rb.

@mabdinur
Copy link
Contributor Author

mabdinur commented Feb 28, 2024

I think I found my answer.

It looks like when both values diverge, when extracting remote distributed headers, we should use the value of the tag instead of the value from the traceparent: https://github.com/DataDog/system-tests/blob/f2c546afe09dce0ad7319c0b6108908c378638da/tests/parametric/test_headers_tracecontext.py#L780-L781

We should add a unit test to cover this specific behaviour in spec/datadog/tracing/distributed/trace_context_spec.rb.

Ideally both values should never diverge. When we inject distributed tracing headers, traceparent and x-datadog headers SHOULD always be consistent (if they are not then this is a bug we need to fix).

We are adding this new dd member to detect the following scenario:

  • Host A: Injects tracecontext headers via ddtrace -> Host B: Injects/Extracts tracecontext headers via another vendor (ex: Dynatrace) -> Host C: Extracts tracecontext headers via ddtrace

In the scenario above Datadog Trace Intake services receive spans from Host A and Host C (host B uses Dynatrace). Although Host A and Host C generate spans in the same trace the relationship between both trace chunks are lost. By updating the tracestate to include the span_id we can ensure third party vendors do not "break" trace propagation for Datadog spans. The span_id in the tracestate is extracted as a distributed tracing tag and sent to the backend/agent. This will allow Datadog UI to better visualize the trace.

This big trade off here is that we are adding 20 new characters to every tracestate header.

tldr: trace parent will always contain 00-current_trace_id-last_span_id-sampling. We will use the tracestate to track the last datadog span id ( last_span_id does not always equal to last_datadog_span_id)

@marcotc
Copy link
Member

marcotc commented Feb 29, 2024

@mabdinur makes sense, thank you!

I think my comment is still correct and should be implemented:

It looks like when both values diverge, when extracting remote distributed headers, we should use the value of the tag instead of the value from the traceparent: DataDog/system-tests@f2c546a/tests/parametric/test_headers_tracecontext.py#L780-L781

Copy link

@bouwkast bouwkast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, I just have a question on when exactly this would be adding the _dd.parent_id tag, I couldn't tell for sure if it was always being added even if datadog headers would have been used.

lib/datadog/tracing/distributed/trace_context.rb Outdated Show resolved Hide resolved
@mabdinur mabdinur changed the base branch from master to munir/add-is-remote March 15, 2024 14:08
@mabdinur mabdinur marked this pull request as draft March 18, 2024 16:56
Base automatically changed from munir/add-is-remote to 2.0 March 18, 2024 22:28
@mabdinur mabdinur marked this pull request as ready for review March 19, 2024 18:59
@mabdinur mabdinur requested a review from marcotc March 19, 2024 18:59
@mabdinur mabdinur merged commit e2504ea into 2.0 Mar 20, 2024
153 checks passed
@mabdinur mabdinur deleted the munir/add-p branch March 20, 2024 19:00
@TonyCTHsu TonyCTHsu modified the milestones: 2.0, 2.0.0.beta1 Mar 20, 2024
@TonyCTHsu TonyCTHsu added the 2.0 label Apr 12, 2024
@TonyCTHsu TonyCTHsu mentioned this pull request Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants