-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use USER_KEEP/USER_REJECT for RuleSampler decisions #1769
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it seems good... but I think I don't have enough context to evaluate if this change makes sense.
Is there any design doc I could reference to confirm that this is the common behavior we want? E.g., it's a bit unclear to me why we're changing this now.
lib/ddtrace/sampler.rb
Outdated
if span.sampled | ||
# If priority sampling has already been applied upstream, use that, otherwise... | ||
unless priority_assigned_upstream?(span) | ||
# Roll the dice and determine whether how we set the priority. | ||
priority = priority_sample!(span) ? Datadog::Ext::Priority::AUTO_KEEP : Datadog::Ext::Priority::AUTO_REJECT | ||
# If priority sampling has already been applied upstream, use that value. | ||
return span.sampled if priority_assigned?(span) | ||
|
||
assign_priority!(span, priority) | ||
end | ||
# Check with post sampler how we set the priority. | ||
sample = priority_sample!(span) | ||
|
||
# Check if post sampler has already assigned a priority. | ||
return span.sampled if priority_assigned?(span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can the value of span.sampled
be mutated as this method is being executed? If not, could we simply the early returns with return true if priority_assigned?(...)
to clarify what we expect from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot, I simplified it was suggested. 👍
Oops just realized that you DID had already dropped a link to the design doc in slack. Thanks. What I did also notice is that I believe our docs still need to be updated, e.g. this section. |
Codecov Report
@@ Coverage Diff @@
## master #1769 +/- ##
=======================================
Coverage 98.21% 98.21%
=======================================
Files 931 931
Lines 44870 44904 +34
=======================================
+ Hits 44069 44103 +34
Misses 801 801
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the clarifications. Here's my 👍 but I think this is one of those where it makes sense for @delner to doublecheck if we missed something, since there may be unknown unknowns hiding.
# NOTE: We do not advise using a pre-sampler. It can save resources, | ||
# but pre-sampling at rates < 100% may result in partial traces, unless | ||
# the pre-sampler knows exactly how to drop a span without dropping its ancestors. | ||
# | ||
# Additionally, as service metrics are calculated in the Datadog Agent, | ||
# the service's throughput will be underestimated. | ||
attr_reader :pre_sampler, :priority_sampler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually curious, are there a lot of customers using this feature? Since it ends up being a very sharp edge...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's used in Ruby, but it is used by other language clients.
Because pre-smapler is the only way for the user to reduce traffic between ddtrace
and the agent, it's used by clients with super high throughput, as their infra wouldn't handle the trace traffic otherwise (mostly due to high load on the trace agent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall code changes seem fine, although it will inevitably have merge conflicts with 1.0 changes. Thank you!
This reverts commit dad5da9.
This PR changes changes how the sampling decisions yielded by the
Datadog::Sampling::RuleSampler
affect the span's_sampling_priority_v1
tag.Before,
0
(for rejecting) and1
(for keeping) were used forRuleSampler
decisions. The issue with this approach is that0
and1
are default sampling values that allow for the sampling decision to be altered downstream, either by the Agent or by the Datadog backend.0
and1
communicate intent, not a strong decision. These works great when the sampling rate is not strictly important: for example, it is normally desirable to keep an error span, even if it's marked with a sampling decision of0
, as that span will have valuable debugging information.The issue with
0
and1
is that they don't allow for strict control of sampling rates. When limiting the sampling rate is very important (e.g. network traffic constrains, strictly limiting billable costs), such sampling was not possible with theRuleSampler
, custom code snippets were required (e.g.Datadog::ForcedTracing.drop(span)
).After the changes in this PR, decisions that are originated from any user configuration to the
RuleSampler
use priority values-1
(for forced rejection) and2
(for forced keeping). This means thatRuleSampling
sampling configurations will always be strictly respected by the whole Datadog processing pipeline.It is possible for the
RuleSampler
to analyze a span and not make any user-configured decision (e.g. when no rules match, or not sampling rate applies). In this case, the priority is not altered.