-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rule sampler #854
Rule sampler #854
Conversation
class SimpleMatcher < Matcher | ||
# Returns `true` for case equality (===) with any object | ||
MATCH_ALL = Class.new do | ||
# DEV: A class that implements `#===` is ~20% faster than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have a link? this is neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it locally while developing this class, I can upload a gist a benchmark snippet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea; I could benchmark it and post results.
lib/ddtrace/sampling/rule_sampler.rb
Outdated
|
||
attr_reader :rules, :rate_limiter, :priority_sampler | ||
|
||
def initialize(rules, rate_limiter, priority_sampler = Datadog::RateByServiceSampler.new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would need to make sure this priority sampler gets it's rates updated from responses from the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's taken take by delegating :update
to that sampler, like so further down this class:
def_delegators :@priority_sampler, :update
This is similar to how Datadog::PrioritySampler
accomplishes the same goal.
draft.rb
Outdated
Datadog::PrioritySampler.new( | ||
post_sampler: Datadog::RateByServiceSampler.new( | ||
1.0, | ||
env: proc { Datadog.tracer.tags[:env] } # TODO: how do I provide `tracer.tags`? Seems like a circular reference here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initializing a fallback PrioritySampler
is not trivial.
Maybe we could use an interface in our tracer that allows us to append or prepend a sampler to a sampling chain, so that we could prepend our RuleSampler
before the existing PrioritySampler
, and avoid having all this complex initialization information outside of the tracer.
lib/ddtrace/sampling/token_bucket.rb
Outdated
|
||
# TODO: This class name is so bad, yet so good. | ||
# [Class documentation] | ||
class UnlimitedLimiter < RateLimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, great name. If you want a different name my suggestions are just as quirky: NonLimiter
, NoLimiter
, NeverLimiter
. But honestly, UnlimitedLimiter
works just fine.
I might suggest though that this behavior for a limiter shouldn't be a Limiter
but a composable module for a Limiter
(e.g. RateLimiiter.new.extend(UnlimitedRate)
) because an unlimited limiter isn't a distinct species of limiter, just one that behaves a particular way.
In the same line of thinking, your default initializer for RuleSampler
could be rate_limiter = default_rate_limiter
, then you could define:
def default_rate_limiter
Datadog::Sampling::RateLimiter.new.tap do |limiter|
limiter.extend(Datadog::Sampling::RateLimiter::UnlimitedRate)
end
end
Maybe there's a more elegant way of expressing the same thing, but just a general thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see the point you are making, specially if we had something like a DebugRateLimiter
that prints all operations to stdout: we would be able to add this functionality to an existing rate limiter as a mixin.
But for the case of UnlimitedLimiter
, I think that being subclass of RateLimiter
makes more sense, as it pretty much overrides all behaviour of RateLimiter
and replaces with its own. I see it less as a composable feature-set and more as a whole type of limiter in itself. When overlaying it with the token bucket implementation, for example, it removes all token bucket functionality and replaces with its own implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, its probably fine using subclasses here, too. Just wanted to consider our options, but I'm cool with this.
lib/ddtrace/sampling/rule_sampler.rb
Outdated
private | ||
|
||
def sample_span(span) | ||
sampled, sample_rate = @rules.find do |rule| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is a little awkward because you have to find and retrieve two different values from the rule. It may be possible to eliminate Rule#sample
entirely, and simplify things. If we were to change Rule
to delegate to Matcher#match?
, Sampler#sample?
, Sampler#sample_rate
, then you could rewrite this function as:
def sample_span(span)
rule = @rules.find do |rule|
rule.matches?(span)
end
yield(span) unless rule
sampled = rule.sample?(span)
sample_rate = rule.sample_rate(span)
[sampled && rate_limiter.allow?(1), sample_rate]
end
As a general rule of thumb, it might be better for components like Rule
to have very simple methods that do as little as possible and have more complex, compositional objects such as this RuleSampler
orchestrate and drive those smaller components.
There might be even better ways of doing this than what I've suggested, but just some food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one trade-off with this approach which is that custom rules might want to provide a coupled result between the sampling decision (sample?
) and sample_rate. In this case we are making it harder to implement such rules:
class CustomRule < Rule
def sample?(span)
case span.service
when 'service-1'
span.name != 'ignore.op'
when 'service-2'
span.start_time.hour > 7
end
end
def sample_rate(span)
case span.service
when 'service-1'
0.8
when 'service-2'
0.9
end
end
end
Instead of returning a "response" payload with both together:
Rule.new do |span|
case span.service
when 'service-1'
[span.name != 'ignore.op', 0.8]
when 'service-2'
[span.start_time.hour > 7, 0.9]
end
end
After reviewing this, it seems like an ergonomic decision that we wouldn't prioritize in detriment of the maintainability of RuleSampler
. Thus I made changes that mostly follow your suggestion.
sampled = sample_span(span) { |s| @fallback_sampler.sample!(s) } | ||
|
||
sampled.tap do | ||
span.sampled = sampled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for here, we always want span.sampled = true
, we want to do:
span.sampled = true
span.context.sampling_priority = sampled ? Datadog::Ext::Priority::AUTO_KEEP : Datadog::Ext::Priority::AUTO_REJECT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our samplers can be used as stand-alone samplers, as well as composed together.
When using RuleSampler
as a stand-alone sampler, it will make hard decisions.
When using it alongside the PrioritySampler
, the sampling_priority and final sampling decision are taken care of by the priority sampler, which will enforce the conditions you mentioned.
PrioritySampler
is currently the sampler in our code base that is aware of sampling_priority
. All other samplers only make a boolean decision and are not concerned with details around what flags to set in the span
.
lib/ddtrace/sampling/rule_sampler.rb
Outdated
end | ||
end | ||
|
||
def_delegators :@fallback_sampler, :update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can no-op
:update
, we don't need to consider agent returned sample rates at all here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is effectively a no-op, as most sampler implementations we delegate this call to are no-ops.
I think we should keep this delegate as we currently support RateByServiceSampler
in our app. We could remove update
it if we drop support for it in the future.
lib/ddtrace/sampler.rb
Outdated
def priority_sample(span) | ||
@priority_sampler.sample?(span) | ||
def priority_sample!(span) | ||
@priority_sampler.sample!(span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We change this to use sample!
instead of sample?
to allow for @priority_sampler
to add tags to the span.
In our case, we want to set '_dd.rule_psr' and '_dd.limit_psr' when sampling happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, makes sense.
lib/ddtrace/sampler.rb
Outdated
span.set_metric(SAMPLE_RATE_METRIC_KEY, pre_sample_rate_metric) # Restore true sampling metric | ||
else | ||
span.clear_metric(SAMPLE_RATE_METRIC_KEY) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is growing in complexity in order to support the RuleSampler
as a post_sampler
.
It seems nice to keep all priority sampling concerns here, and not have any other sampler worry about it, but I can see an argument for moving away from having the current PrioritySampler
try to do it all.
I'm open to suggestions for different approaches to simplify this approach.
The current implementation in this PR works, but trying to maintain compatibility with all existing samplers, alongside the new rule sampler, has created an arms-race for the PrioritySampler
class. I believe this implementation will be much simpler if we move away from having an agent one day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of time, we might want to keep what we have here if its working, and refactor in the future. I would just make sure you have sufficient tests to verify all these edge cases and conditions introduced by this complexity such that when we do refactor later, we won't introduce any bugs to the sampling logic. If we can do that, then I think its okay to let this be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @marcotc! The logic for all these layered sampling rules is a bit complicated, but I think it was expressed fairly cleanly and minimally in your implementation given what the rules are.
Only other suggestion is making sure we have enough tests that verify all the different scenarios/rules we can reasonably anticipate such that if we do need to modify this code again in the near future, it makes it easier to refactor and makes it less likely introduce any bugs/incorrect behavior. Some basic integration/feature tests might be good to such an end.
This is probably the most important point, but I leave it to your judgment if we've met that bar here. Once you feel like you have, feel free to merge this one. Nice job overall!
lib/ddtrace/sampler.rb
Outdated
span.set_metric(SAMPLE_RATE_METRIC_KEY, pre_sample_rate_metric) # Restore true sampling metric | ||
else | ||
span.clear_metric(SAMPLE_RATE_METRIC_KEY) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of time, we might want to keep what we have here if its working, and refactor in the future. I would just make sure you have sufficient tests to verify all these edge cases and conditions introduced by this complexity such that when we do refactor later, we won't introduce any bugs to the sampling logic. If we can do that, then I think its okay to let this be.
lib/ddtrace/sampler.rb
Outdated
def priority_sample(span) | ||
@priority_sampler.sample?(span) | ||
def priority_sample!(span) | ||
@priority_sampler.sample!(span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, makes sense.
|
||
attr_reader :matcher, :sampler | ||
|
||
# @param [Matcher] matcher A matcher to verify span conformity against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm loving these comments you added to these functions! 💯
lib/ddtrace/sampling/token_bucket.rb
Outdated
|
||
# TODO: This class name is so bad, yet so good. | ||
# [Class documentation] | ||
class UnlimitedLimiter < RateLimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, its probably fine using subclasses here, too. Just wanted to consider our options, but I'm cool with this.
@delner I added integration tests recently to this PR, they are under spec/ddtrace/integration_spec.rb. |
@delner added test coverage for the I added a few extra tests to cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Thank you also for describing the API. I think this is a good start. If we need to improve the API to make it simpler (in the future), we have all the building blocks to hide the complexity.
Thank you very much @delner @marcotc @brettlangdon
elsif rate_limit | ||
Datadog::Sampling::TokenBucket.new(rate_limit) | ||
else | ||
Datadog::Sampling::TokenBucket.new(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default here should be no rate limiter.
span.sampled = true | ||
if pre_sample_rate_metric | ||
# Restore true sampling metric, as only the @pre_sampler can reject traces | ||
span.set_metric(SAMPLE_RATE_METRIC_KEY, pre_sample_rate_metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be unset by any sampler right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also shouldn't be set by any other sampler either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only needed for samplers which cause traces to not be send to the agent span.sampled = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default "post_sampler", RateByServiceSampler
:
dd-trace-rb/lib/ddtrace/sampler.rb
Line 192 in 94b47a6
@priority_sampler = opts[:post_sampler] || RateByServiceSampler.new |
ultimately delegate the sampling decision to an instance of
RateSampler
. That sampler sets this metric whenever it successfully samples: dd-trace-rb/lib/ddtrace/sampler.rb
Lines 71 to 73 in 94b47a6
(span.sampled = sample?(span)).tap do |sampled| | |
span.set_metric(SAMPLE_RATE_METRIC_KEY, @sample_rate) if sampled | |
end |
In case case we restore it to the true sampling metric value, store under pre_sample_rate_metric
.
No sampler today is unsetting this metric, but code is put in place for completeness.
Backstory
Ultimately, the current contract interface we have for samplers (sample?
for no side-effects, and sample!
for true sampling decision with side-effects) does not correctly allow the samplers to be used as both:
- Top-level sampler: which ultimately makes the sampling decision, possibly setting metrics and tags.
- Chained sampler: provide a decision, but without side-effects. One example is the "post_sampler" in
PrioritySampler
above.
We have cases where we want "some" side-effects, like in the RuleSampler
when used in a sampler chain: it needs to set rule-specific metrics (e.g. rule sampling ratio metric), but not actually set the sampling decision. It's a "middle-ground" between the sample!
and sample?
of today.
I chose to go with an implementation with reduce changes to the existing samplers because:
- I did not have a clear vision on how to have a clean contract to represent all these requirements I mentioned above in a concise package.
- I did not want to perform a major refactoring on the existing samplers, introduction risk to users of the existing sampler.
This PR adds support for rule-based sampling of traces.
In its most complete form, here's an example configuration:
Or for 100% sampling with 100 traces/sec limit: