Rule sampler #854

marcotc · 2019-11-07T23:29:43Z

This PR adds support for rule-based sampling of traces.

In its most complete form, here's an example configuration:

Datadog.configure do |c|
  c.tracer sampler: Datadog::PrioritySampler.new(
    post_sampler: Datadog::Sampling::RuleSampler.new(
      [
        Datadog::Sampling::SimpleRule.new(name: 'operation.name', sample_rate: 0.9),
        Datadog::Sampling::SimpleRule.new(service: 'service-1', sample_rate: 0.5),
        Datadog::Sampling::SimpleRule.new(service: /service-.*/, name: 'db.select', sample_rate: 0.7),
      ],
      default_sampler: Datadog::RateSampler.new(1.0),
      rate_limiter: Datadog::Sampling::TokenBucket.new(1000),
    )
  )
end

Or for 100% sampling with 100 traces/sec limit:

Datadog.configure do |c|
  c.tracer sampler: Datadog::PrioritySampler.new(
    post_sampler: Datadog::Sampling::RuleSampler.new
  )
end

brettlangdon · 2019-11-25T14:14:28Z

lib/ddtrace/sampling/matcher.rb

+    class SimpleMatcher < Matcher
+      # Returns `true` for case equality (===) with any object
+      MATCH_ALL = Class.new do
+        # DEV: A class that implements `#===` is ~20% faster than


have a link? this is neat.

I tested it locally while developing this class, I can upload a gist a benchmark snippet.

Interesting idea; I could benchmark it and post results.

lib/ddtrace/sampling/rule.rb

brettlangdon · 2019-11-25T14:18:03Z

lib/ddtrace/sampling/rule_sampler.rb

+
+      attr_reader :rules, :rate_limiter, :priority_sampler
+
+      def initialize(rules, rate_limiter, priority_sampler = Datadog::RateByServiceSampler.new)


we would need to make sure this priority sampler gets it's rates updated from responses from the agent.

That's taken take by delegating :update to that sampler, like so further down this class:

def_delegators :@priority_sampler, :update

This is similar to how Datadog::PrioritySampler accomplishes the same goal.

lib/ddtrace/sampling/rule_sampler.rb

marcotc · 2019-11-25T17:21:44Z

draft.rb

+    Datadog::PrioritySampler.new(
+      post_sampler: Datadog::RateByServiceSampler.new(
+        1.0,
+        env: proc { Datadog.tracer.tags[:env] } # TODO: how do I provide `tracer.tags`? Seems like a circular reference here.


Initializing a fallback PrioritySampler is not trivial.
Maybe we could use an interface in our tracer that allows us to append or prepend a sampler to a sampling chain, so that we could prepend our RuleSampler before the existing PrioritySampler, and avoid having all this complex initialization information outside of the tracer.

delner · 2019-11-25T22:22:32Z

lib/ddtrace/sampling/token_bucket.rb

+
+    # TODO: This class name is so bad, yet so good.
+    # [Class documentation]
+    class UnlimitedLimiter < RateLimiter


Haha, great name. If you want a different name my suggestions are just as quirky: NonLimiter, NoLimiter, NeverLimiter. But honestly, UnlimitedLimiter works just fine.

I might suggest though that this behavior for a limiter shouldn't be a Limiter but a composable module for a Limiter (e.g. RateLimiiter.new.extend(UnlimitedRate)) because an unlimited limiter isn't a distinct species of limiter, just one that behaves a particular way.

In the same line of thinking, your default initializer for RuleSampler could be rate_limiter = default_rate_limiter, then you could define:

def default_rate_limiter Datadog::Sampling::RateLimiter.new.tap do |limiter| limiter.extend(Datadog::Sampling::RateLimiter::UnlimitedRate) end end

Maybe there's a more elegant way of expressing the same thing, but just a general thought.

I do see the point you are making, specially if we had something like a DebugRateLimiter that prints all operations to stdout: we would be able to add this functionality to an existing rate limiter as a mixin.

But for the case of UnlimitedLimiter, I think that being subclass of RateLimiter makes more sense, as it pretty much overrides all behaviour of RateLimiter and replaces with its own. I see it less as a composable feature-set and more as a whole type of limiter in itself. When overlaying it with the token bucket implementation, for example, it removes all token bucket functionality and replaces with its own implementation.

Alright, its probably fine using subclasses here, too. Just wanted to consider our options, but I'm cool with this.

delner · 2019-11-25T23:41:53Z

lib/ddtrace/sampling/rule_sampler.rb

+      private
+
+      def sample_span(span)
+        sampled, sample_rate = @rules.find do |rule|


This loop is a little awkward because you have to find and retrieve two different values from the rule. It may be possible to eliminate Rule#sample entirely, and simplify things. If we were to change Rule to delegate to Matcher#match?, Sampler#sample?, Sampler#sample_rate, then you could rewrite this function as:

def sample_span(span) rule = @rules.find do |rule| rule.matches?(span) end yield(span) unless rule sampled = rule.sample?(span) sample_rate = rule.sample_rate(span) [sampled && rate_limiter.allow?(1), sample_rate] end

As a general rule of thumb, it might be better for components like Rule to have very simple methods that do as little as possible and have more complex, compositional objects such as this RuleSampler orchestrate and drive those smaller components.

There might be even better ways of doing this than what I've suggested, but just some food for thought.

There is one trade-off with this approach which is that custom rules might want to provide a coupled result between the sampling decision (sample?) and sample_rate. In this case we are making it harder to implement such rules:

class CustomRule < Rule def sample?(span) case span.service when 'service-1' span.name != 'ignore.op' when 'service-2' span.start_time.hour > 7 end end def sample_rate(span) case span.service when 'service-1' 0.8 when 'service-2' 0.9 end end end

Instead of returning a "response" payload with both together:

Rule.new do |span| case span.service when 'service-1' [span.name != 'ignore.op', 0.8] when 'service-2' [span.start_time.hour > 7, 0.9] end end

After reviewing this, it seems like an ergonomic decision that we wouldn't prioritize in detriment of the maintainability of RuleSampler. Thus I made changes that mostly follow your suggestion.

lib/ddtrace/sampler.rb

lib/ddtrace/sampling/rule_sampler.rb

brettlangdon · 2019-11-26T19:23:08Z

lib/ddtrace/sampling/rule_sampler.rb

+        sampled = sample_span(span) { |s| @fallback_sampler.sample!(s) }
+
+        sampled.tap do
+          span.sampled = sampled


for here, we always want span.sampled = true, we want to do:

span.sampled = true span.context.sampling_priority = sampled ? Datadog::Ext::Priority::AUTO_KEEP : Datadog::Ext::Priority::AUTO_REJECT

Our samplers can be used as stand-alone samplers, as well as composed together.
When using RuleSampler as a stand-alone sampler, it will make hard decisions.

When using it alongside the PrioritySampler, the sampling_priority and final sampling decision are taken care of by the priority sampler, which will enforce the conditions you mentioned.

PrioritySampler is currently the sampler in our code base that is aware of sampling_priority. All other samplers only make a boolean decision and are not concerned with details around what flags to set in the span.

brettlangdon · 2019-11-26T19:23:35Z

lib/ddtrace/sampling/rule_sampler.rb

+        end
+      end
+
+      def_delegators :@fallback_sampler, :update


we can no-op :update, we don't need to consider agent returned sample rates at all here.

This is effectively a no-op, as most sampler implementations we delegate this call to are no-ops.

I think we should keep this delegate as we currently support RateByServiceSampler in our app. We could remove update it if we drop support for it in the future.

lib/ddtrace/sampling/token_bucket.rb

marcotc · 2019-11-26T23:38:52Z

lib/ddtrace/sampler.rb

-    def priority_sample(span)
-      @priority_sampler.sample?(span)
+    def priority_sample!(span)
+      @priority_sampler.sample!(span)


We change this to use sample! instead of sample? to allow for @priority_sampler to add tags to the span.
In our case, we want to set '_dd.rule_psr' and '_dd.limit_psr' when sampling happens.

Gotcha, makes sense.

marcotc · 2019-11-27T18:02:43Z

lib/ddtrace/sampler.rb

+            span.set_metric(SAMPLE_RATE_METRIC_KEY, pre_sample_rate_metric) # Restore true sampling metric
+          else
+            span.clear_metric(SAMPLE_RATE_METRIC_KEY)
+          end


This class is growing in complexity in order to support the RuleSampler as a post_sampler.
It seems nice to keep all priority sampling concerns here, and not have any other sampler worry about it, but I can see an argument for moving away from having the current PrioritySampler try to do it all.

I'm open to suggestions for different approaches to simplify this approach.

The current implementation in this PR works, but trying to maintain compatibility with all existing samplers, alongside the new rule sampler, has created an arms-race for the PrioritySampler class. I believe this implementation will be much simpler if we move away from having an agent one day.

In the interest of time, we might want to keep what we have here if its working, and refactor in the future. I would just make sure you have sufficient tests to verify all these edge cases and conditions introduced by this complexity such that when we do refactor later, we won't introduce any bugs to the sampling logic. If we can do that, then I think its okay to let this be.

delner

Looks good @marcotc! The logic for all these layered sampling rules is a bit complicated, but I think it was expressed fairly cleanly and minimally in your implementation given what the rules are.

Only other suggestion is making sure we have enough tests that verify all the different scenarios/rules we can reasonably anticipate such that if we do need to modify this code again in the near future, it makes it easier to refactor and makes it less likely introduce any bugs/incorrect behavior. Some basic integration/feature tests might be good to such an end.

This is probably the most important point, but I leave it to your judgment if we've met that bar here. Once you feel like you have, feel free to merge this one. Nice job overall!

delner · 2019-11-28T06:24:05Z

lib/ddtrace/sampler.rb

+            span.set_metric(SAMPLE_RATE_METRIC_KEY, pre_sample_rate_metric) # Restore true sampling metric
+          else
+            span.clear_metric(SAMPLE_RATE_METRIC_KEY)
+          end


In the interest of time, we might want to keep what we have here if its working, and refactor in the future. I would just make sure you have sufficient tests to verify all these edge cases and conditions introduced by this complexity such that when we do refactor later, we won't introduce any bugs to the sampling logic. If we can do that, then I think its okay to let this be.

delner · 2019-11-28T06:24:30Z

lib/ddtrace/sampler.rb

-    def priority_sample(span)
-      @priority_sampler.sample?(span)
+    def priority_sample!(span)
+      @priority_sampler.sample!(span)


Gotcha, makes sense.

lib/ddtrace/sampling/matcher.rb

delner · 2019-11-28T06:26:31Z

lib/ddtrace/sampling/rule.rb

+
+      attr_reader :matcher, :sampler
+
+      # @param [Matcher] matcher A matcher to verify span conformity against


I'm loving these comments you added to these functions! 💯

delner · 2019-11-28T06:30:07Z

lib/ddtrace/sampling/token_bucket.rb

+
+    # TODO: This class name is so bad, yet so good.
+    # [Class documentation]
+    class UnlimitedLimiter < RateLimiter


Alright, its probably fine using subclasses here, too. Just wanted to consider our options, but I'm cool with this.

marcotc · 2019-11-28T18:05:17Z

@delner I added integration tests recently to this PR, they are under spec/ddtrace/integration_spec.rb.

marcotc · 2019-11-28T19:19:54Z

@delner added test coverage for the sampler.rb changes. Everything introduced in this PR should be covered now.

I added a few extra tests to cover Sampler#sampling_rate that were missing for existing functionality.

palazzem

Great job! Thank you also for describing the API. I think this is a good start. If we need to improve the API to make it simpler (in the future), we have all the building blocks to hide the complexity.

Thank you very much @delner @marcotc @brettlangdon

brettlangdon · 2019-12-02T16:38:57Z

lib/ddtrace/sampling/rule_sampler.rb

+                        elsif rate_limit
+                          Datadog::Sampling::TokenBucket.new(rate_limit)
+                        else
+                          Datadog::Sampling::TokenBucket.new(100)


The default here should be no rate limiter.

brettlangdon · 2019-12-02T16:46:23Z

lib/ddtrace/sampler.rb

+        span.sampled = true
+        if pre_sample_rate_metric
+          # Restore true sampling metric, as only the @pre_sampler can reject traces
+          span.set_metric(SAMPLE_RATE_METRIC_KEY, pre_sample_rate_metric)


This shouldn't be unset by any sampler right?

It also shouldn't be set by any other sampler either

This is only needed for samplers which cause traces to not be send to the agent span.sampled = false

The default "post_sampler", RateByServiceSampler:

dd-trace-rb/lib/ddtrace/sampler.rb

Line 192 in 94b47a6

@priority_sampler = opts[:post_sampler] || RateByServiceSampler.new

ultimately delegate the sampling decision to an instance of RateSampler. That sampler sets this metric whenever it successfully samples:

dd-trace-rb/lib/ddtrace/sampler.rb

Lines 71 to 73 in 94b47a6

(span.sampled = sample?(span)).tap do |sampled|

span.set_metric(SAMPLE_RATE_METRIC_KEY, @sample_rate) if sampled

end

In case case we restore it to the true sampling metric value, store under pre_sample_rate_metric.
No sampler today is unsetting this metric, but code is put in place for completeness.

Backstory

Ultimately, the current contract interface we have for samplers (sample? for no side-effects, and sample! for true sampling decision with side-effects) does not correctly allow the samplers to be used as both:

Top-level sampler: which ultimately makes the sampling decision, possibly setting metrics and tags.

Chained sampler: provide a decision, but without side-effects. One example is the "post_sampler" in PrioritySampler above.

We have cases where we want "some" side-effects, like in the RuleSampler when used in a sampler chain: it needs to set rule-specific metrics (e.g. rule sampling ratio metric), but not actually set the sampling decision. It's a "middle-ground" between the sample! and sample? of today.
I chose to go with an implementation with reduce changes to the existing samplers because:

I did not have a clear vision on how to have a clean contract to represent all these requirements I mentioned above in a concise package.

I did not want to perform a major refactoring on the existing samplers, introduction risk to users of the existing sampler.

marcotc added 5 commits November 1, 2019 17:49

WIP New sampler

d83c8a1

Token bucket

40a8962

Docs for token bucket

05eab27

Start on tests

3617ec6

V0.8

6c76847

marcotc added core Involves Datadog core libraries feature Involves a product feature labels Nov 7, 2019

marcotc self-assigned this Nov 7, 2019

delner added this to In progress in Active work Nov 21, 2019

document draft.rb

355b000

brettlangdon reviewed Nov 25, 2019

View reviewed changes

marcotc added 4 commits November 25, 2019 09:26

More comments and renaming

59c7a35

Fix return comment

0f0aaa4

Fix RuleSampler metrics

35673ef

Rubocop pass

87a73c0

marcotc commented Nov 25, 2019

View reviewed changes

Make default 100% tracing, not limiting

b9a978c

delner reviewed Nov 25, 2019

View reviewed changes

Simplify rule smapler

421d85c

brettlangdon reviewed Nov 26, 2019

View reviewed changes

marcotc added 8 commits November 26, 2019 15:22

Rename fallback sampler to default

e3d317c

Remove work for agent service priority rate metric

6c102f6

Document matcher

ea8176a

Document Rule

0a387cd

Document RuleSampler

6a811d2

PrioritySampler fixes

2e2ca7c

More initializer options for RuleSampler

d9a2a11

Fix span clear metrics

b9e600d

marcotc commented Nov 26, 2019

View reviewed changes

marcotc added 2 commits November 26, 2019 18:48

Update example

48189c2

Fix for Ruby 2.0

39e7bfb

marcotc changed the title ~~[WIP] New sampler~~ Rule sampler Nov 26, 2019

marcotc marked this pull request as ready for review November 27, 2019 00:08

marcotc requested a review from a team November 27, 2019 00:08

Use Datadog::Utils::Time for monotonic clock

002ac80

marcotc requested a review from delner November 27, 2019 17:06

Merge branch 'master' into feat/new-sampler

951ce66

marcotc commented Nov 27, 2019

View reviewed changes

marcotc added 2 commits November 27, 2019 16:25

Add integration test

0e53c48

Remove draft

4477612

delner previously approved these changes Nov 28, 2019

View reviewed changes

delner moved this from In progress to In review in Active work Nov 28, 2019

Additional sampler.rb testing

a2abbfb

marcotc dismissed delner’s stale review via a2abbfb November 28, 2019 19:08

Refactor PrioritySampler for legibility

202b457

marcotc added 4 commits November 28, 2019 14:23

Add rubocop exception for case equality

a54462a

Add test for clear metric

4f7ecea

Support 0.0 sampling rate

c7c77c0

Set default rate limit to 100/s

b4af4f4

palazzem approved these changes Nov 29, 2019

View reviewed changes

marcotc merged commit 94b47a6 into master Nov 29, 2019

Active work automation moved this from In review to Merged & awaiting release Nov 29, 2019

marcotc deleted the feat/new-sampler branch November 29, 2019 15:47

brettlangdon reviewed Dec 2, 2019

View reviewed changes

marcotc mentioned this pull request Dec 3, 2019

Fix parameter validation for rate limiter #874

Merged

marcotc added this to the 0.30.0 milestone Dec 4, 2019

delner moved this from Merged & awaiting release to Released in Active work Mar 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rule sampler #854

Rule sampler #854

marcotc commented Nov 7, 2019 •

edited

brettlangdon Nov 25, 2019

marcotc Nov 25, 2019 •

edited

delner Nov 25, 2019

brettlangdon Nov 25, 2019

marcotc Nov 25, 2019 •

edited

marcotc Nov 25, 2019

delner Nov 25, 2019 •

edited

marcotc Nov 27, 2019

delner Nov 28, 2019

delner Nov 25, 2019

marcotc Nov 27, 2019

brettlangdon Nov 26, 2019

marcotc Nov 27, 2019

brettlangdon Nov 26, 2019

marcotc Nov 26, 2019

marcotc Nov 26, 2019 •

edited

delner Nov 28, 2019

marcotc Nov 27, 2019

delner Nov 28, 2019

delner left a comment

delner Nov 28, 2019

delner Nov 28, 2019

delner Nov 28, 2019

delner Nov 28, 2019

marcotc commented Nov 28, 2019

marcotc commented Nov 28, 2019

palazzem left a comment

brettlangdon Dec 2, 2019

brettlangdon Dec 2, 2019

brettlangdon Dec 2, 2019

brettlangdon Dec 2, 2019

marcotc Dec 2, 2019


		attr_reader :rules, :rate_limiter, :priority_sampler

		def initialize(rules, rate_limiter, priority_sampler = Datadog::RateByServiceSampler.new)


		attr_reader :matcher, :sampler

		# @param [Matcher] matcher A matcher to verify span conformity against

	(span.sampled = sample?(span)).tap do \|sampled\|
	span.set_metric(SAMPLE_RATE_METRIC_KEY, @sample_rate) if sampled
	end

Rule sampler #854

Rule sampler #854

Conversation

marcotc commented Nov 7, 2019 • edited

Choose a reason for hiding this comment

marcotc Nov 25, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcotc Nov 25, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delner Nov 25, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcotc Nov 26, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcotc commented Nov 28, 2019

marcotc commented Nov 28, 2019

palazzem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Backstory

marcotc commented Nov 7, 2019 •

edited

marcotc Nov 25, 2019 •

edited

marcotc Nov 25, 2019 •

edited

delner Nov 25, 2019 •

edited

marcotc Nov 26, 2019 •

edited