Single Span Sampling #2128

marcotc · 2022-07-05T23:18:07Z

This PR adds support for single span sampling to the tracer.

Single Span Sampling allows you to:

You can configure sampling rule that allow you keep spans despite their
respective traces being dropped by trace-level sampling.

It is configured through the documented environment variables: DD_SPAN_SAMPLING_RULES,ENV_SPAN_SAMPLING_RULES_FILE

All changes in this feature branch have been individually reviewed.

Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>

…ampling

delner

Overall looking good. Some questions.

Only hard ask here is to do some performance testing. I'm particularly interested in:

100+ spans per trace, all traces dropped, rule configured to keep one span

vs

100+ spans per trace, all traces kept, no rules evaluated

This should give us the relative cost increase in a worst case.

delner · 2022-08-08T20:33:04Z

lib/datadog/core/configuration/components.rb

@@ -75,6 +77,7 @@ def build_tracer(settings, agent_settings)
              enabled: settings.tracing.enabled,
              trace_flush: trace_flush,
              sampler: sampler,
+              span_sampler: build_span_sampler(settings),


Don't love that span_sampler is its own component within the tracer, but it seems better to have smaller, simpler responsibilities than to have one sampler that's complex.

delner · 2022-08-08T20:33:50Z

lib/datadog/core/configuration/settings.rb

+            # These rules allow a span to be kept when its encompassing trace is dropped.
+            #
+            # The syntax for single span sampling rules can be found here:
+            # TODO: <Single Span Sampling documentation URL here>


Good call out: let's update this when we can.

delner · 2022-08-08T20:40:47Z

lib/datadog/tracing/flush.rb

        def get_trace(trace_op)
-          trace_op.flush!
+          trace_op.flush! do |spans|
+            spans.select! { |span| single_sampled?(span) } unless trace_op.sampled?


Seems like this will add some cost: having to iterate over each span in each dropped trace. Off the top of my head, not sure how we avoid this, but it is worth noting in regards of possible performance impact.

Fine for now; let's just measure the performance of this before merging, if possible.

Benchmarks show it's not measurably slower, unless single span sampling actually matches, which causes 3 set_tag operations, which are a bit slow.

Benchmarks have been added and attached results to this PR as a comment.

delner · 2022-08-08T20:42:19Z

lib/datadog/tracing/sampling/rate_sampler.rb

@@ -20,6 +20,9 @@ class RateSampler < Sampler
        # * +sample_rate+: the sample rate as a {Float} between 0.0 and 1.0. 0.0
        #   means that no trace will be sampled; 1.0 means that all traces will be
        #   sampled.
+        #
+        # DEV-2.0: Allow for `sample_rate` zero (drop all) to be allowed. This eases
+        # DEV-2.0: usage for many consumers of the {RateSampler} class.


Interesting. Can you clarify?

I expanded the comment in the PR, but here's the gist:
All internal users of RateSampler (RuleSampler and now Single Span Sampling) want sample_rate == 0 to mean "drop all", but they can't do that because of the validation that happens in the RateSampler initializer.

The way they get around it is to not set the value in the initializer, but call:

sampler = RateSampler.new sampler.sample_rate = sample_rate # There's no validation here

This bypasses the validation. Ideally, the RateSampler would respect any rate between 0.0 and 1.0.

delner · 2022-08-08T20:45:43Z

lib/datadog/tracing/sampling/span/matcher.rb

@@ -54,6 +56,13 @@ def match?(span)
            end
          end

+          def ==(other)


Is a span really the same as another span if their service and name are the same? For sampling purposes?

Can you explain the logic behind this a little more?

This equality method referrers to the Matcher class, not to spans.

If two matchers have the exact same instance variables, which is the only state they can have, they are the same.
Currently, the name and service matchers are all the instance variables a Matcher can have, so if two matchers have the same name and service they are effectively the same.

marcotc · 2022-08-08T23:34:17Z

Here are the benchmark results.

Memory usage does have any significant change. The following benchmarks are measuring execution time.

The numbers 1:, 10:, and 100: are the trace size under test, in number of spans.

TraceOperation is kept by trace-level sampling:

1. And no single span sampling is configured (baseline):

This code path does not consult the single span sampler.

                   1:    17532.0 i/s
                  10:     3640.9 i/s - 4.82x  (± 0.00) slower
                 100:      443.4 i/s - 39.54x  (± 0.00) slower

TraceOperation is reject by trace-level sampling:

2. And no single span sampling is configured:

This code path does not consult the single span sampler.

                   1:    20786.5 i/s
                  10:     4217.3 i/s - 4.93x  (± 0.00) slower
                 100:      474.6 i/s - 43.80x  (± 0.00) slower

3. Simple span sampling is configured and all spans are rejected:

The difference between this benchmark and the previous one is the cost to consult single span rules.

                   1:    19565.4 i/s
                  10:     3926.7 i/s - 4.98x  (± 0.00) slower
                 100:      436.3 i/s - 44.85x  (± 0.00) slower

4. Simple span sampling is configured and all spans are kept:

One side effect of being Single Span Sampled is that 3 tags are added to each span successfully being single sampled, thus more overhead is expected.

                   1:    15104.6 i/s
                  10:     3080.9 i/s - 4.90x  (± 0.00) slower
                 100:      365.4 i/s - 41.34x  (± 0.00) slower

Conclusions

The only code path with meaningful performance impact is the 4, and that can be attributed to extra tags being added to each single sampled span, as well as the time it takes to try to match each trace span to the configured rules.

The rules by themselves are not very expensive: the difference between 3 and 2 is effectively the cost to consult single span rules for all spans in a trace.
In fact, the performance of the baseline (1) and 3 are very closely matched: this means that "keeping all spans" is just as expensive as "dropping the trace plus consulting single span sampling rules".

Maybe not surprising, but dropped traces (2) are cheaper than sampled traces (1), like because the PrioritySampler and RuleSampler don't have to be consulted. Neither 1 nor 2 have Single Span Sampling configured, thus this is a tracer-level sampling overhead that existed beforehand.

delner

Performance looks acceptable for a worst case. Tests are passing (minus one annoying MacOS test) so I think this should be good to merge as soon as its rebased.

Nice work!

marcotc added 28 commits June 16, 2022 15:13

[Single Span Sampling] Add single span sampling rule

615a2ce

Update max_per_second tag due to RFC change

61c7e02

Add default values

82821e8

Sample is better than Sampling

8cf2479

Address comments

27cd843

Merge branch 'master' into single-span-rule

6d25572

Rename bool/nil to symbols

8f4617e

Merge pull request #2091 from DataDog/single-span-rule

4f3c210

Merge branch 'master' into feature-single-span-sampling

99fcf46

[Single Span Sampling] Parse user configuration

8645e4c

Better error message on json parse error

d00ea3a

Even better error messages

de3b4a3

Remove stable JSON-related comments

a03ae7f

Merge pull request #2095 from DataDog/single-span-parser

216a43d

Return complete parsing error on partial failures

df907c6

[Single Span Sampling] Span Sampler

8dc706f

Update result value due to upstream changes

46e0562

Make sure important Tracer errors don't spam output

b910e41

Fix namespace lookup

bba9f8d

Simplify test assertion with fewer internal assumptions

3b635ba

Use public SpanOperation API instead

c01f95c

Merge pull request #2098 from DataDog/single-span-sampler

6a1889a

Merge branch 'master' into feature-single-span-sampling

9636112

[Single Span Sampling] Rule configuration settings

eeb2a2b

Better TODO message

f3ff487

Remove one setting nesting level

1fc1621

Add value to conflict setting message

c3eea87

Better file read error message

898709b

marcotc added the feature Involves a product feature label Jul 5, 2022

marcotc self-assigned this Jul 5, 2022

marcotc and others added 15 commits July 8, 2022 17:25

Tell agent to trust trace top-level span tagging

5013c31

Wire Single Span Sampling

ca91ac8

Update lib/datadog/tracing/sampling/span/sampler.rb

239c046

Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>

Flush out single span sampling yard docs

be1cc30

Merge pull request #2131 from DataDog/single-span-fix-priority-sample

10b72d0

Merge branch 'master' into feature-single-span-sampling

5f039f3

Test Span#tag?

72eba19

Rename span name to make it easy to tell our test subject

558afb1

Simplify test

82628a6

Simply integration testing

601ee92

Merge branch 'feature-single-span-sampling' into enable-single-span-s…

a454182

…ampling

Lint commit

3e9c7ca

Hide documentation TODO until public docs are available

c066214

Refactor flushing logic

339cd18

Sorbet

8f238ee

marcotc mentioned this pull request Jul 29, 2022

Wire Single Span Sampling #2142

Merged

marcotc added 2 commits July 29, 2022 14:01

Merge pull request #2142 from DataDog/enable-single-span-sampling

b171d31

Merge branch 'master' into feature-single-span-sampling

d9568bb

marcotc marked this pull request as ready for review July 29, 2022 21:09

marcotc requested a review from a team July 29, 2022 21:09

delner reviewed Aug 8, 2022

View reviewed changes

marcotc added 3 commits August 8, 2022 16:44

Add benchmarks

14a33be

Expand on RateSampler future dev comment

35686f6

Merge branch 'master' into feature-single-span-sampling

17093d8

delner approved these changes Aug 24, 2022

View reviewed changes

marcotc merged commit eaa1469 into master Sep 6, 2022

marcotc deleted the feature-single-span-sampling branch September 6, 2022 18:46

github-actions bot added this to the 1.5.0 milestone Sep 6, 2022

marcotc mentioned this pull request Dec 2, 2022

Fix:Agent sampling rates not applied when environment is set #2287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single Span Sampling #2128

Single Span Sampling #2128

marcotc commented Jul 5, 2022 •

edited

delner left a comment

delner Aug 8, 2022

delner Aug 8, 2022

delner Aug 8, 2022

marcotc Aug 9, 2022

delner Aug 8, 2022

marcotc Aug 9, 2022

delner Aug 8, 2022

marcotc Aug 9, 2022 •

edited

marcotc commented Aug 8, 2022 •

edited

delner left a comment

Single Span Sampling #2128

Single Span Sampling #2128

Conversation

marcotc commented Jul 5, 2022 • edited

delner left a comment

Choose a reason for hiding this comment

delner Aug 8, 2022

Choose a reason for hiding this comment

delner Aug 8, 2022

Choose a reason for hiding this comment

delner Aug 8, 2022

Choose a reason for hiding this comment

marcotc Aug 9, 2022

Choose a reason for hiding this comment

delner Aug 8, 2022

Choose a reason for hiding this comment

marcotc Aug 9, 2022

Choose a reason for hiding this comment

delner Aug 8, 2022

Choose a reason for hiding this comment

marcotc Aug 9, 2022 • edited

Choose a reason for hiding this comment

marcotc commented Aug 8, 2022 • edited

TraceOperation is kept by trace-level sampling:

1. And no single span sampling is configured (baseline):

TraceOperation is reject by trace-level sampling:

2. And no single span sampling is configured:

3. Simple span sampling is configured and all spans are rejected:

4. Simple span sampling is configured and all spans are kept:

Conclusions

delner left a comment

Choose a reason for hiding this comment

marcotc commented Jul 5, 2022 •

edited

marcotc Aug 9, 2022 •

edited

marcotc commented Aug 8, 2022 •

edited