Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parent-based tracing sampling has incorrect logic -- should behave differently on flag=0 vs Parent Span doesn't exist #7574

Closed
andrejpk opened this issue Feb 29, 2024 · 0 comments · Fixed by #7573
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@andrejpk
Copy link
Contributor

The current trace sampling behavior is not a good default. It conflates the parent trace flag and Trace ID-based sampling. With no config to configure something more specific, the default behavior should match the (OTel parent sampler)[https://github.com/open-telemetry/opentelemetry-go/blob/e6e186bfa485f679e35bb775cba63ca24029590d/sdk/trace/sampling.go#L161C1-L170C2].

In what area(s)?

/area runtime

What version of Dapr?

1.1.x

Expected Behavior

OTel ParentSampler behavior:

Parent sampling flag on -> sample
Parent sampling flag off -> do not sample
No/invalid parent: -> fall back to a root sampler (often traceIdRatioSampler)

Actual Behavior

Dapr Trace Sampler behavior:

Parent sampling flag on -> sample
Parent sampling flag off -> fall back to traceIdRatioSampler
No/invalid parent: -> fall back to traceIdRatioSampler

The current implementation combines the second two cases and calls traceIdRatioSampler. This leads to orphaned traces in observability systems, where a correct sampling decision is made for a trace in an upstream service but Dapr does not respect that decision and will start orphaned traces SamplingRate% of the time.

At a minimum, the default behavior should be changed to correctly implement the above logic (and use the ParentSampler in the Go Otel library instead of reimplementing this logic).

A more complete solution would be to expand the Sampling config to provide a richer configuration. For example, it is impossible right now to configure Dapr to only trace when a parent asks for tracing, because setting SamplingRate to 0 completely disables sampling, even though this a very common-sense configuration to use.

Steps to Reproduce the Problem

Release Note

RELEASE NOTE:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: Needs Owner
Development

Successfully merging a pull request may close this issue.

2 participants