Skip to content
This repository has been archived by the owner on Oct 3, 2023. It is now read-only.

trace: allow access to spans that are not sampled #210

Open
gbbr opened this issue Nov 7, 2018 · 10 comments
Open

trace: allow access to spans that are not sampled #210

gbbr opened this issue Nov 7, 2018 · 10 comments

Comments

@gbbr
Copy link

gbbr commented Nov 7, 2018

In the current implementation and specs, a span is only sent to the exporter when sampled. This is causing problems in some APM systems. For example, at Datadog we compute stats and other events based on traces (such as percentile statistics). Without having access to all traces, it is not possible to make these computations accurately.

This is a specific example, but in my opinion, exporters should have access to everything. Perhaps a secondary argument (the SamplingDecision) or a new field on the exported span could solve this? Possibly the former suggestion would work best to give more importance to this decision.

Related: DataDog/opencensus-go-exporter-datadog#20

@basvanbeek
Copy link
Member

What if people using DataDog exporter would just set the OpenCensus sampler to always sample?
And then if needed have the DataDog exporter decide on what can be discarded and what not?

@gbbr
Copy link
Author

gbbr commented Nov 7, 2018

I guess you're right, that is a good workaround. It is nevertheless a workaround and will prevent people from using the OpenCensus sampling if they desire. But if this is the best we can do then it's fine by me. I guess you could leave the issue open as a proposal if that's ok?

@codefromthecrypt
Copy link

codefromthecrypt commented Nov 7, 2018 via email

@basvanbeek
Copy link
Member

The big question is... do you need additional items like annotations and attributes as well or is just an empty span with only SpanContext sufficient?

As a reference Zipkin has been working on the concept of "firehose" mode which deals with this kind of problem and some others. See: https://cwiki.apache.org/confluence/display/ZIPKIN/Firehose+mode

@gbbr
Copy link
Author

gbbr commented Nov 7, 2018

The big question is... do you need additional items like annotations and attributes as well or is just an empty span with only SpanContext sufficient?

Yes, we would need the data too. I see what you're hinting at though; looking at the opencensus-go implementation I can see that spans aren't recording data when they aren't sampled, so this is clearly not an option given the current state. It would be a bigger change.

@arguile-
Copy link

arguile- commented Nov 7, 2018

Using AlwaysSample has a couple issues:

  • expensive
  • if multiple exporters are used it can overload the other exporter that is expecting sampled traces

Do those calculated metrics (for Datadog) really require more than just the outer span (without attributes)? The resource name, start, duration, status, etc. are all determined by the outermost span for the service under trace.

If OpenCensus could optionally export all spans, could Span.IsSampled() or Span.IsRecordingEvents() could be used to map the DataDog agent sampling priority to PriorityAutoKeep and PriorityAutoReject?

@gbbr
Copy link
Author

gbbr commented Dec 28, 2018

Using AlwaysSample has a couple issues:

  • expensive
  • if multiple exporters are used it can overload the other exporter that is expecting sampled traces

This is true, but is currently the only way the exporter can become aware of all traces. It will be less expensive once DataDog/opencensus-go-exporter-datadog#27 is merged because that will allow Datadog to do its own sampling as long as OC lets everything through.

I'd say that's a reasonable solution for now, and unless there is interest from other parties too into letting all spans, with data included, reach the exporters even when they aren't sampled, I'd say it's fine to close this issue.

@gbbr
Copy link
Author

gbbr commented Dec 28, 2018

As a last point and line of thought, it could also be interesting to consider different sampling rules on a per-exporter basis, but I'm not sure how much of a real use case that is.

@codefromthecrypt
Copy link

Here's a more detailed brief on "firehose mode" in Brave (zipkin java tracer);

we have a collaborative sampledLocal decision. If anything needs the data, then it is recorded. There are no fine tuning knobs, like only record tags or whatever.. what we did was cheapen the intermediate object holding pending data so that it isn't eagerly parsed. If any "exporter" needs data either for every request or a specific one, then that sampledLocal flag becomes true and recording occurs irrespective of the remote sampling flag.

@gbbr
Copy link
Author

gbbr commented Jan 4, 2019

Another idea would be to allow different sampling rules on a per exporter basis. That way you could choose exporters which receive all traces programmatically and let them handle them as they please. This change would still require keeping span data though, if any exporter has an all permissive sampling rule.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants