Skip to content
This repository has been archived by the owner on Oct 3, 2023. It is now read-only.

Add support for https://github.com/w3c/distributed-tracing #70

Closed
bogdandrutu opened this issue Mar 15, 2018 · 11 comments
Closed

Add support for https://github.com/w3c/distributed-tracing #70

bogdandrutu opened this issue Mar 15, 2018 · 11 comments

Comments

@bogdandrutu
Copy link
Contributor

Once https://github.com/w3c/distributed-tracing is marked as alpha the OpenCensus library should start supporting this generic format.

This is for the moment a tracking issue for this support until we have the confirmation that the standard is at least in alpha.

@rakyll
Copy link
Contributor

rakyll commented Mar 15, 2018

OpenCensus Go already has support for this format. Feel free to use it as reference. https://github.com/census-instrumentation/opencensus-go/tree/master/plugin/ochttp/propagation/tracecontext

@bogdandrutu
Copy link
Contributor Author

@rakyll we probably need to propagate the Trace-State as well. For the moment no need to look into it but we need to propagate that.

@lmolkova
Copy link

lmolkova commented Jul 5, 2018

@bogdandrutu @adriancole @rakyll,

we've been looking into the changes needed to support the W3C protocol and wanted to share our thoughts with you.
Here is a high level description of what the design might look like and which scenarios should be addressed. There are some open questions, mostly on how to implement it in the most efficient way.

Would be great to hear your thoughts on it and discuss overall plan and the details.

traceparent

Now the header that carries span context is called traceparent. Go already have a propagation component for it, but uses a different name.

Supporting traceparent seems to be straightforward in any language as it is the same as X-Cloud-Trace-Context.

The main point here: we should agree on intention that W3C becomes default propagation component.

E.g. in python flask collector X-Cloud-Trace-Context is hardcoded.

tracestate

Tracestate carries tracing-system specific context in a list of key value pairs. tracestate allows different vendors propagate additional information and interoperate with their legacy Id formats (and migrate from legacy format to the traceparent).

One of the possible scenarios for tracestate is:

  • app instrumented with OpenCensus (and configured to export data to ApplicaitonInsights), receives an incoming http request
  • it contains some traceparent and tracestate: ms-id=|123.,other-id=abc headers
  • propagation component extracts traceparent and tracestate
  • when new child span is created, a vendor plug-in is notified that a new span is created
  • ApplicationInsights reads ms-id on the parent context and recognizes it, it assigns new ms-id=|123.xyz value to the child span context
  • no one recognizes 'other-id', so it is copied as is to the child span context
  • propagation component injects child span context (including tracestate)
  • when span is exported, ApplicaitonInsights exporter adds ms-id into the payload it sends to ApplicationInsights ingestion backend and ignores 'other-id'

So the key requirement here is: 'unknown' pairs should be propagated blindly (in and out of process), 'known' pairs could be modified by pluggable vendor-specific component for each span.

From the OpenCensus side, supporting tracestate would mean

  1. Creating a new (or updating existing in case of Go) propagation component
  2. Arguably, extend SpanContext to carry tracestate:
  • tracestate is something that is extracted and injected from/into the headers/metadata similarly to trace id and span id
  • while some languages could allow storing tracestate implicitly, we should have a consistent API everywhere, so tracestate should be explicit
  1. Synchronously notify registered vendor plugins that new span is created/starting so it can update tracestate on it's context.

General implementation considerations

Tracestate

Tracestate should be represented by the class that allows to add, get, update and delete pairs. It should handle overflows of key, value and overall length.
According to W3C spec the last updated pairs should appear first, i.e. order matters and prepend and delete should be cheap. So internally it could store the actual pairs in the linked list.

It is likely that tracestate is empty or contains just a few pairs. It's also likely that only the first pair will ever be updated (all apps in the cluster are instrumented by the same vendor and request leaves cluster boundary rarely).

Callback

Callback should be fired when new span is created/started. OpenCensus should pass child tracestate to the vendor plugins and must synchronously wait for all plugins to complete updating a child state.
The new Span should start with the child state and parent state should not be modified.

OpenCensus should implement default behavior: copy parent tracestate to child tracestate (open question: can we do it efficiently, without copy?).
After than vendor plugins may run on the child state and update it.

If there are no plugins registered, callback may be skipped (as well as copying the state).
If tracestate is empty, callback should still be fired.

Instead of pub-sub callback, we could use inheritance: have an interface that vendor can implement and register. It will be invoked to update the context. If there is no registered implementation - tracestate is propagated without modification. However, this will prevent certain scenarios like multiple vendor plugins.

Open questions:

  1. How efficiently store tracestate to avoid copy parent state. Tracestate is likely empty or contains just a few items, so may be optimization does not worth it.
  2. Multiple vendor plugins: is it a common case? We already have a scenario when we want to have an extensibility point to support different internal propagation formats.

@wu-sheng
Copy link

wu-sheng commented Jul 6, 2018

Thanks for @reyang to bring me to here. I am familiar with what W3C trace context do.

I think @reyang 's proposal in #131 (comment) is very useful.

@codefromthecrypt
Copy link

small notes:

sometimes a span is resumed (not just new or started) so for example brave decorate is hook whenever a span context is to be used or used again https://github.com/openzipkin/brave/blob/master/brave/src/main/java/brave/propagation/Propagation.java

also and similarly true that we should focus on trace state though there may be other consumers of hooks like this (ex arbitrary or not yet defined field propagation). even if we do subclass approach for things here it might not be viable for multi party. this is why brave propagation hook is composition (of extra things) not inheritance even if a sub hook (like tracestate) could have inheritance.

anyway just offering 2p as did something similar last year and it works for different trace contexts.. might be helpful background even if only java.

@bogdandrutu
Copy link
Contributor Author

Great feedback @lmolkova.

Not 100% I understand the discussion about sub-class approach vs composition. Can @lmolkova or @adriancole write an example.

Here are things that we should do to support tracestate (based on my current understanding):

  • Add an immutable class called TraceState with a Builder class that allows modification only when creating a new instance (or similar mechanism based on the language patterns). It is important that once the TraceState is created for a Span to be immutable that way a Span will have the same TraceState at any moment.
  • Extend SpanContext to contain a TraceState object (default an empty TraceState).
  • Modify the current propagation handlers to support the TraceState. Some languages need to add a new implementation for the w3c standard some may have to modify the current implementation.
  • Add a hook mechanism that allows vendor specific code to be called when creating a new Span object.

@wu-sheng
Copy link

wu-sheng commented Jul 9, 2018

Add a hook mechanism that allows vendor specific code to be called when creating a new Span object.

I think maybe just hook on creating maybe not enough. At the stage of creating, only can get a span name, right? Some propagation context is in annotation or attribute. So, may need hook on that too.

@lmolkova
Copy link

lmolkova commented Jul 9, 2018

@bogdandrutu welcome back!

Not 100% I understand the discussion about sub-class approach vs composition.

This is the discussion around hook implementation: how does OpenCensus notify vendor about new span and how does vendor subscribe.

It could be done through sub-class approach where vendor implements something like ITraceStatePropagation interface and in runtime OC finds the implementation.

However this approach does not provide enough flexibility to support multiple vendors/extensions, so some kind of observer pattern looks more suitable.

@lmolkova
Copy link

lmolkova commented Jul 9, 2018

@wu-sheng

I think maybe just hook on creating maybe not enough. At the stage of creating, only can get a span name, right? Some propagation context is in annotation or attribute. So, may need hook on that too.

Right, when the hook is called it should pass parent tracestate. Perhaps if we're going to support Correlation-Context at some point, it should also pass Tags.

Also, the tracestate is not particularly about propagation across the wire: it is also about in-proc auto-propagation. So adding arbitrary context through the annotations/attributes seems to be quite useful, but tracestate should be propagated automatically, where possible.

@wu-sheng
Copy link

wu-sheng commented Jul 9, 2018

Also, the tracestate is not particularly about propagation across the wire: it is also about in-proc auto-propagation. So adding arbitrary context through the annotations/attributes seems to be quite useful, but tracestate should be propagated automatically, where possible.

Yes. I also want to say tracestate in W3C and auto-propagation are not totally the same thing. We have some similar usage. So, I hope the OC-team could think about providing auto-propagation context with the extendable mechanism, then consider tracestate in W3C is an extendable implementation.

@songy23
Copy link
Contributor

songy23 commented Apr 25, 2019

Already fixed.

@songy23 songy23 closed this as completed Apr 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants