[PROPOSAL] First-class distributed tracing #650

mthmulders · 2021-11-09T09:04:46Z

Over the past weeks, I've been struggling to get distributed tracing to work.

My observation is that a Java application that uses the Dapr SDK for Java does not propagate the traceparent HTTP-header from a call it receives from the sidecar back to the sidecar (e.g. one that does invokeMethod or publishEvent).

Looking at the code, I think that if the sidecar would use gRPC to invoke the Java application, it wouldn't work either - the grpc-trace-bin, traceparent or tracestate headers are propagated but read from an empty Context, it seems. But I haven't been able to let the sidecar talk gRPC to my Java apps, so I'm not 100% sure about this.

Nevertheless, in a platform that wants to simplify distributed systems, I think tracing is an essential thing. I'd love to see the Dapr SDK for Java make it as easy as possible to leverage what Dapr has to offer in this field.

Describe the proposal

Provide the Dapr client code (at least DaprClientBuilder, DaprHttpBuilder, DaprClientHttp and DaprClientGrpc classes, maybe more) with a "strategy" to get a traceparent header value. Use that strategy to enrich calls from the application to the sidecar.

The default implementation should return an empty value. A user can supply their own implementation based on the frameworks they are using. For instance, I could envision an implementation that uses OpenTelemetry's Span class to find the correct value for the traceparent header.

The text was updated successfully, but these errors were encountered:

artursouza · 2021-11-12T22:37:48Z

I am open to make this integration as easy as possible. Have you checked the current example showing how to accomplish this with the current version of the SDK? https://github.com/dapr/java-sdk/tree/master/examples/src/main/java/io/dapr/examples/tracing

I would love to get more details on what "easy" would look like. I have a feeling it will fall into the spring boot part of the SDK since that is where we can inject more opinions on how apps should use Dapr.

lullen · 2021-11-15T07:04:42Z

I do not think we should put too much into the spring boot part as many use alternatives to spring boot or even grpc. I think the spring boot would just apply the last glue code but we should make it easier to do this manually for everyone.

What would be easier? (without much thought of consequences)
One option would be to add helper methods to DaprClientBuilder like new DaprClientBuilder().withSpan("Method name").build() to start new and new DaprClientBuilder().appendSpan(traceParams).build() to continue an existing trace.
The DaprClient shouldn't need to call client.subscriberContext(getReactorContext()) as the client then already knows about the tracing.

mthmulders · 2021-11-17T18:53:45Z

I do not think we should put too much into the spring boot part as many use alternatives to spring boot or even grpc. I think the spring boot would just apply the last glue code but we should make it easier to do this manually for everyone.

Totally agree! In the Java world, Spring is one of many choices, and I don't think Dapr should "push" developers to use one framework or another.

The DaprClient shouldn't need to call client.subscriberContext(getReactorContext()) as the client then already knows about the tracing.

That sounds like a good thing. Manually passing on / creating tracing contexts is error prone and could easily be overseen. I was thinking maybe we could even have a "tracing context enricher" or something similar. That would be a strategy that the Dapr client could invoke to enrich the Reactor context with the necessary tracing data. Ideally, we could pass that when we create the Dapr client. This approach would be somehow similar to the DaprObjectSerializer, which provides the Dapr client with a strategy for serialising application objects.

The only thing I am not sure of yet, is whether a single object could fulfill that role. Maybe the tracing context depends per situation where you want to use the DaprClient.

Another option would be to add an extra overload to all the methods in the DaprClient but I feel that's not the right way to go. Every method already has quite a few overloads, making it a bit of a puzzle for users which one they need. Adding an overload to pass a tracing context will make that even harder.

lullen · 2021-11-18T07:16:27Z

I don't think Dapr should "push" developers to use one framework or another.

Just to clarify, I am perfectly fine with dapr pushing people to use spring as it is the most common framework but it is important that spring isn't the only framework available.

Having a tracing context enricher sounds like a good way forward. Adding an extra overload will make the api harder to use, as you said, so I am not in favor of doing that.

tracing context depends per situation where you want to use the DaprClient

Not sure I follow here. When wouldn't you want the full trace? Or isn't that what you are suggesting?

mthmulders · 2021-11-19T10:56:58Z

Not sure I follow here. When wouldn't you want the full trace? Or isn't that what you are suggesting?

I was thinking about a situation where you would invoke the DaprClient while starting a new span within the current trace. In that case, a global "tracing context enricher" would not be sufficient, as the invocation of the DaprClient was supposed to start a new span.

However, I think we shouldn't allow for that. Starting a span is specific to which tracing framework you use (e.g. OpenTelemetry, or Spring Sleuth, or ....) and just like we don't want to "push" people towards Spring, I think we shouldn't push them to a particular tracing framework either.

Hopefully, that would allow us to treat the "tracing context enricher" as a singleton. And as a consequence, we could then configure the DaprClient with it, rather than configure each call with it.

I hope to experiment with Spring Sleuth & Dapr soon. If I get that to work in a similar way as I did with OpenTelemetry, I want to revisit the ideas so far and see if this approach is still feasible.

mthmulders · 2021-11-21T13:48:49Z

@artursouza, this issue was supposed to be a proposal, not a question. The question is in #647 (which is now answered). Could you please label this issue as "proposal" instead?

mthmulders · 2021-11-23T20:12:03Z

Good news - I was able to get distributed tracing to work with Dapr & Spring Sleuth, too. Even better, I can make that fit in the same structure as the one that is used with OpenTelemetry: a Function<reactor.util.context.Context, reactor.util.context.Context>.

Since Sleuth auto-populates the Reactor Context with the Sleuth TraceContext, the "tracing context enricher" could be a singleton - it's logic is completely implemented without instance variables.

For OpenTracing, the "enricher" does need an instance variable (see this example) but I think that's not a big of a problem.

Aside

The Sleuth integration works when the app communicaties with the sidecar over gRPC, but does not when they use HTTP. Sleuth puts a few more entries in the React Context and some of their names contain a space, e.g. "interface org.springframework.cloud.sleuth.Tracer". Even when you remove those entries in a Reactor.contextWrite(.....), they pop back up when the Reactor Context is used to write HTTP headers; and HTTP header names do not allow spaces. I think the fact that you can't remove something from the context is a bug, but I need to dive into that further.

So, to get back to the original proposal:

Describe the proposal

Provide the Dapr client with a "strategy" to enrich the Reactor Context with tracing information. The DaprClientGrpc and DaprClientHttp classes will invoke that strategy to enrich calls from the application to the sidecar with a tracing context.

The contract would at the very minimum look like this:

public interface DaprTelemetryInjector
    extends java.util.function.Function<reactor.util.context.Context, reactor.util.context.Context> {
}

In order to push implementors to providing useful values, we could make it a bit more expressive by providing a default implementation, and delegating to unimplemented values:

public interface DaprTelemetryInjector
    extends java.util.function.Function<reactor.util.context.Context, reactor.util.context.Context> {

    @Override
    public Context apply(final reactor.util.context.Context context) {
        // enrich the incoming context with the result of calculateTraceState, calculateTraceParent
        // provided that they do not return null.
    }

    String calculateTraceState(final reactor.util.context.Context);
    String calculateTraceParent(final reactor.util.context.Context);
}

The default implementation could return an empty value. The Dapr SDK for Java could provide out-of-the box implementations for OpenTracing and Sleuth, two popular tracing libraries. A user could of course also supply their own implementation based on the frameworks they are using.

Because those implementations depend on external libraries, we will introduce "optional" dependencies to both the OpenTracing and the Sleuth API's.

artursouza · 2021-12-20T19:40:13Z

Good news - I was able to get distributed tracing to work with Dapr & Spring Sleuth, too. Even better, I can make that fit in the same structure as the one that is used with OpenTelemetry: a Function<reactor.util.context.Context, reactor.util.context.Context>.

Since Sleuth auto-populates the Reactor Context with the Sleuth TraceContext, the "tracing context enricher" could be a singleton - it's logic is completely implemented without instance variables.

For OpenTracing, the "enricher" does need an instance variable (see this example) but I think that's not a big of a problem.

Aside

The Sleuth integration works when the app communicaties with the sidecar over gRPC, but does not when they use HTTP. Sleuth puts a few more entries in the React Context and some of their names contain a space, e.g. "interface org.springframework.cloud.sleuth.Tracer". Even when you remove those entries in a Reactor.contextWrite(.....), they pop back up when the Reactor Context is used to write HTTP headers; and HTTP header names do not allow spaces. I think the fact that you can't remove something from the context is a bug, but I need to dive into that further.

So, to get back to the original proposal:

Describe the proposal

Provide the Dapr client with a "strategy" to enrich the Reactor Context with tracing information. The DaprClientGrpc and DaprClientHttp classes will invoke that strategy to enrich calls from the application to the sidecar with a tracing context.

The contract would at the very minimum look like this:
public interface DaprTelemetryInjector
    extends java.util.function.Function<reactor.util.context.Context, reactor.util.context.Context> {
}
In order to push implementors to providing useful values, we could make it a bit more expressive by providing a default implementation, and delegating to unimplemented values:
public interface DaprTelemetryInjector
    extends java.util.function.Function<reactor.util.context.Context, reactor.util.context.Context> {

    @Override
    public Context apply(final reactor.util.context.Context context) {
        // enrich the incoming context with the result of calculateTraceState, calculateTraceParent
        // provided that they do not return null.
    }

    String calculateTraceState(final reactor.util.context.Context);
    String calculateTraceParent(final reactor.util.context.Context);
}
The default implementation could return an empty value. The Dapr SDK for Java could provide out-of-the box implementations for OpenTracing and Sleuth, two popular tracing libraries. A user could of course also supply their own implementation based on the frameworks they are using.

Because those implementations depend on external libraries, we will introduce "optional" dependencies to both the OpenTracing and the Sleuth API's.

This is an interesting proposal. Will any of the implementations have any Dapr specific logic? We might be OK to make Dapr SDK offer the interface but not offer any implementation - except in examples. This was the original intent for the DaprObjectSerializer interface but we ended up adding a default implementation - which I regret at this point because people tend to depend on that and our code had special cases for the default implementation.

We also want to minimize dependencies, so adding implementations means we need to create new artifacts to publish to Maven, one per framework. This way, users can opt-in to our implementation for each one - but I would like to avoid going down that path initially simply because of the maintenance effort to keep each one up to date and deal with breaking changes in dependencies.

wmeints · 2022-04-19T12:07:11Z

I had a chat with @mthmulders about the state of distributed tracing and he referred me to this thread. Looking at the proposal, I think it's nice to have the interface. We can provide the implementation as an example. I personally prefer a nice blogpost or a repo on Github but I'm open to other suggestions in this regard.

mthmulders · 2022-04-19T12:25:41Z

Will any of the implementations have any Dapr specific logic?

I don't think so. My expectation is that it's more a matter of wiring the particular tracing library (Sleuth, OpenTracing, etc.) into the Dapr SDK.

We also want to minimize dependencies, so adding implementations means we need to create new artifacts to publish to Maven, one per framework. This way, users can opt-in to our implementation for each one - but I would like to avoid going down that path initially simply because of the maintenance effort to keep each one up to date and deal with breaking changes in dependencies.

Minimising dependencies absolutely makes sense to me, and I think unless there's a lot of demand from the community, we shouldn't be providing our own implementations at this point. We could deliver sample implementations, or, as @wmeints suggests, a blog post that explains how to do it.

My main doubt (still!) with the idea is whether it's actually going to work. I simply don't know for sure whether we will be able to guarantee that the Reactor Context we are passing to the interface impl. is going to be the correct one. I lack some in-depth knowledge about Reactor to be sure about it. The implementations will need to be 100% thread-safe, that's for sure. Face it: if we'd pass the wrong Reactor Context, we'd be more wrong than the current situation, which simply doesn't pass any Context.

artursouza added the question Further information is requested label Nov 12, 2021

mthmulders mentioned this issue Nov 16, 2021

Distributed tracing doesn't work #647

Closed

artursouza added kind/enhancement New feature or request and removed question Further information is requested labels Dec 20, 2021

artursouza changed the title ~~First-class distributed tracing~~ [PROPOSAL] First-class distributed tracing Dec 20, 2021

mthmulders mentioned this issue Jun 6, 2022

Java track lacks distributed tracing EdwinVW/dapr-workshop#51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] First-class distributed tracing #650

[PROPOSAL] First-class distributed tracing #650

mthmulders commented Nov 9, 2021

artursouza commented Nov 12, 2021

lullen commented Nov 15, 2021

mthmulders commented Nov 17, 2021

lullen commented Nov 18, 2021 •

edited

mthmulders commented Nov 19, 2021

mthmulders commented Nov 21, 2021

mthmulders commented Nov 23, 2021

artursouza commented Dec 20, 2021

Aside

Describe the proposal

wmeints commented Apr 19, 2022

mthmulders commented Apr 19, 2022

[PROPOSAL] First-class distributed tracing #650

[PROPOSAL] First-class distributed tracing #650

Comments

mthmulders commented Nov 9, 2021

Describe the proposal

artursouza commented Nov 12, 2021

lullen commented Nov 15, 2021

mthmulders commented Nov 17, 2021

lullen commented Nov 18, 2021 • edited

mthmulders commented Nov 19, 2021

mthmulders commented Nov 21, 2021

mthmulders commented Nov 23, 2021

Aside

Describe the proposal

artursouza commented Dec 20, 2021

Aside

Describe the proposal

wmeints commented Apr 19, 2022

mthmulders commented Apr 19, 2022

lullen commented Nov 18, 2021 •

edited