Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed tracing story with Confluent Kafka client #1269

Closed
CodeBlanch opened this issue May 1, 2020 · 24 comments
Closed

Distributed tracing story with Confluent Kafka client #1269

CodeBlanch opened this issue May 1, 2020 · 24 comments

Comments

@CodeBlanch
Copy link

Greetings!

I'm using OpenTelemetry to instrument an application, primarily to watch the performance of all its dependencies and find bottlenecks. OpenTelemetry is automatically capturing Http requests coming into my system and then tracing outgoing Http, SQL, & Redis calls made while doing the processing. All of that data is exported into one of the many tools OpenTelemetry supports for visualization. What I would like to do is mod the Confluent Kafka client to also participate in that process.

The easiest way to do that would be to use DiagnosticSource to allow other code in the process to subscribe to events that are fired as messages are produced or consumed. DiagnosticSource only fires when there is a subscriber so the cost to existing users not using the events would be ~0.

If we are open to this, I can take a stab at a PR, documentation, etc., but I wanted to check first.

/cc @SergeyKanzhelev @MikeGoldsmith @cijothomas

@SergeyKanzhelev
Copy link

CC: @davidfowl and @tarekgh

@mhowlett
Copy link
Contributor

mhowlett commented May 5, 2020

librdkafka, the native library on which Confluent.Kafka is based, implements KIP-42, which is an interceptor API for the producer and consumer. This is the natural place to add this I think - and you can do it completely independently of this project (but let us know!), but you'd need to write it in C. As an added bonus, this interceptor would work with all librdkafka bindings, including the confluent python and go clients.

Supporting DiagnosticSource may be an option, I haven't thought about it. If the change is small and has no performance impact I'm open to looking at it.

@SergeyKanzhelev
Copy link

One problem of implementing logic in native library is the need to exchange some information from managed code to the native library. Starting with TraceId and SpanId of a currently running Span, following the currently configured sampling rate, exporter and span processors. From what I seen before it's often much harder to set up this managed to native and back communication than implement a language specific solutions.

What level of details will be lost if instrumentation will be based on pure managed code using DiagnosticSource?

@mhowlett
Copy link
Contributor

mhowlett commented May 6, 2020

ahh right, yes, completely agree.

I don't see any details that would be lost, based on information presented in the kip.

@CodeBlanch
Copy link
Author

I'll take a stab at what DiagnosticSource integration might look like if no one objects. But I have a bit of a backlog so it might not be for a few days. If anyone else wants to jump on it, by all means :)

@mhowlett
Copy link
Contributor

mhowlett commented May 6, 2020

thanks! yes, backlog at this end too. generally, straightforward changes with no controversy are easy to get merged, but changes beyond that are competing with prioritized tasks for time.

@CodeBlanch
Copy link
Author

@mhowlett The one big challenge I can foresee is the trace context. When producing a message, we would spin up an Activity. When consuming a message, we need to spin up an Activity with the same context that the producer had. In order to pull that off, we need to persist some data on the message. W3C Trace Context which basically comes down to 2 strings: traceparent & tracestate. So I'm thinking we would have to write those on the message header. Only if someone is subscribed to the DiagnosticSource, which is hinting they want the telemetry. How do you feel about that?

@SergeyKanzhelev please correct me if this is the wrong approach to take.

@mhowlett
Copy link
Contributor

mhowlett commented May 6, 2020

it sounds like something that requires some thought. my instinct is that we don't want to hard code the message header decisions in the client - we probably want a general plugin API for it, and probably configuration via a builder class method (+ a W3C Trace Context impl. provided in-the-box). this capability could also be built in a class that wraps the producer / consumer of course. but it does seem generally useful enough that we'd like it in the client.

note: there is some chance we'll have some bandwidth to think about this in September, we have some related work planned I believe.

@SergeyKanzhelev
Copy link

@CodeBlanch yes, just store two strings in metadata would be enough. We started working on AMQP in W3C group: https://github.com/w3c/trace-context-amqp but it is only in proposal state now. I will try to push it forward and perhaps we can validate how it works with kafka. (I believe kafka is almost AMQP from the message structure view, right?)

@CodeBlanch
Copy link
Author

@mhowlett I threw up PR #1278 with the Producer side of things so you could check it out. DiagnosticSource generally implements very cleanly, meaning it doesn't add a whole lot of ceremony on top of 'real work' code.

Right now I have the traceparent & tracestate headers hard-coded. They are only added/sent when there is someone subscribed to the DiagnosticSource. But it seems perfectly reasonable to make them configurable, happy to do that if you want. I was thinking maybe a prefix for "system" properties might be nice? I know on the Azure ServiceBus there are two "headers"-esque props on the message, SystemProperties & UserProperties, which lend themselves nicely to something like this IMO.

@EricStG
Copy link

EricStG commented Mar 25, 2021

FYI, there's an experimental page in the OpenTelemetry specification for conventions to use with messaging
There's also some useful bits in the OpenTelemetry .Net library for attaching and propagating tracing context and baggage across activities.

@ElectricVampire
Copy link

@CodeBlanch Is your PR still under consideration for merging.

@CodeBlanch
Copy link
Author

@ElectricVampire I just closed it. Better to do it using ActivitySource now. That was released with .NET 5, but is actually supported all the way back to .NET 4.5 using the NuGet: https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/.

@eshepelyuk
Copy link

eshepelyuk commented May 11, 2021

Hi all,
So, will it be there any efforts to provide instrumentation for Confluent.Kafka ?
In fact it's relatively simple to implement it manually using OpenTelementry Tracer class, but it would be definately better to have it out of the box.

Also, maybe it worth to contribute the code to OpenTelemetry project ?
i.e. https://github.com/open-telemetry/opentelemetry-dotnet-contrib

@MoienTajik
Copy link

Are there any plans to implement this feature?

@chertby
Copy link

chertby commented Nov 11, 2021

Maybe there are assumptions about when this feature can be implemented?

@mhowlett

@wajika
Copy link

wajika commented Apr 1, 2022

I am also interested in this piece of content, if anyone has submitted any PR, please submit a reply.

@vpalukuri05
Copy link

Please let me know once this feature is implemented.

@fedeoliv
Copy link

Hey everyone, JFYI I created a PR #1838 with a proposal for the Producer instrumentation.

@inech
Copy link

inech commented Jul 14, 2023

Hello,
It seems that @fedeoliv's PR has been abandoned. Are there plans to add this feature? In our company, different teams are creating their own extensions, but it would be much better to use the standard approach.

Thanks in advance!

@anchitj
Copy link
Member

anchitj commented May 28, 2024

We are working on KIP-714, which can be used to export various client metrics with Opentelemetry. confluentinc/librdkafka#4721

@anchitj anchitj closed this as completed May 28, 2024
@swythan
Copy link

swythan commented May 29, 2024

We are working on KIP-714, which can be used to export various client metrics with Opentelemetry. confluentinc/librdkafka#4721

KIP-714 looks interesting, but explicitly doesn't cover tracing. Will the implementation here have tracing as well?

@emasab
Copy link
Contributor

emasab commented May 29, 2024

@swythan tracing isn't covered by the KIP, only metrics. There is no effort to support tracing at the moment.

@swythan
Copy link

swythan commented May 29, 2024

Isn't tracing what this ticket is for, though? Shouldn't it be left open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests