Instrumentation support #54

JHK · 2018-11-14T09:51:23Z

To do some deeper introspection on what is going on when receiving or publishing messages it would be useful to have an instrumentation interface compatible to Active Support Instrumentation, default might be just a NullInstrumenter which is just discarding information. To have an idea what might be actually useful to instrument be inspired by ruby-kafka:

message producing
message delivery
message polling
join/leave consumer group
(re-)assign partitions within consumer group
offset changes
consumer heartbeat
connection updates
probably more...

The text was updated successfully, but these errors were encountered:

thijsc · 2018-11-14T10:08:12Z

Thanks for asking. And also for using this gem in racecar! :-)

I have considered integrating AS instrumentation, but given the nature of the underlying C lib I don't see a way in which that approach works well. Did you see the statistics callback we added? #40

I just noticed the docs on rubydocs are not properly regenerated for some reason, so you might have missed that.

thijsc · 2018-11-14T10:11:23Z

Also some callbacks would definitively make sense to add, especially for partition assignment changes.

mensfeld · 2018-11-14T10:27:32Z

It would be really good if the instrumentation engine was not AS Notif based but rather AS Notif compatbile so other engines can be plugged in (like dry-monitor that we use in Karafka)

JHK · 2018-11-14T15:11:20Z

@mensfeld I updated the ticket description to be more clear to not rely on ActiveSupport, but rather use the same interface for instrumentation.

JHK · 2018-11-16T08:25:09Z

The statistics endpoint goes into the right direction, but is not what I meant with this issue. It is about being able to connect the instrumentation e.g. to the datadog agent to be able to introspect what happened on each and every request (that got recorded). There it is quite handy to know which branch the code took, how often and what time it took.

thijsc · 2018-11-16T14:11:43Z

I've been thinking about this quite a bit, especially since I work on a monitoring product all day.

The thing is that I'm not sure there actually is something to measure. Librdkafka does a lot of buffering in the background. Actually consuming a message from Ruby pops something of an internal buffer, which is always super fast. I think what you're talking about mainly happens inside librdkafka. The stats for that are present in the statistics callback.

Can you give an example of where you'd like to see hooks? What would these hooks really allow you to measure?

JHK · 2018-11-19T08:46:08Z

Looking at the instrumentation of ruby-kafka it provides a notification one can subscribe to whenever a message produce gets called. It provides some meta information (code).

This can then be used for example in the datadog-agent or (like in my case) to time_bandits to determine the call frequency per request or similar metrics.

thijsc · 2018-11-19T10:14:14Z

Right, I think I understand the use case better. You're not so much interested in the performance of the produce call. But you do want to get hooks and see the volume?

mensfeld · 2018-11-19T10:16:27Z

@thijsc I am interested in the produce performance. Having the instrumentation for it would allow also for the volume at least for DD using the increment over the messages sent to a particular topic.

thijsc · 2018-11-19T10:20:39Z

I am interested in the produce performance.

What do you see yourself measuring exactly?

mensfeld · 2018-11-19T10:21:49Z

What do you see yourself measuring exactly?

How many messages can I send per second depending on the ack level plus where do they go (to which topic).

mensfeld · 2019-08-08T13:11:43Z

@thijsc any reason for the statistics_callback to be global? What if I would want to have different callback handling in various consumers/producers?

thijsc · 2019-08-15T20:46:53Z

@thijsc any reason for the statistics_callback to be global? What if I would want to have different callback handling in various consumers/producers?

#82 was opened for this question.

thijsc · 2019-08-15T21:06:23Z

I'm trying to get this done, but not making a lot of progress because I don't have a clear picture in my mind what this looks like. I can see how events for assignment changes and so forth can work.

I can also see how emitting an event for producing a message could work. I don't see how emitting an event for a delivered message would be useful. AS notifications assumes that things happen in sync, that's not going to be the case here. I think you're going to get a lot of out of order events.

I also don't see how we can do hooks for message delivery. The C lib pops them of a buffer, so when they arrive on the Ruby side says little on how the network is doing for example. The stats in the statistics callback do tell us that. Maybe I'm missing a useful use case here?

I think we need to spend some time coming up with a spec of which events should be emitted and write up some use cases on how one would benefit from them. That'll make it a more manageable project to get this done.

@JHK and @mensfeld which events do you think should be emitted and could you write up a short description of when they would trigger and which information they would emit?

JHK · 2019-08-19T14:18:27Z

I cannot say what exactly needs to be in such a message, but rather have a look at what racecar already provides:

Producing a message: https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/producer.rb#L220-L228
Message delivery: https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/producer.rb#L246-L257
Error on a Topic: https://github.com/zendesk/ruby-kafka/blob/master/lib/kafka/producer.rb#L467-L470

Those are instrumentations built from the need to measure details within racecar. The statistics callback already provides a lot of those infos, but not the hook itself. So I'd suggest to include what makes sense to you in that hook. If one needs more, then we can still extend using individual PRs. But the general idea of hooks is present by then and the parameters can then be discussed on a case by case basis.

dasch · 2019-08-23T09:10:27Z

We have a pretty clear need to measure then number of successful / failed message deliveries per producer process.

mensfeld · 2019-08-23T11:05:36Z

@dasch but you can do that yourself now: https://github.com/karafka/waterdrop/pull/106/files#diff-d179c7dee2064c1622d2d3da2b03c44dR32

thijsc · 2019-09-18T06:53:02Z

Thanks all for the input! I'm going to work on it.

emersonpriceiv · 2021-06-04T20:47:20Z

Hello! I'm curious what became of this work. We're currently going through the process of updating Racecar and we've been leveraging the consumer heartbeat instrumentation for monitoring our consumer health. Are there any plans to implement something similar? If not we would love to see it!

mensfeld · 2021-06-05T08:07:37Z

@emersonpriceiv the current API allows you to do that. Please see the PR above for waterdrop where there's a full instrumentation support.

thijsc · 2023-04-24T19:26:03Z

Closing this one. I think it's not clear how we can improve on rdkafka's internal capabilities.

JHK mentioned this issue Nov 14, 2018

Change ruby-kafka to rdkafka-ruby zendesk/racecar#97

Merged

4 tasks

This was referenced Nov 30, 2018

Consumer rebalance callback #63

Closed

Message delivery callback #64

Closed

mensfeld mentioned this issue Aug 8, 2019

Migrate from ruby-kafka to rdkafka-ruby karafka/waterdrop#75

Closed

thijsc added the enhancement label Aug 15, 2019

thijsc added this to the Feature complete milestone Aug 15, 2019

thijsc self-assigned this Aug 15, 2019

fallwith mentioned this issue Feb 3, 2023

Instrument Kafka client calls made with rdkafka-ruby newrelic/newrelic-ruby-agent#1758

Open

thijsc closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrumentation support #54

Instrumentation support #54

JHK commented Nov 14, 2018 •

edited

Loading

thijsc commented Nov 14, 2018

thijsc commented Nov 14, 2018

mensfeld commented Nov 14, 2018

JHK commented Nov 14, 2018

JHK commented Nov 16, 2018

thijsc commented Nov 16, 2018

JHK commented Nov 19, 2018

thijsc commented Nov 19, 2018

mensfeld commented Nov 19, 2018

thijsc commented Nov 19, 2018

mensfeld commented Nov 19, 2018

mensfeld commented Aug 8, 2019

thijsc commented Aug 15, 2019

thijsc commented Aug 15, 2019

JHK commented Aug 19, 2019

dasch commented Aug 23, 2019

mensfeld commented Aug 23, 2019

thijsc commented Sep 18, 2019

emersonpriceiv commented Jun 4, 2021

mensfeld commented Jun 5, 2021

thijsc commented Apr 24, 2023

Instrumentation support #54

Instrumentation support #54

Comments

JHK commented Nov 14, 2018 • edited Loading

thijsc commented Nov 14, 2018

thijsc commented Nov 14, 2018

mensfeld commented Nov 14, 2018

JHK commented Nov 14, 2018

JHK commented Nov 16, 2018

thijsc commented Nov 16, 2018

JHK commented Nov 19, 2018

thijsc commented Nov 19, 2018

mensfeld commented Nov 19, 2018

thijsc commented Nov 19, 2018

mensfeld commented Nov 19, 2018

mensfeld commented Aug 8, 2019

thijsc commented Aug 15, 2019

thijsc commented Aug 15, 2019

JHK commented Aug 19, 2019

dasch commented Aug 23, 2019

mensfeld commented Aug 23, 2019

thijsc commented Sep 18, 2019

emersonpriceiv commented Jun 4, 2021

mensfeld commented Jun 5, 2021

thijsc commented Apr 24, 2023

JHK commented Nov 14, 2018 •

edited

Loading