rebalance.timeout.ms support (KIP-62) #1039

jeffwidman · 2017-02-07T19:45:44Z

Does librdkafka support heartbeats in a background thread? (KIP-62)

Trying to minimize risk of a spinning consumer group if a message unexpectedly takes too long to process.

This landed in Kafka 0.10.1.0 as it required a protocol change to pass the rebalance timeout around.

edenhill · 2017-02-07T20:04:45Z

Yes, librdkafka does all control plane stuff in the background and the application doesn't need to worry.
But the app should try to limit its per-message processing time under session.timeout.ms, otherwise if a rebalance happens while processing, another consumer might pick up the same message (depending on commit policy)

jeffwidman · 2017-02-07T21:57:20Z

Yes, librdkafka does all control plane stuff in the background and the application does need to worry.

Did you mean "doesn't" need to worry?

But the app should try to limit its per-message processing time under session.timeout.ms, otherwise if a rebalance happens while processing, another consumer might pick up the same message

Hmm... According to KIP-62, it looks like the rebalance timeout is actually the new limit for per-message processing time. session.timeout.ms can be set much lower in the new design because it's a background heartbeat for catching crashed consumers, and it's fine if per-message processing takes longer than session.timeout.ms.

Am I misreading the KIP?

edenhill · 2017-02-21T18:42:26Z

Did you mean "doesn't" need to worry?

Yes :)

You are right about KIP-62, so while librdkafka performs heartbeats in the background - which solves the initial problem - it does not yet support KIP-62 protocol changes - the rebalance timeout / max processing time.
So people with long message processing will still need to use a high and non-responsive session.timeout.ms

jeffwidman · 2017-02-21T20:55:26Z

Thanks for the update. Looking forward to when support for KIP-62 / rebalance timeout is added.

pablasso · 2017-10-02T23:59:26Z

@edenhill I'm interested in tackling the implementation of KIP-62 but it will be a bit of a challenge without context.

Could you give me some pointers on what/where needs to be changed? Any tips on how to test this would be greatly appreciated.

edenhill · 2017-11-09T16:48:17Z

Since librdkafka already has a background thread (or a bunch) that takes care of all the actual broker communication, including heartbeats, there are only a couple of things that needs to be done in to support KIP-62:

Add max.poll.interval.ms config property. Trivial.
Send rebalanceTimeoutMs in JoinGroupRequest v1. The value used is max.poll.interval.ms. Trivial.
Enforce max.poll.interval.ms, this is not as straight forward as in Java which only has a single poll() call. librdkafka has a multiple APIs to poll for messages (for different use cases) and they can be used simultaneously from different threads, so it is not really clear if a max poll is a poll from any user thread, or all. Also, for bindings like confluent-kafka-go that pulls messages from librdkafka and puts in a buffered Go channel (where they may reside for some time without the app processing), should we really use the time the messages were pulled from librdkafka, or the time the messages were handed to the application? (this is analogue to the auto offset store problem).

edenhill · 2018-10-13T23:43:02Z

This is scheduled for v1.0.0

Changed defaults: * session.timeout.ms = 10000

edenhill · 2018-11-29T15:47:28Z

Now on master

edenhill added the question label Feb 7, 2017

edenhill added conformance enhancement and removed question labels Feb 21, 2017

edenhill changed the title ~~Does librdkafka support heartbeats in a background thread? (KIP-62)~~ rebalance.timeout.ms support (KIP-62) Mar 8, 2017

edenhill added this to the 0.9.5 milestone Mar 8, 2017

edenhill removed this from the next feature milestone May 18, 2017

edenhill mentioned this issue Aug 17, 2017

What's mean about "1 request(s) timed out: disconnect (average rtt 0.207ms)"? #1374

Closed

edenhill mentioned this issue May 23, 2018

Question: Consumer group behavior related to session.timeout.ms and offset store API #1817

Closed

7 tasks

edenhill added a commit that referenced this issue Oct 13, 2018

Added max.poll.interval.ms (KIP-62, #1039)

026bef0

Changed defaults: * session.timeout.ms = 10000

edenhill added a commit that referenced this issue Oct 22, 2018

Added max.poll.interval.ms (KIP-62, #1039)

0a981c5

Changed defaults: * session.timeout.ms = 10000

edenhill added a commit that referenced this issue Oct 22, 2018

Added max.poll.interval.ms (KIP-62, #1039)

2c5c620

Changed defaults: * session.timeout.ms = 10000

edenhill closed this as completed Nov 29, 2018

nick-zh mentioned this issue Dec 11, 2020

Does php-rdkafka support heartbeats in a background thread? (KIP-62) arnaud-lb/php-rdkafka#408

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rebalance.timeout.ms support (KIP-62) #1039

rebalance.timeout.ms support (KIP-62) #1039

jeffwidman commented Feb 7, 2017 •

edited

edenhill commented Feb 7, 2017 •

edited

jeffwidman commented Feb 7, 2017

edenhill commented Feb 21, 2017

jeffwidman commented Feb 21, 2017

pablasso commented Oct 2, 2017 •

edited

edenhill commented Nov 9, 2017

edenhill commented Oct 13, 2018

edenhill commented Nov 29, 2018

rebalance.timeout.ms support (KIP-62) #1039

rebalance.timeout.ms support (KIP-62) #1039

Comments

jeffwidman commented Feb 7, 2017 • edited

edenhill commented Feb 7, 2017 • edited

jeffwidman commented Feb 7, 2017

edenhill commented Feb 21, 2017

jeffwidman commented Feb 21, 2017

pablasso commented Oct 2, 2017 • edited

edenhill commented Nov 9, 2017

edenhill commented Oct 13, 2018

edenhill commented Nov 29, 2018

jeffwidman commented Feb 7, 2017 •

edited

edenhill commented Feb 7, 2017 •

edited

pablasso commented Oct 2, 2017 •

edited