Initiate Coordinator Reconnect w/ Backoff from Heartbeat Thread #2695

dpkp · 2025-11-20T19:33:48Z

Attempt to fix issue raised in #2672 and possibly also #2667

If the coordinator node becomes unresponsive it can cause the consumer to busy-loop until the next attempt to reconnect to the coordinator. This happens when there is no pending request to the coordinator node itself (either a heartbeat or commit) when the broker connection fails. If the consumer is configured with auto-commits then the next commit attempt will see the coordinator failure, mark the coordinator dead, and the consumer loop will revert to waiting for the coordinator to return. But if auto-commits are disabled, the heartbeat thread will be in a limbo state where it thinks the coordinator is connecting and pauses heartbeat request. With neither a failed connection attempt or a failed heartbeat request, the coordinator is never marked dead and the consumer busy loops. To fix, this patch adds an explicit connection attempt to the heartbeat thread loop whenever the coordinator is known but disconnected. If the connection fails the client will mark the node as disconnected and add a backoff reconnect delay. The non-zero backoff delay causes the consumer to mark the coordinator dead, which prevents the busy loop.

dpkp added 2 commits November 20, 2025 10:32

Only use coordinator time_to_next_poll if group_id is set

cb5ed1d

Connect coordinator from heartbeat thread if needed

061659b

dpkp mentioned this pull request Nov 20, 2025

Add a check to ensure broker connection is ready during poll if auto commit is disabled #2672

Closed

dpkp merged commit 8d38aa7 into master Nov 20, 2025
18 checks passed

dpkp deleted the dpkp/coordinator-reconnect branch November 20, 2025 19:55

dpkp mentioned this pull request Nov 20, 2025

Consumer randomly stucks and timeouts when communicating with confluent cloud #2667

Closed

dpkp added a commit that referenced this pull request Nov 20, 2025

Initiate Coordinator Reconnect w/ Backoff from Heartbeat Thread (#2695)

3250395

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initiate Coordinator Reconnect w/ Backoff from Heartbeat Thread #2695

Initiate Coordinator Reconnect w/ Backoff from Heartbeat Thread #2695

Uh oh!

dpkp commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Initiate Coordinator Reconnect w/ Backoff from Heartbeat Thread #2695

Initiate Coordinator Reconnect w/ Backoff from Heartbeat Thread #2695

Uh oh!

Conversation

dpkp commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants