Skip to content

feat(consumer): warn on sustained partition retries#3535

Merged
dnwe merged 1 commit into
mainfrom
dnwe/stuck-consumer-warning
May 14, 2026
Merged

feat(consumer): warn on sustained partition retries#3535
dnwe merged 1 commit into
mainfrom
dnwe/stuck-consumer-warning

Conversation

@dnwe
Copy link
Copy Markdown
Collaborator

@dnwe dnwe commented May 13, 2026

After a broker disconnect a partition consumer can sit retrying indefinitely. Each attempt's error is logged but nothing tells the operator the same partition is still failing ten retries later, so it's hard to tell a transient blip from a stuck partition.

Bump child.retries on every backoff (it was only incremented when a BackoffFunc was configured) and log every 10 consecutive failures.

Also note on ConsumeClaim that handlers should return when the session context is done; reading Messages() alone hangs while a partition keeps retrying.


We've had a few users complain about scenarios like this "I get EOF errors and then my consumers go silent and stop consuming any message until I restart my whole app", so try and make this clearer when its happening

After a broker disconnect a partition consumer can sit retrying
indefinitely. Each attempt's error is logged but nothing tells the
operator the same partition is still failing ten retries later, so it's
hard to tell a transient blip from a stuck partition.

Bump child.retries on every backoff (it was only incremented when a
BackoffFunc was configured) and log every 10 consecutive failures.

Also note on ConsumeClaim that handlers should return when the session
context is done; reading Messages() alone hangs while a partition keeps
retrying.

Signed-off-by: Dominic Evans <dominic.evans@uk.ibm.com>
@dnwe dnwe added the feat label May 14, 2026
Copy link
Copy Markdown
Collaborator

@hindessm hindessm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnwe dnwe merged commit e76f927 into main May 14, 2026
23 of 24 checks passed
@dnwe dnwe deleted the dnwe/stuck-consumer-warning branch May 14, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants