-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consumer overruns its session timeout, drops out of the group and never rejoins--stuck in zombie state. #1435
Comments
Can you enable debug logs and see if anything sticks out ? |
Working on getting that turned on fully now. At the same time trying to replicate in a development environment. |
@dpkp Looking into if I have something misconfigured or not here. Look at kafka-python source now but appears that I am reaching I don't have full depth on this check yet though so I am just speculating.
|
^ Forgot to mention that the remaining logs are just some metadata requests and then idle connection cleanup. |
Trying to separate our own implementation details and kafka-python.... We built a wrapper our kafka-python to standardized how developers interact with the library. Historically we ran consumers by running I am going to try and sanitize the logs before the snippet I provided. So far I am seeing proper fetching and committing of offsets until the very end where I see no new work but I am not seeing a reason why yet. |
This log It is also possible that there is a bug in the kafka-python implementation that is triggering a heartbeat poll expiration too early. But I haven't noticed anything yet myself. |
I setup a test outside of our production use for the past week and was not able to replicate the problem. I am going to try and get proper logging running in our production environment and I might be able to at least narrow this down between a code issue between our wrapper or something happening in kafka-python. @dpkp Even if we are not keeping up with |
This ticket was getting difficult to follow with people adding comments about issues that appeared unrelated (including several that were clearly user error), so I deleted some of those comments. Please file a new bug unless you are quite sure you are experiencing this particular bug. Other known sources of stuckness in As best I can tell, this particular bug is different than both of those. |
Try setting |
We just upgraded to 1.4.1 and I am noticing new behavior. It looks like a consumer is dropping out of the group and not automatically rejoining.
In our logs the only similarity I have noticed is the following logs. Could be unrelated.
After that message the only other logs I see are idle connections getting closed.
The service we run via supervisor just sits there and the consumers are no longer part of the group.
I am still digging through the behavior of kafka-python to understand why we are dropping out of the consumer group. I am going to poke around on the assumption that the worker is leaving the group and not rejoining. Will update when I see something.
This may be related to #1418 but not sure based on the users input.
The text was updated successfully, but these errors were encountered: