Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix high cpu sasl loop after disconnect #2032

Conversation

stefanseufert
Copy link
Contributor

Should fix #1937

Copy link
Contributor

@edenhill edenhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Needs some minor tweaks

src/rdkafka_transport.c Outdated Show resolved Hide resolved
src/rdkafka_transport.c Outdated Show resolved Hide resolved
src/rdkafka_transport.c Outdated Show resolved Hide resolved
…L response

Currently disconnects went unnoticed and forced the client into a tight loop while waiting for an answer on a closed socket. This resulted in high CPU usage in the client and the inability to reconnect.
@stefanseufert stefanseufert force-pushed the FixHighCpuSaslLoopAfterDisconnect branch from 308b560 to 11ba647 Compare October 1, 2018 20:45
@stefanseufert
Copy link
Contributor Author

Rebased and squashed fixes into original commit as per contributing guidelines

@stefanseufert
Copy link
Contributor Author

stefanseufert commented Oct 2, 2018

@edenhill anything else you want me to do here? According to your comments in various threads, 11.6 seems to be close and I really hope to see this included.

@edenhill
Copy link
Contributor

edenhill commented Oct 2, 2018

0.11.6 would be a stretch, it is way past its deadline and needs to get out the door.
But we're doing 1.0.0 release candidates very soon, which would be a better fit for this.

@stefanseufert
Copy link
Contributor Author

stefanseufert commented Oct 2, 2018

This is a really nasty bug for everyone using SASL auth. A restart of any broker node will immediately make the clients go 100% CPU per thread. I've seen boxes logging CPU loads in excess of 350 due to this issue, making them completely inaccessible. Since this also affects producers it will result in message loss (on the consumer side it "only" prevents the affected threads from receiving messages). There are no known workarounds and there is no way to detect the problem in the calling code. As far as I can see all platforms are affected. Not sure how much worse a bug could be?

The only reason, why this hasn't caught much more attention seems to be the fact that auth isn't used that commonly.

The fix on the other hand seems trivial ans will only affect those people using auth.

... but it certainly is your call.

@edenhill
Copy link
Contributor

edenhill commented Oct 2, 2018

@stefanseufert Valid points, I'll get back to you on this tomorrow.

@edenhill edenhill merged commit ab071f3 into confluentinc:master Oct 3, 2018
@edenhill
Copy link
Contributor

edenhill commented Oct 3, 2018

Thanks alot for this!
(yes, it will go into 0.11.6 ;) )

@stefanseufert stefanseufert deleted the FixHighCpuSaslLoopAfterDisconnect branch October 3, 2018 07:37
edenhill pushed a commit that referenced this pull request Oct 4, 2018
…L response (#2032)

Currently disconnects went unnoticed and forced the client into a tight loop while waiting for an answer on a closed socket. This resulted in high CPU usage in the client and the inability to reconnect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SASL IO loop does not detect disconnects on Windows
2 participants