-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed race condition on client reconnection logic #221
Conversation
@saandrews @rdhabalia We should consider to backport this to 1.16 branch as well. No need to release immediately 1.16.1, but to keep it there ready. |
👍 |
cleanupConnection(address, connectionKey, cnxFuture); | ||
}); | ||
|
||
// We are connected to broker, but need to wait until the connect/connected handshake is | ||
// complete | ||
final ClientCnx cnx = (ClientCnx) future.channel().pipeline().get("handler"); | ||
if (!future.channel().isActive() || cnx == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future.isSuccess() = true implies that future.channel() is active. That's guaranteed right? It should be but just wanted to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the connection was actually successfully open, thus the future itself is completed, but then immediately closed by the broker (that's what the test is reproducing).
So the future still stays set to success (future cannot change their status once they flip), but the isActive() now returns false, and the "handler" is also removed from the channel pipeline, causing the NPE when accessing cnx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Yes, we will create a branch and will add this patch and we can wait for sometime before releasing if we want to include other patches as well. |
Merged in branch-1.16 at 956430d |
…che#221) * Auto update the client to handle changes in number of partitions * Fixed linter stuff * Fixed locking of producer/consumer list when updating partitions * Keep the lock during the partition update operation * Removed empty line * Fixed locking in producer.send()
Motivation
There is a race condition in the reconnection logic that would make throwing an uncatched
NullPointerException
if the connection is being closed and cleaned up at a very specific point in the connection phase.Fixes #207
Modifications
Double check the connection state when reacting on new connection being opened in the connection pool.