Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed race condition on client reconnection logic #221

Merged
merged 1 commit into from
Feb 17, 2017

Conversation

merlimat
Copy link
Contributor

@merlimat merlimat commented Feb 17, 2017

Motivation

There is a race condition in the reconnection logic that would make throwing an uncatched NullPointerException if the connection is being closed and cleaned up at a very specific point in the connection phase.

Fixes #207

Modifications

Double check the connection state when reacting on new connection being opened in the connection pool.

@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Feb 17, 2017
@merlimat merlimat added this to the 1.17 milestone Feb 17, 2017
@merlimat merlimat self-assigned this Feb 17, 2017
@merlimat
Copy link
Contributor Author

@saandrews @rdhabalia We should consider to backport this to 1.16 branch as well. No need to release immediately 1.16.1, but to keep it there ready.

@saandrews
Copy link
Contributor

👍

cleanupConnection(address, connectionKey, cnxFuture);
});

// We are connected to broker, but need to wait until the connect/connected handshake is
// complete
final ClientCnx cnx = (ClientCnx) future.channel().pipeline().get("handler");
if (!future.channel().isActive() || cnx == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future.isSuccess() = true implies that future.channel() is active. That's guaranteed right? It should be but just wanted to confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the connection was actually successfully open, thus the future itself is completed, but then immediately closed by the broker (that's what the test is reproducing).

So the future still stays set to success (future cannot change their status once they flip), but the isActive() now returns false, and the "handler" is also removed from the channel pipeline, causing the NPE when accessing cnx.

Copy link
Contributor

@rdhabalia rdhabalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rdhabalia
Copy link
Contributor

We should consider to backport this to 1.16 branch as well. No need to release immediately 1.16.1, but to keep it there ready

Yes, we will create a branch and will add this patch and we can wait for sometime before releasing if we want to include other patches as well.

@merlimat merlimat merged commit 7284ae8 into apache:master Feb 17, 2017
@merlimat
Copy link
Contributor Author

Merged in branch-1.16 at 956430d

sijie pushed a commit to sijie/pulsar that referenced this pull request Mar 4, 2018
hrsakai pushed a commit to hrsakai/pulsar that referenced this pull request Dec 10, 2020
…che#221)

* Auto update the client to handle changes in number of partitions

* Fixed linter stuff

* Fixed locking of producer/consumer list when updating partitions

* Keep the lock during the partition update operation

* Removed empty line

* Fixed locking in producer.send()
hangc0276 pushed a commit to hangc0276/pulsar that referenced this pull request May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent test failures in ClientErrorsTest.testConsumerReconnect
3 participants