Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connect concurrency, can cause connection nodes to close #6964

Closed

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Jul 22, 2014

Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.

try {
nodeChannels.close();
} finally {
logger.debug("disconnected from [{}]", node);
logger.debug("disconnected from [{}] due to explicit disconnect call", node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this above the nodeChannels.close() so it will be logged before an eventual consequence.

@bleskes
Copy link
Contributor

bleskes commented Jul 22, 2014

Left two minor comments. LGTM.

Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes elastic#6964
@kimchy kimchy closed this in 88f3afe Jul 22, 2014
kimchy added a commit that referenced this pull request Jul 22, 2014
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes #6964
@jpountz jpountz removed the review label Jul 24, 2014
@kimchy kimchy deleted the transport_connect_can_cause_disconnect branch August 18, 2014 22:18
@clintongormley clintongormley changed the title Fix connect concurrency, can cause connection nodes to close Internal: Fix connect concurrency, can cause connection nodes to close Sep 8, 2014
@andreasch
Copy link

Would this issue cause intermittent exceptions with the following messages?

org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes were available
or
org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available

We are running ES 1.3.2 and seeing intermittent cases where we get these 2 exceptions on the JVMs that run the transport client even though there is no load on the cluster or any sort of networking issues between the JVM running the transport client and cluster.

@bleskes
Copy link
Contributor

bleskes commented Mar 24, 2015

@andreasch I think it might, if you only have one node configured in your transport client? also if you turn on debugging logging it should see some complaints about node not connected.

@andreasch
Copy link

@bleskes
We do only have one node in our transport client (points to a VIP/load balancer).

"Node not connected" seems to be the root exception message for all of our exceptions with the message "org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available"

@bleskes
Copy link
Contributor

bleskes commented Mar 24, 2015

Ok. sounds plausible then. But as these things go - it's hard to be sure..

@clintongormley clintongormley changed the title Internal: Fix connect concurrency, can cause connection nodes to close Fix connect concurrency, can cause connection nodes to close Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants