New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only retry join when other node is not (yet) a master #8972
Closed
bleskes
wants to merge
2
commits into
elastic:master
from
bleskes:zen_disco_only_retry_on_remote_exception
Closed
Only retry join when other node is not (yet) a master #8972
bleskes
wants to merge
2
commits into
elastic:master
from
bleskes:zen_disco_only_retry_on_remote_exception
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When a node tries to join a master, the master may not yet be ready to accept the join request. In such cases we retry sending the join request up to 3 times before going back to ping. To detec this the current logic uses ExceptionsHelper.unwrapCause(t) to unwrap the incoming RemoteTransportException and inspect it's source, looking for ElasticsearchIllegalStateException. However, local ElasticsearchIllegalStateException can also be thrown when the join process should be cancelled (i.e., node shut down). In this case we shouldn't retry.
bleskes
force-pushed
the
zen_disco_only_retry_on_remote_exception
branch
from
December 16, 2014 21:50
4af41e7
to
d7fad69
Compare
bleskes
force-pushed
the
zen_disco_only_retry_on_remote_exception
branch
from
December 16, 2014 21:52
d7fad69
to
de2bb69
Compare
@kimchy I updated the PR to use an explicit exception. Also cleaned up some unused exceptions. I'll update the PR description + commit msg once the review is done. |
LGTM |
bleskes
changed the title
Discovery: only retry join for remote exceptions
Discovery: only retry join when other node is not (yet) a master
Dec 16, 2014
bleskes
added a commit
to bleskes/elasticsearch
that referenced
this pull request
Dec 16, 2014
When a node tries to join a master, the master may not yet be ready to accept the join request. In such cases we retry sending the join request up to 3 times before going back to ping. To detect this the current logic uses ExceptionsHelper.unwrapCause(t) to unwrap the incoming RemoteTransportException and inspect it's source, looking for `ElasticsearchIllegalStateException`. However, local `ElasticsearchIllegalStateException` can also be thrown when the join process should be cancelled (i.e., node shut down). In this case we shouldn't retry. Since we can't introduce new exceptions in a BWC manner, we are forced to check the message of the exception. Relates to elastic#8972
bleskes
added a commit
that referenced
this pull request
Dec 16, 2014
When a node tries to join a master, the master may not yet be ready to accept the join request. In such cases we retry sending the join request up to 3 times before going back to ping. To detect this the current logic uses ExceptionsHelper.unwrapCause(t) to unwrap the incoming RemoteTransportException and inspect it's source, looking for `ElasticsearchIllegalStateException`. However, local `ElasticsearchIllegalStateException` can also be thrown when the join process should be cancelled (i.e., node shut down). In this case we shouldn't retry. Since we can't introduce new exceptions in a BWC manner, we are forced to check the message of the exception. Relates to #8972 Closes #8979
clintongormley
added
the
:Distributed/Discovery-Plugins
Anything related to our integration plugins with EC2, GCP and Azure
label
Jun 7, 2015
clintongormley
changed the title
Discovery: only retry join when other node is not (yet) a master
Only retry join when other node is not (yet) a master
Jun 7, 2015
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Distributed/Discovery-Plugins
Anything related to our integration plugins with EC2, GCP and Azure
resiliency
v2.0.0-beta1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a node tries to join a master, the master may not yet be ready to accept the join request. In such cases we retry sending the join request up to 3 times before going back to ping. To detect this the current logic uses ExceptionsHelper.unwrapCause(t) to unwrap the incoming RemoteTransportException and inspect it's source, looking for
ElasticsearchIllegalStateException
. However, localElasticsearchIllegalStateException
can also be thrown when the join process should be cancelled (i.e., node shut down). In this case we shouldn't retry.The PR adds an explicit
NotMasterException
to indicate the remote node is not a master. A similarly named exception (but meaning something else) in the master fault detection code was given a better name. Also clean up some other exceptions while at it.See http://build-us-00.elasticsearch.org/job/es_g1gc_master_metal/499/testReport/junit/org.elasticsearch.discovery.zen/ZenDiscoveryTests/testNodeFailuresAreProcessedOnce/ for a test that gets confused by the extra join