GEODE-8473: Hang in ReplyProcessor21 when forced-disconnect does not establish a cancellation cause#5491
Merged
bschuchardt merged 1 commit intoapache:developfrom Sep 16, 2020
Conversation
…establish a cancellation cause Ensure that the cache is informed of a forced-disconnect in the DisconnectThread. This is a follow-on commit to GEODE-8467, which ensured that the DisconnectThread is launched in the presence of cache XML generation failure. This commit adds a try/catch in GMSMembership.uncleanShutdown() to ensure that the up-stream ClusterDistributionManager is informed of the failure so it can set the "rootCause" in its CancelCriterion. ReplyProcessor21 and other objects that poll for this "rootCause" will then be released from waiting for responses to messages sent to other members of the cluster.
echobravopapa
approved these changes
Sep 2, 2020
Bill
approved these changes
Sep 10, 2020
| .isEqualTo(expectedException); | ||
| verify(listener).membershipFailure(isA(String.class), isA(Throwable.class)); | ||
| } | ||
|
|
| listener.membershipFailure(reason, e); | ||
| } catch (RuntimeException re) { | ||
| logger.warn("Exception caught while shutting down", re); | ||
| } |
mkevo
pushed a commit
to Nordix/geode
that referenced
this pull request
Mar 19, 2021
…establish a cancellation cause (apache#5491) Ensure that the cache is informed of a forced-disconnect in the DisconnectThread. This is a follow-on commit to GEODE-8467, which ensured that the DisconnectThread is launched in the presence of cache XML generation failure. This commit adds a try/catch in GMSMembership.uncleanShutdown() to ensure that the up-stream ClusterDistributionManager is informed of the failure so it can set the "rootCause" in its CancelCriterion. ReplyProcessor21 and other objects that poll for this "rootCause" will then be released from waiting for responses to messages sent to other members of the cluster.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ReplyProcessor21 will not stop waiting for responses to a message during a Forced Disconnect unless ClusterDistributionManager is informed of the disconnect. It sets a rootCause in its CancelCriterion that is polled by ReplyProcessor21's StoppableCountDownLatch.
This commit ensures that ClusterDistributionManager is notified of the disconnect so that it can perform this action.
This is a follow-up PR to GEODE-8467, which ensures that a DisconnectThread is launched to execute the GMSMembership.uncleanShutdown() method.
@kamilla1201
Thank you for submitting a contribution to Apache Geode.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
Has your PR been rebased against the latest commit within the target branch (typically
develop)?Is your initial contribution a single, squashed commit?
Does
gradlew buildrun cleanly?Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Note:
Please ensure that once the PR is submitted, check Concourse for build issues and
submit an update to your PR as soon as possible. If you need help, please send an
email to dev@geode.apache.org.