New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quarantining can break akka cluster #25632

Closed
BertRobben opened this Issue Sep 17, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@BertRobben

BertRobben commented Sep 17, 2018

When a node is quarantined, it will be automatically downed and removed from the cluster. If this happens at a time when other nodes are unreachable this can lead to a split-brain situation. Problem is that the downing provider is not consulted / aware of the issue and as such can't prevent the split-brain from happening.

I explained this in more detail in
https://discuss.lightbend.com/t/quarantine-breaks-cluster-abstraction/2141

and provided a detailed example in

https://discuss.lightbend.com/t/erroneous-split-brain-situation-in-cluster-with-properly-working-sbr/2058

As far as I understand, this issue is present in both 2.4 and 2.5

@patriknw

This comment has been minimized.

Show comment
Hide comment
@patriknw

patriknw Sep 17, 2018

Member

I agree that we should remove that downing. I see no theoretical downsides. It will remain as Unreachable (Terminated, which means that it can't go back to Reachable) and then the ordinary downing provider has to act on that.

Member

patriknw commented Sep 17, 2018

I agree that we should remove that downing. I see no theoretical downsides. It will remain as Unreachable (Terminated, which means that it can't go back to Reachable) and then the ordinary downing provider has to act on that.

@patriknw patriknw self-assigned this Sep 24, 2018

patriknw added a commit that referenced this issue Sep 24, 2018

patriknw added a commit that referenced this issue Sep 26, 2018

Merge pull request #25678 from akka/wip-25632-down-Terminated-patriknw
Don't automatically down quarantined node, #25632

@patriknw patriknw added this to the 2.5.17 milestone Sep 26, 2018

@patriknw patriknw closed this Sep 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment