-
Notifications
You must be signed in to change notification settings - Fork 78
=cluster,handshake #724 immediately initialize membership #778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // always update the snapshot before emitting events | ||
| context.system.cluster.updateMembershipSnapshot(state.membership) | ||
| self.clusterEvents.publish(.membershipChange(change)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fix™
The sending to self to avoid having another spot where we have to write "updateMembershipSnapshot and publish event" was causing a window of opportunity for another message to be handled first -- and that message would then hit us when we had NO members at all in the membership, not even "us".
This would then cause the nonsensical handshake rejection because "there is no local member"
Yet there always is a local member! It's impossible to not know the local member, it is us.
| /// If, and only if, the current node is a leader it performs a set of tasks, such as moving nodes to `.up` etc. | ||
| func collectLeaderActions() -> [LeaderAction] { | ||
| guard self.membership.isLeader(self.localNode) else { | ||
| guard self.membership.isLeader(self.selfNode) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming consistency
| reason: "Node cannot be part of cluster, no member available.", | ||
| whenHandshakeReplySent: nil | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this rejection reason is nonsensical and was triggered in some of the tests sporadically; This is now made impossible for good.
"loop" through shell as this may cause a race condition between a node extenging a handshake to this node before the "loop through self adding the myself Member" has a chance to run; This manifested in tests by rejecting handshakes by "no local member" which is nonsense, there always is a local known member after all.
Co-authored-by: Yim Lee <yim_lee@apple.com>
|
Weird failure #779 :/ |
|
@swift-server-bot test this please |
immediately initialize membership, without "loop" through shell as this may cause a race condition between a node
extenging a handshake to this node before the "loop through self adding
the myself Member" has a chance to run; This manifested in tests by
rejecting handshakes by "no local member" which is nonsense, there
always is a local known member after all.
resolves #724, an actual bug 😱