-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid the brain split phenomenon in the symmetric network partition scenario #13221
Avoid the brain split phenomenon in the symmetric network partition scenario #13221
Conversation
56b5ce4
to
c098f31
Compare
c098f31
to
cb8e9fb
Compare
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice work for improving the overall robustness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM~
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13221 +/- ##
============================================
- Coverage 41.00% 40.84% -0.16%
Complexity 71 71
============================================
Files 3783 3825 +42
Lines 234996 236041 +1045
Branches 28140 28441 +301
============================================
+ Hits 96362 96422 +60
- Misses 138634 139619 +985 ☔ View full report in Codecov by Sentry. |
LGTM |
…cenario #13221 Signed-off-by: OneSizeFitQuorum <tanxinyu@apache.org>
The current Ratis module's Leader may step down voluntarily without knowing who the new Leader is, which will not trigger the state machine's notifyLeaderChange callback. As a result, some modules that rely on this interface to determine whether the current node is no longer the Leader might delay resource release, potentially causing split-brain issues with multiple Leaders.
For example, in a 3-node ConfigNode setup, if a symmetric network partition fault is injected into the Leader node, the other two nodes will elect a new Leader. However, certain services (such as heartbeat, procedure, etc.) on the old Leader will not be cleared, leading to a split-brain scenario, which could cause some unexpected behavior.
After this PR, even if the new Leader is unknown, Ratis will still call the notifyNotReady function, thereby preventing split-brain issues from occurring.