Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid the brain split phenomenon in the symmetric network partition scenario #13221

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

OneSizeFitsQuorum
Copy link
Contributor

@OneSizeFitsQuorum OneSizeFitsQuorum commented Aug 19, 2024

The current Ratis module's Leader may step down voluntarily without knowing who the new Leader is, which will not trigger the state machine's notifyLeaderChange callback. As a result, some modules that rely on this interface to determine whether the current node is no longer the Leader might delay resource release, potentially causing split-brain issues with multiple Leaders.

image image

For example, in a 3-node ConfigNode setup, if a symmetric network partition fault is injected into the Leader node, the other two nodes will elect a new Leader. However, certain services (such as heartbeat, procedure, etc.) on the old Leader will not be cleared, leading to a split-brain scenario, which could cause some unexpected behavior.

image

After this PR, even if the new Leader is unknown, Ratis will still call the notifyNotReady function, thereby preventing split-brain issues from occurring.

@OneSizeFitsQuorum OneSizeFitsQuorum force-pushed the enhance_symmetric_network_partition_issue branch 2 times, most recently from 56b5ce4 to c098f31 Compare August 19, 2024 07:19
Signed-off-by: OneSizeFitQuorum <tanxinyu@apache.org>
@OneSizeFitsQuorum OneSizeFitsQuorum force-pushed the enhance_symmetric_network_partition_issue branch from c098f31 to cb8e9fb Compare August 19, 2024 07:20
Copy link

sonarcloud bot commented Aug 19, 2024

Copy link
Collaborator

@liyuheng55555 liyuheng55555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work for improving the overall robustness

Copy link
Collaborator

@HxpSerein HxpSerein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

Copy link

codecov bot commented Aug 19, 2024

Codecov Report

Attention: Patch coverage is 41.93548% with 18 lines in your changes missing coverage. Please review.

Project coverage is 40.84%. Comparing base (b93348d) to head (cb8e9fb).
Report is 18 commits behind head on master.

Files Patch % Lines
...nsensus/statemachine/ConfigRegionStateMachine.java 0.00% 18 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13221      +/-   ##
============================================
- Coverage     41.00%   40.84%   -0.16%     
  Complexity       71       71              
============================================
  Files          3783     3825      +42     
  Lines        234996   236041    +1045     
  Branches      28140    28441     +301     
============================================
+ Hits          96362    96422      +60     
- Misses       138634   139619     +985     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@133tosakarin
Copy link
Contributor

LGTM

@OneSizeFitsQuorum OneSizeFitsQuorum merged commit b88c82b into master Aug 19, 2024
29 of 30 checks passed
@OneSizeFitsQuorum OneSizeFitsQuorum deleted the enhance_symmetric_network_partition_issue branch August 19, 2024 10:08
OneSizeFitsQuorum added a commit that referenced this pull request Aug 19, 2024
…cenario #13221

Signed-off-by: OneSizeFitQuorum <tanxinyu@apache.org>
OneSizeFitsQuorum added a commit that referenced this pull request Aug 19, 2024
…cenario #13221 (#13226)

Signed-off-by: OneSizeFitQuorum <tanxinyu@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants