Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: ensure no leader before restarting outdated follower #18203

Merged
merged 1 commit into from
May 6, 2024

Conversation

deepthidevaki
Copy link
Contributor

Description

The test was flaky because we were looking for a leader among m1 and m2. But it should be m1 and m2Restarted. In the flaky case, m2Restarted was the leader. This was not the intention of the test, because node 2 must be outdated at this point and cannot become the leader. But in this case, the shutdowns and restarts were so fast that leader m1 didn't get a chance to step down. So when m2 was restarted, it immediately got the latest log entry and was able to become the leader immediately after the force configure is completed. To prevent this case, we now wait until there is no leader, before restarting node 2.

Related issues

closes #17670

@deepthidevaki deepthidevaki added the backport stable/8.5 Backport a pull request to stable/8.5 label May 3, 2024
@github-actions github-actions bot added the component/zeebe Related to the Zeebe component/team label May 3, 2024
Copy link
Member

@lenaschoenburg lenaschoenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix 👍

The test was flaky because we were looking for a leader among m1 and m2. But
it should be m1 and m2Restarted. In the flaky case, m2Restarted was the leader.
This was not the intention of the test, because node 2 must be outdated at this
point and cannot become the leader. But in this case, the shutdowns and restarts
were so fast that leader m1 didn't get a chance to step down. So when m2 was
restarted, it immediatly got the latest log entry and was able to become the leader
immediately after the force configure is completed. To prevent this case, we now wait
until there is no leader, before restarting node 2.
@deepthidevaki
Copy link
Contributor Author

PR is blocked at step "SDK test summary" which doesn't exist in this branch. Rebasing on main.

@lenaschoenburg lenaschoenburg added this pull request to the merge queue May 6, 2024
Merged via the queue into main with commit d471acd May 6, 2024
39 checks passed
@lenaschoenburg lenaschoenburg deleted the dd-17670-flaky-reconfigure branch May 6, 2024 08:32
@backport-action
Copy link
Collaborator

Successfully created backport PR for stable/8.5:

github-merge-queue bot pushed a commit that referenced this pull request May 6, 2024
…ed follower (#18259)

# Description
Backport of #18203 to `stable/8.5`.

relates to #17670
original author: @deepthidevaki
@Zelldon Zelldon added the version:8.5.1 Marks an issue as being completely or in parts released in 8.5.1 label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport stable/8.5 Backport a pull request to stable/8.5 component/zeebe Related to the Zeebe component/team version:8.5.1 Marks an issue as being completely or in parts released in 8.5.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test shouldReconfigureViaAnOutDatedFollower : io.atomix.raft.ReconfigurationTest$ForceConfigureTest
4 participants