RATIS-1305. Leader stuck in infinite install snapshot cycle when logs have been purged#420
Conversation
Adding followers in the test still does not pass, but that could be a test specific issue.
Adding peers to cluster still fails.
|
cc @szetszwo @runzhiwang for review and any ideas on why the leader enters a closed state and adding peers to the configuration fails in the integration test. |
|
The added integration test When it tries to add two new nodes to the cluster later in the test, |
|
@errose28 , I just have tried the test. The two new servers, s3 and s4, have failed to start due to "Unexpected gap in segments" as shown below. It seems the changes might have some bug. |
|
New integration test is passing. Ready for review. Current CI failure looks unrelated. |
szetszwo
left a comment
There was a problem hiding this comment.
+1 the change looks good. Thanks for the update.
|
Just have reopened this for triggering a new build. Will wait for it. |
… have been purged (apache#420). Contributed by Ethan Rose
What changes were proposed in this pull request?
After RATIS-1241, a leader with no logs will send install snapshot notifications to followers on every heartbeat, regardless of whether or not they are behind. Fix this so that followers only get install snapshot notifications when they are actually behind.
What is the link to the Apache JIRA
RATIS-1305
How was this patch tested?
Integration test added. The test still has an issue that is marked with a TODO. Leaving this PR as a draft while I fix this.