RATIS-1411. Alleviate slow follower issue #508

ChenSammi · 2021-09-27T12:05:48Z

https://issues.apache.org/jira/browse/RATIS-1411

ChenSammi · 2021-09-27T12:20:17Z

This patch is a mixed patch. I will upload a clean patch later.

szetszwo

This is a clever idea to limit the slowest follower. Thanks.

This feature should be disabled by default. Note that setting it to 1 does not disable the feature since the leader will wait the slowest follower to commit new requests. In other words. when a server is dead, no more transection can be committed.

szetszwo · 2021-09-28T11:34:49Z

ratis-server-api/src/main/java/org/apache/ratis/server/RaftServerConfigKeys.java

@@ -119,6 +119,17 @@ static SizeInBytes byteLimit(RaftProperties properties) {
    static void setByteLimit(RaftProperties properties, SizeInBytes byteLimit) {
      setSizeInBytes(properties::set, BYTE_LIMIT_KEY, byteLimit, requireMin(1L));
    }
+
+    String FOLLOWER_MAX_GAP_RATIO_KEY = PREFIX + ".follower-max-gap-ratio";


Let's rename it to ".follower.gap.ratio.max".

szetszwo · 2021-09-28T11:35:52Z

ratis-server-api/src/main/java/org/apache/ratis/server/RaftServerConfigKeys.java

@@ -119,6 +119,17 @@ static SizeInBytes byteLimit(RaftProperties properties) {
    static void setByteLimit(RaftProperties properties, SizeInBytes byteLimit) {
      setSizeInBytes(properties::set, BYTE_LIMIT_KEY, byteLimit, requireMin(1L));
    }
+
+    String FOLLOWER_MAX_GAP_RATIO_KEY = PREFIX + ".follower-max-gap-ratio";
+    float FOLLOWER_MAX_GAP_RATIO_DEFAULT = 1;


Let's use double instead of float. We should avoid using float, especially in 64-bit computers.

Sure. Except for that double is more precise than float, any major drawbacks of using float?

Precision is the main reason.

Also, double is the default in Java. For example, the code below does not compile.

float a = 1.2;

Using float is easier to make mistakes unawarely.

Unless space is an issue, we should just use double.

@szetszwo , thanks for the detail info.

szetszwo · 2021-09-28T11:37:25Z

ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java

@@ -269,6 +272,13 @@ boolean removeAll(Collection<LogAppender> c) {
    this.pendingRequests = new PendingRequests(server.getMemberId(), properties, raftServerMetrics);
    this.watchRequests = new WatchRequests(server.getMemberId(), properties);
    this.messageStreamRequests = new MessageStreamRequests(server.getMemberId());
+    this.maxPendingRequests = RaftServerConfigKeys.Write.elementLimit(properties);


Change maxPendingRequests to a local variable.

bshashikant · 2021-09-28T14:20:03Z

Thanks @ChenSammi for the patch. This should be disabled by default in ratis as @szetszwo suggested,

The only problem i think what we will run into eventually will be :

If a slow follower is just not able to catch up with the leader and exceeds the threshold, the leader itself will not be able to apply the transactions whatever have already been appended to its own log . In such cases, probably, it should be able to remove the node and apply the pending transactions which have been appended on the raft log,.

Also, a degenerate case will be where we are not updating the majority index to the max but the min of the nodes, but, the leader itself will be accepting accepting transactions till the pending request limit is reached.

The other approach can be to not remove the entry from the cache aggressively once applyTransaction is called in Ozone. it should be available on the data cache as long as it is within the threshold of difference between majority and min index The Once it exceeds the threshold , the pipeline can be closed.
The data in the stateMachine case will be there in the cache, until all the followers get the data(majority index == min index) . Once all the nodes have the data, or the gap between majority and min index is within the threshold , the entry will always be in the cache. Otherwise, remove the entry from cache and mark the node as slow follower and handle the slow follower case in ozone by closing down the pipeline.

ChenSammi · 2021-09-29T08:11:43Z

Thanks @szetszwo and @bshashikant for giving the feedback. Add a switch to turn on/off this feature is very good point.

For those ratis client who doesn't need WATCH for ALL_COMMITTED, and doesn't have large statemachine data to write/read, there will be no severe slow follower issue. For example, current OM and SCM HA.
I will add an new property for that.

ChenSammi · 2021-09-29T08:33:42Z

Thanks @ChenSammi for the patch. This should be disabled by default in ratis as @szetszwo suggested,

The only problem i think what we will run into eventually will be :

If a slow follower is just not able to catch up with the leader and exceeds the threshold, the leader itself will not be able to apply the transactions whatever have already been appended to its own log . In such cases, probably, it should be able to remove the node and apply the pending transactions which have been appended on the raft log,.

If the slow follower is alive and function well, it will eventually catch up when the leader and another follower are waiting for it. If the slow follower doesn't response, it will eventually trigger the pipeline close after "raft.server.rpc.slowness.timeout" timeout(300s in Ozone).
Removing a slow follwer node, leave the two members keep going on, or shall we add another new member to the raft group?

Also, a degenerate case will be where we are not updating the majority index to the max but the min of the nodes, but, the leader itself will be accepting accepting transactions till the pending request limit is reached.

The other approach can be to not remove the entry from the cache aggressively once applyTransaction is called in Ozone. it should be available on the data cache as long as it is within the threshold of difference between majority and min index The Once it exceeds the threshold , the pipeline can be closed. The data in the stateMachine case will be there in the cache, until all the followers get the data(majority index == min index) . Once all the nodes have the data, or the gap between majority and min index is within the threshold , the entry will always be in the cache. Otherwise, remove the entry from cache and mark the node as slow follower and handle the slow follower case in ozone by closing down the pipeline.

I opened a Ozone ticket for the cache improvement HDDS-5791.

szetszwo

+1 the change looks good.

szetszwo · 2021-09-30T03:20:45Z

@bshashikant , do you want to take a look at the new change?

bshashikant

Thanks @ChenSammi for the patch and @szetszwo for the review.

lokeshj1703 · 2021-10-07T08:19:59Z

This PR would also affect LeaderStateImpl#commitIndexChanged. I feel we do not need the follower gap limitation for updating the watch requests.

kaijchen · 2021-10-07T10:30:25Z

Good catch.
However, I don't feel this is a perfect solution in general case because it may stall the leader. (that's why we need the switch)

Do you think keeping some cache for the slowest follower will work?
We can adjust the ratio of cache for slowest follower vs cache for latest entries to control the flow rates.
If the slowest follower is making progress, we increase the ratio to let it catch up.
If the slowest follower is not responding, we decrease the ratio so there is little impact for leader to make progress.

ChenSammi · 2021-10-08T03:22:38Z

Thanks @szetszwo and @bshashikant for the code review.

lokeshj1703 · 2021-10-08T06:23:04Z

I think there are a couple of problems with having two caches.

There might be some entries which are shared between these two caches.
Let's consider a scenario where lag is larger around 10k and slow follower cache can support 1k entries. In this case the rest 9k entries would still be read from disk.

kaijchen · 2021-10-09T06:57:02Z

@lokeshj1703 Yes. I was aware of the same problems. Seems we can't set the priorities just by cache.

This PR would also affect LeaderStateImpl#commitIndexChanged. I feel we do not need the follower gap limitation for updating the watch requests.

+1 on this.

ChenSammi changed the title ~~Ratis 1411- Alleviate slow follower issue~~ Ratis-1411. Alleviate slow follower issue Sep 28, 2021

ChenSammi changed the title ~~Ratis-1411. Alleviate slow follower issue~~ RATIS-1411. Alleviate slow follower issue Sep 28, 2021

RATIS-1411. Alleviate slow follower issue.

7857dd8

ChenSammi force-pushed the RATIS-1411 branch from 120657f to 7857dd8 Compare September 28, 2021 06:13

szetszwo requested changes Sep 28, 2021

View reviewed changes

ChenSammi added 3 commits September 29, 2021 18:04

improve per comments

94f34d6

naming consistency

6efb509

fix failed TestConfUtils

856458b

szetszwo approved these changes Sep 30, 2021

View reviewed changes

bshashikant approved these changes Sep 30, 2021

View reviewed changes

bshashikant merged commit 837b063 into apache:master Sep 30, 2021

symious pushed a commit to symious/ratis that referenced this pull request Feb 21, 2024

RATIS-1411. Alleviate slow follower issue (apache#508)

0e727de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RATIS-1411. Alleviate slow follower issue #508

RATIS-1411. Alleviate slow follower issue #508

ChenSammi commented Sep 27, 2021

ChenSammi commented Sep 27, 2021

szetszwo left a comment

szetszwo Sep 28, 2021

szetszwo Sep 28, 2021

ChenSammi Sep 29, 2021

szetszwo Sep 29, 2021

ChenSammi Oct 9, 2021

szetszwo Sep 28, 2021

bshashikant commented Sep 28, 2021 •

edited

ChenSammi commented Sep 29, 2021

ChenSammi commented Sep 29, 2021

szetszwo left a comment

szetszwo commented Sep 30, 2021

bshashikant left a comment

lokeshj1703 commented Oct 7, 2021

kaijchen commented Oct 7, 2021

ChenSammi commented Oct 8, 2021

lokeshj1703 commented Oct 8, 2021

kaijchen commented Oct 9, 2021

RATIS-1411. Alleviate slow follower issue #508

RATIS-1411. Alleviate slow follower issue #508

Conversation

ChenSammi commented Sep 27, 2021

ChenSammi commented Sep 27, 2021

szetszwo left a comment

Choose a reason for hiding this comment

szetszwo Sep 28, 2021

Choose a reason for hiding this comment

szetszwo Sep 28, 2021

Choose a reason for hiding this comment

ChenSammi Sep 29, 2021

Choose a reason for hiding this comment

szetszwo Sep 29, 2021

Choose a reason for hiding this comment

ChenSammi Oct 9, 2021

Choose a reason for hiding this comment

szetszwo Sep 28, 2021

Choose a reason for hiding this comment

bshashikant commented Sep 28, 2021 • edited

ChenSammi commented Sep 29, 2021

ChenSammi commented Sep 29, 2021

szetszwo left a comment

Choose a reason for hiding this comment

szetszwo commented Sep 30, 2021

bshashikant left a comment

Choose a reason for hiding this comment

lokeshj1703 commented Oct 7, 2021

kaijchen commented Oct 7, 2021

ChenSammi commented Oct 8, 2021

lokeshj1703 commented Oct 8, 2021

kaijchen commented Oct 9, 2021

bshashikant commented Sep 28, 2021 •

edited