Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATIS-1411. Alleviate slow follower issue #508

Merged
merged 4 commits into from Sep 30, 2021

Conversation

ChenSammi
Copy link
Contributor

@ChenSammi
Copy link
Contributor Author

This patch is a mixed patch. I will upload a clean patch later.

@ChenSammi ChenSammi changed the title Ratis 1411- Alleviate slow follower issue Ratis-1411. Alleviate slow follower issue Sep 28, 2021
@ChenSammi ChenSammi changed the title Ratis-1411. Alleviate slow follower issue RATIS-1411. Alleviate slow follower issue Sep 28, 2021
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clever idea to limit the slowest follower. Thanks.

This feature should be disabled by default. Note that setting it to 1 does not disable the feature since the leader will wait the slowest follower to commit new requests. In other words. when a server is dead, no more transection can be committed.

@@ -119,6 +119,17 @@ static SizeInBytes byteLimit(RaftProperties properties) {
static void setByteLimit(RaftProperties properties, SizeInBytes byteLimit) {
setSizeInBytes(properties::set, BYTE_LIMIT_KEY, byteLimit, requireMin(1L));
}

String FOLLOWER_MAX_GAP_RATIO_KEY = PREFIX + ".follower-max-gap-ratio";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename it to ".follower.gap.ratio.max".

@@ -119,6 +119,17 @@ static SizeInBytes byteLimit(RaftProperties properties) {
static void setByteLimit(RaftProperties properties, SizeInBytes byteLimit) {
setSizeInBytes(properties::set, BYTE_LIMIT_KEY, byteLimit, requireMin(1L));
}

String FOLLOWER_MAX_GAP_RATIO_KEY = PREFIX + ".follower-max-gap-ratio";
float FOLLOWER_MAX_GAP_RATIO_DEFAULT = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use double instead of float. We should avoid using float, especially in 64-bit computers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Except for that double is more precise than float, any major drawbacks of using float?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precision is the main reason.

Also, double is the default in Java. For example, the code below does not compile.

      float a = 1.2;

Using float is easier to make mistakes unawarely.

Unless space is an issue, we should just use double.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo , thanks for the detail info.

@@ -269,6 +272,13 @@ boolean removeAll(Collection<LogAppender> c) {
this.pendingRequests = new PendingRequests(server.getMemberId(), properties, raftServerMetrics);
this.watchRequests = new WatchRequests(server.getMemberId(), properties);
this.messageStreamRequests = new MessageStreamRequests(server.getMemberId());
this.maxPendingRequests = RaftServerConfigKeys.Write.elementLimit(properties);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change maxPendingRequests to a local variable.

@bshashikant
Copy link
Contributor

bshashikant commented Sep 28, 2021

Thanks @ChenSammi for the patch. This should be disabled by default in ratis as @szetszwo suggested,

The only problem i think what we will run into eventually will be :

If a slow follower is just not able to catch up with the leader and exceeds the threshold, the leader itself will not be able to apply the transactions whatever have already been appended to its own log . In such cases, probably, it should be able to remove the node and apply the pending transactions which have been appended on the raft log,.

Also, a degenerate case will be where we are not updating the majority index to the max but the min of the nodes, but, the leader itself will be accepting accepting transactions till the pending request limit is reached.

The other approach can be to not remove the entry from the cache aggressively once applyTransaction is called in Ozone. it should be available on the data cache as long as it is within the threshold of difference between majority and min index The Once it exceeds the threshold , the pipeline can be closed.
The data in the stateMachine case will be there in the cache, until all the followers get the data(majority index == min index) . Once all the nodes have the data, or the gap between majority and min index is within the threshold , the entry will always be in the cache. Otherwise, remove the entry from cache and mark the node as slow follower and handle the slow follower case in ozone by closing down the pipeline.

@ChenSammi
Copy link
Contributor Author

Thanks @szetszwo and @bshashikant for giving the feedback. Add a switch to turn on/off this feature is very good point.

For those ratis client who doesn't need WATCH for ALL_COMMITTED, and doesn't have large statemachine data to write/read, there will be no severe slow follower issue. For example, current OM and SCM HA.
I will add an new property for that.

@ChenSammi
Copy link
Contributor Author

Thanks @ChenSammi for the patch. This should be disabled by default in ratis as @szetszwo suggested,

The only problem i think what we will run into eventually will be :

If a slow follower is just not able to catch up with the leader and exceeds the threshold, the leader itself will not be able to apply the transactions whatever have already been appended to its own log . In such cases, probably, it should be able to remove the node and apply the pending transactions which have been appended on the raft log,.

If the slow follower is alive and function well, it will eventually catch up when the leader and another follower are waiting for it. If the slow follower doesn't response, it will eventually trigger the pipeline close after "raft.server.rpc.slowness.timeout" timeout(300s in Ozone).
Removing a slow follwer node, leave the two members keep going on, or shall we add another new member to the raft group?

Also, a degenerate case will be where we are not updating the majority index to the max but the min of the nodes, but, the leader itself will be accepting accepting transactions till the pending request limit is reached.

The other approach can be to not remove the entry from the cache aggressively once applyTransaction is called in Ozone. it should be available on the data cache as long as it is within the threshold of difference between majority and min index The Once it exceeds the threshold , the pipeline can be closed. The data in the stateMachine case will be there in the cache, until all the followers get the data(majority index == min index) . Once all the nodes have the data, or the gap between majority and min index is within the threshold , the entry will always be in the cache. Otherwise, remove the entry from cache and mark the node as slow follower and handle the slow follower case in ozone by closing down the pipeline.

I opened a Ozone ticket for the cache improvement HDDS-5791.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@szetszwo
Copy link
Contributor

@bshashikant , do you want to take a look at the new change?

Copy link
Contributor

@bshashikant bshashikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for the patch and @szetszwo for the review.

@bshashikant bshashikant merged commit 837b063 into apache:master Sep 30, 2021
@lokeshj1703
Copy link
Contributor

This PR would also affect LeaderStateImpl#commitIndexChanged. I feel we do not need the follower gap limitation for updating the watch requests.

@kaijchen
Copy link
Contributor

kaijchen commented Oct 7, 2021

Good catch.
However, I don't feel this is a perfect solution in general case because it may stall the leader. (that's why we need the switch)

Do you think keeping some cache for the slowest follower will work?
We can adjust the ratio of cache for slowest follower vs cache for latest entries to control the flow rates.
If the slowest follower is making progress, we increase the ratio to let it catch up.
If the slowest follower is not responding, we decrease the ratio so there is little impact for leader to make progress.

@ChenSammi
Copy link
Contributor Author

Thanks @szetszwo and @bshashikant for the code review.

@lokeshj1703
Copy link
Contributor

I think there are a couple of problems with having two caches.

  1. There might be some entries which are shared between these two caches.
  2. Let's consider a scenario where lag is larger around 10k and slow follower cache can support 1k entries. In this case the rest 9k entries would still be read from disk.

@kaijchen
Copy link
Contributor

kaijchen commented Oct 9, 2021

@lokeshj1703 Yes. I was aware of the same problems. Seems we can't set the priorities just by cache.

This PR would also affect LeaderStateImpl#commitIndexChanged. I feel we do not need the follower gap limitation for updating the watch requests.

+1 on this.

symious pushed a commit to symious/ratis that referenced this pull request Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants