Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATIS-2089. Add CommitInfoProto in NotReplicatedException #1105

Merged
merged 4 commits into from
May 28, 2024

Conversation

ivandika3
Copy link
Contributor

@ivandika3 ivandika3 commented May 27, 2024

What changes were proposed in this pull request?

In Ozone's XceiverClientRatis#watchForCommit, there are two watch commits request with different ReplicationLevel

  1. Watch for ALL_COMMITTED
  2. Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)

Based on the second watch request, the client will remove some failed datanode UUID from the commitInfoMap.

The second watch might not be necessary since the entries in AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has been committed to the majority of the servers. Therefore, another MAJORITY_COMMITTED watch might not be necessary. From my understanding, the second MAJORITY_COMMITTED only serves to gain information to remove entries from commitInfoMap.

If the first watch failed with NotReplicatedException, we might be able to remove the need to a second watch request. Since NotReplicatedException is a Raft server exception, we can include the CommitInfoProtos in the NotReplicatedException. The client can use this CommitInfoProtos to remove the entry from commitInfoMap without sending another WATCH request.

This CommitInfoProto is returned for every RaftClientReply (RaftClientReply.commitInfos), but if there is an exception, it seems the RaftClientReply is not accessible to the client.

However, if the exception is a client exception (e.g. due to Raft client watch timeout configuration), the client might have no choice but to send another watch request.

So in this patch, I propose to include CommitInfoProto into NotReplicatedException.

This only introduces a client-side change by reusing the commitInfos in RaftClientReply.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2089

How was this patch tested?

Add a unit test to ensure commitInfoProto is not null during NotReplicateException.

@ivandika3
Copy link
Contributor Author

cc: @smengcl @szetszwo

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@szetszwo szetszwo merged commit dd75ffb into apache:master May 28, 2024
12 checks passed
@ivandika3
Copy link
Contributor Author

@szetszwo Thank you for the review and merge. Created HDDS-10932 to remove the need for the second watch commit in case of NotReplicatedException, will do after Ratis 3.1.0 release.

@ivandika3 ivandika3 deleted the RATIS-2089 branch May 29, 2024 01:54
@smengcl
Copy link
Contributor

smengcl commented May 29, 2024

Thanks a lot @ivandika3 for working on this one!

SzyWilliam pushed a commit to SzyWilliam/ratis that referenced this pull request Jun 12, 2024
szetszwo pushed a commit to szetszwo/ratis that referenced this pull request Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants