RATIS-2089. Add CommitInfoProto in NotReplicatedException #1105
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In Ozone's XceiverClientRatis#watchForCommit, there are two watch commits request with different ReplicationLevel
Based on the second watch request, the client will remove some failed datanode UUID from the commitInfoMap.
The second watch might not be necessary since the entries in AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has been committed to the majority of the servers. Therefore, another MAJORITY_COMMITTED watch might not be necessary. From my understanding, the second MAJORITY_COMMITTED only serves to gain information to remove entries from commitInfoMap.
If the first watch failed with NotReplicatedException, we might be able to remove the need to a second watch request. Since NotReplicatedException is a Raft server exception, we can include the CommitInfoProtos in the NotReplicatedException. The client can use this CommitInfoProtos to remove the entry from commitInfoMap without sending another WATCH request.
This CommitInfoProto is returned for every RaftClientReply (RaftClientReply.commitInfos), but if there is an exception, it seems the RaftClientReply is not accessible to the client.
However, if the exception is a client exception (e.g. due to Raft client watch timeout configuration), the client might have no choice but to send another watch request.
So in this patch, I propose to include CommitInfoProto into NotReplicatedException.
This only introduces a client-side change by reusing the commitInfos in RaftClientReply.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-2089
How was this patch tested?
Add a unit test to ensure commitInfoProto is not null during NotReplicateException.