HDDS-8230. Let ReplicationManager decide the timeout for commands in Datanodes #4453
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Right now, "hdds.scm.replication.command.deadline.factor" is a fraction. For long durations such as 60 minutes, the difference between SCM deadline and Datanode deadline will be 60 - 60 * 0.9, which is 6 minutes. This is a significant difference, so perhaps this configuration should be a duration instead, like 30 seconds.
Currently the APIs provided by RM expose the DN deadline as a parameter. We could remove this and just let the RM decide a deadline for commands in the DN.
This PR removes the parameter hdds.scm.replication.command.deadline.factor and also passing the datanodeDeadline via the various methods. Instead we now have hdds.scm.replication.event.timeout.datanode.offset, defaulting to 30s, which is the amount of time we subtract from event.timeout (SCM deadline) to get the datanode deadline.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8230
How was this patch tested?
Existing test cover this.