HDDS-8230. Let ReplicationManager decide the timeout for commands in Datanodes #4453

sodonnel · 2023-03-22T20:10:51Z

What changes were proposed in this pull request?

Right now, "hdds.scm.replication.command.deadline.factor" is a fraction. For long durations such as 60 minutes, the difference between SCM deadline and Datanode deadline will be 60 - 60 * 0.9, which is 6 minutes. This is a significant difference, so perhaps this configuration should be a duration instead, like 30 seconds.

Currently the APIs provided by RM expose the DN deadline as a parameter. We could remove this and just let the RM decide a deadline for commands in the DN.

This PR removes the parameter hdds.scm.replication.command.deadline.factor and also passing the datanodeDeadline via the various methods. Instead we now have hdds.scm.replication.event.timeout.datanode.offset, defaulting to 30s, which is the amount of time we subtract from event.timeout (SCM deadline) to get the datanode deadline.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8230

How was this patch tested?

Existing test cover this.

…Datanodes

siddhantsangwan

Looks good! I just have one comment about the deadline calculation.

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java

S O'Donnell added 2 commits March 22, 2023 18:21

HDDS-8230. Let ReplicationManager decide the timeout for commands in …

e9f1c43

…Datanodes

HDDS-8230. Let ReplicationManager decide the timeout for commands in …

e27d530

…Datanodes

sodonnel changed the title ~~Hdds 8230. Let ReplicationManager decide the timeout for commands in Datanodes~~ HDDS-8230. Let ReplicationManager decide the timeout for commands in Datanodes Mar 22, 2023

sodonnel requested a review from siddhantsangwan March 22, 2023 20:11

siddhantsangwan reviewed Mar 23, 2023

View reviewed changes

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java Show resolved Hide resolved

adoroszlai reviewed Mar 23, 2023

View reviewed changes

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java Show resolved Hide resolved

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java Show resolved Hide resolved

S O'Donnell added 3 commits March 23, 2023 12:50

Restore postconstruct validator

d4f284f

Set offset to zero for failing tests

087bc44

Fix style

68fd648

adoroszlai approved these changes Mar 23, 2023

View reviewed changes

adoroszlai merged commit 181558a into apache:master Mar 23, 2023
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-8230. Let ReplicationManager decide the timeout for commands in Datanodes #4453

HDDS-8230. Let ReplicationManager decide the timeout for commands in Datanodes #4453

sodonnel commented Mar 22, 2023

siddhantsangwan left a comment

HDDS-8230. Let ReplicationManager decide the timeout for commands in Datanodes #4453

HDDS-8230. Let ReplicationManager decide the timeout for commands in Datanodes #4453

Conversation

sodonnel commented Mar 22, 2023

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

siddhantsangwan left a comment

Choose a reason for hiding this comment