HDDS-8168. Make deadlines inside MoveManager for move commands configurable #4415

siddhantsangwan · 2023-03-16T19:10:04Z

What changes were proposed in this pull request?

MoveManager currently has a hardcoded deadline of 60 minutes. We should make this configurable, and that configuration should be present in ContainerBalancerConfiguration.
Move is replication + deletion. Replication involves copying between two datanodes so it's generally more expensive than deletion. So we probably need to have an additional configuration for deciding how much time out of the total move deadline is allowed for replication. The rest will be allowed for deletion. Move deadline's configuration is "hdds.container.balancer.move.timeout".

Changes introduced:

Added a new configuration "hdds.container.balancer.move.replication.timeout". Made MoveManager use this configuration.
Added validations for this configuration in ContainerBalancer.
Added this configuration to ContainerBalancerConfigurationProto.
Other small changes.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8168

How was this patch tested?

Added unit tests

…urable

# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java

adoroszlai

LGTM, with one minor comment.

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/MoveManager.java

siddhantsangwan · 2023-03-20T07:07:52Z

DU is run in a DN every 60 minutes by default (hdds.datanode.du.refresh.period). That means SCM gets updated information on a DN's free space every 60 minutes. This information is used by Container Balancer to compare DNs based on their space utilisation (free space divided by total space). Container Balancer is stateless between iterations - all the space utilisation information is recalculated every iteration. So, it's good to have a default Container Balancer iteration interval that's greater than the DU interval. It prevents balancer from making moves with the same stale information that was used in the previous iteration.

If we increase moveTimeout to 90 minutes, then at the latest we expect moves to complete close to the 90th minute. In the worst case, DU will run before moves have completed. This means if our Container Balancer iteration interval is close to (and greater than) 90 minutes, it'll start a new iteration that does not account for moves made by the previous iteration because DU hasn't calculated the latest space yet. That's why I think the iteration interval should be greater than 90 + 60 minutes. 160 minutes seems like a good default to me. If anyone wants to make it more aggressive, they can enable trigger.du.before.move.enable and reduce the iteration interval.

kerneltime · 2023-03-20T16:15:40Z

@SaketaChalamchala can you please take a look?

siddhantsangwan · 2023-03-21T05:33:12Z

Considering improvements made in HDDS-6577, it's probably best to have the iteration interval default to 70 minutes, same as before. We can revisit these values later if needed.

sodonnel · 2023-03-21T10:59:32Z

hadoop-hdds/interface-client/src/main/proto/hdds.proto

@@ -451,6 +451,8 @@ message ContainerBalancerConfigurationProto {

    required bool shouldRun = 18;
    optional int32 nextIterationIndex = 19;
+


There is an extra blank line here - probably best to remove it.

sodonnel · 2023-03-21T10:59:50Z

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java

+    if (conf.getMoveReplicationTimeout().toMillis() >=
+        conf.getMoveTimeout().toMillis()) {
+      LOG.warn("hdds.container.balancer.move.replication.timeout {} should " +
+              "be lesser than hdds.container.balancer.move.timeout {}.",


typo: lesser -> less

sodonnel · 2023-03-21T10:59:58Z

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java

+          conf.getMoveTimeout().toMinutes());
+      throw new InvalidContainerBalancerConfigurationException(
+          "hdds.container.balancer.move.replication.timeout should " +
+          "be lesser than hdds.container.balancer.move.timeout.");


typo: lesser -> less

sodonnel · 2023-03-21T11:05:58Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/MoveManager.java

+  moveTimeout and replicationTimeout are set by ContainerBalancer.
+   */
+  private long moveTimeout = 1000 * 65 * 60;
+  private long replicationTimeout = 1000 * 50 * 60;
  private static final double MOVE_DEADLINE_FACTOR = 0.95;


I wonder if using a factor like this makes sense for these longer duration timeouts.

The idea I had, is that the datanode should timeout before SCM does, so that when SCM abandons the command, we know the DN has given up on it too.

If we have a replication timeout of 50 mins, then 95% of that is 47.5, so the DN will give up 2.5 mins before SCM does. Feels like the DN is then giving up too early. If we have a 10 minute timeout or a 60 minute timeout, the DN doesn't need to give up earlier for the longer timeout.

Perhaps we could take factor away from here completely, and then let RM decide what the DN timeout should be. That would simplify the API slightly into RM, as we only need to pass a single timeout.

Perhaps 30 seconds less than the SCM timeout for all values, rather than a percentage like we have now, but it could have a single configuration in RM, rather than having the factor defined here and also in RM.

We could have another small PR to change the factor if you think it will complicate this PR too much. I am happy either way.

If we have a replication timeout of 50 mins, then 95% of that is 47.5, so the DN will give up 2.5 mins before SCM does. Feels like the DN is then giving up too early. If we have a 10 minute timeout or a 60 minute timeout, the DN doesn't need to give up earlier for the longer timeout.

That's a good point. Created HDDS-8230 for fixing this.

sodonnel

LGTM

adoroszlai · 2023-03-21T18:30:17Z

Thanks @siddhantsangwan for the patch, @sodonnel for the review.

* master: (43 commits) HDDS-8148. Improve log for Pipeline creation failure (apache#4385) HDDS-7853. Add support for RemoveSCM in SCMRatisServer. (apache#4358) HDDS-8042. Display certificate issuer in cert list command. (apache#4429) HDDS-8189. [Snapshot] renamedKeyTable should only track keys in buckets that has at least one active snapshot. (apache#4436) HDDS-8154. Perf: Reuse Mac instances in S3 token validation (apache#4433) HDDS-8245. Info log for keyDeletingService when nonzero number of keys are deleted. (apache#4451) HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers (apache#4447) HDDS-8220. [Ozone-Streaming] Trigger volume check on IOException in StreamDataChannelBase (apache#4428) HDDS-8173. Fix to remove enrties from RocksDB after container gets deleted. (apache#4445) HDDS-7975. Rebalance acceptance tests (apache#4437) HDDS-8152. Reduce S3 acceptance test setup time (apache#4393) HDDS-8172. ECUnderReplicationHandler should consider commands already sent when processing the container (apache#4435) HDDS-7883. [Snapshot] Accommodate FSO, key renames and implement OMSnapshotPurgeRequest for SnapshotDeletingService (apache#4407) HDDS-8168. Make deadlines inside MoveManager for move commands configurable (apache#4415) HDDS-7918. EC: ECBlockReconstructedStripeInputStream should check for spare replicas before failing an index (apache#4441) HDDS-8222. EndpointBase#getBucket should handle BUCKET_NOT_FOUND (apache#4431) HDDS-8068. Fix Exception: JMXJsonServlet, getting attribute RatisRoles of Hadoop:service=OzoneManager. (apache#4352) HDDS-8139. Datanodes should not drop block delete transactions based on transaction ID (apache#4384) HDDS-8216. EC: OzoneClientConfig is overwritten in ECKeyOutputStream (apache#4425) HDDS-8054. Fix NPE in metrics for failed volume (apache#4340) ...

siddhantsangwan added 4 commits March 16, 2023 16:24

HDDS-8167. Inject MoveManager into ContainerBalancer

5ff4138

HDDS-8168. Make deadlines inside MoveManager for move commands config…

f495e2d

…urable

Merge branch 'master' into HDDS-8168

1f18a24

# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java

remove unused imports

9f125d7

siddhantsangwan requested review from sodonnel and adoroszlai March 16, 2023 19:10

adoroszlai reviewed Mar 20, 2023

View reviewed changes

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/MoveManager.java Outdated Show resolved Hide resolved

siddhantsangwan added 2 commits March 20, 2023 12:41

increase balancing.iteration.interval to 160 minutes

f8247c7

address review comment

65d1bbe

Changes:moveTimeout=65m, replicationTimeout=50m, iterationInterval=70m

afb28c2

sodonnel reviewed Mar 21, 2023

View reviewed changes

fix typos and delete blank line

0628847

sodonnel approved these changes Mar 21, 2023

View reviewed changes

adoroszlai merged commit 4a43f34 into apache:master Mar 21, 2023
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-8168. Make deadlines inside MoveManager for move commands configurable #4415

HDDS-8168. Make deadlines inside MoveManager for move commands configurable #4415

siddhantsangwan commented Mar 16, 2023

adoroszlai left a comment

siddhantsangwan commented Mar 20, 2023

kerneltime commented Mar 20, 2023

siddhantsangwan commented Mar 21, 2023

sodonnel Mar 21, 2023

sodonnel Mar 21, 2023

sodonnel Mar 21, 2023

sodonnel Mar 21, 2023

sodonnel Mar 21, 2023

siddhantsangwan Mar 21, 2023

sodonnel left a comment

adoroszlai commented Mar 21, 2023

		@@ -451,6 +451,8 @@ message ContainerBalancerConfigurationProto {

		required bool shouldRun = 18;
		optional int32 nextIterationIndex = 19;

HDDS-8168. Make deadlines inside MoveManager for move commands configurable #4415

HDDS-8168. Make deadlines inside MoveManager for move commands configurable #4415

Conversation

siddhantsangwan commented Mar 16, 2023

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

adoroszlai left a comment

Choose a reason for hiding this comment

siddhantsangwan commented Mar 20, 2023

kerneltime commented Mar 20, 2023

siddhantsangwan commented Mar 21, 2023

sodonnel Mar 21, 2023

Choose a reason for hiding this comment

sodonnel Mar 21, 2023

Choose a reason for hiding this comment

sodonnel Mar 21, 2023

Choose a reason for hiding this comment

sodonnel Mar 21, 2023

Choose a reason for hiding this comment

sodonnel Mar 21, 2023

Choose a reason for hiding this comment

siddhantsangwan Mar 21, 2023

Choose a reason for hiding this comment

sodonnel left a comment

Choose a reason for hiding this comment

adoroszlai commented Mar 21, 2023