HDDS-5712. make it configurable to trigger refresh datanode usage info before start a new balance iteration by JacksonYao287 · Pull Request #2944 · apache/ozone

JacksonYao287 · 2021-12-23T09:42:50Z

What changes were proposed in this pull request?

make it configurable to trigger refresh datanode usage info before start a new balance iteration

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5712

How was this patch tested?

ut

JacksonYao287 · 2021-12-24T02:59:29Z

@lokeshj1703 @siddhantsangwan can you please take a look? thanks

lokeshj1703

@JacksonYao287 Thanks for working on this! The changes look good to me.
Could we also add a UT in SCM to check the functionality of refresh command?

…ance iteration

JacksonYao287 · 2022-01-21T08:48:03Z

thanks @lokeshj1703 for the review! will add UT soon

siddhantsangwan

@JacksonYao287 Looks good overall! I've added some review comments.

siddhantsangwan · 2022-02-14T04:57:22Z

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java

-   * @param containerManager   ContainerManager
-   * @param replicationManager ReplicationManager
-   * @param ozoneConfiguration OzoneConfiguration
+   * @param scm        the storage container manager


NIT: extra whitespace

siddhantsangwan · 2022-02-14T04:59:23Z

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java

+    this.containerManager = scm.getContainerManager();
+    this.replicationManager = scm.getReplicationManager();
+    this.ozoneConfiguration = scm.getConfiguration();
+    this.config = new ContainerBalancerConfiguration();


This should probably remain:

this.config = ozoneConfiguration.getObject(ContainerBalancerConfiguration.class);

thanks , will fix this

siddhantsangwan · 2022-02-14T05:00:25Z

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java

+        // this is helpful for container balancer to make more appropriate
+        // decisions. this will increase the disk io load of data nodes, so
+        // please enable it with caution.
+        sendRefreshUsageCommandToAllDNs();


Since balancer will only use healthy, in-service DNs, do we need to trigger DU in all the DNs?

good point, will fix this, thanks

siddhantsangwan · 2022-02-15T12:16:07Z

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java

+            // reporting back make it like this for now, a more suitable
+            // value. can be set in the future if needed
+            wait(3 * nodeReportInterval);
+          } catch (InterruptedException e) {


Forgot to mention, we'll also need to ensure the interrupted state of the thread isn't lost. Could add Thread.currentThread().interrupt();

good point , will add this!

…ding refresh command

lokeshj1703

@JacksonYao287 Thanks for updating the PR! I have few minor comments.

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java

...hadoop/ozone/container/common/statemachine/commandhandler/TestRefreshVolumeUsageHandler.java

JacksonYao287 · 2022-02-17T12:25:21Z

@lokeshj1703 thanks for the review, i have updated this patch , please take a look

siddhantsangwan

@JacksonYao287 Thanks for updating! I have a few minor comments.

siddhantsangwan · 2022-02-22T06:50:26Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java

+    getAllNodes().stream().filter(dn -> {
+      boolean isHealthy = false;
+      try {
+        isHealthy = getNodeStatus(dn).isHealthy();
+      } catch (NodeNotFoundException nnfe) {
+        LOG.warn("datanode {} is not found", dn.getIpAddress());
+      }
+      return isHealthy;


Can we use the getNodes( NodeOperationalState opState, NodeState health) method here? Is there a reason for not filtering DNs with NodeOperationalState as well as NodeState?

thanks for the comment , will fix it

siddhantsangwan · 2022-02-22T07:28:57Z

.../main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerConfiguration.java

+      description = "whether to send command to all the data nodes to run du " +
+          "immediately before starting a balance iteration. note that " +
+          "running du is very time consuming , especially when the disk " +
+          "usage rate of a data node is very high")


We can update the description to all healthy, in-service datanodes or something similar.

sure , will fix it!

lokeshj1703

@JacksonYao287 Thanks for updating the PR! The changes look good to me. +1
Will merge once pending comments are addressed.

JacksonYao287 · 2022-02-25T10:03:30Z

@siddhantsangwan thanks for the review, i have updated this patch according to you comments, please take a look !

siddhantsangwan

@JacksonYao287 Thanks for updating. Looks good to me!

lokeshj1703 · 2022-03-03T06:46:51Z

@JacksonYao287 Thanks for the contribution! @siddhantsangwan Thanks for the reviews! I have committed the PR to master branch.

JacksonYao287 force-pushed the HDDS-5712 branch from f0fe2e3 to df2b9af Compare December 28, 2021 04:39

lokeshj1703 reviewed Jan 10, 2022

View reviewed changes

HDDS-5712. trigger refresh datanode usage info before start a new bal…

2d56fdd

…ance iteration

JacksonYao287 force-pushed the HDDS-5712 branch from df124e7 to 2d56fdd Compare January 21, 2022 08:44

siddhantsangwan reviewed Feb 14, 2022

View reviewed changes

Jackson Yao added 2 commits February 15, 2022 17:51

Merge remote-tracking branch 'origin/master' into HDDS-5712

74f573e

fix comments

cc75e47

JacksonYao287 force-pushed the HDDS-5712 branch from 2442a5d to cc75e47 Compare February 15, 2022 11:45

siddhantsangwan reviewed Feb 15, 2022

View reviewed changes

Jackson Yao added 8 commits February 15, 2022 20:28

make sure the interrupted state is not lost

236adbe

add integration test for refresh volume usage

8381bf3

add check to show than we can not get the latest usageinfo before sen…

3e83300

…ding refresh command

fix checkstyle

4cbfdca

remove sleep

9e41cae

fix error

c45b038

triger CI

97b4991

fix error

d1c72e9

lokeshj1703 reviewed Feb 17, 2022

View reviewed changes

...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java Outdated Show resolved Hide resolved

...hadoop/ozone/container/common/statemachine/commandhandler/TestRefreshVolumeUsageHandler.java Outdated Show resolved Hide resolved

update according to comments

556d585

Jackson Yao added 4 commits February 18, 2022 00:20

trigger CI

536cdce

remove log

748afe2

update

45a27d9

fix checkstyle

8f5120f

JacksonYao287 requested a review from lokeshj1703 February 18, 2022 06:35

add double check of usage info

c16efb0

siddhantsangwan reviewed Feb 22, 2022

View reviewed changes

lokeshj1703 reviewed Feb 25, 2022

View reviewed changes

Jackson Yao added 2 commits February 25, 2022 17:51

Merge remote-tracking branch 'origin/master' into HDDS-5712

9fc5901

fix comments

be812bf

siddhantsangwan approved these changes Feb 25, 2022

View reviewed changes

lokeshj1703 merged commit d0cde3a into apache:master Mar 3, 2022

JacksonYao287 deleted the HDDS-5712 branch March 4, 2022 02:05

Conversation

JacksonYao287 commented Dec 23, 2021

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

JacksonYao287 commented Dec 24, 2021

Uh oh!

lokeshj1703 left a comment

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 commented Jan 21, 2022

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lokeshj1703 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JacksonYao287 commented Feb 17, 2022

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lokeshj1703 left a comment

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 commented Feb 25, 2022

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

lokeshj1703 commented Mar 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants