Skip to content

HDDS-5712. make it configurable to trigger refresh datanode usage info before start a new balance iteration#2944

Merged
lokeshj1703 merged 19 commits intoapache:masterfrom
JacksonYao287:HDDS-5712
Mar 3, 2022
Merged

HDDS-5712. make it configurable to trigger refresh datanode usage info before start a new balance iteration#2944
lokeshj1703 merged 19 commits intoapache:masterfrom
JacksonYao287:HDDS-5712

Conversation

@JacksonYao287
Copy link
Contributor

What changes were proposed in this pull request?

make it configurable to trigger refresh datanode usage info before start a new balance iteration

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5712

How was this patch tested?

ut

@JacksonYao287
Copy link
Contributor Author

@lokeshj1703 @siddhantsangwan can you please take a look? thanks

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for working on this! The changes look good to me.
Could we also add a UT in SCM to check the functionality of refresh command?

@JacksonYao287
Copy link
Contributor Author

thanks @lokeshj1703 for the review! will add UT soon

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Looks good overall! I've added some review comments.

* @param containerManager ContainerManager
* @param replicationManager ReplicationManager
* @param ozoneConfiguration OzoneConfiguration
* @param scm the storage container manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: extra whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

this.containerManager = scm.getContainerManager();
this.replicationManager = scm.getReplicationManager();
this.ozoneConfiguration = scm.getConfiguration();
this.config = new ContainerBalancerConfiguration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably remain:

this.config = ozoneConfiguration.getObject(ContainerBalancerConfiguration.class);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks , will fix this

// this is helpful for container balancer to make more appropriate
// decisions. this will increase the disk io load of data nodes, so
// please enable it with caution.
sendRefreshUsageCommandToAllDNs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since balancer will only use healthy, in-service DNs, do we need to trigger DU in all the DNs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will fix this, thanks

// reporting back make it like this for now, a more suitable
// value. can be set in the future if needed
wait(3 * nodeReportInterval);
} catch (InterruptedException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention, we'll also need to ensure the interrupted state of the thread isn't lost. Could add Thread.currentThread().interrupt();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point , will add this!

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for updating the PR! I have few minor comments.

@JacksonYao287
Copy link
Contributor Author

@lokeshj1703 thanks for the review, i have updated this patch , please take a look

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for updating! I have a few minor comments.

Comment on lines 1027 to 1034
getAllNodes().stream().filter(dn -> {
boolean isHealthy = false;
try {
isHealthy = getNodeStatus(dn).isHealthy();
} catch (NodeNotFoundException nnfe) {
LOG.warn("datanode {} is not found", dn.getIpAddress());
}
return isHealthy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the getNodes( NodeOperationalState opState, NodeState health) method here? Is there a reason for not filtering DNs with NodeOperationalState as well as NodeState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the comment , will fix it

Comment on lines 128 to 131
description = "whether to send command to all the data nodes to run du " +
"immediately before starting a balance iteration. note that " +
"running du is very time consuming , especially when the disk " +
"usage rate of a data node is very high")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update the description to all healthy, in-service datanodes or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure , will fix it!

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for updating the PR! The changes look good to me. +1
Will merge once pending comments are addressed.

@JacksonYao287
Copy link
Contributor Author

@siddhantsangwan thanks for the review, i have updated this patch according to you comments, please take a look !

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonYao287 Thanks for updating. Looks good to me!

@lokeshj1703 lokeshj1703 merged commit d0cde3a into apache:master Mar 3, 2022
@lokeshj1703
Copy link
Contributor

@JacksonYao287 Thanks for the contribution! @siddhantsangwan Thanks for the reviews! I have committed the PR to master branch.

@JacksonYao287 JacksonYao287 deleted the HDDS-5712 branch March 4, 2022 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants