Skip to content

HDDS-5733. Incorrect calculation of iteration related metrics in ContainerBalancer#2631

Merged
lokeshj1703 merged 9 commits intoapache:masterfrom
siddhantsangwan:HDDS-5733
Oct 8, 2021
Merged

HDDS-5733. Incorrect calculation of iteration related metrics in ContainerBalancer#2631
lokeshj1703 merged 9 commits intoapache:masterfrom
siddhantsangwan:HDDS-5733

Conversation

@siddhantsangwan
Copy link
Contributor

What changes were proposed in this pull request?

ContainerBalancer incorrectly calculates dataSizeBalancedGB and countDatanodesInvolvedPerIteration. Datanodes involved are counted twice if they had been involved earlier.

This Jira fixes these bugs.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5733

How was this patch tested?

TestContainerBalancer UT

@siddhantsangwan
Copy link
Contributor Author

@lokeshj1703 @JacksonYao287 please take a look!

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @siddhantsangwan for the work!

Comment on lines 597 to 606
countDatanodesInvolvedPerIteration += 2;
// don't count a source that has been involved in move earlier
if (sourceToTargetMap.containsKey(source) ||
selectedTargets.contains(source)) {
countDatanodesInvolvedPerIteration -= 1;
}
// don't count a target that has been involved in move earlier
if (selectedTargets.contains(moveSelection.getTargetNode())) {
countDatanodesInvolvedPerIteration -= 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
countDatanodesInvolvedPerIteration += 2;
// don't count a source that has been involved in move earlier
if (sourceToTargetMap.containsKey(source) ||
selectedTargets.contains(source)) {
countDatanodesInvolvedPerIteration -= 1;
}
// don't count a target that has been involved in move earlier
if (selectedTargets.contains(moveSelection.getTargetNode())) {
countDatanodesInvolvedPerIteration -= 1;
}
// don't count a source that has been involved in move earlier
if (sourceToTargetMap.containsKey(source) ||
selectedTargets.contains(source)) {
countDatanodesInvolvedPerIteration++;
}
// don't count a target that has been involved in move earlier
if (selectedTargets.contains(moveSelection.getTargetNode())) {
countDatanodesInvolvedPerIteration++;
}

NIT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion looks wrong semantically since we want to exclude re-counting a node that we have already counted earlier. For example in the suggested code, if both the if conditions are satisfied, the result will equal 2. But the correct result should be 0.

Copy link
Contributor

@JacksonYao287 JacksonYao287 Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i made a mistake here

Suggested change
countDatanodesInvolvedPerIteration += 2;
// don't count a source that has been involved in move earlier
if (sourceToTargetMap.containsKey(source) ||
selectedTargets.contains(source)) {
countDatanodesInvolvedPerIteration -= 1;
}
// don't count a target that has been involved in move earlier
if (selectedTargets.contains(moveSelection.getTargetNode())) {
countDatanodesInvolvedPerIteration -= 1;
}
// don't count a source that has been involved in move earlier
if (!sourceToTargetMap.containsKey(source) &&
!selectedTargets.contains(source)) {
countDatanodesInvolvedPerIteration++;
}
// don't count a target that has been involved in move earlier
if (!selectedTargets.contains(moveSelection.getTargetNode())) {
countDatanodesInvolvedPerIteration++;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @siddhantsangwan for this work! LGTM +1

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan Thanks for working on this! I have a few comments inline.

this.sizeMovedPerIteration += container.getUsedBytes();
this.countDatanodesInvolvedPerIteration += 2;
metrics.incrementMovedContainersNum(1);
LOG.info("Move completed for container {} to target {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also log the source dn here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would involve a linear search to find the source dn for this ContainerMoveSelection every time. Should I implement it?

@Metric(about = "The total amount of used space in GigaBytes that needs to " +
"be balanced.")
private LongMetric dataSizeToBalanceGB;
private double dataSizeToBalanceGB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to create a DoubleMetric. JsonAutoDetect seems to be used for visibility in LongMetric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also check if double value is supported. We are using org.apache.hadoop.metrics2.annotation.Metric.

Comment on lines +597 to +605
// count source if it has not been involved in move earlier
if (!sourceToTargetMap.containsKey(source) &&
!selectedTargets.contains(source)) {
countDatanodesInvolvedPerIteration += 1;
}
// count target if it has not been involved in move earlier
if (!selectedTargets.contains(moveSelection.getTargetNode())) {
countDatanodesInvolvedPerIteration += 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this since we are already setting this metric above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Counting is performed both during an iteration (to check max datanodes and size per iteration limits) and at the end of an iteration (to get correct values once moves have actually been performed).

# Conflicts:
#	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java
Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan Thanks for updating the PR! The changes look good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Divide by GB.

@lokeshj1703 lokeshj1703 merged commit db97a32 into apache:master Oct 8, 2021
@lokeshj1703
Copy link
Contributor

@siddhantsangwan Thanks for the contribution! @JacksonYao287 Thanks for the review! I have committed the PR to master branch.

@siddhantsangwan siddhantsangwan deleted the HDDS-5733 branch October 8, 2021 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants