HDDS-5733. Incorrect calculation of iteration related metrics in ContainerBalancer#2631
Conversation
|
@lokeshj1703 @JacksonYao287 please take a look! |
JacksonYao287
left a comment
There was a problem hiding this comment.
thanks @siddhantsangwan for the work!
| countDatanodesInvolvedPerIteration += 2; | ||
| // don't count a source that has been involved in move earlier | ||
| if (sourceToTargetMap.containsKey(source) || | ||
| selectedTargets.contains(source)) { | ||
| countDatanodesInvolvedPerIteration -= 1; | ||
| } | ||
| // don't count a target that has been involved in move earlier | ||
| if (selectedTargets.contains(moveSelection.getTargetNode())) { | ||
| countDatanodesInvolvedPerIteration -= 1; | ||
| } |
There was a problem hiding this comment.
| countDatanodesInvolvedPerIteration += 2; | |
| // don't count a source that has been involved in move earlier | |
| if (sourceToTargetMap.containsKey(source) || | |
| selectedTargets.contains(source)) { | |
| countDatanodesInvolvedPerIteration -= 1; | |
| } | |
| // don't count a target that has been involved in move earlier | |
| if (selectedTargets.contains(moveSelection.getTargetNode())) { | |
| countDatanodesInvolvedPerIteration -= 1; | |
| } | |
| // don't count a source that has been involved in move earlier | |
| if (sourceToTargetMap.containsKey(source) || | |
| selectedTargets.contains(source)) { | |
| countDatanodesInvolvedPerIteration++; | |
| } | |
| // don't count a target that has been involved in move earlier | |
| if (selectedTargets.contains(moveSelection.getTargetNode())) { | |
| countDatanodesInvolvedPerIteration++; | |
| } |
NIT
There was a problem hiding this comment.
The suggestion looks wrong semantically since we want to exclude re-counting a node that we have already counted earlier. For example in the suggested code, if both the if conditions are satisfied, the result will equal 2. But the correct result should be 0.
There was a problem hiding this comment.
sorry, i made a mistake here
| countDatanodesInvolvedPerIteration += 2; | |
| // don't count a source that has been involved in move earlier | |
| if (sourceToTargetMap.containsKey(source) || | |
| selectedTargets.contains(source)) { | |
| countDatanodesInvolvedPerIteration -= 1; | |
| } | |
| // don't count a target that has been involved in move earlier | |
| if (selectedTargets.contains(moveSelection.getTargetNode())) { | |
| countDatanodesInvolvedPerIteration -= 1; | |
| } | |
| // don't count a source that has been involved in move earlier | |
| if (!sourceToTargetMap.containsKey(source) && | |
| !selectedTargets.contains(source)) { | |
| countDatanodesInvolvedPerIteration++; | |
| } | |
| // don't count a target that has been involved in move earlier | |
| if (!selectedTargets.contains(moveSelection.getTargetNode())) { | |
| countDatanodesInvolvedPerIteration++; | |
| } |
JacksonYao287
left a comment
There was a problem hiding this comment.
thanks @siddhantsangwan for this work! LGTM +1
lokeshj1703
left a comment
There was a problem hiding this comment.
@siddhantsangwan Thanks for working on this! I have a few comments inline.
| this.sizeMovedPerIteration += container.getUsedBytes(); | ||
| this.countDatanodesInvolvedPerIteration += 2; | ||
| metrics.incrementMovedContainersNum(1); | ||
| LOG.info("Move completed for container {} to target {}", |
There was a problem hiding this comment.
Maybe we should also log the source dn here.
There was a problem hiding this comment.
This would involve a linear search to find the source dn for this ContainerMoveSelection every time. Should I implement it?
| @Metric(about = "The total amount of used space in GigaBytes that needs to " + | ||
| "be balanced.") | ||
| private LongMetric dataSizeToBalanceGB; | ||
| private double dataSizeToBalanceGB; |
There was a problem hiding this comment.
We might need to create a DoubleMetric. JsonAutoDetect seems to be used for visibility in LongMetric.
There was a problem hiding this comment.
Let's also check if double value is supported. We are using org.apache.hadoop.metrics2.annotation.Metric.
| // count source if it has not been involved in move earlier | ||
| if (!sourceToTargetMap.containsKey(source) && | ||
| !selectedTargets.contains(source)) { | ||
| countDatanodesInvolvedPerIteration += 1; | ||
| } | ||
| // count target if it has not been involved in move earlier | ||
| if (!selectedTargets.contains(moveSelection.getTargetNode())) { | ||
| countDatanodesInvolvedPerIteration += 1; | ||
| } |
There was a problem hiding this comment.
Do we need this since we are already setting this metric above?
There was a problem hiding this comment.
Yes. Counting is performed both during an iteration (to check max datanodes and size per iteration limits) and at the end of an iteration (to get correct values once moves have actually been performed).
# Conflicts: # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java
lokeshj1703
left a comment
There was a problem hiding this comment.
@siddhantsangwan Thanks for updating the PR! The changes look good to me.
|
@siddhantsangwan Thanks for the contribution! @JacksonYao287 Thanks for the review! I have committed the PR to master branch. |
What changes were proposed in this pull request?
ContainerBalancer incorrectly calculates
dataSizeBalancedGBandcountDatanodesInvolvedPerIteration. Datanodes involved are counted twice if they had been involved earlier.This Jira fixes these bugs.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5733
How was this patch tested?
TestContainerBalancerUT