-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17052. Improve BlockPlacementPolicyRackFaultTolerant to avoid choose nodes failed when no enough Rack. #5759
Conversation
💔 -1 overall
This message was automatically generated. |
//BlockPlacementPolicyRackFaultTolerant#chooseEvenlyFromRemainingRacks
int bestEffortMaxNodesPerRack = maxNodesPerRack;
while (results.size() != totalReplicaExpected &&
numResultsOflastChoose != results.size()) {
// Exclude the chosen nodes
final Set<Node> newExcludeNodes = new HashSet<>();
for (DatanodeStorageInfo resultStorage : results) {
addToExcludedNodes(resultStorage.getDatanodeDescriptor(),
newExcludeNodes);
}
LOG.trace("Chosen nodes: {}", results);
LOG.trace("Excluded nodes: {}", excludedNodes);
LOG.trace("New Excluded nodes: {}", newExcludeNodes);
final int numOfReplicas = totalReplicaExpected - results.size();
numResultsOflastChoose = results.size();
try {
chooseOnce(numOfReplicas, writer, newExcludeNodes, blocksize,
++bestEffortMaxNodesPerRack, results, avoidStaleNodes,
storageTypes);
} catch (NotEnoughReplicasException nere) {
lastException = nere;
} finally {
excludedNodes.addAll(newExcludeNodes);
}
} Since totalNumOfReplicas < numOfRacks, maxNodesPerRack initial to 1. If after incrementing bestEffortMaxNodesPerRack++, BlockPlacementPolicyRackFaultTolerant#chooseOnce still cannot select a new node to add to the results, the 'while' loop will exit because numResultsOflastChoose != results.size() is false. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Great catch here! It's make sense to me. I have some thoughts to discuss with you. The solution here may involve multiple calls to
|
Thank you for your review. We appreciate your suggestion. However, in situations where the number of available racks is insufficient to meet the requirements of the Erasure Coding storage type, each write operation would trigger the invocation of the |
@ayushtkn @Hexiaoqiao @zhangshuyan0 Would you be willing to discuss this issue together and provide me with some advice? |
It makes sense. We should try to minimize the impact on |
@zhtttylz How about combining the above two solutions like this? May avoid the drawbacks of both options.
|
@zhangshuyan0 @zhtttylz Thank you very much for your contribution! I am not very familiar with EC, but this flow chart can help me understand the scenario described by pr. I think @zhangshuyan0 @zhtttylz are co-authors of this PR. |
💔 -1 overall
This message was automatically generated. |
String rackName = dsInfo.getDatanodeDescriptor().getNetworkLocation(); | ||
nodesPerRack.merge(rackName, 1, Integer::sum); | ||
} | ||
bestEffortMaxNodesPerRack = Collections.max(nodesPerRack.values()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to introduce infinite loops here? If each rack already has one chosen node and bestEffortMaxNodesPerRack
is 2, and no datanode can be chosen now, then bestEffortMaxNodesPerRack
will change to 1 after line 201, which may cause an infinite loop. So the calculation of the maximum value should consider the old value of bestEffortMaxNodesPerRack
to insure increase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback. I will make the necessary code changes as soon as possible.
Thanks to involve me here. Skim the PR (seem very common logic), I am wonder if this issue is only related with EC feature? or will it also impact on replication scenario? |
@Hexiaoqiao The problem solved by this PR is only related to |
@zhangshuyan0's explanation is excellent, and I have greatly benefited from it. I would like to provide a brief supplement. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch and patch here. And I was deep impressed by this PR's descriptions and comments. Thanks both @zhtttylz and @zhangshuyan0 .
Totally LGTM, just leave some nit comments inline, JFYI.
Another way, I think this issue does not only impact on EC feature, so suggest that we should change the title to one common issue, such as 'Improve BlockPlacementPolicyRackFaultTolerant to avoid choose nodes failed when no enough Rack.'?
final HdfsConfiguration conf = new HdfsConfiguration(); | ||
conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1); | ||
|
||
// nine disk node eleven archive node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a capital letter at the beginning of the sentences and period at the end of annotation. (L525,L533,L556,L563)
@@ -192,11 +192,23 @@ private void chooseEvenlyFromRemainingRacks(Node writer, | |||
} finally { | |||
excludedNodes.addAll(newExcludeNodes); | |||
} | |||
if (numResultsOflastChoose == results.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be better to add some annotation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, I'll make the required code changes promptly
for (int numNodes : nodesPerRack.values()) { | ||
if (numNodes > bestEffortMaxNodesPerRack) { | ||
bestEffortMaxNodesPerRack = numNodes; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 201~203 get max value, maybe we can use the following code to make it more clear and concise.
bestEffortMaxNodesPerRack =
Math.max(bestEffortMaxNodesPerRack, Collections.max(nodesPerRack.values()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback. I'll prioritize implementing the required code changes promptly.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…ype rack NOT enough
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1.
Committed to trunk. Thanks both @zhtttylz , @zhangshuyan0 , @slfan1989 and @whbing for your works. |
@Hexiaoqiao @zhangshuyan0 @slfan1989 @whbing Thank you for your assistance in reviewing the code! |
…oose nodes failed when no enough Rack. (apache#5759). Contributed by Hualong Zhang and Shuyan Zhang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
JIRA: HDFS-17052. Improve BlockPlacementPolicyRackFaultTolerant to avoid choose nodes failed when no enough Rack.