-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16287. Support to make dfs.namenode.avoid.read.slow.datanode reconfigurable #3596
Conversation
… dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable
💔 -1 overall
This message was automatically generated. |
Fix the problems encountered above and update later. |
Update PR, Make the following changes:
|
💔 -1 overall
This message was automatically generated. |
Fix code style and remove useless code |
💔 -1 overall
This message was automatically generated. |
...hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
Show resolved
Hide resolved
@haiyang1987 Thanks for contribution, some comments: |
this.maxSlowPeerReportNodes = conf.getInt( | ||
DFSConfigKeys.DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY, | ||
DFSConfigKeys.DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_DEFAULT); | ||
this.slowPeerCollectionInterval = conf.getTimeDuration( | ||
DFSConfigKeys.DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_KEY, | ||
DFSConfigKeys.DFS_NAMENODE_SLOWPEER_COLLECT_INTERVAL_DEFAULT, | ||
TimeUnit.MILLISECONDS); | ||
if (slowPeerTracker != null && excludeSlowNodesEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this change is made, the SlowPeerCollector thread will be started regardless of whether we enable this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomscut Thank you for your review.
1.Current parameter 'dataNodePeerStatsEnabled' and 'excludeSlowNodesEnabled' decision SlowPeerCollector thread whether to start ,But it didn't take into account avoid SlowDataNodesForRead logic
2.So think about two phases:
a.The first is to start SlowPeerCollector thread
b.Second, you can control whether to enable read/write avoid slow datanode according to dynamic parameters
@@ -511,7 +505,16 @@ private boolean isInactive(DatanodeInfo datanode) { | |||
private boolean isSlowNode(String dnUuid) { | |||
return avoidSlowDataNodesForRead && slowNodesUuidSet.contains(dnUuid); | |||
} | |||
|
|||
|
|||
public void setAvoidSlowDataNodesForReadEnabled(boolean enable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to check whether the slowPeerTracker is started or we might not get the slow peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider slowNodesUuidSet is generated when the SlowPeerCollector thread is started,therefore it is logical to judge whether the dnUuid exists in the slowNodesUuidSet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @haiyang1987 for your comment. I think the logic right now is no problem.
I mean that only when excludeSlowNodesEnabled
is set to true
we startSlowPeerCollector
, and stopSlowPeerCollector
when excludeSlowNodesEnabled
is set to false
. There is no extra overhead. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tomscut for your comment. You have a point there.
But there is a problem,current parameter 'dataNodePeerStatsEnabled' and 'excludeSlowNodesEnabled' decision SlowPeerCollector thread whether to start ,But it didn't take into account avoid SlowDataNodesForRead logic.
And from the perspective of stability, the production environment of this function online can be consider divided into two stages:
The first is to start SlowPeerCollector thread, we can check whether the reported datanode is a slow node based on monitoring indicators.
Second, you can control whether to enable read/write avoid slow datanode according to dynamic parameters.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tomscut for your comment. You have a point there.
But there is a problem,current parameter 'dataNodePeerStatsEnabled' and 'excludeSlowNodesEnabled' decision SlowPeerCollector thread whether to start ,But it didn't take into account avoid SlowDataNodesForRead logic.
And from the perspective of stability, the production environment of this function online can be consider divided into two stages: The first is to start SlowPeerCollector thread, we can check whether the reported datanode is a slow node based on monitoring indicators. Second, you can control whether to enable read/write avoid slow datanode according to dynamic parameters.
What do you think?
Thanks @haiyang1987 for your explanation. It is OK from my side. Can we also make excludeSlowNodesEnabled
reconfigurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tomscut for your comment.
HDFS-15120 already support BlockPlacementPolicyDefault#dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable. But the current way I think is heavy. Do we need to support alone logic make excludeSlowNodesEnabled reconfigurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @haiyang1987 for your reminding, I didn't see the logic here before. I think we don't need to implement it separately if it's already supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tomscut for your comment.
Is there anything else that needs to be improved in the current code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tomscut for your comment. HDFS-15120 already support BlockPlacementPolicyDefault#dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable. But the current way I think is heavy. Do we need to support alone logic make excludeSlowNodesEnabled reconfigurable?
Hi @tomscut take a deep look at code discovery HDFS-15120, if we don't update 'dfs.block.replicator.classname', 'dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled' can't support reconfigurable.
so we need to support alone logic make excludeSlowNodesEnabled reconfigurable.
Consider create a new issue to solve this problem.
What do you think?
@ferhui Thank you for your review. |
💔 -1 overall
This message was automatically generated. |
final DatanodeManager datanodeManager = nameNode.namesystem | ||
.getBlockManager().getDatanodeManager(); | ||
|
||
// By default, avoidSlowDataNodesForRead is false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the standard of previous review, a period should be added after the comment.
nameNode.reconfigureProperty( | ||
DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY, Boolean.toString(true)); | ||
|
||
// After reconfigured, avoidSlowDataNodesForRead is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a period after the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tomscut for your comment. PR update
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@haiyang1987 Thanks for contribution. @tomscut Thanks for review! |
Description of PR
Support to make dfs.namenode.avoid.read.slow.datanode reconfigurable
Details: HDFS-16287
For code changes: