New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YARN-11421] Graceful Decommission ignores launched containers and gets deactivated before timeout #5905
base: trunk
Are you sure you want to change the base?
Conversation
…ts deactivated before timeout During Graceful Decommission, a Node gets deactivated before timeout even though there are launched containers on that node. We have observed cases when graceful decommission signal is sent to node and Containers are launched at NodeManager and at the same time, in such cases ResourceManager moves the node from Decommissioning to Decommissioned state because launced containers are not checked in DeactivateNodeTransition. We will suggest waiting for yarn.resourcemanager.decommissioning-nodes-watcher.delay-ms to complete before marking node ready to be decommissioned. No delay if set to 0. Expire interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS. Open Source JIRA: https://issues.apache.org/jira/browse/YARN-11421
💔 -1 overall
This message was automatically generated. |
...hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
Outdated
Show resolved
Hide resolved
...hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
@abhishekd0907 We need to fix checkstyle issue. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@slfan1989 styling issues are fixed. Can you please check the PR again? |
@abhishekd0907 Thank you for your contribution! If there are no other comments, I will merge this PR into the trunk branch after 3 days. |
// expire interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS | ||
this.expireIntvl = Math.min(conf.getLong(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS, | ||
YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS), | ||
conf.getInt(YarnConfiguration.RM_DECOMMISSIONING_NODES_WATCHER_DELAY_MS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is a little hard to read, maybe extracking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in a separate method
@@ -126,6 +127,11 @@ public void init(Configuration conf) { | |||
YarnConfiguration.RM_DECOMMISSIONING_NODES_WATCHER_POLL_INTERVAL, | |||
YarnConfiguration | |||
.DEFAULT_RM_DECOMMISSIONING_NODES_WATCHER_POLL_INTERVAL); | |||
// expire interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS | |||
this.expireIntvl = Math.min(conf.getLong(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use getTimeDuration()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other time related confs like RM_AM_EXPIRY_INTERVAL_MS and others in YarnConfiguration are added as milliseconds long/int instead of strings converted to duration so keeping similar for consistency. Let me know if changing to String duration is a must for going forward with this PR
...src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java
Outdated
Show resolved
Hide resolved
...test/java/org/apache/hadoop/yarn/server/resourcemanager/TestDecommissioningNodesWatcher.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri I have handled your comments. please review again |
@goiri @slfan1989 please review again. |
.getSchedulerNode(rmNode.getNodeID()) | ||
.getCopiedListOfRunningContainers() | ||
.stream().anyMatch(RMContainer::isAMContainer); | ||
if (hasScheduledAMContainers) { | ||
LOG.info("Node " + rmNode.nodeId + " has AM containers scheduled on it." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use {} logger format.
// decommissioning nodes, but delay should not be more than RM_AM_EXPIRY_INTERVAL_MS | ||
private long setExpireInterval(Configuration conf) { | ||
return Math.min( | ||
conf.getInt(YarnConfiguration.RM_DECOMMISSIONING_NODES_WATCHER_DELAY_MS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use getTimeDuration()?
throws Exception { | ||
Configuration conf = new Configuration(); | ||
// decommission timeout is 10 min | ||
conf.set(YarnConfiguration.RM_NODE_GRACEFUL_DECOMMISSION_TIMEOUT, "600"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setInt or ideally setTimeDuration?
|
||
// we should still get WAIT_SCHEDULED_APPS as expiry time is not over | ||
NodeHealthStatus status = NodeHealthStatus.newInstance(true, "", | ||
System.currentTimeMillis() - 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation looks funny.
MockRM.finishAMAndVerifyAppState(app, rm, nm1, am); | ||
rm.waitForState(app.getApplicationId(), RMAppState.FINISHED); | ||
Assert.assertEquals(0, node1.getRunningApps().size()); | ||
watcher.update(node1, nodeStatus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we update this for if we don't assert later?
rm = new MockRM(conf); | ||
rm.start(); | ||
|
||
MockNM nm1 = rm.registerNode("host1:1234", 10240); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it 10*1024.
Open Source JIRA: https://issues.apache.org/jira/browse/YARN-11421
Description of PR
During Graceful Decommission, a Node gets deactivated before timeout even though there are launched containers on that node.
We have observed cases when graceful decommission signal is sent to node and Containers are launched at NodeManager and at the same time, in such cases ResourceManager moves the node from Decommissioning to Decommissioned state because launced containers are not checked in DecommissioningNodesWatcher.
We will suggest waiting for yarn.resourcemanager.decommissioning-nodes-watcher.delay-ms to complete before marking node ready to be decommissioned. No delay if set to 0. Expire interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS.
How was this patch tested?
Unit Tests Added
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?