[YARN-11421] Graceful Decommission ignores launched containers and gets deactivated before timeout #5905

abhishekd0907 · 2023-07-29T12:45:19Z

Open Source JIRA: https://issues.apache.org/jira/browse/YARN-11421

Description of PR

During Graceful Decommission, a Node gets deactivated before timeout even though there are launched containers on that node.

We have observed cases when graceful decommission signal is sent to node and Containers are launched at NodeManager and at the same time, in such cases ResourceManager moves the node from Decommissioning to Decommissioned state because launced containers are not checked in DecommissioningNodesWatcher.

We will suggest waiting for yarn.resourcemanager.decommissioning-nodes-watcher.delay-ms to complete before marking node ready to be decommissioned. No delay if set to 0. Expire interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS.

How was this patch tested?

Unit Tests Added

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

…ts deactivated before timeout During Graceful Decommission, a Node gets deactivated before timeout even though there are launched containers on that node. We have observed cases when graceful decommission signal is sent to node and Containers are launched at NodeManager and at the same time, in such cases ResourceManager moves the node from Decommissioning to Decommissioned state because launced containers are not checked in DeactivateNodeTransition. We will suggest waiting for yarn.resourcemanager.decommissioning-nodes-watcher.delay-ms to complete before marking node ready to be decommissioned. No delay if set to 0. Expire interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS. Open Source JIRA: https://issues.apache.org/jira/browse/YARN-11421

hadoop-yetus · 2023-07-29T15:31:07Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 58s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 4s		Maven dependency ordering for branch
+1 💚	mvninstall	32m 39s		trunk passed
+1 💚	compile	7m 54s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	7m 18s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	2m 8s		trunk passed
+1 💚	mvnsite	4m 18s		trunk passed
+1 💚	javadoc	4m 12s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	3m 57s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+0 🆗	spotbugs	0m 44s		branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no spotbugs output file (spotbugsXml.xml)
+1 💚	shadedclient	35m 1s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 33s		Maven dependency ordering for patch
-1 ❌	mvninstall	0m 34s	/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt	hadoop-yarn-server-resourcemanager in the patch failed.
-1 ❌	compile	2m 31s	/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt	hadoop-yarn in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.
-1 ❌	javac	2m 31s	/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt	hadoop-yarn in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.
-1 ❌	compile	2m 20s	/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.txt	hadoop-yarn in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.
-1 ❌	javac	2m 20s	/patch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.txt	hadoop-yarn in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09.
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	1m 35s	/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt	hadoop-yarn-project/hadoop-yarn: The patch generated 10 new + 202 unchanged - 0 fixed = 212 total (was 202)
-1 ❌	mvnsite	0m 37s	/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt	hadoop-yarn-server-resourcemanager in the patch failed.
+1 💚	javadoc	2m 26s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	2m 23s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
-1 ❌	spotbugs	0m 36s	/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt	hadoop-yarn-server-resourcemanager in the patch failed.
+0 🆗	spotbugs	0m 21s		hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site has no data from spotbugs
-1 ❌	shadedclient	9m 3s		patch has errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	0m 59s		hadoop-yarn-api in the patch passed.
+1 💚	unit	5m 37s		hadoop-yarn-common in the patch passed.
-1 ❌	unit	0m 39s	/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt	hadoop-yarn-server-resourcemanager in the patch failed.
+1 💚	unit	0m 18s		hadoop-yarn-site in the patch passed.
+1 💚	asflicense	0m 38s		The patch does not generate ASF License warnings.
		164m 13s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/1/artifact/out/Dockerfile
GITHUB PR	#5905
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname	Linux 58ecb166ba98 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `64ec927`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/1/testReport/
Max. process+thread count	568 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

...hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java

...src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java

hadoop-yetus · 2023-07-31T12:33:49Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 57s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 2s		Maven dependency ordering for branch
+1 💚	mvninstall	35m 21s		trunk passed
+1 💚	compile	7m 49s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	compile	7m 21s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	2m 6s		trunk passed
+1 💚	mvnsite	4m 16s		trunk passed
+1 💚	javadoc	4m 12s		trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	3m 56s		trunk passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+0 🆗	spotbugs	0m 45s		branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no spotbugs output file (spotbugsXml.xml)
+1 💚	shadedclient	34m 21s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 32s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 25s		the patch passed
+1 💚	compile	6m 56s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javac	6m 56s		the patch passed
+1 💚	compile	7m 15s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	7m 15s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	1m 52s	/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt	hadoop-yarn-project/hadoop-yarn: The patch generated 10 new + 202 unchanged - 0 fixed = 212 total (was 202)
+1 💚	mvnsite	3m 56s		the patch passed
+1 💚	javadoc	3m 44s		the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1
+1 💚	javadoc	3m 23s		the patch passed with JDK Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
+0 🆗	spotbugs	0m 38s		hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site has no data from spotbugs
+1 💚	shadedclient	35m 17s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	1m 20s		hadoop-yarn-api in the patch passed.
+1 💚	unit	5m 54s		hadoop-yarn-common in the patch passed.
+1 💚	unit	104m 45s		hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚	unit	0m 40s		hadoop-yarn-site in the patch passed.
+1 💚	asflicense	1m 5s		The patch does not generate ASF License warnings.
		315m 28s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/2/artifact/out/Dockerfile
GITHUB PR	#5905
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname	Linux 536d233b2286 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `6312eb5`
Default Java	Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/2/testReport/
Max. process+thread count	950 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

slfan1989 · 2023-08-13T23:12:07Z

@abhishekd0907 We need to fix checkstyle issue.

hadoop-yetus · 2023-08-14T22:38:46Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 58s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 33s		Maven dependency ordering for branch
+1 💚	mvninstall	31m 53s		trunk passed
+1 💚	compile	7m 46s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	compile	7m 10s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	2m 1s		trunk passed
+1 💚	mvnsite	4m 19s		trunk passed
+1 💚	javadoc	4m 10s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	3m 58s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+0 🆗	spotbugs	0m 42s		branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no spotbugs output file (spotbugsXml.xml)
+1 💚	shadedclient	33m 45s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 33s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 28s		the patch passed
+1 💚	compile	6m 56s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javac	6m 56s		the patch passed
+1 💚	compile	7m 4s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	7m 4s		the patch passed
-1 ❌	blanks	0m 0s	/blanks-eol.txt	The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️	checkstyle	1m 56s	/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt	hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 202 unchanged - 0 fixed = 205 total (was 202)
+1 💚	mvnsite	3m 56s		the patch passed
+1 💚	javadoc	3m 37s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	3m 33s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+0 🆗	spotbugs	0m 39s		hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site has no data from spotbugs
+1 💚	shadedclient	35m 2s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	1m 18s		hadoop-yarn-api in the patch passed.
+1 💚	unit	5m 57s		hadoop-yarn-common in the patch passed.
+1 💚	unit	105m 18s		hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚	unit	0m 40s		hadoop-yarn-site in the patch passed.
+1 💚	asflicense	1m 7s		The patch does not generate ASF License warnings.
		312m 58s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/3/artifact/out/Dockerfile
GITHUB PR	#5905
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname	Linux 3a66aee7dd3d 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `4adedca`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/3/testReport/
Max. process+thread count	1011 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-08-15T08:43:03Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 59s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	15m 14s		Maven dependency ordering for branch
+1 💚	mvninstall	32m 19s		trunk passed
+1 💚	compile	7m 45s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	compile	7m 19s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	2m 2s		trunk passed
+1 💚	mvnsite	4m 21s		trunk passed
+1 💚	javadoc	4m 10s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	3m 57s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+0 🆗	spotbugs	0m 45s		branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no spotbugs output file (spotbugsXml.xml)
+1 💚	shadedclient	33m 58s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 33s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 24s		the patch passed
+1 💚	compile	7m 1s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javac	7m 1s		the patch passed
+1 💚	compile	7m 7s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	7m 7s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 55s		the patch passed
+1 💚	mvnsite	3m 50s		the patch passed
+1 💚	javadoc	3m 39s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	3m 33s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+0 🆗	spotbugs	0m 39s		hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site has no data from spotbugs
+1 💚	shadedclient	34m 9s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	1m 20s		hadoop-yarn-api in the patch passed.
+1 💚	unit	5m 55s		hadoop-yarn-common in the patch passed.
+1 💚	unit	107m 17s		hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚	unit	0m 39s		hadoop-yarn-site in the patch passed.
+1 💚	asflicense	1m 4s		The patch does not generate ASF License warnings.
		314m 32s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/4/artifact/out/Dockerfile
GITHUB PR	#5905
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname	Linux 5db661ab174b 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `5db1905`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/4/testReport/
Max. process+thread count	955 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

abhishekd0907 · 2023-08-15T08:51:42Z

@slfan1989 styling issues are fixed. Can you please check the PR again?

slfan1989 · 2023-08-15T15:37:52Z

@abhishekd0907 Thank you for your contribution! If there are no other comments, I will merge this PR into the trunk branch after 3 days.

goiri · 2023-08-15T22:11:25Z

...src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java

+    // expire interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS
+    this.expireIntvl = Math.min(conf.getLong(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS,
+            YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS),
+    conf.getInt(YarnConfiguration.RM_DECOMMISSIONING_NODES_WATCHER_DELAY_MS,


This code is a little hard to read, maybe extracking?

added in a separate method

goiri · 2023-08-15T22:11:37Z

...src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java

@@ -126,6 +127,11 @@ public void init(Configuration conf) {
        YarnConfiguration.RM_DECOMMISSIONING_NODES_WATCHER_POLL_INTERVAL,
        YarnConfiguration
          .DEFAULT_RM_DECOMMISSIONING_NODES_WATCHER_POLL_INTERVAL);
+    // expire interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS
+    this.expireIntvl = Math.min(conf.getLong(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS,


Can we use getTimeDuration()?

other time related confs like RM_AM_EXPIRY_INTERVAL_MS and others in YarnConfiguration are added as milliseconds long/int instead of strings converted to duration so keeping similar for consistency. Let me know if changing to String duration is a must for going forward with this PR

...src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java

...test/java/org/apache/hadoop/yarn/server/resourcemanager/TestDecommissioningNodesWatcher.java

hadoop-yetus · 2023-08-20T16:49:54Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 56s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	13m 59s		Maven dependency ordering for branch
+1 💚	mvninstall	31m 43s		trunk passed
+1 💚	compile	7m 47s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	compile	7m 21s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	2m 0s		trunk passed
+1 💚	mvnsite	4m 21s		trunk passed
+1 💚	javadoc	4m 12s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	3m 49s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+0 🆗	spotbugs	0m 45s		branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no spotbugs output file (spotbugsXml.xml)
+1 💚	shadedclient	34m 46s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 33s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 26s		the patch passed
+1 💚	compile	7m 5s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javac	7m 5s		the patch passed
+1 💚	compile	7m 11s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	7m 11s		the patch passed
-1 ❌	blanks	0m 0s	/blanks-eol.txt	The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	checkstyle	1m 53s		the patch passed
+1 💚	mvnsite	3m 55s		the patch passed
+1 💚	javadoc	3m 41s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	3m 36s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+0 🆗	spotbugs	0m 38s		hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site has no data from spotbugs
+1 💚	shadedclient	34m 35s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	1m 14s		hadoop-yarn-api in the patch passed.
+1 💚	unit	5m 55s		hadoop-yarn-common in the patch passed.
+1 💚	unit	105m 7s		hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚	unit	0m 39s		hadoop-yarn-site in the patch passed.
+1 💚	asflicense	1m 6s		The patch does not generate ASF License warnings.
		312m 16s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/5/artifact/out/Dockerfile
GITHUB PR	#5905
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname	Linux 98cc4ed1d6b6 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `1d9c17c`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/5/testReport/
Max. process+thread count	939 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-08-20T22:39:55Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 59s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 40s		Maven dependency ordering for branch
+1 💚	mvninstall	31m 55s		trunk passed
+1 💚	compile	7m 52s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	compile	7m 20s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	2m 1s		trunk passed
+1 💚	mvnsite	4m 26s		trunk passed
+1 💚	javadoc	4m 10s		trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	4m 2s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+0 🆗	spotbugs	0m 41s		branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no spotbugs output file (spotbugsXml.xml)
+1 💚	shadedclient	34m 25s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 33s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 26s		the patch passed
+1 💚	compile	6m 59s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javac	6m 59s		the patch passed
+1 💚	compile	7m 10s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	7m 10s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 58s		the patch passed
+1 💚	mvnsite	3m 52s		the patch passed
+1 💚	javadoc	3m 41s		the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚	javadoc	3m 36s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+0 🆗	spotbugs	0m 40s		hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site has no data from spotbugs
+1 💚	shadedclient	34m 7s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	1m 18s		hadoop-yarn-api in the patch passed.
+1 💚	unit	5m 55s		hadoop-yarn-common in the patch passed.
+1 💚	unit	105m 2s		hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚	unit	0m 40s		hadoop-yarn-site in the patch passed.
+1 💚	asflicense	1m 7s		The patch does not generate ASF License warnings.
		312m 25s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/6/artifact/out/Dockerfile
GITHUB PR	#5905
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname	Linux e4d18a0b9da5 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `e933d60`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/6/testReport/
Max. process+thread count	975 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5905/6/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

abhishekd0907 · 2023-08-21T04:55:10Z

@goiri I have handled your comments. please review again

abhishekd0907 · 2023-08-28T13:56:55Z

@goiri @slfan1989 please review again.

goiri · 2023-08-31T22:43:18Z

...cemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java

          .getSchedulerNode(rmNode.getNodeID())
          .getCopiedListOfRunningContainers()
          .stream().anyMatch(RMContainer::isAMContainer);
+      if (hasScheduledAMContainers) {
+        LOG.info("Node " + rmNode.nodeId + " has AM containers scheduled on it."


Use {} logger format.

goiri · 2023-08-31T22:43:33Z

...src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java

+  // decommissioning nodes, but delay should not be more than RM_AM_EXPIRY_INTERVAL_MS
+  private long setExpireInterval(Configuration conf) {
+    return Math.min(
+      conf.getInt(YarnConfiguration.RM_DECOMMISSIONING_NODES_WATCHER_DELAY_MS,


Could we use getTimeDuration()?

goiri · 2023-08-31T23:07:15Z

...test/java/org/apache/hadoop/yarn/server/resourcemanager/TestDecommissioningNodesWatcher.java

+          throws Exception {
+    Configuration conf = new Configuration();
+    // decommission timeout is 10 min
+    conf.set(YarnConfiguration.RM_NODE_GRACEFUL_DECOMMISSION_TIMEOUT, "600");


setInt or ideally setTimeDuration?

goiri · 2023-08-31T23:07:48Z

...test/java/org/apache/hadoop/yarn/server/resourcemanager/TestDecommissioningNodesWatcher.java

+
+    // we should still get WAIT_SCHEDULED_APPS as expiry time is not over
+    NodeHealthStatus status = NodeHealthStatus.newInstance(true, "",
+            System.currentTimeMillis() - 1000);


Indentation looks funny.

goiri · 2023-08-31T23:08:18Z

...test/java/org/apache/hadoop/yarn/server/resourcemanager/TestDecommissioningNodesWatcher.java

+    MockRM.finishAMAndVerifyAppState(app, rm, nm1, am);
+    rm.waitForState(app.getApplicationId(), RMAppState.FINISHED);
+    Assert.assertEquals(0, node1.getRunningApps().size());
+    watcher.update(node1, nodeStatus);


What do we update this for if we don't assert later?

goiri · 2023-08-31T23:09:01Z

.../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java

+    rm = new MockRM(conf);
+    rm.start();
+
+    MockNM nm1 = rm.registerNode("host1:1234", 10240);


Make it 10*1024.

github-actions bot added YARN trunk labels Jul 29, 2023

slfan1989 reviewed Jul 30, 2023

View reviewed changes

...hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java Outdated Show resolved Hide resolved

slfan1989 reviewed Jul 30, 2023

View reviewed changes

...hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java Outdated Show resolved Hide resolved

slfan1989 reviewed Jul 30, 2023

View reviewed changes

...src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java Outdated Show resolved Hide resolved

pr comments

6312eb5

abhishekd0907 requested a review from slfan1989 July 31, 2023 07:18

slfan1989 approved these changes Aug 13, 2023

View reviewed changes

pr comments

4adedca

pr comments

5db1905

abhishekd0907 closed this Aug 15, 2023

abhishekd0907 reopened this Aug 15, 2023

slfan1989 approved these changes Aug 15, 2023

View reviewed changes

goiri reviewed Aug 15, 2023

View reviewed changes

pr comments

1d9c17c

pr comments

e933d60

goiri reviewed Aug 31, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YARN-11421] Graceful Decommission ignores launched containers and gets deactivated before timeout #5905

[YARN-11421] Graceful Decommission ignores launched containers and gets deactivated before timeout #5905

abhishekd0907 commented Jul 29, 2023 •

edited

hadoop-yetus commented Jul 29, 2023

hadoop-yetus commented Jul 31, 2023

slfan1989 commented Aug 13, 2023

hadoop-yetus commented Aug 14, 2023

hadoop-yetus commented Aug 15, 2023

abhishekd0907 commented Aug 15, 2023

slfan1989 commented Aug 15, 2023

goiri Aug 15, 2023

abhishekd0907 Aug 21, 2023

goiri Aug 15, 2023

abhishekd0907 Aug 21, 2023

hadoop-yetus commented Aug 20, 2023

hadoop-yetus commented Aug 20, 2023

abhishekd0907 commented Aug 21, 2023

abhishekd0907 commented Aug 28, 2023

goiri Aug 31, 2023

goiri Aug 31, 2023

goiri Aug 31, 2023

goiri Aug 31, 2023

goiri Aug 31, 2023

goiri Aug 31, 2023

[YARN-11421] Graceful Decommission ignores launched containers and gets deactivated before timeout #5905

Are you sure you want to change the base?

[YARN-11421] Graceful Decommission ignores launched containers and gets deactivated before timeout #5905

Conversation

abhishekd0907 commented Jul 29, 2023 • edited

Description of PR

How was this patch tested?

For code changes:

hadoop-yetus commented Jul 29, 2023

hadoop-yetus commented Jul 31, 2023

slfan1989 commented Aug 13, 2023

hadoop-yetus commented Aug 14, 2023

hadoop-yetus commented Aug 15, 2023

abhishekd0907 commented Aug 15, 2023

slfan1989 commented Aug 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Aug 20, 2023

hadoop-yetus commented Aug 20, 2023

abhishekd0907 commented Aug 21, 2023

abhishekd0907 commented Aug 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhishekd0907 commented Jul 29, 2023 •

edited