Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARN-11360: Add number of decommissioning/shutdown nodes to YARN cluster metrics. #5060

Merged
merged 1 commit into from
Oct 28, 2022

Conversation

cnauroth
Copy link
Contributor

@cnauroth cnauroth commented Oct 21, 2022

Description of PR

YARN cluster metrics expose counts of NodeManagers in various states including active and decommissioned. However, these metrics don't expose NodeManagers that are currently in the process of decommissioning. This can look a little spooky to a consumer of these metrics. First, the node drops out of the active count, so it seems like a node just vanished. Then, later (possibly hours later with consideration of graceful decommission), it comes back into existence in the decommissioned count.

This issue tracks adding the decommissioning count to the metrics ResourceManager RPC. We're also adding the shutdown node count. This also enables exposing it in the yarn top output. These metrics are already visible through the REST API, so there isn't any change required there.

How was this patch tested?

The patch adds new unit tests for the ResourceManager RPC, correct merging of the metric through the router service and yarn top. I also tested successfully in a live cluster.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 buf 0m 1s buf was not available.
+0 🆗 buf 0m 1s buf was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 44s Maven dependency ordering for branch
+1 💚 mvninstall 26m 42s trunk passed
+1 💚 compile 10m 34s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 9m 15s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 2m 9s trunk passed
+1 💚 mvnsite 6m 10s trunk passed
+1 💚 javadoc 5m 45s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 5m 32s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 10m 4s trunk passed
+1 💚 shadedclient 23m 8s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for patch
+1 💚 mvninstall 3m 33s the patch passed
+1 💚 compile 10m 2s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 cc 10m 2s the patch passed
+1 💚 javac 10m 2s the patch passed
+1 💚 compile 9m 31s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 cc 9m 31s the patch passed
-1 ❌ javac 9m 31s /results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07.txt hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_342-8u342-b07-0ubuntu120.04-b07 with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu120.04-b07 generated 4 new + 642 unchanged - 0 fixed = 646 total (was 642)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 56s /results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 176 unchanged - 0 fixed = 179 total (was 176)
+1 💚 mvnsite 5m 36s the patch passed
+1 💚 javadoc 4m 58s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 5m 0s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 10m 21s the patch passed
+1 💚 shadedclient 23m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 34s hadoop-yarn-api in the patch passed.
+1 💚 unit 5m 25s hadoop-yarn-common in the patch passed.
+1 💚 unit 100m 26s hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚 unit 28m 55s hadoop-yarn-client in the patch passed.
-1 ❌ unit 5m 58s /patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt hadoop-yarn-server-router in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
338m 43s
Reason Tests
Failed junit tests hadoop.yarn.server.router.clientrm.TestFederationClientInterceptorRetry
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5060/1/artifact/out/Dockerfile
GITHUB PR #5060
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc buflint bufcompat
uname Linux c36a346bf48e 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a201215
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5060/1/testReport/
Max. process+thread count 974 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5060/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@slfan1989
Copy link
Contributor

slfan1989 commented Oct 22, 2022

@cnauroth this Junit Test error is caused by YARN-11342 (#5005), I submitted a fix patch YARN-11357(#5055). Can you help review this pr(#5055)? Thank you very much!

@cnauroth
Copy link
Contributor Author

Can you help review this pr(#5055)?

@slfan1989 , thanks for notifying me of the test failure! I gave +1 on your patch, but I'd also like to see one more code review from someone who has spent more time than me in the federation code.

@slfan1989
Copy link
Contributor

@slfan1989 , thanks for notifying me of the test failure! I gave +1 on your patch, but I'd also like to see one more code review from someone who has spent more time than me in the federation code.

@cnauroth Thank you very much for your help reviewing the code!

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 buf 0m 1s buf was not available.
+0 🆗 buf 0m 1s buf was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 31s Maven dependency ordering for branch
+1 💚 mvninstall 28m 26s trunk passed
+1 💚 compile 11m 10s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 9m 51s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 2m 18s trunk passed
+1 💚 mvnsite 6m 27s trunk passed
+1 💚 javadoc 5m 48s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 5m 32s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 10m 24s trunk passed
+1 💚 shadedclient 24m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 3m 38s the patch passed
+1 💚 compile 10m 24s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 cc 10m 24s the patch passed
+1 💚 javac 10m 24s the patch passed
+1 💚 compile 12m 35s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 cc 12m 35s the patch passed
-1 ❌ javac 12m 35s /results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07.txt hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_342-8u342-b07-0ubuntu120.04-b07 with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu120.04-b07 generated 4 new + 642 unchanged - 0 fixed = 646 total (was 642)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 59s /results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 176 unchanged - 0 fixed = 177 total (was 176)
+1 💚 mvnsite 5m 42s the patch passed
+1 💚 javadoc 5m 32s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 5m 10s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 10m 48s the patch passed
+1 💚 shadedclient 24m 27s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 33s hadoop-yarn-api in the patch passed.
+1 💚 unit 5m 29s hadoop-yarn-common in the patch passed.
+1 💚 unit 102m 9s hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚 unit 29m 7s hadoop-yarn-client in the patch passed.
+1 💚 unit 6m 17s hadoop-yarn-server-router in the patch passed.
+1 💚 asflicense 1m 19s The patch does not generate ASF License warnings.
351m 58s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5060/2/artifact/out/Dockerfile
GITHUB PR #5060
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc buflint bufcompat
uname Linux 0472c4e96392 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 767ddb9
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5060/2/testReport/
Max. process+thread count 977 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5060/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@cnauroth
Copy link
Contributor Author

Regarding Checkstyle -0, TopCLI.java has 134 pre-existing Checkstyle violations. For consistency, my patch is continuing one of the class's patterns that triggers a Checkstyle violation. I'm not planning to address this right now. We could do a big style cleanup later, but I don't want to mix it with real logic changes here.

Pre-commit tests also reported new compiler warnings, pasted below. I don't understand why this is happening. It looks unrelated to anything I'm changing in this patch, but I also can't reproduce the warnings locally.

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestNoHaRMFailoverProxyProvider.java:153:9:[unchecked] unchecked conversion
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestNoHaRMFailoverProxyProvider.java:258:11:[unchecked] unchecked conversion
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailoverProxyProvider.java:255:11:[unchecked] unchecked conversion
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailoverProxyProvider.java:283:11:[unchecked] unchecked conversion

@cnauroth
Copy link
Contributor Author

@mikaylakonst , as you suggested, I've updated the patch so that it also includes number of shutdown nodes.

@abmodi , thank you for the review. FYI, I pushed up a change to add one more property after you approved.

@cnauroth cnauroth changed the title YARN-11360: Add number of decommissioning nodes to YARN cluster metrics. YARN-11360: Add number of decommissioning/shutdown nodes to YARN cluster metrics. Oct 25, 2022
Copy link
Contributor

@ashutoshcipher ashutoshcipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1 (non-binding)

@ashutoshcipher
Copy link
Contributor

[nit]

@cnauroth, may be you want to uncheck the below in PR description

* Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
* If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
* If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@cnauroth
Copy link
Contributor Author

[nit]

@cnauroth, may be you want to uncheck the below in PR description

* Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
* If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
* If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Thanks for the suggestion. I made the update.

Thanks also for the code review and LGTM!

@cnauroth cnauroth merged commit bfb84cd into apache:trunk Oct 28, 2022
@cnauroth cnauroth deleted the YARN-11360 branch October 28, 2022 18:11
asfgit pushed a commit that referenced this pull request Oct 28, 2022
asfgit pushed a commit that referenced this pull request Oct 28, 2022
…ter metrics. (#5060)

(cherry picked from commit bfb84cd)
(cherry picked from commit 33293d4)
@cnauroth
Copy link
Contributor Author

I have committed this to trunk, branch-3.3 and branch-3.2 (after resolving a minor merge conflict). @mikaylakonst , @ashutoshcipher and @abmodi , thank you for the code reviews.

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants