Skip to content

Conversation

@Daniel-009497
Copy link
Contributor

Tens of thousands of meaningless logs are frequently printed during ResourceManager startup and recover container.
As we know, ResourceManager will always keep 10k application information by default. In our very big scale cluster, it is very usual that resourcemanager try to recover the containers which already finished and does not exist in ResourceManager but still reported by nodemanager.
Under this case, below logs will be frequently printed, more importantly, this log is meaningless, in real production setups, the maintainers actually more care about which containers are properly recovered or killed not the ones are skipped.
The related code are as follows,
image

So we move the log into function killOrphanContainerOnNode().
image

Only the containers to be killed need to be loged which is vital for trouble shooting to distinguish whether the containers are kill by hadoop inner mechanism or by users themselves.

private void killOrphanContainerOnNode(RMNode node,
NMContainerStatus container) {
if (!container.getContainerState().equals(ContainerState.COMPLETE)) {
LOG.warn("Killing container " + container + " for unknown application");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this change result in more log output? Does this mean that all apps that enter this judgment must be printed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your contribution, but I think different people may have different understandings of the log output. This does not seem to be a general change, and I have not seen much value from this change.

Copy link
Contributor Author

@Daniel-009497 Daniel-009497 Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this change result in more log output? Does this mean that all apps that enter this judgment must be printed?

@slfan1989 Not really, Only the orphan container which status is unfinished will be killed and logged here. What is really matters is the minority ones be handled not the majority ones be skipped.

Copy link
Contributor

@slfan1989 slfan1989 Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at it yesterday, and today I'll take a closer look at the code.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 47s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 17s trunk passed
+1 💚 compile 1m 4s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 59s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 55s trunk passed
+1 💚 mvnsite 1m 5s trunk passed
-1 ❌ javadoc 0m 58s /branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt hadoop-yarn-server-resourcemanager in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.
+1 💚 javadoc 0m 45s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 2m 7s trunk passed
+1 💚 shadedclient 21m 7s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 55s the patch passed
+1 💚 compile 0m 59s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 59s the patch passed
+1 💚 compile 0m 53s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 53s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 39s the patch passed
+1 💚 mvnsite 0m 55s the patch passed
-1 ❌ javadoc 0m 39s /patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.
+1 💚 javadoc 0m 37s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 58s the patch passed
+1 💚 shadedclient 20m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 99m 12s hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚 asflicense 0m 32s The patch does not generate ASF License warnings.
195m 44s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5247/1/artifact/out/Dockerfile
GITHUB PR #5247
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 690a3d44d790 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1464474
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5247/1/testReport/
Max. process+thread count 969 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5247/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Oct 29, 2025
@github-actions github-actions bot closed this Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants