Skip to content

Conversation

@Daniel-009497
Copy link
Contributor

ResourceManager DelegationTokenRenewer timeout feature may cause high utilization of CPU and object leak.
1-If yarn cluster is in idle state, that is almost no token renewer event triggered, the DelegationTokenRenewerPoolTracker thread will do nothing but dead loop, it will cause high CPU utilization.

2-The renewer event is hold in a map named futures, will has no remove logic , that is the map will become increasingly great with time going by.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
-1 ❌ mvninstall 41m 1s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 1m 3s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 55s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 52s trunk passed
+1 💚 mvnsite 1m 1s trunk passed
-1 ❌ javadoc 0m 53s /branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt hadoop-yarn-server-resourcemanager in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.
+1 💚 javadoc 0m 41s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 2m 10s trunk passed
+1 💚 shadedclient 24m 18s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 52s the patch passed
+1 💚 compile 0m 59s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 59s the patch passed
+1 💚 compile 0m 49s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 49s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 39s /results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 33 unchanged - 0 fixed = 34 total (was 33)
+1 💚 mvnsite 0m 54s the patch passed
-1 ❌ javadoc 0m 37s /patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.
+1 💚 javadoc 0m 35s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 2m 6s the patch passed
+1 💚 shadedclient 24m 9s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 102m 10s hadoop-yarn-server-resourcemanager in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
207m 4s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5233/1/artifact/out/Dockerfile
GITHUB PR #5233
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux b379aecae3f7 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4ed67fe
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5233/1/testReport/
Max. process+thread count 917 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5233/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

// If the cluster is idle for some time, futures map is empty or no event handler found which may still cause high CPU utilization
// Therefore a short nap should be added here.
try {
Thread.sleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set sleep 1000ms, what is the effect of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set sleep 1000ms, what is the effect of this?

Give up CPU

Copy link
Contributor

@slfan1989 slfan1989 Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give data? The CPU ratio before and after the modification. Why did we choose 1000ms? there seems to be no data support for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989
Thanks for review.
The CPU utilization statistics is as follows:
Before optimize:
image

After optimize:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem to be a good way to deal with it through thread sleep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced with the sleep either as @slfan1989 mentioned and it looks very arbitrary to me, worked for you but might cause issues for others

@Daniel-009497
Copy link
Contributor Author

@ayushtkn Could you pls help to review this PR?

@szilard-nemeth szilard-nemeth changed the title YARN-11398 DelegationTokenRenewer timeout feautre may cause high utilization of CPU and memory leak YARN-11398 DelegationTokenRenewer timeout feature may cause high utilization of CPU and memory leak Jan 11, 2023
@bitterfox
Copy link
Contributor

bitterfox commented Aug 7, 2023

Hi, any update on this PR? We're suffering high CPU usage in low-spec clusters like alpha and beta environments with yarn too.

WDYT to wait for some signal in run() while loop when futures is empty instead of sleeping if you don't like sleep?
We can fire signal from

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Oct 29, 2025
@github-actions github-actions bot closed this Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants