-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU load when PulsarClient (Java) is idle #6414
Comments
Thanks a lot for @racorn 's detailed analysis. your suggestions sounds great. Would you like to provide a fix for this? |
Yes, I can try to implement what I suggested, by mid next week. Will that be OK with you? |
I wish I had seen this before: netty/netty#9710, and the fix netty/netty#9714 As the Netty version is now bumped to 4.1.45 on master, the CPU load is no longer an issue on Windows. Anyway, @jiazhai, I made the suggested implementation in my forked repository, you can have a look to see if it is of any value. I could be an improvement in the |
Thanks @racorn for your great help. Agree with you, we could leave it as it is. And would like to close this issue. |
Describe the bug
Running a simple consumer on Windows using the
PulsarClient
, I notice that there is considerable CPU usage when the consumer is just idling. Profiling showed that the 'pulsar-timer-x-y' thread is heavy on the CPU. Digging into the source code, we can see that theHashedWheelTimer
is set up with 1 ms tick time:timer = new HashedWheelTimer(getThreadFactory("pulsar-timer"), 1, TimeUnit.MILLISECONDS);
A simple test with similar values show that a running HashedWheelTimer can consume CPU equal to 100% of one hyperthread:
My understanding of the
HashedWheelTimer
is that it is indented to be used for a large number of approximated timeouts and not for millisecond precision scheduling, but I may be mistaken.To Reproduce
Run a simple producer subscribing to a topic with no messages, or run the above simple JUnit test. Observe CPU usage.
Expected behavior
An idle consumer should not incur high CPU load. For example, when running 8 consumers on a i7-8700 (4 cores 2 threads per core), TaskManager reported 100% CPU usage - when there was no traffic, no messages on the topics.
Discussion
High CPU load is a high price to pay for immediate batch dispatching (which may be the primary reason for setting the 1 ms tick time). It should be possible to lax on immediate batch dispatching in favor of more compute resources available to other tasks.
An alternative is to use a plain old
ScheduledExecutorService
for batch dispatching/reception, tests suggest it will also be lighter on the CPU when millisecond precision is required and the number of task is not too high. For example, I could schedule 2000 simple tasks every millisecond (scheduleAtFixedRate
) with more compute available on a single thread. When scheduling 10k tasks every millisecond, the thread was working 100%, but scheduling 10k tasks every 10 millisecond was OK. So if a client is used to create a huge number of producers/consumers, it might be better to lax the batch timeout so the timeout thread has a chance to actually perform its work. Another observation is that theHashedWheelTimer
used with 1 millisecond tick time is not able to run a single simple task every millisecond, party due to theHashedWheelTimer
approximate nature, and maybe party because spinning the wheel competes with task expiration.Note that when using
ScheduledExecutorService
each task might not get exactly the same number of executions, for example when scheduling 1000 tasks run execute every millisecond for 30 seconds, task were executed between 29 986 and 30 012 times.Some suggestions:
ScheduledExecutorService
inPulsarClientImpl
that is exposed the same way as theTimer
object for other classes to use. TheProducerImpl
andConsumerBase
could use theScheduledExecutorService
to schedule batch dispatching/reception.Additional context
Observed under Windows 10 64-bit, running Java 11.
The text was updated successfully, but these errors were encountered: