Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU when using SleepingWaitStrategy on 32 bit Linux #162

Closed
TanyaGaleyev opened this issue Jul 7, 2016 · 19 comments
Closed

100% CPU when using SleepingWaitStrategy on 32 bit Linux #162

TanyaGaleyev opened this issue Jul 7, 2016 · 19 comments

Comments

@TanyaGaleyev
Copy link

TanyaGaleyev commented Jul 7, 2016

Environment

Guest: Debian 7.11 kernel 3.2.0-4-486.
Host: Mac OS X x64.
Virtualization: VirtualBox.
Oracle JDK 1.8.0_92.
Disruptor 3.3.4.

Description

Check sample program. It seems that (un)famous LockSupport.parkNanos(1L) behaves differently on 64 and 32 bit kernels. On 32 bit Linux sleeping wait strategy behaves almost like busy spin.

@craigday
Copy link

craigday commented Jul 8, 2016

How many CPUs have you given your VM?

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 8, 2016

How many threads/event handlers do you have running in your application? CPU usage can get quite high if the system is over contended. Also what does your Disruptor setup look like? It is possible to see high CPU usage if you have an event handler gating on another event handler that takes quite a long time to complete.

@TanyaGaleyev
Copy link
Author

@craigday VM has only one CPU.
@mikeb01 in test application I have one thread producing events (one per second), one disruptor thread consuming event. There is only one event handler. I run test on idle VM. You can see setup in following gist. No additional configs and system properties are used.

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 9, 2016

Do you see the same behaviour when running with the BlockingWaitStrategy?

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 9, 2016

Have you tried with a 2 CPU VM.

@TanyaGaleyev
Copy link
Author

@mikeb01 yes, with 2 CPU top in Irix mode reports 100% load. Which means that one CPU is fully loaded and second is free.

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 9, 2016

It would also be useful if you could run a strace on the Java process to see what system call it is using.

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 9, 2016

I notice that you are running a Linux guest on a Mac OS X host. Have you tried hosting the VM on a Linux server, e.g. on Amazon EC2? Also, what virtualisation tool are you using (e.g. VirtualBox)?

@TanyaGaleyev
Copy link
Author

VirtualBox was used. Same symptoms were observed for guests running on VMWare ESXi hypervisor. Have not tried with EC2. I will try to check it on other environments too.

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 9, 2016

Were you still running with the Mac OS X host when using VMWare?

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 9, 2016

I've tested with a 64 bit guest on KVM on Linux and I don't see the same issue. A spinning LockSupport.parkNanos tends to use 10% CPU.

BTW, could you try just running the following code instead of the full Disruptor test. This will ensure that the problem is isolated to the LockSupport.parkNanos call.

public class Spin
{
    public static void main(String[] args)
    {
        while (true)
        {
            java.util.concurrent.locks.LockSupport.parkNanos(1);
        }
    }
}

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 9, 2016

I'll test tomorrow with a 32 bit guest on KVM. This will help determine if it is specifically an issue with being a 32 bit system.

@TanyaGaleyev
Copy link
Author

I have checked it both disruptor example and parkNanos snippet on a laptop with installed 32 bit CentOS Linux and confirm that one CPU is fully loaded.

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 10, 2016

I've replicated the same issue with a 32 bit Linux guest on a 64 bit Linux host via KVM. Unfortunately there isn't anything that we can really do about it. If deploying onto a 32 bit guest is your only option, then I would recommend either the BlockWaitStrategy if you need to preserve CPU use, Yield/BusySpin if you need performance or some custom solution that many need to back off to a Thread.sleep(1) if the other solutions are not viable.

The better recommendation would be to move to a 64 bit guest, which doesn't seem to exhibit the same issue. I'm not sure if this is an issue on 32 bit native hardware, but the last physical 32 bit machine in our environment was decommissioned about 5 years ago.

I'm going to close this as a known issue, there is not much that we can do about it.

@mikeb01 mikeb01 closed this as completed Jul 10, 2016
@TanyaGaleyev
Copy link
Author

@mikeb01 where can one find a list of known issues?
About 32 bit native hardware, as I said earlier I reproduced an issue on a laptop with 32 bit CentOS and the laptop has native 32 bit hardware. So I still recommend to mention all 32 bit Linux no matter running on 32 or 64 bit hardware, physical or virtual.
Also perhaps that curious one can find answers in parkNanos native jre code.

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 10, 2016

I'll add a section to the Wiki.

@mikeb01
Copy link
Contributor

mikeb01 commented Jul 10, 2016

I did some digging through the JDK source while investigating this issue. The LockSupport.parkNanos call consists of 2 main library calls. It does a gettimeofday to get the current clock time, then calls pthread_cond_wait. If I run strace on both the 32 bit and 64 there is two subtle differences. On 32 bit it calls clock_gettime and futex(..., FUTEX_WAIT_PRIVATE, ...) on 64 bit it does just futex(..., FUTEX_WAIT_BITSET_PRIVATE, ...).

So there are 2 things happening here. On 64 bit it is able to get the current time without making an OS syscall (likely a property of libc), on 32 bit it has to make a syscall to get the current time. Also on 64 bit the FUTEX_WAIT_BITSET_PRIVATE option allows for filtering of matching bitset values on wait and wake. I suspect it is the former (clock_gettime syscall) that induces the extra overhead.

@TanyaGaleyev
Copy link
Author

Thanks for sharing!

@themass
Copy link

themass commented Dec 18, 2017

hello, the same problem·
Thread 8746: (state = IN_JAVA)

  • com.lmax.disruptor.BlockingWaitStrategy.waitFor(long, com.lmax.disruptor.Sequence, com.lmax.disruptor.Sequence, com.lmax.disruptor.SequenceBarrier) @bci=92, line=56 (Compiled frame; information m
    ay be imprecise)
  • com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(long) @bci=18, line=56 (Interpreted frame)
  • com.lmax.disruptor.BatchEventProcessor.run() @bci=52, line=124 (Interpreted frame)
  • java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame)
  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Thread 8745: (state = IN_JAVA)

  • java.util.concurrent.locks.LockSupport.unpark(java.lang.Thread) @bci=8, line=152 (Compiled frame; information may be imprecise)
  • java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node) @bci=80, line=662 (Compiled frame)
  • java.util.concurrent.locks.AbstractQueuedSynchronizer.release(int) @bci=26, line=1263 (Compiled frame)
  • java.util.concurrent.locks.ReentrantLock.unlock() @bci=5, line=460 (Compiled frame)
  • com.lmax.disruptor.BlockingWaitStrategy.waitFor(long, com.lmax.disruptor.Sequence, com.lmax.disruptor.Sequence, com.lmax.disruptor.SequenceBarrier) @bci=50, line=50 (Compiled frame)
  • com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(long) @bci=18, line=56 (Interpreted frame)
  • com.lmax.disruptor.BatchEventProcessor.run() @bci=52, line=124 (Interpreted frame)
  • java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame)
  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

8745 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.39 java
8747 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.36 java
8748 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.39 java
8744 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:30.83 java
8746 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.38 java
8749 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.35 java
8750 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.35 java
8751 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.35 java
8887 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 79:29.79 java

26 core,linux 64 server
all disruptor use 100% cpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants