100% CPU when using SleepingWaitStrategy on 32 bit Linux #162

TanyaGaleyev · 2016-07-07T13:46:57Z

Environment

Guest: Debian 7.11 kernel 3.2.0-4-486.
Host: Mac OS X x64.
Virtualization: VirtualBox.
Oracle JDK 1.8.0_92.
Disruptor 3.3.4.

Description

Check sample program. It seems that (un)famous LockSupport.parkNanos(1L) behaves differently on 64 and 32 bit kernels. On 32 bit Linux sleeping wait strategy behaves almost like busy spin.

The text was updated successfully, but these errors were encountered:

craigday · 2016-07-08T01:14:11Z

How many CPUs have you given your VM?

mikeb01 · 2016-07-08T02:51:25Z

How many threads/event handlers do you have running in your application? CPU usage can get quite high if the system is over contended. Also what does your Disruptor setup look like? It is possible to see high CPU usage if you have an event handler gating on another event handler that takes quite a long time to complete.

TanyaGaleyev · 2016-07-08T09:06:45Z

@craigday VM has only one CPU.
@mikeb01 in test application I have one thread producing events (one per second), one disruptor thread consuming event. There is only one event handler. I run test on idle VM. You can see setup in following gist. No additional configs and system properties are used.

mikeb01 · 2016-07-09T04:24:25Z

Do you see the same behaviour when running with the BlockingWaitStrategy?

mikeb01 · 2016-07-09T04:29:25Z

Have you tried with a 2 CPU VM.

TanyaGaleyev · 2016-07-09T07:11:45Z

@mikeb01 yes, with 2 CPU top in Irix mode reports 100% load. Which means that one CPU is fully loaded and second is free.

mikeb01 · 2016-07-09T08:23:42Z

It would also be useful if you could run a strace on the Java process to see what system call it is using.

mikeb01 · 2016-07-09T09:24:19Z

I notice that you are running a Linux guest on a Mac OS X host. Have you tried hosting the VM on a Linux server, e.g. on Amazon EC2? Also, what virtualisation tool are you using (e.g. VirtualBox)?

TanyaGaleyev · 2016-07-09T10:24:41Z

VirtualBox was used. Same symptoms were observed for guests running on VMWare ESXi hypervisor. Have not tried with EC2. I will try to check it on other environments too.

mikeb01 · 2016-07-09T11:04:25Z

Were you still running with the Mac OS X host when using VMWare?

mikeb01 · 2016-07-09T11:09:41Z

I've tested with a 64 bit guest on KVM on Linux and I don't see the same issue. A spinning LockSupport.parkNanos tends to use 10% CPU.

BTW, could you try just running the following code instead of the full Disruptor test. This will ensure that the problem is isolated to the LockSupport.parkNanos call.

public class Spin
{
    public static void main(String[] args)
    {
        while (true)
        {
            java.util.concurrent.locks.LockSupport.parkNanos(1);
        }
    }
}

mikeb01 · 2016-07-09T11:10:20Z

I'll test tomorrow with a 32 bit guest on KVM. This will help determine if it is specifically an issue with being a 32 bit system.

TanyaGaleyev · 2016-07-09T14:10:33Z

I have checked it both disruptor example and parkNanos snippet on a laptop with installed 32 bit CentOS Linux and confirm that one CPU is fully loaded.

mikeb01 · 2016-07-10T02:42:07Z

I've replicated the same issue with a 32 bit Linux guest on a 64 bit Linux host via KVM. Unfortunately there isn't anything that we can really do about it. If deploying onto a 32 bit guest is your only option, then I would recommend either the BlockWaitStrategy if you need to preserve CPU use, Yield/BusySpin if you need performance or some custom solution that many need to back off to a Thread.sleep(1) if the other solutions are not viable.

The better recommendation would be to move to a 64 bit guest, which doesn't seem to exhibit the same issue. I'm not sure if this is an issue on 32 bit native hardware, but the last physical 32 bit machine in our environment was decommissioned about 5 years ago.

I'm going to close this as a known issue, there is not much that we can do about it.

TanyaGaleyev · 2016-07-10T06:43:16Z

@mikeb01 where can one find a list of known issues?
About 32 bit native hardware, as I said earlier I reproduced an issue on a laptop with 32 bit CentOS and the laptop has native 32 bit hardware. So I still recommend to mention all 32 bit Linux no matter running on 32 or 64 bit hardware, physical or virtual.
Also perhaps that curious one can find answers in parkNanos native jre code.

mikeb01 · 2016-07-10T07:05:09Z

I'll add a section to the Wiki.

mikeb01 · 2016-07-10T07:47:08Z

I did some digging through the JDK source while investigating this issue. The LockSupport.parkNanos call consists of 2 main library calls. It does a gettimeofday to get the current clock time, then calls pthread_cond_wait. If I run strace on both the 32 bit and 64 there is two subtle differences. On 32 bit it calls clock_gettime and futex(..., FUTEX_WAIT_PRIVATE, ...) on 64 bit it does just futex(..., FUTEX_WAIT_BITSET_PRIVATE, ...).

So there are 2 things happening here. On 64 bit it is able to get the current time without making an OS syscall (likely a property of libc), on 32 bit it has to make a syscall to get the current time. Also on 64 bit the FUTEX_WAIT_BITSET_PRIVATE option allows for filtering of matching bitset values on wait and wake. I suspect it is the former (clock_gettime syscall) that induces the extra overhead.

TanyaGaleyev · 2016-07-10T13:04:44Z

Thanks for sharing!

themass · 2017-12-18T06:27:22Z

hello， the same problem·
Thread 8746: (state = IN_JAVA)

com.lmax.disruptor.BlockingWaitStrategy.waitFor(long, com.lmax.disruptor.Sequence, com.lmax.disruptor.Sequence, com.lmax.disruptor.SequenceBarrier) @bci=92, line=56 (Compiled frame; information m
ay be imprecise)
com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(long) @bci=18, line=56 (Interpreted frame)
com.lmax.disruptor.BatchEventProcessor.run() @bci=52, line=124 (Interpreted frame)
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame)
java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Thread 8745: (state = IN_JAVA)

java.util.concurrent.locks.LockSupport.unpark(java.lang.Thread) @bci=8, line=152 (Compiled frame; information may be imprecise)
java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node) @bci=80, line=662 (Compiled frame)
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(int) @bci=26, line=1263 (Compiled frame)
java.util.concurrent.locks.ReentrantLock.unlock() @bci=5, line=460 (Compiled frame)
com.lmax.disruptor.BlockingWaitStrategy.waitFor(long, com.lmax.disruptor.Sequence, com.lmax.disruptor.Sequence, com.lmax.disruptor.SequenceBarrier) @bci=50, line=50 (Compiled frame)
com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(long) @bci=18, line=56 (Interpreted frame)
com.lmax.disruptor.BatchEventProcessor.run() @bci=52, line=124 (Interpreted frame)
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame)
java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

8745 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.39 java
8747 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.36 java
8748 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.39 java
8744 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:30.83 java
8746 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.38 java
8749 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.35 java
8750 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.35 java
8751 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 21:31.35 java
8887 www 20 0 23.914g 4.441g 23468 R 99.9 7.1 79:29.79 java

26 core，linux 64 server
all disruptor use 100% cpu

mikeb01 closed this as completed Jul 10, 2016

TanyaGaleyev mentioned this issue Jul 10, 2016

100% CPU load on 32 bit Linux reactor/reactor-core#112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

100% CPU when using SleepingWaitStrategy on 32 bit Linux #162

100% CPU when using SleepingWaitStrategy on 32 bit Linux #162

TanyaGaleyev commented Jul 7, 2016 •

edited

craigday commented Jul 8, 2016

mikeb01 commented Jul 8, 2016

TanyaGaleyev commented Jul 8, 2016

mikeb01 commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

TanyaGaleyev commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

TanyaGaleyev commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

TanyaGaleyev commented Jul 9, 2016

mikeb01 commented Jul 10, 2016

TanyaGaleyev commented Jul 10, 2016

mikeb01 commented Jul 10, 2016 •

edited

mikeb01 commented Jul 10, 2016 •

edited

TanyaGaleyev commented Jul 10, 2016

themass commented Dec 18, 2017

100% CPU when using SleepingWaitStrategy on 32 bit Linux #162

100% CPU when using SleepingWaitStrategy on 32 bit Linux #162

Comments

TanyaGaleyev commented Jul 7, 2016 • edited

Environment

Description

craigday commented Jul 8, 2016

mikeb01 commented Jul 8, 2016

TanyaGaleyev commented Jul 8, 2016

mikeb01 commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

TanyaGaleyev commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

TanyaGaleyev commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

mikeb01 commented Jul 9, 2016

TanyaGaleyev commented Jul 9, 2016

mikeb01 commented Jul 10, 2016

TanyaGaleyev commented Jul 10, 2016

mikeb01 commented Jul 10, 2016 • edited

mikeb01 commented Jul 10, 2016 • edited

TanyaGaleyev commented Jul 10, 2016

themass commented Dec 18, 2017

TanyaGaleyev commented Jul 7, 2016 •

edited

mikeb01 commented Jul 10, 2016 •

edited

mikeb01 commented Jul 10, 2016 •

edited