Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenJ9 super-slow with libasan (address santizer) versus HotSpot #6228

Open
avermeer opened this issue Jun 21, 2019 · 11 comments

Comments

@avermeer
Copy link

commented Jun 21, 2019

Hello,

We tried to move our C++ test tools based on ASAN (see https://en.wikipedia.org/wiki/AddressSanitizer for background on this tools) to detect memory leaks/freed memory pointers/etc from Hotspot Java to OpenJ9 Java.

We have JNI code, therefore we need to provide a Java Runtime to our test factory.

When we switched this Java Runtime dependency from AdoptOpenJDK 11 based on HotSpot to AdoptOpenJDK 11 based on OpenJ9, it became so slow that we had to rollback.

We have isolated a trivial test case to reproduce this issue, showing that with HotSpot it takes less than 1 second to run 'java -version' with LIBASAN, while it takes more than 4 minutes to run the same test with OpenJ9.

Prerequisites:

  • Operating system: CentOS 7.5
  • We install AdoptOpenJDK 11.0.1 with HotSpot in ./hotspot/jdk-11.0.3+7
  • We install AdoptOpenJDK 11.0.1 with OpenJ9 in ./openj9/jdk-11.0.3+7
  • We use package libasan5-8.2.1-3.el7.x86_64

We run the following test:

export LD_PRELOAD=/usr/lib64/libasan.so.5
time ./hotspot/jdk-11.0.3+7/bin/java -version
time ./openj9/jdk-11.0.3+7/bin/java -version

Hotspot result:

...
SUMMARY: AddressSanitizer: 23284 byte(s) leaked in 158 allocation(s).

real    0m0.50s
user    0m0.17s
sys     0m0.34s

OpenJ9 result:

...
SUMMARY: AddressSanitizer: 947030 byte(s) leaked in 337 allocation(s).

real    4m2.17s
user    0m20.25s
sys     3m41.96s

This huge execution time difference clearly excludes the use of OpenJ9 to check the memory errors of our C++ code depending on Java through JNI, which is too bad.

Is there a way to get this execution time difference fixed? any JVM option which I could use to make this usage of OpenJ9-based JavaRuntime actually usable?

Kind regards,
Alexandre Vermeerbergen

@pshipton

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

Please try

I suspect the slowness is due to the code that searches for an appropriate place to put the compressedrefs object heap.

@pshipton

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

@mpirvu

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

I can reproduce the issue even with -Xint and linux profiler shows the following distribution of ticks:

  47.80%  java     [kernel.kallsyms]  [k] unmap_page_range
  15.66%  java     [kernel.kallsyms]  [k] free_pgd_range
   8.48%  java     [kernel.kallsyms]  [k] _cond_resched
   5.68%  java     [kernel.kallsyms]  [k] system_call_after_swapgs
   4.57%  java     [kernel.kallsyms]  [k] nmi

This confirms Peter's suspicion.
On my machine I have 16GB of RAM and using no -Xmx option the JVM wants to allocate a 4GB heap. If I specify a smaller heap with -Xmx1G, then the problem does not occur:

[root@ivybridgedocker IBM]# time /home/mpirvu/sdks/OpenJDK11U-jdk_x64_linux_openj9_11.0.2_9_openj9-0.12.1_openj9-0.12/bin/java -Xmx1g  -version
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.12.1, JRE 11 Linux amd64-64-Bit Compressed References 20190204_127 (JIT enabled, AOT enabled)
OpenJ9   - 90dd8cb40
OMR      - d2f4534b
JCL      - 289c70b6844 based on jdk-11.0.2+9)

real    0m0.205s
user    0m0.180s
sys     0m0.073s

Not exporting LD_PRELOAD=/usr/lib64/libasan.so.5 also makes the problem go away.

@amicic

@pshipton

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

Using -Xgc:preferredHeapBase=0x10000000 also is much faster.

@charliegracie

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

@pshipton and @mpirvu you can reproduce this? Only on CentOS machines?

@pshipton

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

Yes, I was able to reproduce it. I created a CentOS 7.5 machine, used yum to install the libasan5 package. Then export LD_PRELOAD=/usr/lib64/libasan.so.5 and run java

@dmitripivkine

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

@charliegracie has a theory: OMRPORT_VMEM_ALLOC_QUICK (search in process map file for suitable memory range) does not work in this case somehow... We are going to investigate.

@pshipton

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

I thought I'd give it a try on Ubuntu 16 as well, but while libasan reported leaks, the same slowdown didn't occur. Although this could be related to the amount of memory I have available (6G).

apt install libasan5
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libasan.so.5

@pshipton pshipton added the comp:gc label Jun 25, 2019

@pshipton pshipton added this to the Release 0.16 (Java 13) milestone Jun 25, 2019

@avermeer

This comment has been minimized.

Copy link
Author

commented Jun 25, 2019

Hello,

Just to let you know that for the moment, we have planned to try our C++ with JNI automated tests with -Xmx1G option.

I'm glad to read that this incident raised some serious investigations which maybe eventually solve this issue without the need of a by-pass.

Kind regards,
Alexandre Vermeerbergen

@cedrichansen

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

I looked into the issue, and the problem essentially boils down to the following points.

  1. The biggest issue, is that Libasan5 takes up ~15 TB of virtual memory (https://github.com/google/sanitizers/wiki/AddressSanitizerIncompatiblity), which was confirmed after looking at /proc/[pid]/maps when preloading Libasan. Libasan would always occupy very early address range (between about 1.8Gb - 15Tb or so), on ubuntu and centOS systems.
  2. Openj9 with compressed refs will try to place the heap in the first 64Gb of memory, which is problematic because libasan takes up address ranges 1.8Gb - Far past 64 Gb... This leaves only about 1.8 gigs for the heap
  3. By default on in java 11, the heap size is 1/4 of memory, so with 8Gb of physical memory, a 2Gb heap will try to be created (which will not fit in this case). The vm realizes that no 2Gb hole exists, so it shrinks the heap and tries again until it has shrunk enough so it can fit. This is the part that takes a long time in the startup process.

This issue is essentially a conflict of who is occupying the early address ranges.

There are 2 workarounds to using openj9 with libasan:

  1. Specify a heap smaller than 1.8Gb by explicitly setting -Xmx1800M (as a few people have mentioned). This number may vary by a few hundred Mb's.
  2. Use the large heap build (just as @pshipton had mentioned). This avoids the issue because the heap is no longer forced to be under 64Gb.
@dmitripivkine

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

Thank you @cedrichansen
So the reason for this problem is preload of libasan on centOS occupate virtual address range 8fff7000-2008fff7000 and left a very limited memory for heap allocation for Compressed Refs OpenJ9.
An attempt to allocate more memory then available moves JVM to failure path. And this takes a lot of time unfortunately until it realizes default heap size is not available and try to reduce it for next attempt. This is a side effect of my change eclipse/omr#3411 That change suppose to be temporary until we figure out a real problem however we were not able to reproduce it neither get debug information from customer.
We are going to address slow startup issue.
It is interesting that such behaviour created by preload of libasan seems exists on centOS and Ubuntu but preloading on RHEL for example put it to much higher virtual address range without interference with JVM allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.