Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in compiled code on Linux PPC when running with disableTOC option #10939

Closed
ashu-mehra opened this issue Oct 20, 2020 · 17 comments · Fixed by #10942
Closed

Crash in compiled code on Linux PPC when running with disableTOC option #10939

ashu-mehra opened this issue Oct 20, 2020 · 17 comments · Fixed by #10942
Assignees
Labels
arch:power comp:jit segfault Issues that describe segfaults / JVM crashes

Comments

@ashu-mehra
Copy link
Contributor

I came across this issue while working on some other item. If I use -Xjit:disableTOC I see a crash in the compiled code.
The issue is reproducible easily using latest nightly builds from Adopt.
I used the following version:

$ /home/ashu/builds/jdk-11.0.9+10-jre/bin/java -version
openjdk version "11.0.9" 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9+10-202010200042)
Eclipse OpenJ9 VM AdoptOpenJDK (build master-e9d73ae0f, JRE 11 Linux ppc64le-64-Bit Compressed References 20201020_754 (JIT enabled, AOT enabled)
OpenJ9   - e9d73ae0f
OMR      - b7e8a5d78
JCL      - 5e617ab0c8 based on jdk-11.0.9+10)

Command to reproduce:
$ ~/builds/jdk-11.0.9+10-jre/bin/java -Xshareclasses:none "-Xjit:disableTOC,verbose={compilePerformance},vlog=jit.log" -version

In my case I got a crash in compiled code of java/lang/ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class.
gdb shows following backtrace:

#13 <signal handler called>
#14 0x7fff00009113dd70 in ?? ()
#15 0x00007fff74c026b0 in ?? ()
#16 0x00007fff92a7658c in sendLoadClass () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libj9vm29.so
#17 0x00007fff92a87138 in internalFindClassInModule () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libj9vm29.so
#18 0x00007fff92a8870c in internalFindClassUTF8 () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libj9vm29.so
#19 0x00007fff92adf258 in resolveClassRef () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libj9vm29.so
#20 0x00007fff92b185d4 in bytecodeLoopCompressed () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libj9vm29.so
#21 0x00007fff92b6375c in c_cInterpreter () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libj9vm29.so
#22 0x00007fff92a77568 in initializeAttachedThreadImpl () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libj9vm29.so
#23 0x00007fff92a7e4d0 in initializeAttachedThread () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libj9vm29.so
#24 0x00007fff906616f8 in standardInit () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libjclse29.so
#25 0x00007fff90672d1c in scarInit () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libjclse29.so
#26 0x00007fff90673ab8 in J9VMDllMain () from /home/ashu/builds/jdk-11.0.9+10-jre/lib/compressedrefs/libjclse29.so

From jit verbose log:

(warm) java/lang/ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; @ 00007FFF74C02478-00007FFF74C02718 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=6 j9m=00000000000589B8 bcsz=43 GCR time=1754us mem=[region=1984 system=16384]KB compThreadID=0 CpuLoad=4%(0%avg) JvmCpu=0%

So frame 15 belongs to jitted code for ClassLoader.loadClass.

The last passing build with -Xjit:disableTOC option is:

$ ./java -version
openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.8+10-202008102341)
Eclipse OpenJ9 VM AdoptOpenJDK (build master-d3ca26b52, JRE 11 Linux ppc64le-64-Bit Compressed References 20200810_678 (JIT enabled, AOT enabled)
OpenJ9   - d3ca26b52
OMR      - e27fe4682
JCL      - 22c634c401 based on jdk-11.0.8+10)

First failing build with -Xjit:disableTOC option is:

$ ./java -version
openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.8+10-202008102341)
Eclipse OpenJ9 VM AdoptOpenJDK (build master-d3ca26b52, JRE 11 Linux ppc64le-64-Bit Compressed References 20200810_678 (JIT enabled, AOT enabled)
OpenJ9   - d3ca26b52
OMR      - e27fe4682
JCL      - 22c634c401 based on jdk-11.0.8+10)

Following commits have gone in OMR between the two builds:

$ git log --oneline e27fe4682..a636c8f56
a636c8f56 Merge pull request #5420 from jason-hall/heapExpansionHeuristics
542bd619b Merge pull request #5434 from gza060625/dispatch_d2p_pr2_Version2
8e718c25d Merge pull request #4092 from wbh123456/load_opcode
d9d370c98 Merging Dispatcher to ParallelDispatcher (2/2) https://github.com/eclipse/omr/issues/4611
76165638e Add heap free ratio heuristics for optthruput heap expansion
07a3a610a Remove references of IL opcode TR::luRegLoad
989882a4e Remove references of IL opcode TR::luload and TR::luloadi
114c3c09a Remove references of IL opcode TR::iuRegLoad
bc39df4d4 Remove references of IL opcode TR::iuloadi
30c007abd Remove reference of IL opcodes TR::iuload
7ecea9367 Remove references of Il opcodes TR::cload and TR::cloadi
3b07457b1 Remove references of IL opcodes: TR::buload and TR::buloadi

and in OpenJ9:

$ git log --oneline d3ca26b52..f5b189430
f5b189430 Merge pull request #10332 from olvap377/master
d6f336815 Merge pull request #9014 from BeverlyXu/bevPerfScript
d786d0f2d Merge pull request #10349 from aviansie-ben/p10-trampolines
c6bcdb85c A script to parse the perf tool result
772d8cce6 Merge pull request #10302 from jdekonin/master
c739bf191 Merge pull request #10353 from rpshukla/vs2019
c67a8f54c macOS: Updated buildenv variables
9da7337de Fix missing libraries for VS2019 linker
086094d58 Disable initTOC call when disableTOC option is enabled
3b66ee95c Rework trampolines for POWER10
756b5d2de Move CPU detection before code cache initialization
abcffca74 Use VS2019 on Windows JDK15+
638564e79 Clean up some unused declarations in Trampoline.cpp
@ashu-mehra
Copy link
Contributor Author

@aviansie-ben @zl-wang - fyi

@dsouzai dsouzai added arch:power comp:jit segfault Issues that describe segfaults / JVM crashes labels Oct 20, 2020
@aviansie-ben
Copy link
Contributor

Just tested this myself and can confirm the problem. It looks to me like the second-highest-order 16 bits of the target address of a call are somehow ending up in the highest-order 16 bits instead. When placing those bits in the correct location in my testing, it seems to be attempting to call jitStackOverflow. Probably has to do with the rework to how trampolines were generated. Looking into it now.

aviansie-ben added a commit to aviansie-ben/openj9 that referenced this issue Oct 20, 2020
During a previous rework of how trampolines are generated, support was
added for helper trampolines to be generated when the pTOC is explicitly
disabled. However, this implementation accidentally used an oris
instruction where it should have used an ori instruction, which results
in the branch going to the wrong address. This bug has been fixed.

Fixes: eclipse-openj9#10939
Signed-off-by: Ben Thomas <ben@benthomas.ca>
@aviansie-ben
Copy link
Contributor

Tracked the problem down to an incorrect instruction when generating helper trampolines with disableTOC set on pre-P10 systems. It should have been using an ori instruction but was instead erroneously using an oris instruction, which was resulting in the branch target being loaded with the second-highest-order 16 bits of the real address or'ed into the highest-order 16 bits. I've opened #10942 to fix this.

@aviansie-ben aviansie-ben self-assigned this Oct 20, 2020
@ashu-mehra
Copy link
Contributor Author

@aviansie-ben the changes for #10942 resolve the crash when shared class cache is disabled, but I still see the crash when shared classes is enabled when using disableTOC option.
java -version works fine but a more complicated test results in crash.
This is the test case I used
testcase.zip

To reproduce, unzip the file testcase.zip and then run the test as:
java "-Xjit:verbose={compilePerformance},vlog=jit.log,disableTOC" -Xshareclasses:name=c1,cacheDir=/tmp,reset -cp . SimpleSortObject

gdb shows this info:

(gdb) bt
...
#12 <signal handler called>
#13 0x00007fff5c9c37f8 in ?? ()
#14 0x00007fff5c9c3624 in ?? ()
#15 0x00007fff7eb6a59c in runCallInMethod (env=<error reading variable: Cannot access memory at address 0xc2209b25fc53463c>,
    receiver=<error reading variable: Cannot access memory at address 0xc2209b25fc534634>, clazz=<error reading variable: Cannot access memory at address 0xc2209b25fc53462c>,
    methodID=<error reading variable: Cannot access memory at address 0xc2209b25fc534624>, args=<error reading variable: Cannot access memory at address 0xc2209b25fc53461c>) at callin.cpp:1083
Backtrace stopped: Cannot access memory at address 0xc2209b25fc534704

(gdb) x/10i 0x00007fff5c9c37e8
   0x7fff5c9c37e8:      b       0x7fff5c9c36fc
   0x7fff5c9c37ec:      ori     r31,r3,0
   0x7fff5c9c37f0:      ori     r3,r15,0
   0x7fff5c9c37f4:      ori     r4,r2,0
   0x7fff5c9c37f8:      ld      r12,720(r16)       <<< crash point
   0x7fff5c9c37fc:      bl      0x7fff5cbbed60
   0x7fff5c9c3800:      ori     r0,r3,0
   0x7fff5c9c3804:      ori     r3,r31,0
   0x7fff5c9c3808:      ori     r8,r0,0
   0x7fff5c9c380c:      b       0x7fff5c9c368c

At crash point r16 happens to be 0x0 (from the javacore).

Dumping the instructions for the call just after the crash point:

(gdb) x/10i 0x7fff5cbbed60
   0x7fff5cbbed60:      lis     r11,0
   0x7fff5cbbed64:      ori     r11,r11,32767
   0x7fff5cbbed68:      rldicr  r11,r11,32,31
   0x7fff5cbbed6c:      oris    r11,r11,30589
   0x7fff5cbbed70:      ori     r11,r11,51176
   0x7fff5cbbed74:      mtctr   r11
   0x7fff5cbbed78:      bctr

This sequence results in call to 0x7fff777dc7e8 which is jitNewObject.

Another observation is if I use -Xaot:disableTOC, then I get a crash at some other location, for example using the following command:
java "-Xjit:verbose={compilePerformance},vlog=jit.log,disableTOC" "-Xaot:disableTOC" -Xshareclasses:name=c1,cacheDir=/tmp,reset -cp . SimpleSortObject
crash happened at:

#12 <signal handler called>
#13 0xcc6c9128b1234564 in ?? ()
#14 0x00007fff7b7c9600 in fast_jitCheckIfFinalizeObject (currentThread=0x1e700, object=0xffefd080) at cnathelp.cpp:3213
#15 0x00007fff60bc04e8 in ?? ()
#16 0x00007fff82df9868 in sendResolveMethodHandle (currentThread=0x0, cpIndex=0, ramCP=0x0, definingClass=0x0, nameAndSig=0x0) at callin.cpp:919
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Dumping instructions for the jitted method in frame 15:

(gdb) x/10i 0x00007fff60bc04c8
   0x7fff60bc04c8:      beq     0x7fff60bc04e8
   0x7fff60bc04cc:      ori     r3,r15,0
   0x7fff60bc04d0:      lis     r12,0
   0x7fff60bc04d4:      ori     r12,r12,32767
   0x7fff60bc04d8:      rldicr  r12,r12,32,31
   0x7fff60bc04dc:      oris    r12,r12,31644
   0x7fff60bc04e0:      ori     r12,r12,37432
   0x7fff60bc04e4:      bl      0x7fff60dbeb20    << call to fast_jitCheckIfFinalizeObject (?)

Note that the instruction sequence in this case is different than in the previous case when -Xaot:disableTOC was not used. We are not accessing r16 to set r12 in this sequence. Not sure if this is expected.

Dumping the instructions at the target address of the last branch instruction:

(gdb) x/10i 0x7fff60dbeb20
   0x7fff60dbeb20:      lis     r11,0
   0x7fff60dbeb24:      ori     r11,r11,32767
   0x7fff60dbeb28:      rldicr  r11,r11,32,31
   0x7fff60dbeb2c:      oris    r11,r11,31612
   0x7fff60dbeb30:      ori     r11,r11,38352
   0x7fff60dbeb34:      mtctr   r11
   0x7fff60dbeb38:      bctr

The instruction sequence at 0x7fff60dbeb20 corresponds to a call to 0x00007fff7b7c95d0 which is fast_jitCheckIfFinalizeObject.

(gdb) disassemble fast_jitCheckIfFinalizeObject
Dump of assembler code for function fast_jitCheckIfFinalizeObject(J9VMThread*, j9object_t):
   0x00007fff7b7c95d0 <+0>:     addis   r2,r12,111
   0x00007fff7b7c95d4 <+4>:     addi    r2,r2,31536
   0x00007fff7b7c95d8 <+8>:     mflr    r0
   0x00007fff7b7c95dc <+12>:    std     r0,16(r1)
   0x00007fff7b7c95e0 <+16>:    std     r31,-8(r1)
   0x00007fff7b7c95e4 <+20>:    stdu    r1,-64(r1)
   0x00007fff7b7c95e8 <+24>:    mr      r31,r1
   0x00007fff7b7c95ec <+28>:    std     r3,40(r31)
   0x00007fff7b7c95f0 <+32>:    std     r4,32(r31)
   0x00007fff7b7c95f4 <+36>:    ld      r4,32(r31)
   0x00007fff7b7c95f8 <+40>:    ld      r3,40(r31)
   0x00007fff7b7c95fc <+44>:    bl      0x7fff7a839380 <0008cd25.plt_call._ZN12VM_VMHelpers21checkIfFinalizeObjectEP10J9VMThreadP8J9Object>
   0x00007fff7b7c9600 <+48>:    ld      r2,24(r1)
   0x00007fff7b7c9604 <+52>:    nop
   0x00007fff7b7c9608 <+56>:    addi    r1,r31,64
   0x00007fff7b7c960c <+60>:    ld      r0,16(r1)
   0x00007fff7b7c9610 <+64>:    mtlr    r0
   0x00007fff7b7c9614 <+68>:    ld      r31,-8(r1)
   0x00007fff7b7c9618 <+72>:    blr
   0x00007fff7b7c961c <+76>:    .long 0x0
   0x00007fff7b7c9620 <+80>:    .long 0x1000900
   0x00007fff7b7c9624 <+84>:    .long 0x1000180

Call at 0x00007fff7b7c95fc is suppose to be VM_VMHelpers::checkIfFinalizeObject

(gdb) x/10i 0x7fff7a839380
   0x7fff7a839380 <0008cd25.plt_call._ZN12VM_VMHelpers21checkIfFinalizeObjectEP10J9VMThreadP8J9Object>: std     r2,24(r1)
   0x7fff7a839384 <0008cd25.plt_call._ZN12VM_VMHelpers21checkIfFinalizeObjectEP10J9VMThreadP8J9Object+4>:       addis   r12,r2,4
   0x7fff7a839388 <0008cd25.plt_call._ZN12VM_VMHelpers21checkIfFinalizeObjectEP10J9VMThreadP8J9Object+8>:       ld      r12,11624(r12)
   0x7fff7a83938c <0008cd25.plt_call._ZN12VM_VMHelpers21checkIfFinalizeObjectEP10J9VMThreadP8J9Object+12>:      mtctr   r12
   0x7fff7a839390 <0008cd25.plt_call._ZN12VM_VMHelpers21checkIfFinalizeObjectEP10J9VMThreadP8J9Object+16>:      bctr

At the start of this sequence r2=00007FFF7C0C0D68, so the target address for the branch is [0x7FFF7C103AD0] which is:

(gdb) x/g 0x7FFF7C103AD0
0x7fff7c103ad0: **0xcc6c9128b1234567**

This seem to indicate r2 is not loaded correctly, which happens to be set in first two instructions of fast_jitCheckIfFinalizeObject:

   0x00007fff7b7c95d0 <+0>:     addis   r2,r12,111
   0x00007fff7b7c95d4 <+4>:     addi    r2,r2,31536

So r2 is in turn loaded using r12 which is set in the jitted method before the call to C helper.
It looks like something is wrong in the instructions generated for handling call to C helper, whether we use -Xaot:disableTOC or not.

@aviansie-ben
Copy link
Contributor

Running with AOT on and with -Xjit:disableTOC but without -Xaot:disableTOC is definitely not a supported configuration and I wouldn't expect it to work. The AOT and JIT options need to agree on whether or not the pTOC is disabled or the compiler may attempt to use the pTOC without properly initializing it.

The other issue looks completely different and seems to be related to the linkage not correctly loading the address of the target method into gr12. I'm not terribly familiar with that code and I'm definitely not sure how it's supposed to work under AOT.

@zl-wang Any ideas?

@zl-wang
Copy link
Contributor

zl-wang commented Nov 3, 2020

When disabling pTOC, gr16 is still used: that means you have out of dated code base (before p10 work items were completed).

On the other hand, @AlenBadel and @gita-omr are handling AOT helper addresses not properly relocated (when pTOC disabled). This looks like a dup to that.

@ashu-mehra
Copy link
Contributor Author

@zl-wang I tried with the latest level, and it still fails the same as before.

@zl-wang
Copy link
Contributor

zl-wang commented Nov 3, 2020

did you destroy your SCC AOT cache, before re-running the test case?

@ashu-mehra
Copy link
Contributor Author

yes, I am using -Xshareclasses:reset option that destroys the cache before starting up.

@zl-wang
Copy link
Contributor

zl-wang commented Nov 4, 2020

still crash on: ld r12, 720(r16) from AOT code?
AOT automatically disableTOC as far as I know. If you are with code base a couple months old (the last batch of p10 work merged), it appeared to me impossible for AOT to generate that code. It was explicitly directed to use loadAddressConstant.

@aviansie-ben
Copy link
Contributor

No, AOT does not imply that the pTOC is disabled even though AOT does not use the pTOC much. The only reason AOT doesn't use the pTOC is because the relocation infrastructure doesn't currently support updating the pTOC and relocating pTOC load sequences. IIRC AOT code will still use the pTOC for the helper addresses since they're at known offsets within the pTOC and thus do not require emitting any relocations.

All of the problems when not specifying -Xaot:disableTOC are (as I've already said) not real issues and are caused by an unsupported configuration. When -Xjit:disableTOC is specified, the JIT will not initialize the pTOC as it believes it will not be using it, but when -Xaot:disableTOC is missing, AOT code will end up trying to load from the pTOC in these specific cases regardless. The only real issue here AFAICT is the failure to load the correct value into r12 when -Xaot:disableTOC is specified.

@zl-wang
Copy link
Contributor

zl-wang commented Nov 4, 2020

maybe I was not clear:
disableTOC is not on globally pre-p10, but AOT per-method compilation set disableTOC automatically. Previously, AOT code can still use pTOC for the C helper or JNI addresses, but it cannot now (directed to use loadAddressConstant above).

@aviansie-ben
Copy link
Contributor

I cannot find any code that's supposed to set that option in AOT for the case where -Xjit:disableTOC is set manually. My searches for setOption(TR_DisableTOC) turn up only [1] (which is for P10 and is not applicable here) and [2] (which is for JITServer and is thus also not applicable here).

[1] https://github.com/eclipse/omr/blob/8f0354071ad30192d665808207b353214b8b2b3a/compiler/control/OMROptions.cpp#L1966-L1969
[2] https://github.com/eclipse/openj9/blob/8d50944d4769cf17f2bddf00ced8d029d760ef6c/runtime/compiler/control/CompilationThread.cpp#L8079-L8092

@zl-wang
Copy link
Contributor

zl-wang commented Nov 4, 2020

then, you are right: globally disableTOC is not expected to work with AOT without disableTOC.
they are really self contradictory ...

@ashu-mehra
Copy link
Contributor Author

I understand the configuration -Xjit:disableTOC is not expected to work without -Xaot:disableTOC but even with -Xaot:disableTOC I see crash with shared classes on. I think this is the issue that needs to be addressed now.

@zl-wang
Copy link
Contributor

zl-wang commented Nov 4, 2020

as mentioned previously, i expected it to be the dup @AlenBadel is trying to fix.

@gita-omr
Copy link
Contributor

gita-omr commented Nov 4, 2020

Yes, @AlenBadel and myself are looking into it.

zl-wang pushed a commit that referenced this issue Nov 9, 2020
During a previous rework of how trampolines are generated, support was
added for helper trampolines to be generated when the pTOC is explicitly
disabled. However, this implementation accidentally used an oris
instruction where it should have used an ori instruction, which results
in the branch going to the wrong address. This bug has been fixed.

Fixes: #10939
Signed-off-by: Ben Thomas <ben@benthomas.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch:power comp:jit segfault Issues that describe segfaults / JVM crashes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants