Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation error vmState=0x00053cff #15730

Closed
mateuszrzeszutek opened this issue Aug 17, 2022 · 17 comments
Closed

Segmentation error vmState=0x00053cff #15730

mateuszrzeszutek opened this issue Aug 17, 2022 · 17 comments
Assignees
Labels
comp:jit segfault Issues that describe segfaults / JVM crashes userRaised

Comments

@mateuszrzeszutek
Copy link

Java -version output

openjdk version "18.0.2" 2022-07-19
IBM Semeru Runtime Open Edition 18.0.2.0 (build 18.0.2+9)
Eclipse OpenJ9 VM 18.0.2.0 (build openj9-0.33.0, JRE 18 Linux amd64-64-Bit Compressed References 20220804_30 (JIT enabled, AOT enabled)
OpenJ9   - 04a55b45b
OMR      - b58aa2[7](https://github.com/open-telemetry/opentelemetry-java-instrumentation/runs/7858608971?check_suite_focus=true#step:4:8)0[8](https://github.com/open-telemetry/opentelemetry-java-instrumentation/runs/7858608971?check_suite_focus=true#step:4:9)
JCL      - dff[9](https://github.com/open-telemetry/opentelemetry-java-instrumentation/runs/7858608971?check_suite_focus=true#step:4:10)c97e1c7 based on jdk-18.0.2+9)

Summary of problem

The OpenTelemetry Java Instrumentation builds have recently started randomly failing on JIT compilation in the Java 18 openj9 build.

Diagnostic files

You can find the diagnostic files attached to the GHA build: https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/2867509938 in the javacore-test-18 artifact.

In case you need more examples, this has happened several times in our daily build job: https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/2865361522, https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/2858576759

Unhandled exception
Type=Segmentation error vmState=0x00053cff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FB99670F9C0 Handler2=00007FB99666D5C0 InaccessibleAddress=0000000000000008
RDI=00007FB99424A2B0 RSI=0000000000000000 RAX=0000000000000000 RBX=0000000000000000
RCX=00007FB99424A680 RDX=0000000000000000 R8=0000000000000000 R9=0000000000000004
R10=00000000000A5800 R11=00007FB974D26A00 R12=0000000000000000 R13=00007FB995F91100
R14=00007FB99424A270 R15=00000000AAAAAAAB
RIP=00007FB9958B9BD0 GS=0000 FS=0000 RSP=00007FB99424A238
EFlags=0000000000010202 CS=0033 RBP=00007FB99424A270 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000004 CR2=0000000000000008
xmm0 00007fb99424f780 (f: 2485450752.000000, d: 6.938412e-310)
xmm1 00007fb9900d6c90 (f: 2416798976.000000, d: 6.938409e-310)
xmm2 ffffffffffffffff (f: 4294967296.000000, d: -nan)
xmm3 00007fb93a9d9240 (f: 983405120.000000, d: 6.938338e-310)
xmm4 00007fb93a9bc7d0 (f: 983287744.000000, d: 6.938338e-310)
xmm5 ffffffffffffffff (f: 4294967296.000000, d: -nan)
xmm6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 436e776f6e4b6c6c (f: 1850436736.000000, d: 6.860456e+16)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 1802090825430005 (f: 625147904.000000, d: 4.941237e-193)
xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/opt/hostedtoolcache/Java_Adopt-OpenJ9_jdk/18.0.2-9/x64/lib/default/libj9jit29.so
Module_base_address=00007FB9952F3000

Method_being_compiled=java/util/Collections.unmodifiableCollection(Ljava/util/Collection;)Ljava/util/Collection;
Target=2_90_20220804_30 (Linux 5.15.0-1017-azure)
CPU=amd64 (2 logical CPUs) (0x1b1ffe000 RAM)
----------- Stack Backtrace -----------
_ZNK2TR13NodeChecklist8containsEPNS_4NodeE+0x0 (0x00007FB9958B9BD0 [libj9jit29.so+0x5c6bd0])
_ZN2TR20PreorderNodeIterator11stepForwardEv+0x43 (0x00007FB9958BB203 [libj9jit29.so+0x5c8203])
_ZN18TR_ColdBlockMarker11isBlockColdEPN2TR5BlockE+0xb6 (0x00007FB995A60766 [libj9jit29.so+0x76d766])
_ZN18TR_ColdBlockMarker18identifyColdBlocksEv+0x9a (0x00007FB995A60ACA [libj9jit29.so+0x76daca])
_ZN18TR_ColdBlockMarker7performEv+0x3a (0x00007FB995A60CFA [libj9jit29.so+0x76dcfa])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0x875 (0x00007FB995A89AA5 [libj9jit29.so+0x796aa5])
_ZN3OMR9Optimizer8optimizeEv+0x1bb (0x00007FB995A8B83B [libj9jit29.so+0x79883b])
_ZN3OMR20ResolvedMethodSymbol5genILEP11TR_FrontEndPN2TR11CompilationEPNS3_20SymbolReferenceTableERNS3_12IlGenRequestE+0x523 (0x00007FB99588D3C3 [libj9jit29.so+0x59a3c3])
_ZN3OMR11Compilation7compileEv+0x24f (0x00007FB995872CBF [libj9jit29.so+0x57fcbf])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x49b (0x00007FB99546AADB [libj9jit29.so+0x177adb])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x362 (0x00007FB99546BAB2 [libj9jit29.so+0x178ab2])
omrsig_protect+0x2b1 (0x00007FB99666E2F1 [libj9prt29.so+0x2a2f1])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x360 (0x00007FB995469650 [libj9jit29.so+0x176650])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x128 (0x00007FB995469998 [libj9jit29.so+0x176998])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x368 (0x00007FB9954688C8 [libj9jit29.so+0x1758c8])
_ZN2TR24CompilationInfoPerThread3runEv+0x42 (0x00007FB995468C32 [libj9jit29.so+0x175c32])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x82 (0x00007FB995468CE2 [libj9jit29.so+0x175ce2])
omrsig_protect+0x2b1 (0x00007FB99666E2F1 [libj9prt29.so+0x2a2f1])
_Z21compilationThreadProcPv+0x177 (0x00007FB9954690A7 [libj9jit29.so+0x1760a7])
thread_wrapper+0x163 (0x00007FB996636333 [libj9thr29.so+0xb333])
start_thread+0xd9 (0x00007FB996EF9609 [libpthread.so.0+0x8609])
clone+0x43 (0x00007FB996E18133 [libc.so.6+0x11f133])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2022/08/16 13:37:44 - please wait.
JVMDUMP032I JVM requested System dump using '/home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/instrumentation/spring/spring-boot-autoconfigure/core.20220816.133744.23602.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.25315.

JVMDUMP012E Error in System dump: The core file created by child process with pid = 25315 was not found. Expected to find core file with name "/home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/instrumentation/spring/spring-boot-autoconfigure/core"
JVMDUMP032I JVM requested Java dump using '/home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/instrumentation/spring/spring-boot-autoconfigure/javacore.20220816.133744.23602.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/instrumentation/spring/spring-boot-autoconfigure/javacore.20220816.133744.23602.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/instrumentation/spring/spring-boot-autoconfigure/Snap.20220816.133744.23602.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/instrumentation/spring/spring-boot-autoconfigure/Snap.20220816.133744.23602.0003.trc
JVMDUMP032I JVM requested JIT dump using '/home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/instrumentation/spring/spring-boot-autoconfigure/jitdump.20220816.133744.23602.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-000' thread 0x0000000000018F00
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
JVMDUMP048I JIT dump method being compiled is an ordinary method
JVMDUMP053I JIT dump is recompiling java/util/Collections.unmodifiableCollection(Ljava/util/Collection;)Ljava/util/Collection;
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
JVMDUMP010I JIT dump written to /home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/instrumentation/spring/spring-boot-autoconfigure/jitdump.20220816.133744.23602.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".
@pshipton
Copy link
Member

@0xdaryl fyi

@pshipton
Copy link
Member

java -Xjit:vmState=0x00053cff
vmState [0x53cff]: {J9VMSTATE_JIT} {coldBlockMarker}

@hzongaro
Copy link
Member

@mateuszrzeszutek, is the test system able to capture a core file? Failing that, is there any way I could reproduce the problem in a stand-alone way? And if not, I'm wondering if you might be able to capture the bytecode for java/util/Collections.unmodifiableCollection, which I'm guessing is being instrumented in the test environment.

@hzongaro
Copy link
Member

I'm wondering if you might be able to capture the bytecode for java/util/Collections.unmodifiableCollection, which I'm guessing is being instrumented in the test environment

Ignore that - I guess it's not being instrumented

@hzongaro
Copy link
Member

@mateuszrzeszutek, if capturing a core file is not easy to do, are you easily able to rerun the affected test with the following additional JIT compiler options? That might help shed more light on the problem.

-Xjit:{java/util/Collections.unmodifiableCollection*}(traceBC,traceILGen,log=issue15730.log),verbose,vlog=issue15730.vlog

@0xdaryl
Copy link
Contributor

0xdaryl commented Sep 20, 2022

We won't get to the bottom of this for 0.35. Moving to 0.36.

@hzongaro
Copy link
Member

hzongaro commented Oct 4, 2022

Hello @mateuszrzeszutek. I was wondering whether you have had any success in attempting to capture any trace logs or core files. They would be a great help in trying to resolve this problem.

@mateuszrzeszutek
Copy link
Author

Hey,
Sorry for being quiet, I was on vacation/sick leave for some time and this flew under my radar completely. I'll try to compile the files you requested tomorrow.

@hzongaro
Copy link
Member

Hi @mateuszrzeszutek. Have you had any luck producing core files or log files?

@hzongaro
Copy link
Member

Hi, @mateuszrzeszutek. I just wanted to follow up to see whether you were able to produce any core files or log files for this problem. It will be difficult to figure out what might be going wrong without them.

@0xdaryl
Copy link
Contributor

0xdaryl commented Nov 15, 2022

Moving to 0.38 release.

@hzongaro
Copy link
Member

Hi, @mateuszrzeszutek. I wanted to follow up again to see whether you're still seeing this problem and whether you might be able to gather any core or log files to help us investigate the cause of the crash.

@hzongaro
Copy link
Member

hzongaro commented Mar 1, 2023

Since I haven't heard from the originator of this problem recently, I spent a little bit of time digging into the jitdump file again, jitdump.20220816.133744.23602.0004.dmp.txt

In the version of IBM Semeru that's involved, the bytecode for java/util/Collections.unmodifiableCollection looks like this:

  public static <T> java.util.Collection<T> unmodifiableCollection(java.util.Collection<? extends T>);
    Code:
       0: aload_0
       1: invokeinterface #157,  1          // InterfaceMethod java/util/Collection.getClass:()Ljava/lang/Class;
       6: ldc           #161                // class java/util/Collections$UnmodifiableCollection
       8: if_acmpne     13
      11: aload_0
      12: areturn
      13: new           #161                // class java/util/Collections$UnmodifiableCollection
      16: dup
      17: aload_0
      18: invokespecial #163                // Method java/util/Collections$UnmodifiableCollection."<init>":(Ljava/util/Collection;)V
      21: areturn

The failure seems to occur in an AOT compilation. In the IL that's shown in the jitdump, we see this.

n3n       BBStart <block_2>                                                                   [0x7fb9399bc4b0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n11n      NULLCHK on n5n [#32]                                                                [0x7fb9399bc730] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n10n        PassThrough                                                                       [0x7fb9399bc6e0] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n5n           aload  c<parm 0 Ljava/util/Collection;>[#356  Parm] [flags 0x40000107 0x0 ]     [0x7fb9399bc550] bci=[-1,0,-] rc=4 vc=0 vn=- li=- udi=- nc=0
n18n      ZEROCHK [#54]                                                                       [0x7fb9399bc960] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n16n        iload  <temp slot 4>[#364  Auto] [flags 0x3 0x0 ]                                 [0x7fb9399bc8c0] bci=[-1,3,-] rc=2 vc=0 vn=- li=- udi=- nc=0
n63n      istore  <temp slot 3>[#363  Auto] [flags 0x3 0x0 ]                                  [0x7fb939a38370] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n16n        ==>iload
n7n       treetop                                                                             [0x7fb9399bc5f0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n6n         aloadi  <vft-symbol>[#285  Shadow] [flags 0x18607 0x0 ]                           [0x7fb9399bc5a0] bci=[-1,3,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n5n           ==>aload
n15n      treetop                                                                             [0x7fb9399bc870] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n9n         aloadi  <javaLangClassFromClass>[#275  Shadow +48] [flags 0x607 0x0 ]             [0x7fb9399bc690] bci=[-1,3,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n6n           ==>aloadi
n62n      astore  <temp slot 2>[#362  Auto] [flags 0x7 0x0 ]                                  [0x7fb939a38320] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n9n         ==>aloadi
n59n      treetop                                                                             [0x7fb939a38230] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n5n         ==>aload
n71n      ifacmpeq --> block_10 BBStart at n68n ()                                            [0x7fb939a385f0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x20
n5n         ==>aload
n70n        aconst NULL                                                                       [0x7fb939a385a0] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=0
n61n      BBEnd </block_2> =====                                                              [0x7fb939a382d0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=0

n60n      BBStart <block_8>                                                                   [0x7fb939a38280] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n78n      ResolveCHK [#290]                                                                   [0x7fb939a38820] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1

I believe the ZEROCHK at n18n is supposed to test the result of instanceof, and throw an IncompatibleClassChangeError if the object is not an instance of java.util.Collection (though getClass is actually a final method on java.lang.Object). However, it looks like the ZEROCHK is operating on the value of a variable -- #364 -- that is never set.

Putting that aside for now, IL generation is generating inline code for the call to getClass, but a ResolveCHK still appears, presumably with a NULL child in place of the missing call to getClass.

I spent some time trying to reproduce the failure, but I'm hitting this AOT error:

storeValidationRecordIfNecessary:
        constantPool 00000000000A4C50 cpIndex 0
        reloKind 50 isStatic 0
        method 000000000009D070 from class 000000000009D800 java/util/Collections
        definingClass 000000000003D800
        definingClass name java/lang/Object
        Created new AOT class info 00007FEF285B7330
Compilation Failed Because: Method symbol reference is final in object

In the original jitdump, this AOT message appears for the recompilation:

AOT support of annotations temporarily disabled
ImproperInterfaceMethodFromCPRecord
        _method=0x000000000003BD48
        _beholder=0x00000000000A3F00
        className=java/util/Collections
        _cpIndex=60
        kind=97
        id=29

I will spend some more time trying to piece together how we might come to generate that IL.

@0xdaryl
Copy link
Contributor

0xdaryl commented Mar 2, 2023

Moving to 0.40.

@hzongaro
Copy link
Member

I'm going to move this to the Backlog. I strongly suspect the problem still exists, but we haven't been able to reproduce it, and the originator hasn't reported any further occurrences of the problem, which hampers investigation.

@eminence
Copy link

Fixed by #19604 I think

@hzongaro
Copy link
Member

As @eminence mentioned, this was likely fixed by pull request #19604. Closing.

@hzongaro hzongaro closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit segfault Issues that describe segfaults / JVM crashes userRaised
Projects
None yet
Development

No branches or pull requests

5 participants