Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK11 Segmentation error vmState=0x00020019 #15474

Closed
connglli opened this issue Jul 1, 2022 · 9 comments · Fixed by #15870
Closed

JDK11 Segmentation error vmState=0x00020019 #15474

connglli opened this issue Jul 1, 2022 · 9 comments · Fixed by #15870
Labels
comp:jit segfault Issues that describe segfaults / JVM crashes userRaised

Comments

@connglli
Copy link

connglli commented Jul 1, 2022

Java -version output

openjdk version "11.0.16-internal" 2022-07-19
OpenJDK Runtime Environment (build 11.0.16-internal+0-adhoc..openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build master-4ca209b54, JRE 11 Linux amd64-64-Bit Compressed References 20220615_000000 (JIT enabled, AOT enabled)
OpenJ9   - 4ca209b54
OMR      - 26b89f9f9
JCL      - 231dcc9eeb based on jdk-11.0.16+6)

Summary of problem

The following Test.java, which is reduced by us, crashes OpenJ9's JIT compiler. Even through the stacktrace shows the crash happens inside libj9gc29.so; the bug disappears if you add -Xint (and the last item of the stacktrace is also a call in the JIT compiler which calls into perhaps an uncommon NPE trap through a slowpath). So we reckon this to be a JIT bug.

class Test {
  long instanceCount;
  float fFld;

  void vMeth1(long l, int i1, long l1) {
    switch (19) {
      case 19:
        int ax$15 = 555550;
        byte[] ax$14 = new byte[ax$15];
        int ax$17 = ax$14.length;
        for (; ax$17 > 0; ax$17--) {
          int ax$16 = ax$14.length - ax$17;
          ax$14[ax$16] = (byte) 0xff;
        }
    }
  }

  void vMeth(float f) {
    int i = 11;
    vMeth1(instanceCount, i, 309183205857848145L);
  }

  void mainTest(String[] strArr1) {
    vMeth(fFld);
  }

  public static void main(String[] strArr) {
    Test _instance = new Test();
    for (int i; ; ) _instance.mainTest(strArr);
  }
}

Diagnostic files

By issuing

$ java Test

the following crash log is given:

Unhandled exception
Type=Segmentation error vmState=0x00020019
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000080
Handler1=00007F0855E2EFD0 Handler2=00007F0855C16EF0 InaccessibleAddress=0000000000000000
RDI=00007F084FE74BAF RSI=00007F0856085870 RAX=00007F0856085980 RBX=00007F084FE74BAF
RCX=00007F084FF7C628 RDX=0000000000000100 R8=00007F0856085858 R9=00007F0856085B18
R10=00007F0850187610 R11=0000000000000001 R12=00007F08160442E5 R13=0000000000000001
R14=00000007FFF8AEA0 R15=00007F0850181CE0
RIP=00007F084FE62C92 GS=0000 FS=0000 RSP=00007F0856085850
EFlags=0000000000010246 CS=0033 RBP=00007F0856085870 ERR=0000000000000000
TRAPNO=000000000000000D OLDMASK=0000000000000000 CR2=0000000800000000
xmm0 40462e424373506b (f: 1131630720.000000, d: 4.436140e+01)
xmm1 000000003e3ab283 (f: 1044034176.000000, d: 5.158214e-315)
xmm2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3 bfdffffef20a4123 (f: 4060758272.000000, d: -4.999997e-01)
xmm4 000000003f800000 (f: 1065353216.000000, d: 5.263544e-315)
xmm5 bff0000000000000 (f: 0.000000, d: -1.000000e+00)
xmm6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7 4122387a6e756238 (f: 1853186560.000000, d: 5.970532e+05)
xmm8 6332313578766100 (f: 2021024000.000000, d: 6.865676e+169)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/zdata/congli/OpenJ9/jdk11/lib/default/libj9gc29.so
Module_base_address=00007F084FD33000
Target=2_90_20220615_000000 (Linux 5.4.0-117-generic)
CPU=amd64 (128 logical CPUs) (0x3ee84d8000 RAM)
----------- Stack Backtrace -----------
omrGcDebugAssertionOutput+0xc2 (0x00007F084FE62C92 [libj9gc29.so+0x12fc92])
_ZN27MM_LargeObjectAllocateStats17getSizeClassIndexEm+0x1fa (0x00007F084FE74D8A [libj9gc29.so+0x141d8a])
_ZN27MM_LargeObjectAllocateStats32decrementFreeEntrySizeClassStatsEmP26MM_FreeEntrySizeClassStatsm+0x1b (0x00007F084FE7927B [libj9gc29.so+0x14627b])
_ZN31MM_MemoryPoolAddressOrderedList11allocateTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionmRPvS5_+0x92 (0x00007F084FE567F2 [libj9gc29.so+0x1237f2])
_ZN23MM_TLHAllocationSupport11allocateTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionP17MM_MemorySubSpaceP13MM_MemoryPool+0x3e (0x00007F084FE6B4BE [libj9gc29.so+0x1384be])
_ZN24MM_MemorySubSpaceGeneric11allocateTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionP28MM_ObjectAllocationInterfaceP17MM_MemorySubSpaceS7_b+0x1ae (0x00007F084FEF61EE [libj9gc29.so+0x1c31ee])
_ZN23MM_TLHAllocationSupport7refreshEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0x414 (0x00007F084FE6BEC4 [libj9gc29.so+0x138ec4])
_ZN23MM_TLHAllocationSupport15allocateFromTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0xde (0x00007F084FE6C0AE [libj9gc29.so+0x1390ae])
_ZN25MM_TLHAllocationInterface14allocateObjectEP18MM_EnvironmentBaseP22MM_AllocateDescriptionP14MM_MemorySpaceb+0x10a (0x00007F084FE6AA0A [libj9gc29.so+0x137a0a])
_Z21OMR_GC_AllocateObjectP12OMR_VMThreadP25MM_AllocateInitialization+0x20b (0x00007F084FE70DDB [libj9gc29.so+0x13dddb])
J9AllocateObject+0x226 (0x00007F084FD7BA56 [libj9gc29.so+0x48a56])
internalSetCurrentExceptionWithCause+0x2e9 (0x00007F0855E23629 [libj9vm29.so+0x34629])
setCurrentExceptionWithUtfCause+0x55 (0x00007F0855E23CC5 [libj9vm29.so+0x34cc5])
old_slow_jitHandleNullPointerExceptionTrap+0x93 (0x00007F0854BE7503 [libj9jit29.so+0x96c503])
 (0x00007F0854C08405 [libj9jit29.so+0x98d405])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2022/07/01 14:46:54 - please wait.
JVMDUMP032I JVM requested System dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/87.openj9/mutant/red/core.20220701.144654.2810124.0001.dmp' in response to an event
JVMDUMP010I System dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/87.openj9/mutant/red/core.20220701.144654.2810124.0001.dmp
JVMDUMP032I JVM requested Java dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/87.openj9/mutant/red/javacore.20220701.144654.2810124.0002.txt' in response to an event
JVMDUMP010I Java dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/87.openj9/mutant/red/javacore.20220701.144654.2810124.0002.txt
JVMDUMP032I JVM requested Snap dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/87.openj9/mutant/red/Snap.20220701.144654.2810124.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/87.openj9/mutant/red/Snap.20220701.144654.2810124.0003.trc
JVMDUMP032I JVM requested JIT dump using '/zdata/congli/ax-exp/ax-eval/2-ax-only/87.openj9/mutant/red/jitdump.20220701.144654.2810124.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'main' thread 0x000000000004DD00
JVMDUMP053I JIT dump is recompiling Test.vMeth1(JIJ)V
JVMDUMP010I JIT dump written to /zdata/congli/ax-exp/ax-eval/2-ax-only/87.openj9/mutant/red/jitdump.20220701.144654.2810124.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".

Please also check openj9-bug-87.tar.gz for all the logs (jitdump, snap, etc.), the test (Test.java, Test.class), and the unreduced test (Test.java.orig).

Notice

  1. The given Test.java (which is reduced by us) is always reproducible for us. If it is not reproducible for you, please use Test.java.orig in the above link.
  2. Sometimes it crashes without any stacktraces where the stacktrace part just shows 0x000000000100000 [<unknown>+0x0]
@dmitripivkine
Copy link
Contributor

I am going to triage the crash from GC point of view

@pshipton pshipton added this to the Release 0.34 (Java 19) milestone Jul 5, 2022
@pshipton pshipton added the segfault Issues that describe segfaults / JVM crashes label Jul 5, 2022
@dmitripivkine
Copy link
Contributor

The reason for crash is heap memory corruption at 0x7FFF8AEA0 where !mm_memorypooladdressorderedlist 0x00007F0850181CE0 expects to get Linked Free Header _heapFreeList = !mm_heaplinkedfreeheader 0x00000007FFF8AEA0. Or, alternatively, _heapFreeList might be set to bogus value somehow

0x7FFF8AE80 :  00000000 00000000 00000000 00000000 [ ................ ]
0x7FFF8AE90 :  00000000 00000000 00000000 ffffff00 [ ................ ]
0x7FFF8AEA0 :  ffffffff ffffffff ffffffff ffffffff [ ................ ] <-------
0x7FFF8AEB0 :  ffffffff ffffffff ffffffff ffffffff [ ................ ]
0x7FFF8AEC0 :  ffffffff ffffffff ffffffff ffffffff [ ................ ]

@dmitripivkine
Copy link
Contributor

Is it possible heap corruption occur at out-of-bound initialization of large byte array?
Previous to corruption object !j9indexableobject 0x00000007FFF03470 0 555550 is initialized by 0x00.
However previous-previous object !j9indexableobject 0x00000007FFE7BA40 0 555550 is initialized by 0xFF

@0xdaryl FYI

@hzongaro
Copy link
Member

I can reproduce this easily with 0.33 but not with 0.32, so it could be a new problem. I'll take a closer look.

@hzongaro
Copy link
Member

It looks like this problem is related to Idiom Recognition. Running the test with this option avoids the problem.

-Xjit:{Test.vMeth*}(disableIdiomRecognition)

I believe that issue #15592 and issue #15575 are duplicates of this issue

@hzongaro
Copy link
Member

I mentioned that I was unable to reproduce this problem with 0.32. That's because at some point Idiom Recognition was disabled outside of jitserver compilations. It was turned on again in 0.33, re-exposing what was likely an existing problem.

@hzongaro
Copy link
Member

I should have come back to summarize what Idiom Recognition is doing in this case. Running Test with this option

-Xjit:limit={Test.vMeth1*},{Test.vMeth1*}\(log=test.vmeth1.log,traceIdiomRecognition\)

the loop in vMeth1 is replaced with an arrayset that looks like this:

n24n      istore  <auto slot 8>[#423  Auto] [flags 0x3 0x0 ]                                  [0x7f35a6becfb0] bci=[-1,33,10] rc=0 vc=482 vn=6 li=3 udi=2 nc=1
n16n        ==>iconst 0x87a1e
  ...
n161n     treetop                                                                             [0x7f35a6c6a210] bci=[-1,53,13] rc=0 vc=0 vn=50 li=- udi=- nc=1
n160n       arrayset  <arrayset>[#219  helper Method] [flags 0x400 0x0 ] ()                   [0x7f35a6c6a1c0] bci=[-1,53,13] rc=1 vc=0 vn=49 li=- udi=- nc=3 flg=0x20
n144n         aladd (X>=0 internalPtr sharedMemory )                                          [0x7f35a6c69cc0] bci=[-1,53,13] rc=1 vc=484 vn=33 li=- udi=- nc=2 flg=0x8100
n145n           aload  <auto slot 7>[#422  Auto] [flags 0x7 0x0 ] (X!=0 sharedMemory )        [0x7f35a6c69d10] bci=[-1,48,13] rc=1 vc=484 vn=34 li=- udi=6 nc=0 flg=0x4
n146n           lsub (highWordZero X>=0 cannotOverflow )                                      [0x7f35a6c69d60] bci=[-1,53,13] rc=1 vc=484 vn=35 li=- udi=- nc=2 flg=0x5100
n147n             i2l (highWordZero X>=0 )                                                    [0x7f35a6c69db0] bci=[-1,53,13] rc=1 vc=484 vn=36 li=- udi=- nc=1 flg=0x4100
n148n               isub (X>=0 cannotOverflow )                                               [0x7f35a6c69e00] bci=[-1,45,12] rc=1 vc=484 vn=37 li=- udi=- nc=2 flg=0x1100
n149n                 iconst 0x87a1e (X!=0 X>=0 cannotOverflow )                              [0x7f35a6c69e50] bci=[-1,42,12] rc=1 vc=484 vn=38 li=- udi=- nc=0 flg=0x1104
n155n                 iconst 1                                                                [0x7f35a6c6a030] bci=[-1,37,11] rc=1 vc=0 vn=44 li=- udi=- nc=0
n151n             lconst -16 (X!=0 X<=0 )                                                     [0x7f35a6c69ef0] bci=[-1,53,13] rc=1 vc=484 vn=40 li=- udi=- nc=0 flg=0x204
n152n         bconst  -1 (X!=0 X<=0 )                                                         [0x7f35a6c69f40] bci=[-1,53,13] rc=1 vc=484 vn=41 li=- udi=- nc=0 flg=0x204
n159n         i2l                                                                             [0x7f35a6c6a170] bci=[-1,43,12] rc=1 vc=0 vn=48 li=- udi=- nc=1
n143n           iload  <auto slot 8>[#423  Auto] [flags 0x3 0x0 ] (X>=0 cannotOverflow )      [0x7f35a6c69c70] bci=[-1,43,12] rc=1 vc=484 vn=32 li=- udi=5 nc=0 flg=0x1100

That is, it begins the arrayset at element ax$14[0x87a1e-1] (i.e., the last element) and sets 0x87a1e elements beginning there. Note that 555550 == 0x87a1e.

Looking at Idiom Recognition's pattern graph for memset:

 ptr id dagId(L=Loop) succ children (chains) (dest) (hintChildren) (flags) (TRNodeInfo)
[00007F35A4E03920]   0 11  Var 0       [] []
[00007F35A4E03A30]   1 10  Var 1       [] []
[00007F35A4E03B40]   2  9  arraybase 0 [] []
[00007F35A4E03C50]   3  8  quasiConst2 [] []
[00007F35A4E03D30]   4  7  variableORconst [] []
[00007F35A4E03E10]   5  6  constall    [] []
[00007F35A4E03EF0]   6  5  arrayindex 0 [] []
[00007F35A4E04000]   7  4  ahconst 0   [] []
[00007F35A4E04110]   8  3  iconst -1   [] []
[00007F35A4E04220]   9  2  entrynode   [10] []
[00007F35A4E04350]  10  1L i2l         [11] [6]
[00007F35A4E04470]  11  1L lmul        [12] [10 5]
[00007F35A4E045B0]  12  1L lsub        [13] [11 7]
[00007F35A4E04750]  13  1L aladd       [14] [2 12]
[00007F35A4E048B0]  14  1L conversion  [15] [4] (Optional)
[00007F35A4E049F0]  15  1L indstore    [16] [13 14]
[00007F35A4E04B50]  16  1L iaddORisub  [17] [0 8]
[00007F35A4E04C70]  17  1L istore      [18] [16 0]
[00007F35A4E04E10]  18  1L iaddORisub  [19] [1 8]
[00007F35A4E04F30]  19  1L istore      [20] [18 1]
[00007F35A4E050D0]  20  1L ifcmpall    [10 21] [0 3]
[00007F35A4E05230]  21  0  exitnode    [] []

and the target graph (with some annotations)

 ptr id dagId(L=Loop) succ children (chains) (dest) (hintChildren) (flags) (TRNodeInfo)
[00007F35A6D6CED0]   2 11  iconst 555550 [] []  TR::Node:[0x7f35a6bed1e0,]
[00007F35A6D6D000]   3 10  Var 423     [] []    TR::Node:[0x7f35a6bed280,]
[00007F35A6D6D3C0]   6  9  Var 424     [] []    TR::Node:[0x7f35a6bed320,]
[00007F35A6D6D640]   8  8  Var 422     [] []    TR::Node:[0x7f35a6bed370,]
[00007F35A6D6DA00]  11  7  lconst 1    [] []
[00007F35A6D6DBF0]  13  6  lconst -16  [] []    TR::Node:[0x7f35a6bed5a0,]
[00007F35A6D6DFE0]  16  5  bconst -1   [] []    TR::Node:[0x7f35a6bed460,]
[00007F35A6D6E270]  18  4  iconst -1   [] []    TR::Node:[0x7f35a6bed780,]
[00007F35A6D6E7A0]  22  3  iconst 0    [] []    TR::Node:[0x7f35a6c68e60,]
[00007F35A6D6CCB0]   0  2  entrynode   [1] [] chains[4 9 ]
[00007F35A6D6CDA0]   1  1L BBStart 5   [4] []   TR::Node:[0x7f35a6bed0a0,]
[00007F35A6D6D110]   4  1L iload       [5] [3] chains[20 0 ]    TR::Node:[0x7f35a6bed280,]         <<< #423
[00007F35A6D6D260]   5  1L isub        [7] [2 4] dest=6         TR::Node:[0x7f35a6bed2d0,]         <<< 555550 - #423
[00007F35A6D6D4E0]   7  1L istore      [9] [5 6] chains[10 ]    TR::Node:[0x7f35a6bed320,]         <<< #424 = 555550 - #423
[00007F35A6D6D750]   9  1L aload       [10] [8] chains[0 ]      TR::Node:[0x7f35a6bed370,]         <<< #422
[00007F35A6D6D8A0]  10  1L i2l         [12] [6] chains[7 ] hint=5       TR::Node:[0x7f35a6bed5f0,] <<< i2l #424
[00007F35A6D6DAE0]  12  1L lmul        [14] [10 11]                                                <<< #424 * 1
[00007F35A6D6DD20]  14  1L lsub        [15] [12 13]     TR::Node:[0x7f35a6bed640,]                 <<< #424 * 1 - (-16)
[00007F35A6D6DE80]  15  1L aladd       [17] [9 14]      TR::Node:[0x7f35a6bed690,]                 <<< #422 + (#424 * 1 - (-16))
[00007F35A6D6E110]  17  1L bstorei     [19] [15 16]     TR::Node:[0x7f35a6bed6e0,]                 <<< *(#422+(#424*1-(-16))) = -1
[00007F35A6D6E3A0]  19  1L iadd        [20] [4 18] dest=3       TR::Node:[0x7f35a6bed7d0,]         <<< #423 + (-1)
[00007F35A6D6E510]  20  1L istore      [21] [19 3] chains[4 23 ]        TR::Node:[0x7f35a6bed820,] <<< #423 = $423+(-1)
[00007F35A6D6E670]  21  1L asynccheck  [23] []  TR::Node:[0x7f35a6c68000,]
[00007F35A6D6E8D0]  23  1L ificmple    [1 25] [3 22] chains[20 ] hint=19        TR::Node:[0x7f35a6c68dc0,]
[00007F35A6D6EA50]  24  1L BBEnd 5     [25] []  TR::Node:[0x7f35a6c68d70,]
[00007F35A6D6EB90]  25  0  exitnode    [] []

and finally the T2P (with some annotations):

  2: negligible
  3:  0  1 SPBC (negligible)    <<< Var 423 (3) in T is both Var 0 (0) and Var 1 (1) in P
  6:  6 SPBC    (negligible)    <<< Var 424 (6) in T is arrayindex (6) in P
  8:  2 SPBC    (negligible)    <<< Var 422 (8) in T is arraybase (2) in P
 11:  5 SPBC    (negligible)
 13:  7 SPBC    (negligible)
 16:  4 SPBC    (negligible)
 18:  8 SPBC    (negligible)
 22:  3 SPBC    (negligible)
  0:  9 SPBC
  1: negligible
  4: negligible
  5: negligible
  7: negligible
  9: negligible
 10: 10 SPBC
 12: 11 SPBC
 14: 12 SPBC
 15: 13 SPBC
 17: 15 SPBC
 19: 16 18 SPBC
 20: 17 19 SPBC (negligible)
 21: negligible
 23: 20 SPBC
 24: negligible
 25: 21 SPBC    (negligible)

Ultimately, the JIT arrives at CISCTransform2ArraySet where it calls countGoodArrayIndex for Var 0 and Var 1, both of which are #423 above.

countGoodArrayIndex calls analyzeOneArrayIndex, which returns true if the opcode for the arrayindex is TR_variable, which it is in this case - #424 - but it is not the induction variable, #423. In the case where the arrayindex is a sum, on the other hand, analyzeOneArrayIndex checks that one of the operands is an induction variable.

As #423 begins the loop with the value 555550, CISCTransform2ArraySet decides that's where the arrayset must begin.

I believe that analyzeOneArrayIndex should check whether an arrayindex that is just a TR_variable is also an induction variable, just as it does for an arrayindex that is a TR_variable (or iload) plus a value.

hzongaro added a commit to hzongaro/openj9 that referenced this issue Sep 12, 2022
In examining an array index for Idiom Recognition, analyzeOneArrayIndex
checks whether the variable operand in a 'var + const' or 'const + var'
expression is an induction variable.  On the other hand, if the
expression is a simple variable, the analysis assumes the variable
reference is acceptable.  However, it can happen that the variable used
as the array index is not an induction variable which can result in an
incorrect transformation.

Fixed this by adding a check that an arrayindex that is a variable is
also an induction variable.

Fixes eclipse-openj9#15474

Signed-off-by:  Henry Zongaro <zongaro@ca.ibm.com>
@hzongaro hzongaro reopened this Sep 15, 2022
@hzongaro
Copy link
Member

Reopening this temporarily, as I had hoped to include a fix for the 0.35 milestone build.

hzongaro added a commit to hzongaro/openj9 that referenced this issue Sep 15, 2022
In examining an array index for Idiom Recognition, analyzeOneArrayIndex
checks whether the variable operand in a 'var + const' or 'const + var'
expression is an induction variable.  On the other hand, if the
expression is a simple variable, the analysis assumes the variable
reference is acceptable.  However, it can happen that the variable used
as the array index is not an induction variable which can result in an
incorrect transformation.

Fixed this by adding a check that an arrayindex that is a variable is
also an induction variable.

Fixes eclipse-openj9#15474

Signed-off-by:  Henry Zongaro <zongaro@ca.ibm.com>
@pshipton
Copy link
Member

It's merged to 0.35 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit segfault Issues that describe segfaults / JVM crashes userRaised
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants