Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk17 Jep389Tests_testClinkerFfi_UpCall Object neither in heap nor stack-allocated in thread main #16410

Closed
pshipton opened this issue Dec 2, 2022 · 7 comments · Fixed by #16427

Comments

@pshipton
Copy link
Member

pshipton commented Dec 2, 2022

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_x86-64_windows_OMR_testList_0/245
Jep389Tests_testClinkerFfi_UpCall_0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_x86-64_windows_OMR_testList_0/245/functional_test_output.tar.gz

09:22:49  TESTING:
09:22:50  WARNING: Using incubator modules: jdk.incubator.foreign
09:23:27  ...
09:23:27  ... TestNG 6.14.2 by Cédric Beust (cedric@beust.com)
09:23:27  ...
09:23:27  
09:23:27  add2IntsReturnVoid: intSum = 222852
09:23:27  
09:23:27  add2IntsReturnVoid: intSum = 222852
09:23:27  
09:23:27  add2IntsReturnVoid: intSum = 222852
09:23:27  
09:23:27  add2IntsReturnVoid: intSum = 222852
09:23:27  
09:23:27  add2IntsReturnVoid: intSum = 222852
09:23:27  
09:23:27  add2IntsReturnVoid: intSum = 222852
09:23:27  
09:23:28  add2IntsReturnVoid: intSum = 377852
09:23:28  
09:23:29  000000000001AA00: Object neither in heap nor stack-allocated in thread main
09:23:29  000000000001AA00:	O-Slot=0000000000118648
09:23:29  000000000001AA00:	O-Slot value=00007FFA00000010
09:23:29  000000000001AA00:	PC=00007FFAF699DC00
09:23:29  000000000001AA00:	framesWalked=0
09:23:29  000000000001AA00:	arg0EA=0000000000118678
09:23:29  000000000001AA00:	walkSP=0000000000118648
09:23:29  000000000001AA00:	literals=0000000000000010
09:23:29  000000000001AA00:	jitInfo=0000000000000000
09:23:29  000000000001AA00:	stack=0000000000111A38-0000000000119250
09:23:29  14:23:28.783 0x1ce100    j9mm.479    *   ** ASSERTION FAILED ** at f:\users\jenkins\workspace\build_jdk17_x86-64_windows_omr\openj9\runtime\gc_glue_java\ScavengerRootScanner.hpp:109: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(_env)))
09:23:29  JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/12/02 09:23:28 - please wait.
09:23:29  JVMDUMP032I JVM requested System dump using 'F:\Users\jenkins\workspace\Test_openjdk17_j9_sanity.functional_x86-64_windows_OMR_testList_0\aqa-tests\TKG\output_1669979372969\Jep389Tests_testClinkerFfi_UpCall_0\core.20221202.092328.6816.0001.dmp' in response to an event
09:23:31  JVMDUMP010I System dump written to F:\Users\jenkins\workspace\Test_openjdk17_j9_sanity.functional_x86-64_windows_OMR_testList_0\aqa-tests\TKG\output_1669979372969\Jep389Tests_testClinkerFfi_UpCall_0\core.20221202.092328.6816.0001.dmp
09:23:31  JVMDUMP032I JVM requested Java dump using 'F:\Users\jenkins\workspace\Test_openjdk17_j9_sanity.functional_x86-64_windows_OMR_testList_0\aqa-tests\TKG\output_1669979372969\Jep389Tests_testClinkerFfi_UpCall_0\javacore.20221202.092328.6816.0002.txt' in response to an event
09:23:31  JVMDUMP010I Java dump written to F:\Users\jenkins\workspace\Test_openjdk17_j9_sanity.functional_x86-64_windows_OMR_testList_0\aqa-tests\TKG\output_1669979372969\Jep389Tests_testClinkerFfi_UpCall_0\javacore.20221202.092328.6816.0002.txt
09:23:31  JVMDUMP032I JVM requested Snap dump using 'F:\Users\jenkins\workspace\Test_openjdk17_j9_sanity.functional_x86-64_windows_OMR_testList_0\aqa-tests\TKG\output_1669979372969\Jep389Tests_testClinkerFfi_UpCall_0\Snap.20221202.092328.6816.0003.trc' in response to an event
09:23:31  JVMDUMP010I Snap dump written to F:\Users\jenkins\workspace\Test_openjdk17_j9_sanity.functional_x86-64_windows_OMR_testList_0\aqa-tests\TKG\output_1669979372969\Jep389Tests_testClinkerFfi_UpCall_0\Snap.20221202.092328.6816.0003.trc
09:23:31  JVMDUMP013I Processed dump event "traceassert", detail "".

This is like #15251 but for jdk17.
@dmitripivkine @ChengJin01 fyi

@ChengJin01
Copy link
Contributor

ChengJin01 commented Dec 2, 2022

I didn't see this before in upcall tests on JDK17 & 19. It doesn't seem likely related to FFI based on the results of dumps.

1XMCURTHDINFO  Current thread
3XMTHREADINFO      "GC Worker" J9VMThread:0x00000000001CE100, omrthread_t:0x000002BB3308F408, java/lang/Thread:0x000
00007001E2A58, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0xF, isDaemon:true)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x000000070016DC38)
3XMTHREADINFO1            (native thread ID:0x1AD0, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread
 flags:0x00000081)
3XMCPUTIME               CPU usage total: 0.250000000 secs, user: 0.125000000 secs, system: 0.125000000 secs, curren
t category="GC"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           No Java callstack associated with this thread
3XMTHREADINFO3           Native callstack:
4XENATIVESTACK               j9port_isCompatible+0x15e5a (0x00007FFAF6798A0A [j9prt29+0x18a0a])
4XENATIVESTACK               j9port_isCompatible+0x1564c (0x00007FFAF67981FC [j9prt29+0x181fc])
4XENATIVESTACK               j9port_isCompatible+0x18926 (0x00007FFAF679B4D6 [j9prt29+0x1b4d6])
4XENATIVESTACK               j9port_isCompatible+0x1a050 (0x00007FFAF679CC00 [j9prt29+0x1cc00])
4XENATIVESTACK               JVM_OnUnload+0x11761 (0x00007FFAF67491D1 [j9dmp29+0x191d1])
4XENATIVESTACK               JVM_OnUnload+0x7ebd (0x00007FFAF673F92D [j9dmp29+0xf92d])
4XENATIVESTACK               j9port_isCompatible+0x18926 (0x00007FFAF679B4D6 [j9prt29+0x1b4d6])
4XENATIVESTACK               j9port_isCompatible+0x1a050 (0x00007FFAF679CC00 [j9prt29+0x1cc00])
4XENATIVESTACK               JVM_OnUnload+0xab7b (0x00007FFAF67425EB [j9dmp29+0x125eb])
4XENATIVESTACK               JVM_OnUnload+0x742a (0x00007FFAF673EE9A [j9dmp29+0xee9a])
4XENATIVESTACK               j9port_isCompatible+0x18926 (0x00007FFAF679B4D6 [j9prt29+0x1b4d6])
...
4XENATIVESTACK               (0x00007FFAF6733081 [j9dmp29+0x3081])
4XENATIVESTACK               JVM_OnUnload+0x13d6d (0x00007FFAF674B7DD [j9dmp29+0x1b7dd])
4XENATIVESTACK               JVM_OnUnload+0x11bf8 (0x00007FFAF670AFA8 [j9trc29+0x1afa8])
4XENATIVESTACK               JVM_OnUnload+0x5677 (0x00007FFAF66FEA27 [j9trc29+0xea27])
4XENATIVESTACK               JVM_OnUnload+0x7d4d (0x00007FFAF67010FD [j9trc29+0x110fd])
4XENATIVESTACK               JVM_OnUnload+0x3589 (0x00007FFAF66FC939 [j9trc29+0xc939])
4XENATIVESTACK               JVM_OnUnload+0x1561 (0x00007FFAF66FA911 [j9trc29+0xa911])
4XENATIVESTACK               J9VMDllMain+0x130700 (0x00007FFAF3071710 [j9gc29+0x131710])
4XENATIVESTACK               J9VMDllMain+0x191aa (0x00007FFAF2F5A1BA [j9gc29+0x1a1ba])
4XENATIVESTACK               J9_GetInterface+0x3ab16 (0x00007FFAF6901456 [j9vm29+0x111456])
4XENATIVESTACK               J9_GetInterface+0x39b4f (0x00007FFAF690048F [j9vm29+0x11048f])
4XENATIVESTACK               J9VMDllMain+0x19155 (0x00007FFAF2F5A165 [j9gc29+0x1a165])
4XENATIVESTACK               J9VMDllMain+0x15b51 (0x00007FFAF2F56B61 [j9gc29+0x16b61])
4XENATIVESTACK               J9VMDllMain+0x15e53 (0x00007FFAF2F56E63 [j9gc29+0x16e63])
4XENATIVESTACK               J9VMDllMain+0x14d1d (0x00007FFAF2F55D2D [j9gc29+0x15d2d])
4XENATIVESTACK               J9VMDllMain+0x126b2d (0x00007FFAF3067B3D [j9gc29+0x127b3d])
4XENATIVESTACK               J9VMDllMain+0x14fdb8 (0x00007FFAF3090DC8 [j9gc29+0x150dc8])
4XENATIVESTACK               J9VMDllMain+0x150e51 (0x00007FFAF3091E61 [j9gc29+0x151e61])
4XENATIVESTACK               j9port_isCompatible+0x1a08b (0x00007FFAF679CC3B [j9prt29+0x1cc3b])
4XENATIVESTACK               J9VMDllMain+0x150fab (0x00007FFAF3091FBB [j9gc29+0x151fbb])
4XENATIVESTACK               omrthread_get_category+0xa42 (0x00007FFAF7A64232 [j9thr29+0x4232])
4XENATIVESTACK               _o_exp+0x5a (0x00007FFB0C4A268A [ucrtbase+0x2268a])
4XENATIVESTACK               0x00007FFB0F027974
4XENATIVESTACK               0x00007FFB0F54A2F1

@dmitripivkine
Copy link
Contributor

<1aa00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x000000000001AA00 ***
<1aa00> 	ITERATE_O_SLOTS
<1aa00> 	RECORD_BYTECODE_PC_OFFSET
<1aa00> Initial values: walkSP = 0x0000000000118640, PC = 0x00007FFAF699DC00, literals = 0x0000000000000010, A0 = 0x0000000000118678, j2iFrame = 0x0000000000118A40, ELS = 0x000000C3C4C7E970, decomp = 0x0000000000000000
<1aa00> JNI call-in frame: bp = 0x0000000000118678, sp = 0x0000000000118640, pc = 0x00007FFAF699DC00, cp = 0x0000000000000000, arg0EA = 0x0000000000118678, flags = 0x0000000000000000
<1aa00> Error: PC should have been advanced in order to push return value, pc = 0x00007FFAF699DC00, cipc = 0x00007FFAF699DC00 !!! <-------- Error!
<1aa00> 	Call-in return value (non-object) takes UDATA(0x0000000000000008) slots at 0x0000000000118640
<1aa00> 	Object pushes starting at 0x0000000000118648 for 2 slots
<1aa00> 		Push[0x0000000000118648] = 0x00007FFA00000010 <---- bad o-slot
<1aa00> 		Push[0x0000000000118650] = 0x000000070030B5E8
<1aa00> 	New ELS = 0x000000C3C4C7F180
<1aa00> JNI native method frame: bp = 0x00000000001186A0, sp = 0x0000000000118680, pc = 0x0000000000000007, cp = 0x0000000000000000, arg0EA = 0x00000000001186E0, flags = 0x0000000000000000
<1aa00> 	Method: openj9/internal/foreign/abi/InternalDowncallHandler.invokeNative(JJJ[J)J !j9method 0x000000000044E028
<1aa00> 	Using signature mapper
<1aa00> 	Arguments starting at 0x00000000001186E0 for UDATA(0x0000000000000008) slots
<1aa00> 		O-Slot: a0[0x00000000001186E0] = 0x00000007FFFADF78
<1aa00> 		I-Slot: a1[0x00000000001186D8] = 0x00000007FFFAFF48
<1aa00> 		I-Slot: a2[0x00000000001186D0] = 0x0000000000000000
<1aa00> 		I-Slot: a3[0x00000000001186C8] = 0x0000000000000000
<1aa00> 		I-Slot: a4[0x00000000001186C0] = 0x00007FFAF63F2730
<1aa00> 		I-Slot: a5[0x00000000001186B8] = 0x000002BB3CC7BF7D
<1aa00> 		I-Slot: a6[0x00000000001186B0] = 0x000002BB50307EA8
<1aa00> 		O-Slot: a7[0x00000000001186A8] = 0x00000007FFFAFF20

@dmitripivkine
Copy link
Contributor

<1aa00> 	                        JNI call-in frame
<1aa00> 	!j9method 0x000000000044E028   openj9/internal/foreign/abi/InternalDowncallHandler.invokeNative(JJJ[J)J
<1aa00> 	!j9method 0x000000000044E3C8   openj9/internal/foreign/abi/InternalDowncallHandler.runNativeMethod(Ljdk/incubator/foreign/Addressable;Ljdk/incubator/foreign/SegmentAllocator;[J)Ljava/lang/Object;
<1aa00> 	!j9method 0x0000000000457CC8   java/lang/invoke/LambdaForm$DMH/0x0000000050491b40.invokeSpecial(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<1aa00> 	!j9method 0x00000000004BB478   java/lang/invoke/LambdaForm$MH/0x0000000050483120.invoke(Ljava/lang/Object;ILjava/lang/Object;Ljava/lang/Object;)I
<1aa00> 	!j9method 0x00000000004BBED8   java/lang/invoke/LambdaForm$MH/0x00000000504837f0.invokeExact_MT(Ljava/lang/Object;ILjava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)I
<1aa00> 	!j9method 0x0000000000381158   org/openj9/test/jep389/upcall/UpcallMHWithStructTests.test_addByteAndBytesFromStructWithNestedStructArrayByUpcallMH()V
<1aa00> 	                        JNI call-in frame
<1aa00> 	!j9method 0x0000000000153248   jdk/internal/reflect/NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
.....

@dmitripivkine
Copy link
Contributor

I didn't see this before in upcall tests on JDK17 & 19. It doesn't seem likely related to FFI based on the results of dumps.

1XMCURTHDINFO  Current thread
3XMTHREADINFO      "GC Worker" J9VMThread:0x00000000001CE100, omrthread_t:0x000002BB3308F408, java/lang/Thread:0x000

@ChengJin01 You are looking to wrong thread stack. This is GC Worker thread scans java stack of thread

	!stack 0x0001aa00	!j9vmthread 0x0001aa00	!j9thread 0x2bb31d49700	tid 0xa1c (2588) // (main)

@ChengJin01
Copy link
Contributor

ChengJin01 commented Dec 2, 2022

We do have a callin frame built at

if (buildCallInStackFrameHelper(currentThread, &newELS, returnsObject)) {
to call into the interpreter to invoke the target MH with all arguments placed on the java stack:

if (buildCallInStackFrameHelper(currentThread, &newELS, returnsObject)) {
...
		j9object_t calleeHandle = J9VMOPENJ9INTERNALFOREIGNABIUPCALLMHMETADATA_CALLEEMH(currentThread, J9_JNI_UNWRAP_REFERENCE(data->mhMetaData));
		PUSH_OBJECT_IN_SPECIAL_FRAME(currentThread, calleeHandle);
...
		currentThread->returnValue = J9_BCLOOP_N2I_TRANSITION;
		currentThread->returnValue2 = (UDATA)data;
		c_cInterpreter(currentThread); <-------
...

against the callin frame in the stacktrace above:

> !stackslots  0x0001aa00
<1aa00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x000000000001AA00 ***
<1aa00>         ITERATE_O_SLOTS
<1aa00>         RECORD_BYTECODE_PC_OFFSET
<1aa00> Initial values: walkSP = 0x0000000000118640, PC = 0x00007FFAF699DC00, literals = 0x0000000000000010, A0 = 0x0000000000118678, j2iFrame = 0x0000000000118A40, ELS = 0x000000C3C4C7E970, decomp = 0x0000000000000000
<1aa00> JNI call-in frame: bp = 0x0000000000118678, sp = 0x0000000000118640, pc = 0x00007FFAF699DC00, cp = 0x0000000000000000, arg0EA = 0x0000000000118678, flags = 0x0000000000000000
<1aa00> Error: PC should have been advanced in order to push return value, pc = 0x00007FFAF699DC00, cipc = 0x00007FFAF699DC00 !!!
<1aa00>         Call-in return value (non-object) takes UDATA(0x0000000000000008) slots at 0x0000000000118640
<1aa00>         Object pushes starting at 0x0000000000118648 for 2 slots
<1aa00>                 Push[0x0000000000118648] = 0x00007FFA00000010
<1aa00>                 Push[0x0000000000118650] = 0x000000070030B5E8 <------- the DMH (target MH)
<1aa00>         New ELS = 0x000000C3C4C7F180
<1aa00> JNI native method frame: bp = 0x00000000001186A0, sp = 0x0000000000118680, pc = 0x0000000000000007, cp = 0x0000000000000000, arg0EA = 0x00000000001186E0, flags = 0x0000000000000000
<1aa00>         Method: openj9/internal/foreign/abi/InternalDowncallHandler.invokeNative(JJJ[J)J !j9method 0x000000000044E028
<1aa00>         Using signature mapper

> !j9object 0x000000070030B5E8
!J9Object 0x000000070030B5E8 {
        struct J9Class* clazz = !j9class 0x15A600 // java/lang/invoke/DirectMethodHandle
        Object flags = 0x00000000;
        Ljava/lang/invoke/MethodType; type = !fj9object 0xe006155a (offset = 0) (java/lang/invoke/MethodHandle)
        Ljava/lang/invoke/LambdaForm; form = !fj9object 0xe0061926 (offset = 4) (java/lang/invoke/MethodHandle)
        Ljava/lang/invoke/MethodHandle; asTypeCache = !fj9object 0x0 (offset = 8) (java/lang/invoke/MethodHandle)
        B customizationCount = 0x00000000 (offset = 16) (java/lang/invoke/MethodHandle)
        Z updateInProgress = 0x00000000 (offset = 20) (java/lang/invoke/MethodHandle)
        Ljava/lang/invoke/MemberName; jitVMEntryKeepAlive = !fj9object 0x0 (offset = 12) (java/lang/invoke/MethodHandle) <hidden>
        Ljava/lang/invoke/MemberName; member = !fj9object 0xe0085f0e (offset = 28) (java/lang/invoke/DirectMethodHandle)
        Z crackable = 0x00000001 (offset = 24) (java/lang/invoke/DirectMethodHandle)
        I lockword = 0x00000000 (offset = 32) (java/lang/invoke/DirectMethodHandle) <hidden>

When setting arguments (including primitive types, struct and pointer) on the java stack, we need to push the allocated struct/pointer objects to the special frame before allocating another struct/pointer object for the next argument if exists in the argument list to avoid updating by GC.

According to the test case in native at

char byteSum = (*upcallMH)(arg1, arg2);

char
addByteAndBytesFromStructWithNestedStructArrayByUpcallMH(char arg1, stru_NestedStruArray_Byte arg2, char (*upcallMH)(char, stru_NestedStruArray_Byte))
{
	char byteSum = (*upcallMH)(arg1, arg2); <------
	return byteSum;
}

there are two arguments: arg1(char) and arg2(stru_NestedStruArray_Byte which is a struct that is allocated in the dispatcher and pushed to the special frame) which were placed on the java stack.

Now the question what should be or was stored in Push[0x0000000000118648] (arg1 or arg2?) and why 0x00007FFA00000010 was messed up in there as we don't touch them any more once all arguments for upcall are pushed on there.

@pshipton
Copy link
Member Author

pshipton commented Dec 6, 2022

There was a crash on jdk19 as well in PR testing, although it may be a different issue.
#16422

@ChengJin01
Copy link
Contributor

ChengJin01 commented Dec 6, 2022

As talked to Dmitri, GAC & Jack offline as to how PUSH_OBJECT_IN_SPECIAL_FRAME works to affect GC in upcall, I realized that PUSH_OBJECT_IN_SPECIAL_FRAME was misused in our upcall dispatcher. The initial intention was to leverage PUSH_OBJECT_IN_SPECIAL_FRAME to stop GC from updating the previous allocated memory (struct/pointer) arguments every time when triggering GC in allocating the next memory argument, in which case the GC already recorded the count of argument when iterating the java stack after building the callin frame and mistakenly treated the second argument as an object (which was supposed to be a byte) in the failing test.

The problem can be easily captured before call-in in the the dispatcher on Linux platform by triggering dumps in the debugger:

<c8200>         RECORD_BYTECODE_PC_OFFSET
<c8200> Initial values: walkSP = 0x00000000001ADE78, PC = 0x00007FFFF793A290, literals = 0x0000000000000010, A0 = 0x00000000001ADEB0, j2iFrame = 0x0000000000000000, ELS = 0x00007FFFF7A9C9B0, decomp = 0x0000000000000000
<c8200> JNI call-in frame: bp = 0x00000000001ADEB0, sp = 0x00000000001ADE78, pc = 0x00007FFFF793A290, cp = 0x0000000000000000, arg0EA = 0x00000000001ADEB0, flags = 0x0000000000000000
<c8200> Error: PC should have been advanced in order to push return value, pc = 0x00007FFFF793A290, cipc = 0x00007FFFF793A290 !!!
<c8200>         Call-in return value (non-object) takes UDATA(0x0000000000000008) slots at 0x00000000001ADE78
<c8200>         Object pushes starting at 0x00000000001ADE80 for 2 slots
<c8200>                 Push[0x00000000001ADE80] = 0x0000000700000010 <----- a byte (0x10) (treated as an object in GC)
<c8200>                 Push[0x00000000001ADE88] = 0x00000007FFEC19D0
<c8200>         New ELS = 0x00007FFFF7A9DF80

The fundamental reason is that PUSH_OBJECT_IN_SPECIAL_FRAME is not supposed to use in our complicated situation to deal with multiple memory argument allocations which might trigger GC in setting up arguments on the java stack in upcall. To avoid this issue based on the existing implementation in the dispatcher, the ideal solution should be avoid using PUSH_OBJECT_IN_SPECIAL_FRAME in building the call-in frame in upcall:

  1. move the shared session/scope to the java code instead of allocating it in native.
  2. create an argument array in java which is used in native to store the allocated memory argument in native without PUSH_OBJECT_IN_SPECIAL_FRAME involved (given the java argument array won't be bothered by GC) before creating the call-in frame.
  3. load the memory argument out of the java argument array and place it on the java stack before calling into the interpreter.

ChengJin01 added a commit to ChengJin01/openj9 that referenced this issue Dec 8, 2022
The idea is to store the argument objects in native to the java
array rather to avoid misusing PUSH_OBJECT_IN_SPECIAL_FRAME so
as to stop GC from updating the references before calling into
the interpreter in upcall.

Fixes: eclipse-openj9#16410, eclipse-openj9#16422

Signed-off-by: ChengJin01 <jincheng@ca.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants