Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk11 plinux compiled with gcc 7.3 doesn't work with NativeImageBuffer #2788

Closed
pshipton opened this issue Sep 6, 2018 · 12 comments
Closed

Comments

@pshipton
Copy link
Member

pshipton commented Sep 6, 2018

As per #2718, jdk11 plinux compiled with gcc 7.3 cannot load classes from the jimage file, unless -Djdk.image.use.jvm.map=false is specified.

10:53:17 java.lang.ExceptionInInitializerError
10:53:17 	at java.lang.J9VMInternals.ensureError(java.base@11-internal/J9VMInternals.java:191)
10:53:17 	at java.lang.J9VMInternals.recordInitializationFailure(java.base@11-internal/J9VMInternals.java:180)
10:53:17 	at jdk.internal.module.SystemModuleFinders$SystemModuleReader.findImageLocation(java.base@11-internal/SystemModuleFinders.java:426)
10:53:17 	at jdk.internal.module.SystemModuleFinders$SystemModuleReader.find(java.base@11-internal/SystemModuleFinders.java:437)
10:53:17 	at jdk.internal.loader.BuiltinClassLoader$2.run(java.base@11-internal/BuiltinClassLoader.java:576)
10:53:17 	at jdk.internal.loader.BuiltinClassLoader$2.run(java.base@11-internal/BuiltinClassLoader.java:571)
10:53:17 	at java.security.AccessController.doPrivileged(java.base@11-internal/AccessController.java:696)
10:53:17 	at jdk.internal.loader.BuiltinClassLoader.findMiscResource(java.base@11-internal/BuiltinClassLoader.java:570)
10:53:17 	at jdk.internal.loader.BuiltinClassLoader.findResource(java.base@11-internal/BuiltinClassLoader.java:457)
10:53:17 	at jdk.internal.loader.BootLoader.findResource(java.base@11-internal/BootLoader.java:154)
10:53:17 	at java.lang.ClassLoader.getResource(java.base@11-internal/ClassLoader.java:692)
10:53:17 	at java.lang.ClassLoader.getResource(java.base@11-internal/ClassLoader.java:694)
10:53:17 	at org.apache.tools.ant.launch.Locator.getResourceSource(Locator.java:140)
10:53:17 	at org.apache.tools.ant.launch.Locator.getClassSource(Locator.java:118)
10:53:17 	at org.apache.tools.ant.launch.Launcher.run(Launcher.java:180)
10:53:17 	at org.apache.tools.ant.launch.Launcher.main(Launcher.java:112)
10:53:17 Caused by: java.io.UncheckedIOException: java.io.IOException: "/home/jenkins/jenkins-agent/workspace/Test-extended.functional-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/lib/modules" is not an image file
10:53:17 	at jdk.internal.jimage.ImageReaderFactory.getImageReader(java.base@11-internal/ImageReaderFactory.java:87)
10:53:17 	at jdk.internal.module.SystemModuleFinders$SystemImage.<clinit>(java.base@11-internal/SystemModuleFinders.java:383)
10:53:17 	... 14 more
10:53:17 Caused by: java.io.IOException: "/home/jenkins/jenkins-agent/workspace/Test-extended.functional-JDK11-linux_ppc-64_cmprssptrs_le/openjdkbinary/j2sdk-image/lib/modules" is not an image file
10:53:17 	at jdk.internal.jimage.BasicImageReader.readHeader(java.base@11-internal/BasicImageReader.java:192)
10:53:17 	at jdk.internal.jimage.BasicImageReader.<init>(java.base@11-internal/BasicImageReader.java:152)
10:53:17 	at jdk.internal.jimage.ImageReader$SharedImageReader.<init>(java.base@11-internal/ImageReader.java:224)
10:53:17 	at jdk.internal.jimage.ImageReader$SharedImageReader.open(java.base@11-internal/ImageReader.java:238)
10:53:17 	at jdk.internal.jimage.ImageReader.open(java.base@11-internal/ImageReader.java:67)
10:53:17 	at jdk.internal.jimage.ImageReader.open(java.base@11-internal/ImageReader.java:71)
10:53:17 	at jdk.internal.jimage.ImageReaderFactory$1.apply(java.base@11-internal/ImageReaderFactory.java:70)
10:53:17 	at jdk.internal.jimage.ImageReaderFactory$1.apply(java.base@11-internal/ImageReaderFactory.java:67)
10:53:17 	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(java.base@11-internal/ConcurrentHashMap.java:1705)
10:53:17 	at jdk.internal.jimage.ImageReaderFactory.get(java.base@11-internal/ImageReaderFactory.java:61)
10:53:17 	at jdk.internal.jimage.ImageReaderFactory.getImageReader(java.base@11-internal/ImageReaderFactory.java:85)
10:53:17 	... 15 more
@pshipton
Copy link
Member Author

pshipton commented Sep 6, 2018

Notes from an old investigation of a similar problem that ended up being a compiler mis-match.

5. Bain, Peter (P.D.), May 16, 2017, 2:12 PM
The header is messed up: 0000000 da da fe ca 00 00 01 00 00 00 00 00 31 8c 00 00
Should be:
if (result.getMagic() != ImageHeader.MAGIC) {
           throw new IOException("\"" + name + "\" is not an image file");
       }

public static final int MAGIC = 0xCAFEDADA;

6. Bain, Peter (P.D.), May 16, 2017, 2:13 PM
But dumping word-by-word:
0000000 cafedada 00010000 00000000 00008c31

7. Bain, Peter (P.D.), May 16, 2017, 2:47 PM
jdk.internal.misc.Unsafe.getUnsafe().isBigEndian() == false.

8. Bain, Peter (P.D.), May 17, 2017, 2:34 PM
BasicImageReader is mis-reading the module file.
Opening the file using FileInputStream and MappedByteBuffer, and reading the first int,
I get the correct magic number (0xcafedada).  However, the magic number returned from ImageHeader.readFrom()
is a totally unrelated value 0xe8410018.  The image file doesn't even contain that value.

9. Bain, Peter (P.D.), May 17, 2017, 2:46 PM
Forcing BasicImageReader to use the Java file mapper gives the correct result. NativeImageBuffer.getNativeMap() may be broken.

10. Bain, Peter (P.D.), May 17, 2017, 4:11 PM
The native method creates a DirectByteBuffer on the memory-mapped image file. Verified that the memory itself has
the correct data.  However, reading from the resulting ByteBuffer gives the wrong result.

11. Bain, Peter (P.D.), May 18, 2017, 11:10 AM
The address passed to DirectByteBuffer's constructor was wrong.
Added a printf on the address to jnicsup.cpp:newDirectByteBuffer
and it works.

12. Bain, Peter (P.D.), May 18, 2017, 1:14 PM
Weirdness: instrumented the native call to create a direct byte buffer, and the DBB constructor:
jnicsup.cpp 775 PDB_DEBUG newDirectByteBuffer address=0x3ffea0580000
jnicsup.cpp 785 PDB_DEBUG newDirectByteBuffer call java_nio_Buffer
PDB_DEBUG DirectByteBuffer 165 addr=3fffb22c7d38 #### DBB constructor:
jnicsup.cpp 787 PDB_DEBUG newDirectByteBuffer return from java_nio_Buffer

13. Bain, Peter (P.D.), May 18, 2017, 1:36 PM
/team/pdbain/defects/130167/openj9/jdk/src/java.base/share/native/libjimage/NativeImageBuffer.cpp 55 PDB_DEBUG Java_jdk_internal_jimage_NativeImageBuffer_getNativeMap address=0x3ffe75890000 *addr=cafedada
jnicsup.cpp 775 PDB_DEBUG newDirectByteBuffer address=0x3ffe75890000
jnicsup.cpp 785 PDB_DEBUG newDirectByteBuffer call java_nio_Buffer javaVM->java_nio_DirectByteBuffer=0x3fff840b4478 javaVM->java_nio_DirectByteBuffer_init=0x3fff845b85c8
PDB_DEBUG DirectByteBuffer 165 addr=3fff82917d38
jnicsup.cpp 787 PDB_DEBUG newDirectByteBuffer return from java_nio_Buffer

gdb:
(gdb) x 0x3ffe75890000
0x3ffe75890000:	0xcafedada
(gdb) x/100 0x3ffe75890000
0x3ffe75890000:	0xcafedada	0x00010000	0x00000000	0x00008c2c  ### as expected
0x3ffe75890010:	0x00008c2c	0x000b5ac1	0x000b8e01	0xfffffffd
0x3ffe75890020:	0x00000000	0x00000000	0x00000000	0x00000000


(gdb) x/100 0x3fff82917d38
0x3fff82917d38 <j9gc_objaccess_recentlyAllocatedObject+88>:	0xe8410018	0x38210020	0xe8010010	0x7c0803a6
0x3fff82917d48 <j9gc_objaccess_recentlyAllocatedObject+104>:	0x4e800020	0x00000000	0x01000900	0x00000080
0x3fff82917d58:	0x60000000	0x60420000	0x3c4c002e	0x3842dca0
0x3fff82917d68 <j9gc_objaccess_postStoreClassToClassLoader+8>:	0xe9430008	0x60000000	0xe9228158	0xe94a4590
0x3fff82917d78 <j9gc_objaccess_postStoreClassToClassLoader+24>:	0xe94a0060	0xe90a5678	0xe9480000	0xe94a0378
0x3fff82917d88 <j9gc_objaccess_postStoreClassToClassLoader+40>:	0x7faa4800	0x4d9e0020	0x7c0802a6	0x7ca62b78
0x3fff82917d98 <j9gc_objaccess_postStoreClassToClassLoader+56>:	0x7d4903a6	0x7c852378	0x7d4c5378	0x7c641b78

18. Bain, Peter (P.D.), May 19, 2017, 11:06 AM
I instrumented jnicsup.cpp:newObject().  It isn't being called by env->NewObject().
Looks like env->NewObject() is not jnicsup.cpp:newObject().  If I force newDBB to call newObject directly, things work fine.

@pdbain-ibm
Copy link
Contributor

Investigating. Downloaded and built Java 11 and looked at the compilation log per the previous investigation:
/usr/bin/g++ -DOPENJ9_BUILD -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -fno-exceptions -fno-rtti -fno-threadsafe-statics -g -DLINUX -D_REENTRANT -D_FILE_OFFSET_BITS=64 -fpic -DIPv6_FUNCTION_SUPPORT -DLINUXPPC -m64 -DLINUXPPC64 -DPPC64 -Wreturn-type -Werror -I. -I../include -I../oti -I../util -I../gc_include -I../omr/gc/include -I../shared_common/include -I../gc_glue_java -I../nls -I../omr/include_core -DUT_DIRECT_TRACE_REGISTRATION -DTR_HOST_POWER -c jnicsup.cpp

@pdbain-ibm
Copy link
Contributor

Further to above comment:

  1. Bain, Peter (P.D.), May 19, 2017, 11:06 AM
    I instrumented jnicsup.cpp:newObject(). It isn't being called by env->NewObject().
    Looks like env->NewObject() is not jnicsup.cpp:newObject(). If I force newDBB to call newObject directly, things work fine.

env->NewObject() in C++ calls newObjectV():
jobject NewObject(jclass clazz, jmethodID methodID, ...) { jobject retval; va_list parms; va_start(parms, methodID); retval = functions->NewObjectV(this, clazz, methodID, parms); va_end(parms); return retval; }

@pdbain-ibm
Copy link
Contributor

pdbain-ibm commented Sep 12, 2018

Bug in gcc version 7.3.0.

I instrumented openj9/runtime/vm/jnicsup.cpp:

static jobject JNICALL newDirectByteBuffer(JNIEnv *env, void *address, jlong capacity)
{
...
        if (actualCapacity != capacity) {
                actualCapacity = -1;
        }
    fprintf(stderr, "%s %d OPENJ9_DEBUG %s address=%p *address=%lx\n", __FILE__, __LINE__, __FUNCTION__, address, *((long*)address));
        result = newObject(env, javaVM->java_nio_DirectByteBuffer, javaVM->java_nio_DirectByteBuffer_init, (jlong)(UDATA)address, actualCapacity);

and

UDATA JNICALL   pushArguments(J9VMThread *vmThread, J9Method* method, void *args) {
        jvalue* jvalues;
...
                       case 'J':
                                /* long type */
                                lng = ARG(jlong, j);
                                lngOrDblPtr = (UDATA *) &lng;
pushLongOrDouble:
#ifdef J9VM_ENV_DATA64
                                --sp;
                                *--sp = *(lngOrDblPtr);
if (0) {
        fprintf(stderr, "%s %d OPENJ9_DEBUG %s *lngOrDblPtr=%lx **lngOrDblPtr=%lx\n", __FILE__, __LINE__, __FUNCTION__, (unsigned long) *lngOrDblPtr, *((unsigned long*)*lngOrDblPtr));
        fprintf(stderr, "%s %d OPENJ9_DEBUG %s *sp=%lx **sp=%lx\n", __FILE__, __LINE__, __FUNCTION__, (unsigned long) *sp, *((unsigned long*)*sp));
}       
#else   
                                *--sp = *(lngOrDblPtr + 1);
                                *--sp = *(lngOrDblPtr);
#endif  
                                break;

Running with the printf disabled (as above):

java  testgras
getSystemResourceAsStream during live phase
jnicsup.cpp 771 OPENJ9_DEBUG newDirectByteBuffer address=0x3fff9ca50000 *address=10000cafedada
Exception in thread "main" java.lang.NoClassDefFoundError: jdk.internal.module.SystemModuleFinders$SystemImage (initialization failure)
	at java.lang.J9VMInternals.initializationAlreadyFailed(java.base@11-internal/J9VMInternals.java:141)
	at jdk.internal.module.SystemModuleFinders$SystemModuleReader.findImageLocation(java.base@11-internal/SystemModuleFinders.java:426)

Running with the printf enabled (change to if(1)) makes it pass
issue_2788.zip
:

getSystemResourceAsStream during live phase
jnicsup.cpp 771 OPENJ9_DEBUG newDirectByteBuffer address=0x3fff7e8e0000 *address=10000cafedada
jnicsup.cpp 523 OPENJ9_DEBUG pushArguments *lngOrDblPtr=3fff7e8e0000 **lngOrDblPtr=10000cafedada
jnicsup.cpp 524 OPENJ9_DEBUG pushArguments *sp=3fff7e8e0000 **sp=10000cafedada

Relevant files attached.

@pdbain-ibm
Copy link
Contributor

Opened a Gnu bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87289

@pdbain-ibm
Copy link
Contributor

The compiler tripped over a goto from one arm of a switch statement to another in pushArguments():

				/* double type */
				dbl = ARG(jdouble, d);
				lngOrDblPtr = (UDATA *) &dbl;
				goto pushLongOrDouble;
#endif
			case 'J':
				/* long type */
				lng = ARG(jlong, j);
				lngOrDblPtr = (UDATA *) &lng;
pushLongOrDouble:
#ifdef J9VM_ENV_DATA64
				--sp;
				*--sp = *(lngOrDblPtr);
#else
				*--sp = *(lngOrDblPtr + 1);
				*--sp = *(lngOrDblPtr);
#endif
				break;

Replaced the goto with a copy of the "gone to" code and it works fine.

This code is dubious too, though I don't plan to fix it in this changeset:

			case '[':
				/* skip the rest of the signature */
				while (*sigChar == '[') sigChar++;
				if (*sigChar++ != 'L') goto dontSkipClassName;
			case 'L':
				/* skip the rest of the signature */
				while (*sigChar++ != ';');
dontSkipClassName:
				objArg = ARG(jobject, l);
				*--sp = objArg ? (UDATA)*(j9object_t*)objArg : 0;
				break;

@pdbain-ibm
Copy link
Contributor

pdbain-ibm commented Sep 13, 2018

Comment on bug:

I was able to create a small reproducer program and it shows you have a -fstrict-aliasing bug in your source code. You take the address of the double var "dbl" and cast its address to a UDATA * pointer and then use it. That violates aliasing rules. I was able to get my small reproducer program to work using the -fno-strict-aliasing option as well as using -O0 and -O1 where -fstrict-aliasing is not enabled by default.

I'll note that a quick scan through pushArguments() shows other code that looks to violate the aliasing rules too. You can either use -fno-strict-aliasing as a work around or you will need to fix your code.

@DanHeidinga
Copy link
Member

-fno-strict-aliasing

Pretty sure we build with that option on other platforms. Can you confirm that and then update the makefiles to include the option?

We'll need to do some perf testing on JDK8 (as it has the best perf test coverage) to validate that the adding the option doesn't negatively effect startup, footprint, throughput.

@pdbain-ibm
Copy link
Contributor

LinuxX86 does have -fno-strict-aliasing. Will check other platforms.

/usr/bin/g++ -DOPENJ9_BUILD -O3 -fno-strict-aliasing -fno-exceptions -fno-rtti -fno-threadsafe-statics -g -MMD -DLINUX -D_REENTRANT -D_FILE_OFFSET_BITS=64 -fPIC -DIPv6_FUNCTION_SUPPORT -DJ9HAMMER -m64 -Wreturn-type -Werror -Wall -Wno-non-virtual-dtor -I. -I../include -I../oti -I../util -I../gc_include -I../omr/gc/include -I../shared_common/include -I../gc_glue_java -I../nls -I../omr/include_core -DUT_DIRECT_TRACE_REGISTRATION -DTR_HOST_X86 -c jnicsup.cpp

I have a pull request
#2844 to address the aliasing errors in pushArguments(). Those changes fix the issue.

@pshipton
Copy link
Member Author

pshipton commented Sep 13, 2018

#2844 to address the aliasing errors in pushArguments(). Those changes fix the issue.

It might fix this particular issue, which doesn't mean we won't find more issues afterwards. How much testing have you done on a ppc le JVM, compiled with gcc 7.3, with the fix applied?

@pdbain-ibm
Copy link
Contributor

pdbain-ibm commented Sep 13, 2018

I ran a build with sanity and extended testing enabled on all platforms, but the PPCLE builds failed to run due to machine issues. GAC started an X86 build, but we should run a plinux test build.

@pshipton
Copy link
Member Author

ppc le is the platform which isn't working with gcc 7.3. There may be more aliasing problems lurking before its working. "Running a build" needs to be done with gcc 7.3 enabled in order to test this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants