Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT compiler crashes during inlining while compiling XML11Configuration.setFeature #9453

Closed
MNeuling opened this issue May 5, 2020 · 38 comments
Labels
comp:jit segfault Issues that describe segfaults / JVM crashes userRaised

Comments

@MNeuling
Copy link

MNeuling commented May 5, 2020

Java -version output

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-b09)
Eclipse OpenJ9 VM (build openj9-0.20.0, JRE 1.8.0 Windows 10 amd64-64-Bit Compressed References 20200422_667 (JIT enabled, AOT enabled)
OpenJ9 - 05fa2d3
OMR - d4365f371
JCL - 5e623848e9 based on jdk8u252-b09)

Summary of problem

On one of our PCs an application crashes running on the new JRE 8u252 with Eclipse OpenJ9 VM 0.20.0. It is no problem to run the application with JRE 8u232 (VM 0.17.0).
All other PCs seem not to have the problem.

The VM crashes compiling a method. If you run the application again the VM crash happens on another method.

Diagnostic files

run1-dumps.zip
run2-dumps.zip

Please let me know if you need the core dumps. Then I will upload them as multipe file zip.

@DanHeidinga DanHeidinga added the segfault Issues that describe segfaults / JVM crashes label May 5, 2020
@DanHeidinga
Copy link
Member

From the javacore in run1-dump.zip:

1XHEXCPMODULE  Compiling method: com/sun/org/apache/xerces/internal/parsers/XML11Configuration.setFeature(Ljava/lang/String;Z)V
NULL           
1XHFLAGS       VM flags:00000000000501FF

@MNeuling I see that the -Xverify:none is specified. Can you remove that option and see if the issue still occurs?

@DanHeidinga
Copy link
Member

fyi @andrewcraik

@MNeuling
Copy link
Author

MNeuling commented May 6, 2020

Yes, I removed -Xverify:none and now it works.

@DanHeidinga
Copy link
Member

@MNeuling we don't support -Xverify:none and have a policy to not investigate crashes if they only occur when -Xverify:none is enabled.

Is that an option you can remove from your command line?

OpenJ9 offers a number of ways to get faster startup including -Xshareclasses (enabling class data and AOT code to be shared & reused between runs) & -Xquickstart (tune the VM for fast startup at the cost of a minor thruput loss) which can provide even faster startup.

Additionally, we have a "safe" replacement for -Xverify:none called -XX:+ClassRelationshipVerifier that still verifies the bytecode but avoids the classloading from the verifier

@DanHeidinga
Copy link
Member

I'm going to close this now. Please comment if you disagree and we can discuss further

@DanHeidinga DanHeidinga self-assigned this May 6, 2020
@MNeuling
Copy link
Author

MNeuling commented May 7, 2020

@DanHeidinga sorry
I had to wait for a college of mine. He is the user of the computer where the vm crashes.
Today the VM crashes again without option -Xverify:none.
run3.zip
run4.zip
run5.zip

@MNeuling
Copy link
Author

MNeuling commented May 7, 2020

@DanHeidinga Please reopen this issue.

@DanHeidinga DanHeidinga reopened this May 7, 2020
@DanHeidinga
Copy link
Member

The run3.zip shows a crash in the same phase of the JIT but with a different method:

1XHEXCPMODULE  Compiling method: java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireQueued(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;I)Z
NULL           
1XHFLAGS       VM flags:00000000000501FF

@andrewcraik Can someone on your team pick this up?

@andrewcraik
Copy link
Contributor

@liqunl if you could take this for now?

@liqunl
Copy link
Contributor

liqunl commented May 13, 2020

@MNeuling Could you upload the core file?

@fjeremic fjeremic changed the title VM crashes while method compilation. JIT compiler crashes during inlining while compiling XML11Configuration.setFeature May 13, 2020
@fjeremic
Copy link
Contributor

Crash in inliner. @liqunl let me know if the jitdump ends up being useful or not.

bin/java -Xjit:vmstate=0x0000501FF -version
vmState [0x501ff]: {J9VMSTATE_JIT_CODEGEN} {inlining}

@liqunl
Copy link
Contributor

liqunl commented May 13, 2020

@fjeremic I checked jit dumps from three crashes, their vmstates shows crash in three different places: inliner, instruction selection and loop versioner. The jit dumps are for completed replay compilations, so they're not very helpful.

@fjeremic
Copy link
Contributor

@fjeremic I checked jit dumps from three crashes, their vmstates shows crash in three different places: inliner, instruction selection and loop versioner. The jit dumps are for completed replay compilations, so they're not very helpful.

Thanks. Once we have the core file I can take a look. Even better if we could get steps to reproduce. Usually the cause of jitdumps succeeding and not reproducing the failure is due to #9137.

@MNeuling
Copy link
Author

@liqunl Please let me know if you need a core dump of another run.

@andrewcraik
Copy link
Contributor

FYI @fjeremic since I think you were offering to look?

@liqunl
Copy link
Contributor

liqunl commented May 20, 2020

FYI @fjeremic since I think you were offering to look?

I wonder if @fjeremic was talking about help in getting a useful jitdump.

I took a look at the core file. The stack trace is

00 j9vm29!classHashFn+0x4c3 [vm\vm\keyhashtable.c @ 241] 
01 j9vm29!hashTableFind+0x22 [vm\omr\util\hashtable\hashtable.c @ 444] 
02 j9vm29!hashClassTableAt+0x25 [vm\vm\keyhashtable.c @ 286] 
03 j9vm29!internalFindClassInModule+0x3f7 [vm\vm\classsupport.c @ 1130] 
04 j9vm29!internalFindClassUTF8+0x20 [vm\vm\classsupport.c @ 1108] 
05 j9jit29!jitGetClassInClassloaderFromUTF8+0x34 [vm\jit_vm\ctsupport.cpp @ 71] 
06 j9jit29!TR_J9VMBase::matchRAMclassFromROMclass+0xaf [vm\compiler\env\vmj9.cpp @ 525] 
07 j9jit29!TR_IPBCDataCallGraph::loadFromPersistentCopy+0x118 [vm\compiler\runtime\iprofiler.cpp @ 3153] 
08 j9jit29!TR_IProfiler::profilingSample+0x378 [vm\compiler\runtime\iprofiler.cpp @ 1514] 
09 j9jit29!TR_IProfiler::getCallCount+0x4a [vm\compiler\runtime\iprofiler.cpp @ 3311] 
0a j9jit29!TR_MultipleCallTargetInliner::exceedsSizeThreshold+0x6e6 [vm\compiler\optimizer\inlinertempforj9.cpp @ 4253] 

In frame 06 TR_J9VMBase::matchRAMclassFromROMclass, the rom class is 0x0000000010880110 which is invalid as its className field is the rom class itself

> !j9romclass 0x0000000010880110
J9ROMClass at 0x10880110 {
  Fields for J9ROMClass:
        0x0: U32 romSize = 0x00003BB3 (15283)
        0x4: U32 singleScalarStaticCount = 0x00000000 (0)
        0x8: J9SRP(J9UTF8) className = !j9utf8 0x0000000010880110

The rom class is also out of the range, from javacore

2SCLTEXTRCS            ROMClass start address                    = 0x00000000108D9000
2SCLTEXTRCE            ROMClass end address                      = 0x000000001125A148

TR_IPBCDataCallGraph::loadFromPersistentCopy searches profiling class from SCC. @dsouzai Is this the same issue as #7890?

@fjeremic
Copy link
Contributor

I wonder if @fjeremic was talking about help in getting a useful jitdump.

That's right. I was referring to checking out why the jitdump generated was not useful so I can gather feedback and improve it for future issues.

@fjeremic
Copy link
Contributor

@liqunl it could also be #7684 which @cathyzhyi worked on. That one was an inliner bug where the symptom was an invalid class name in the constant pool, so kind of sort of what you are seeing here. Perhaps they could be related.

@dsouzai
Copy link
Contributor

dsouzai commented May 20, 2020

Is this the same issue as #7890?

Yeah it does look like the same issue. Kind of surprising given that #7890 was supposed to have fixed this.

@MNeuling, does the issue still happen if you delete the existing SCC?

@dsouzai
Copy link
Contributor

dsouzai commented May 20, 2020

FWIW, a workaround for this is to run with -Xjit:dontUsePersistentIprofiler -Xaot:dontUsePersistentIprofiler

@liqunl
Copy link
Contributor

liqunl commented May 20, 2020

@dsouzai profilingSample uses TR_DisableAOTWarmRunThroughputImprovement to determine whether to search in SCC. Is TR_DisableAOTWarmRunThroughputImprovement set to true when incompatible SCC is detected?

@dsouzai
Copy link
Contributor

dsouzai commented May 20, 2020

@andrewcraik
Copy link
Contributor

What's next on this? @liqunl or @dsouzai ? we need to get this one resolved or deferred... would be best to resolve...

@liqunl
Copy link
Contributor

liqunl commented May 27, 2020

Sorry, was sidetracked by other work. @dsouzai In this crash, we're calling loadFromPersistentCopy from profilingSample. profilingSample doesn't check TR::Options::sharedClassCache() before calling to loadFromPersistentCopy. Maybe we should check TR::Options::sharedClassCache()?

@dsouzai
Copy link
Contributor

dsouzai commented May 27, 2020

I talked to @liqunl offline, and I guess there is no guard for loading from the SCC in the call here https://github.com/eclipse/openj9/blob/05fa2d3611f757a1ca7bd45d7312f99dd60403cc/runtime/compiler/runtime/IProfiler.cpp#L1514

We think the solution might be to add the following in this if statement https://github.com/eclipse/openj9/blob/05fa2d3611f757a1ca7bd45d7312f99dd60403cc/runtime/compiler/control/rossa.cpp#L1895-L1896

TR::Options::getCmdLineOptions->setOption(TR_DoNotUsePersistentIprofiler);
TR::Options::getAOTCmdLineOptions->setOption(TR_DoNotUsePersistentIprofiler);
TR::Options::getCmdLineOptions->setOption(TR_DisablePersistIProfile);
TR::Options::getAOTCmdLineOptions->setOption(TR_DisablePersistIProfile);

@liqunl
Copy link
Contributor

liqunl commented May 27, 2020

Will open a PR once my build/test passes.

@dsouzai
Copy link
Contributor

dsouzai commented May 28, 2020

@liqunl pointed out that current validation done at startup would in fact prevent us going down the code path of calling loadFromPersistentCopy. Therefore, I'm kind of at a loss as to how this problem happens.

@MNeuling does the problem happen if you delete the SCC? If so, could you run with -Xshareclasses:verbose -Xjit:verbose={compilePerformance},vlog=vlog and paste what gets printed to stderr and attached the generated vlog file?

@andrewcraik
Copy link
Contributor

@dsouzai / @liqunl do we expect a fix to be delivered for 0.21 or should we move this out to 0.22?

@dsouzai
Copy link
Contributor

dsouzai commented Jun 10, 2020

We should move this to 0.22; I don't think we have enough information to know what the problem is.

@pshipton
Copy link
Member

Moved to 0.22

@MNeuling
Copy link
Author

@liqunl pointed out that current validation done at startup would in fact prevent us going down the code path of calling loadFromPersistentCopy. Therefore, I'm kind of at a loss as to how this problem happens.

@MNeuling does the problem happen if you delete the SCC? If so, could you run with -Xshareclasses:verbose -Xjit:verbose={compilePerformance},vlog=vlog and paste what gets printed to stderr and attached the generated vlog file?

@dsouzai I'm sorry for the late reply. My colleague was a long time not in the office. Therefore I couldn't let him test a new run with deleted SCC before today. Today he has deteted the SCC and ran the application again without failer.

@andrewcraik
Copy link
Contributor

@dsouzai is this a close, fix pending, or defer to 0.23?

@dsouzai
Copy link
Contributor

dsouzai commented Aug 5, 2020

Defer to 0.23; there's been no new diagnostic information regarding this.

@andrewcraik
Copy link
Contributor

Moved to 0.23 due to lack of information - if we still lack information then this may go to the backlog.

@andrewcraik
Copy link
Contributor

@dsouzai this one has been deferred once - since there has been no update and deleting the SCC cleared the failure is this a close, backlog or do we expect some kind of change for 0.23?

@dsouzai
Copy link
Contributor

dsouzai commented Sep 22, 2020

No change expected for 0.23; I'd say this is a close until we get more diagnostic information.

@andrewcraik
Copy link
Contributor

I am going to close this as no further diagnostics have been provided - we would love to get to a root cause so if the requested diagnostics can be provided please re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit segfault Issues that describe segfaults / JVM crashes userRaised
Projects
None yet
Development

No branches or pull requests

7 participants