Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM build failure - VMState: 0x00050080 or 0x000501ff #9416

Closed
M-Davies opened this issue Apr 30, 2020 · 63 comments · Fixed by #12259
Closed

VM build failure - VMState: 0x00050080 or 0x000501ff #9416

M-Davies opened this issue Apr 30, 2020 · 63 comments · Fixed by #12259
Labels
blocker comp:jit segfault Issues that describe segfaults / JVM crashes

Comments

@M-Davies
Copy link

Got this on https://ci.adoptopenjdk.net/view/Failing%20Builds/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-mac-x64-openj9/573 (build-macstadium-macos1014-1)

* For target support_jmods_java.compiler.jmod:
Assertion failed at /Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/build/macosx-x86_64-normal-server-release/vm/compiler/../compiler/compile/J9SymbolReferenceTable.cpp:811: containingClass != NULL
VMState: 0x00050080
	failed to get defining class of field ref cpIndex=35 in owning method J9Method=0xc5e3660
compiling jdk/internal/org/objectweb/asm/ClassWriter.<init>(I)V at level: warm

JVMDUMP039I Processing dump event "abort", detail "" at 2020/04/30 00:59:05 - please wait.
JVMDUMP032I JVM requested System dump using '/Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/make/core.20200430.005905.52000.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 52079 was not found. Expected to find core file with name "/cores/core.52079"
JVMDUMP032I JVM requested Java dump using '/Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/make/javacore.20200430.005905.52000.0002.txt' in response to an event
JVMDUMP010I Java dump written to /Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/make/javacore.20200430.005905.52000.0002.txt
JVMDUMP032I JVM requested Snap dump using '/Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/make/Snap.20200430.005905.52000.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/make/Snap.20200430.005905.52000.0003.trc
JVMDUMP007I JVM Requesting JIT dump using '/Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/make/jitdump.20200430.005905.52000.0004.dmp'
JVMDUMP010I JIT dump written to /Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/make/jitdump.20200430.005905.52000.0004.dmp
JVMDUMP013I Processed dump event "abort", detail "".

* All command lines available in /Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9/workspace/build/src/build/macosx-x86_64-normal-server-release/make-support/failure-logs.

Access to the machine can be granted by opening an issue at https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues

@pshipton
Copy link
Member

Also seen here #9291

@DanHeidinga DanHeidinga added the segfault Issues that describe segfaults / JVM crashes label Apr 30, 2020
@DanHeidinga
Copy link
Member

@andrewcraik @gita-omr Can someone take a look at this? It seems to be a newish assert failure. This one occurred on osx, #9291 on aix ~10 days ago.

@andrewcraik
Copy link
Contributor

@cathyzhyi might this be related to the AOT issues we have been looking at?

@cathyzhyi
Copy link
Contributor

This failure seems very likely a owning method and cpIndex mismatch issue but we would need a backtrace to tell the root cause.

@cathyzhyi
Copy link
Contributor

assigned to @Leonardo2718

@DanHeidinga
Copy link
Member

DanHeidinga commented Jun 2, 2020

@Leonardo2718 any update on this? We're coming up on the v0.21.0 code split on June 7

@Leonardo2718
Copy link
Contributor

So, I wasn't able to get very far because I couldn't get the build that the assert was triggered in. There's really not a lot that can be done with the available information, so the best thing to do is to try to capture more information the next time this failure is seen. I've been trying to think of a way of improving the assert message but haven't found one yet.

@DanHeidinga
Copy link
Member

#9291 seems very similar. Is that easier to reproduce to debug?

@Leonardo2718
Copy link
Contributor

No, #9291 has the same problem, unfortunately.

@DanHeidinga
Copy link
Member

The same problem = can't get the build?

Are you able to run the test locally or on a cloud machine to reproduce the issue? We have the nightly builds and debug images that can be downloaded from Adopt, especially for recent builds so reproducing this - locally or on a build farm - seems like the next step to me.

@Leonardo2718
Copy link
Contributor

So, the problem is that the failure is happening during the build, which makes reproducing the issue and extracting useful information very time-consuming. I'm going to try requesting access to the machine at Adopt to see if I have better luck reproducing the failure. It's probably still going to take a lot of time to figure this out, though.

@DanHeidinga
Copy link
Member

Any update on this @Leonardo2718? Do you expect it to make the 0.22.0 code split on Aug 9?

@Leonardo2718
Copy link
Contributor

@DanHeidinga

Any update on this @Leonardo2718? Do you expect it to make the 0.22.0 code split on Aug 9?

Unfortunately, I haven't been able to reproduce this so there's no chance of it making the 0.22 code split.

@DanHeidinga
Copy link
Member

I've moved it to the 0.23 milestone

@Leonardo2718
Copy link
Contributor

From #10313 (comment)

Unfortunately, it appears that core files can't be produced on MacOS (see adoptium/temurin-build#1778). Without a core file, I don't really have a way of investigating this issue.

@andrewcraik
Copy link
Contributor

I think this has to move to the backlog until the adopt issue is fixed

@andrewcraik
Copy link
Contributor

@M-Davies do we have any corefiles to debug as yet? If we don't have them then this will need to move to backlog until we can get diagnostic files since attempts at reproduction have not been successful to the best of my understaning.

@M-Davies
Copy link
Author

M-Davies commented Sep 3, 2020

@M-Davies do we have any corefiles to debug as yet? If we don't have them then this will need to move to backlog until we can get diagnostic files since attempts at reproduction have not been successful to the best of my understaning.

It's hard to pick up diagnostic files from nightly builds at the best of times and with adoptium/temurin-build#1778, it makes it near impossible. I'll chase up that issue and see what the situation is and post back here once it's resolved Looks like it's something we'll be awaiting apple to fix. See the linked issue for the full problem

@andrewcraik
Copy link
Contributor

@DanHeidinga FYI this one is missing diagnostics and so cannot be progressed - should we backlog it or close it pending diagnostics?

Leonardo2718 added a commit to Leonardo2718/openj9 that referenced this issue Feb 16, 2021
`findOrCreateShadowSymbol()` tries to reuse symrefs produced by
`findOrFabricateShadowSymbol()` when the shadow corresponds to the same
field. Detecting that the fields are in fact the same requires checking
that the defining class is the same. Normally, the expectation is that
if a cp field ref is resolved, then the cp ref pointing to the defining
class must also be resolved. However, in eclipse-openj9#9416, it appears as though
this may not always be the case, which triggers an assert. As a
temporary work around, the assert is changed to an abort compilation to
avoid taking the JVM. Because the assert is useful for detecting such
cases, this patch is only submitted for the release branch so that
further investigation can continue on the main development branch.

The mainline code uses a different assert when compiling relocatable
code to add extra debug information to the assert message in those
cases. Since this commit effectively hides any occurrences of the issue
reported in eclipse-openj9#9416, I don't believe there is any value in keeping the
extra debug information. So, I've simplified the two asserts back into a
single check that throws an exception of failure.

Signed-off-by: Leonardo Banderali <leonardo2718@protonmail.com>
@0xdaryl
Copy link
Contributor

0xdaryl commented Feb 18, 2021

Moving this out to 0.26. A failure has not been seen since adding the new instrumentation to help us narrow this down, and the workaround has been committed to the 0.25 branch already.

@dsouzai
Copy link
Contributor

dsouzai commented Feb 25, 2021

Saw this on a PR build:

[2021-02-25T22:39:22.527Z] 0x1dcdaa00: relo::fieldAttrib, 10, 0x1ddcc510, 0x1ddcc6f0
[2021-02-25T22:39:22.527Z] 0x1dcdaa00: relo::storeValRecordIfNec, Found in local list, nothing to do (0x1ddcc510, 10, 0x1ddcc6f0)
[2021-02-25T22:39:22.527Z] 0x1dcdaa00: relo::storeValRecordIfNec, classChain NULL, 0x1ddcc300, 0x1ddcc510, 10, 0x1ddcc6f0
[2021-02-25T22:39:22.527Z] 0x1dcdaa00: relo::definingClassFromCPFieldRef !valid, 0x1ddcc510, 10, 0x1ddcc6f0
[2021-02-25T22:39:22.527Z] End (9728)
[2021-02-25T22:39:22.527Z] Assertion failed at /Users/jenkins/workspace/Build_JDK11_x86-64_mac_Personal/openj9/runtime/compiler/compile/J9SymbolReferenceTable.cpp:945: containingClass != NULL
[2021-02-25T22:39:22.527Z] VMState: 0x000501ff
[2021-02-25T22:39:22.527Z] 	failed to get defining class of field ref cpIndex=10 in owning method J9Method=0x1ddcc6f0
[2021-02-25T22:39:22.527Z] 	TR_ResolvedJ9Method::definingClassFromCPFieldRef=0x1ddcc300, TR_ResolvedRelocatableJ9Method::definingClassFromCPFieldRef=0x1ddcc300
[2021-02-25T22:39:22.527Z] 	fromResolvedJ9Method=0x1ddcc300
[2021-02-25T22:39:22.527Z] 	TR_UseSymbolValidationManager=0
[2021-02-25T22:39:22.527Z] compiling sun/nio/ch/FileChannelImpl.<init>(Ljava/io/FileDescriptor;Ljava/lang/String;ZZZLjava/lang/Object;)V at level: warm
[2021-02-25T22:39:22.527Z] 

Unfortunately, the PR that added this tracing missed a change that meant we didn't trace anything in rememberClass.... sigh.

The tracing output that did show tells us is that we got the class chain the first time, and failed to get the class chain the second time. Therefore, either the ROMClass stopped being in the the SCC or findChainForClass returned NULL. The ROMClass not being in the SCC would be kind of crazy since this is is happening during the build and not in an HCR test, though this is a legitimate reason for why we could get NULL in one query and non-NULL in another. So it seems like

   sharedCacheConfig()->findSharedData(vmThread,
                                       key,
                                       keyLength,
                                       J9SHR_DATA_TYPE_AOTCLASSCHAIN,
                                       FALSE,
                                       &dataDescriptor,
                                       NULL);

is returning NULL in one instance and non-NULL in another. Will have to enable the tracing to know for sure. I'll open a PR to ensure tracing happens in rememberClass Opened #12058.

That said... as I mentioned above, with HCR it is actually possible for us to get non-NULL the first time and NULL the second time. It's possible for the class of the method we're compiling to get redefined and the compilation wouldn't be interrupted till we explicitly checked, which would not happen between the two queries. So, we're legitimately going to need to handle the possibility a null class chain.

Leonardo2718 added a commit to Leonardo2718/openj9 that referenced this issue Mar 9, 2021
`findOrCreateShadowSymbol()` tries to reuse symrefs produced by
`findOrFabricateShadowSymbol()` when the shadow corresponds to the same
field. Detecting that the fields are in fact the same requires checking
that the defining class is the same. Normally, the expectation is that
if a cp field ref is resolved, then the cp ref pointing to the defining
class must also be resolved. However, in eclipse-openj9#9416, it appears as though
this may not always be the case, which triggers an assert. As a
temporary work around, the assert is changed to an abort compilation to
avoid taking the JVM. Because the assert is useful for detecting such
cases, this patch is only submitted for the release branch so that
further investigation can continue on the main development branch.

The mainline code uses a different assert when compiling relocatable
code to add extra debug information to the assert message in those
cases. Since this commit effectively hides any occurrences of the issue
reported in eclipse-openj9#9416, I don't believe there is any value in keeping the
extra debug information. So, I've simplified the two asserts back into a
single check that throws an exception of failure.

Signed-off-by: Leonardo Banderali <leonardo2718@protonmail.com>
@pshipton
Copy link
Member

pshipton commented Mar 12, 2021

I found a failure in an internal build. No diagnostic files though.
Created adoptium/temurin-build#2528
job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-mac-x64-openj9-IBM/7

22:10:39  0x148f3a00: relo::fieldAttrib, 11, 0x149ccf70, 0x149cdcf8
22:10:39  0x148f3a00: findChainForClass 0x148f3a00, 0x149ce200, 0x3125b3a0, 0x424c9288
22:10:39  0x148f3a00: rememberClass: chain exists (0x424c9288) so nothing to store, 0x149ce200, 0x3125b3a0
22:10:39  0x148f3a00: relo::storeValRecordIfNec, Created new AOT class info 0xd3b4f80
22:10:39  0x148f3a00: findChainForClass 0x148f3a00, 0x149ce200, 0x3125b3a0, 0x0
22:10:39  0x148f3a00: rememberClass: unable to store chain 0x148f3a00, 0x3125b3a0
22:10:39  0x148f3a00: relo::storeValRecordIfNec, classChain NULL, 0x149ce200, 0x149ccf70, 11, 0x149cdcf8
22:10:39  0x148f3a00: relo::definingClassFromCPFieldRef !valid, 0x149ccf70, 11, 0x149cdcf8
22:10:39  End (9216)
22:10:39  Assertion failed at /Users/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-mac-x64-openj9-IBM/workspace/build/src/openj9/runtime/compiler/compile/J9SymbolReferenceTable.cpp:929: containingClass != NULL
22:10:39  VMState: 0x000501ff
22:10:39  	failed to get defining class of field ref cpIndex=11 in owning method J9Method=0x149cdcf8
22:10:39  	TR_ResolvedJ9Method::definingClassFromCPFieldRef=0x149ce200, TR_ResolvedRelocatableJ9Method::definingClassFromCPFieldRef=0x149ce200
22:10:39  	fromResolvedJ9Method=0x149ce200
22:10:39  	TR_UseSymbolValidationManager=0
22:10:39  compiling sun/nio/fs/UnixChannelFactory.newFileChannel(ILsun/nio/fs/UnixPath;Ljava/lang/String;Ljava/util/Set;I)Ljava/nio/channels/FileChannel; at level: warm
22:10:39  
22:10:39  Unhandled exception
22:10:39  Type=Unhandled trap vmState=0x000501ff
22:10:39  J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=00000001
22:10:39  Handler1=0000000006436290 Handler2=0000000004B7AAE0
22:10:39  RDI=000000000000A403 RSI=0000000000000005 RAX=0000000000000000 RBX=0000700005D79000
22:10:39  RCX=0000700005D6EED8 RDX=0000000000000000 R8=0000000000000017 R9=0000000000000017
22:10:39  R10=0000000000000000 R11=0000000000000287 R12=000000000000A403 R13=000000000D270980
22:10:39  R14=0000000000000005 R15=000000000000002D
22:10:39  RIP=00007FFF618E02C6 GS=0000 FS=0000 RSP=0000700005D6EED8
22:10:39  RFlags=0000000000000286 CS=0007 RBP=0000700005D6EF10 ERR=06EA481002000148
22:10:39  TRAPNO=0200014800000085 CPU=4810020001480000 FAULTVADDR=0000000006EA4810
22:10:39  XMM0 ffff000000000000 (f: 0.000000, d: nan)
22:10:39  XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:10:39  Module=/usr/lib/system/libsystem_kernel.dylib
22:10:39  Module_base_address=00007FFF618D9000 Symbol=__pthread_kill
22:10:39  Symbol_address=00007FFF618E02BC
22:10:39  
22:10:39  Method_being_compiled=sun/nio/fs/UnixChannelFactory.newFileChannel(ILsun/nio/fs/UnixPath;Ljava/lang/String;Ljava/util/Set;I)Ljava/nio/channels/FileChannel;
22:10:39  Target=2_90_20210311_7 (Mac OS X 10.14.6)
22:10:39  CPU=amd64 (12 logical CPUs) (0x400000000 RAM)
22:10:39  ----------- Stack Backtrace -----------
22:10:39  ---------------------------------------

@dsouzai
Copy link
Contributor

dsouzai commented Mar 12, 2021

The place where findChainForClass is called is:

https://github.com/eclipse/openj9/blob/269003cc717d8fb7191426c71bcbafa86b296c15/runtime/compiler/env/J9SharedCache.cpp#L884-L901

The key that's passed into findChainForClass comes from romClass (which is also passed in as a param). The first time it's called:

22:10:39  0x148f3a00: findChainForClass 0x148f3a00, 0x149ce200, 0x3125b3a0, 0x424c9288

where the last value 0x424c9288 is the class chain returned by the findSharedData API, and second-last value 0x3125b3a0 is the romClass. This message gets printed out after the call to findSharedData:
https://github.com/eclipse/openj9/blob/269003cc717d8fb7191426c71bcbafa86b296c15/runtime/compiler/env/J9SharedCache.cpp#L1135-L1150

The second time findChainForClass is called:

22:10:39  0x148f3a00: findChainForClass 0x148f3a00, 0x149ce200, 0x3125b3a0, 0x0

notice that the last value is 0x0 even though we're passing in the same romClass 0x3125b3a0. Because we're passing in the same romClass, the key that gets generated will be the same. Therefore, findSharedData is returning different values.

As such, I don't think this is a JIT issue.

fyi @hangshao0

@hangshao0
Copy link
Contributor

hangshao0 commented Mar 12, 2021

The only possible cases that findSharedData() could return 0 for classChian are:

  1. Failed to startup the ByteDataManager.
    https://github.com/eclipse/openj9/blob/269003cc717d8fb7191426c71bcbafa86b296c15/runtime/shared_common/CacheMap.cpp#L4050-L4052

    As the first time classChain is found, it is already started up. Won't be caused by this.

  2. Lockhash Table failed:
    https://github.com/eclipse/openj9/blob/269003cc717d8fb7191426c71bcbafa86b296c15/runtime/shared_common/Manager.cpp#L414-L418

    It returns false after failed ON MONITOR_ENTER_RETRY_TIMES (10) attempts.
    https://github.com/eclipse/openj9/blob/269003cc717d8fb7191426c71bcbafa86b296c15/runtime/shared_common/Manager.cpp#L472-L484

    I'm not sure if there is anything we can do in this case.

@hangshao0
Copy link
Contributor

The documentation of j9shr_findSharedData() is:
https://github.com/eclipse/openj9/blob/269003cc717d8fb7191426c71bcbafa86b296c15/runtime/shared_common/shrinit.cpp#L1792-L1795

It won't be able to return the record in an error case.

@dsouzai
Copy link
Contributor

dsouzai commented Mar 12, 2021

Thanks @hangshao0

@0xdaryl @Leonardo2718 I think the assert should just be removed:

  1. In AOT it's possible to get different values based on the semantics of the findSharedData API.
  2. HCR means that even without AOT, it's possible to get different values (specifically non-NULL one time and NULL every time after) if classes get redefined; because J9Classes are redefined in place, we can end up with a different J9ROMClass that's not necessarily going to be in the SCC.

@0xdaryl
Copy link
Contributor

0xdaryl commented Mar 17, 2021

If the assert is removed (implying it is invalid), will the subsequent code work as expected assuming that containingClass is NULL?

https://github.com/eclipse/openj9/blob/a9838e74bba0f5935103bedcda6fc84c881804ca/runtime/compiler/compile/J9SymbolReferenceTable.cpp#L940

@Leonardo2718
Copy link
Contributor

@0xdaryl

If the assert is removed (implying it is invalid), will the subsequent code work as expected assuming that containingClass is NULL?

I don't think simply removing the assert is valid. If containingClass can be null, it can cause the JIT to think two fields from different classes are actually the same field. At the very least, I think we should abort compilation as we've been doing in the release branches.

However, I believe there's a deeper issue as well.

  1. In AOT it's possible to get different values based on the semantics of the findSharedData API.

So, to me, it seems pretty reasonable to assume that once something is resolved it stays resolved (ignoring the HCR issue for a minute). If findSharedData can cause us to violate that assumption (or make it "look like" we did), then similar issues as this could impact other parts of the JIT that make a similar assumption, perhaps in even more subtle ways if there are no asserts.

  1. HCR means that even without AOT, it's possible to get different values (specifically non-NULL one time and NULL every time after) if classes get redefined; because J9Classes are redefined in place, we can end up with a different J9ROMClass that's not necessarily going to be in the SCC.

Redefining VM structures like this from under the JIT's nose seems rather dangerous. I would think that we would probably want to abort all current compilations if a class gets redefined (at least all the compilations that might try to look at the class that got redefined). Otherwise, for similar reasons as above, assumptions made in the JIT about the stability of the data structures could be violated and lead to a crash (even if the compiled body is thrown away in the end, the JIT could crash before compilation completes).

@dsouzai
Copy link
Contributor

dsouzai commented Mar 17, 2021

A way to deal with the HCR problem, in this specific case anyway, is to check whether the comp should be interrupted before the assert - If it does then we abort the compilation; if it doesn't then the assert will only fire when we hit an inconsistent state.

One way for us to deal with the fact that the SCC API could return NULL even if it returned non-NULL previously is to cache the class chain for each J9Class. There exists some (currently inactive) infrastructure to cache the validation result of a class chain for each J9Class - this is only used in the AOT load run. I believe we can piggy-back off this infra to also just cache the class chain per J9Class.

That said, the infra uses the CH Table; this is because we already have a whole bunch of code that ensures that the CH Table maintains coherence through all of the runtime events (such as class loading/unloading, redefinition, etc.). Therefore, if we were to piggy back off the aforementioned infra, we would depend on the existence of the CHTable. Currently we disable the maintenance of the table if we disable CHOpts for whatever reason. However, there's no reason why we can't maintain the table even if we're not going to make use of it.

Therefore, if we can ensure that the CHTable will always be coherent, then we can cache the classchain and not have to worry about whether or not we're going to incorrectly fail an AOT validation we passed earlier. However, I don't have the time to do this right now; I can open an issue to describe exactly what has to be done, but it's very unlikely I'll get to it in the next month or so.

@Leonardo2718
Copy link
Contributor

So, given there doesn't appear to be an easy way to fix the underlying issue, I've opened #12259 to change the assert into a failCompilation() call and remove the diagnostic instrumentation (this is the same fix we've been putting in release branches) and #12264 and #12265 to track the work of resolving the deeper problem.

Leonardo2718 added a commit to Leonardo2718/openj9 that referenced this issue Mar 23, 2021
`findOrCreateShadowSymbol()` tries to reuse symrefs produced by
`findOrFabricateShadowSymbol()` when the shadow corresponds to the same
field. Detecting that the fields are in fact the same requires checking
that the defining class is the same. Normally, the expectation is that
if a cp field ref is resolved, then the cp ref pointing to the defining
class must also be resolved. However, in eclipse-openj9#9416, it was discovered that
this assumption does not hold in some cases. Under AOT, loading class
data from the SCC using `findSharedData()` may fail, which is propagated
to the JIT as a NULL defining class. In addition, if a class is
redefined between the point where a field is looked up and where the
defining class is looked, the latter may become NULL.

Since the code in question will not work correctly if the defining class
is null, this commit tactically replaces the assert with a call to
`failCompilation()`.

Fixes eclipse-openj9#9416

Signed-off-by: Leonardo Banderali <leonardo2718@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker comp:jit segfault Issues that describe segfaults / JVM crashes
Projects
None yet
Development

Successfully merging a pull request may close this issue.