-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled exception Type=Segmentation error vmState=0x00020011 #10541
Comments
We need system core dump to investigate and will be happy to do this. However meanwhile this failure reminds me known problem with Class Loading Verification problem: JVM detects incompatible class and left it partially loaded with init state "failed" and GC has crashed because class object pointer left NULL. (I know this is not adequate reaction of JVM but problem really is in class libraries). @tajila Would you please point to the related issue? Do you have any suggestions how to find incompatible class if it is a case? |
Hi @dmitripivkine, I am still looking into this, sadly (of course!) it only fails on our production server that I don't have access to, which means modifying the build to not throw everything away, get that promoted to prod… you know the drill ;-) Hopefully I should be able to get this sorted out in the next day or so. |
@dmitripivkine, so I have a dump file, however this has embedded passwords (pass in via the environment) such that I am uncomfortable to upload this tmp fie, definitely publicly. Is there any easy way to "sanitise" the |
I can see two ways how we can handle this problem.
You should see something like:
and use command
You can try speculative short cut in Get registers from crash, like:
The goal of exercise is to find problematic partially initialized The first slot of
If you can see eye-catcher
If it is a problematic case you should see |
Let's begin with the easy step first then 👍 Snap.20200914.094455.65828.0003.trc.gz Following along with your walkthrough:
Looks like a j9class… so:
Right, so that means that The second is in a precompiled JAR, that has been working perfectly under Java (openJ9) 8 for quite a while. However it was modified in the past with proguard, maybe the stricter checking in Java 11 means that this JAR file is no detected as invalid? |
It looks like the problem is
|
I took a look at the snap trace. There is not enough data in there to determine why the class was not able to be initialized. If the problem is easily reproduceable can you re-run with |
Thank @tajila, sadly this is all too reproducible for me ;-) I'll see if I can run this tomorrow with these arguments. |
Hmm. right, well, just kicked off a build, that'll take a few hours to fail, I'll try to update this ticket on my tomorrow. However, I just noticed that this is failing during the compilation of one of our targets. It has managed to compile this target four times, and fails on the fifth attempt. Each time it is compiling (sans timestamps) the same thing. |
Here is the requested I have double checked, as far as I can tell, Java dies while the Java compiler is being called (ejc via ant), and the Edit: Note that the offsets are slightly different, however it is the same class that appears to be causing the issue. Offset in latest dump was 0x1A93600. |
@tajila, did this help to determine why this is failing? I am at a loss as to how this could happen, and why we see it at random points in our build cycle. Is there anything that I can do to help move this issue forward? |
@pwagland The last time we encountered an issue similar to this it was because the class failed to load properly due to a classloading constraint error. The reason why the crash occurs at random times is because the bad class may be scanned at any time by the GC (assuming that is the case). We are currently working on a solution to fix the cause of the crash but that may take some time. In the short term I suggest first determining if the scenario I described is indeed the problem in this case, and if so, fixing the root cause (finding the classload error). I was not able to get much more details from the last snaptrace you sent. The time at which the class fails to load and the time at which the crash occurs may be far enough apart that the bad class is removed from the trace buffers. The following trace option targets the specific cases which I think may be causing the error.
While we wait for those results, in the core files that you already have can you try using jdmpview:
If there are multiple versions of this class then this may indicate a classloading constraint problem that I am suspecting. If you see multiple classes try |
Hi @tajila, right, I will kick off a new build with the Xtrace that you asked for, you should see results from that tomorrow. As to the
Some of these have a classObject, and some don't:
These are from the second trace, where we died processing 0x1a93600, which indeed doesn't isn't initialised:
|
Thinking about this some more, and then figuring that too much information is probably better than not enough, I also add, that there are 38 classes loaded, and also 38 different class loaders, at lsat based on the 27 of the classes were
|
Thanks for the info: In the core file above what is the output of:
Also if you compare |
All of the
|
Right, digging around a little further I see that (probably?) all of the This ClassLoader works perfectly when compiling under Java 8. but somehow appears to be causing some form of failure when run OpenJ9 on our server, again I cannot reproduce this locally at all, neither under docker, nor on my Mac. :-( |
Okay the fact that the vtable is empty for the bad class suggests a classloading constraints issue. To confirm try:
The romClasses need to be identical or else class loading constraints are not met. |
They are not identical:
Is there any way to see from the dump from which file a class was loaded? I think that these two classes probably are being loaded from different files, so they might be different. I think (but this is why I would to figure out from which JAR files the respective class files were loaded) that files loaded by Ant are actually executed, while the files loaded by the other classLoader are only used by compilation, and are not actually run. |
In the log file, it does indicate that there is a constraint violation:
I'm not sure that I 100% understand this message, this is saying that the This might be possible, since |
Based on the responses in this issue, I have managed to create a workaround to the JVM crashing. However, I am not sure whether I am just working around the issue, or whether I am actually fixing an issue with our code. If this didn't crash the JVM, what exception would/should I expect to see? For reference, the source to public class FallbackClassLoader
extends URLClassLoader
{
private final ClassLoader fallback;
public FallbackClassLoader(URL[] urls, ClassLoader newFallBack)
{
super(urls, null);
fallback = newFallBack;
}
@Override
protected Class<?> findClass(String name)
throws ClassNotFoundException
{
try
{
if (!"<…>.LoggerFactory".equals(name))
{
return super.findClass(name);
}
}
catch (LinkageError e)
{
// Could not load class, so falling back to the one from the alternative loader.
if (!LinkageError.class.getName().equals(e.getClass().getName()))
{
throw e;
}
}
return fallback.loadClass(name);
}
} Adding However, what constraint is being violated? The running code never sees a |
Yes, and I believe this is the next step. So we will be able to find the original file for the good class, but Im not sure we can do the same for the bad class. Note: By "good" I simply mean the first first class that was loaded, and "bad" is the second version. Lets start with the good class: Then dump protectionDomain: Then dump codesource: Then dump location: Then dump path: Dump file to confirm: So thats where the good class came from. Perhaps this info may help you find where the bad class is coming from. Since we don't have the classObject for the bad class, it will be more difficult since that data will be thrown away after the loading constraint violation. The only way I can think of finding it right now is by creating a patched build for you to try. If you like I can create that for you. |
What it is saying is that is that the loading of the good class was initiated by Now, the bad class classload is being initiated by a different loader, it may have the same name but it will be a different instance of the loader. you can check by doing: !j9class 0x8b3b00 (bad class) then then dump the classloader object with You run |
If the classloader of the good and bad class are of different types that may help us identify where the problem is coming from. |
@pwagland Its the fix is currently being reviewed then it has to be tested. Ill try to get it in for the 0.23 Milestone (deadline Oct 1) which should be released sometime late October. Otherwise, you can get a binary from adoptopenjdk nightlies as soon as the PR is merged. |
When a class fails loading due to a classloading constraint error, the ramclass remains in the class memory segments with an unverified state. This causes a a crash in the GC because the GC expects all ramclasses to have classObjects assign. Delaying the linking of the class in the memory segment list and subclass traversal list until after the class passes loading constraints checks will ensure that the GC or any other component cannot iterate an invalid class. Fixes: eclipse-openj9#10541 Signed-off-by: Tobi Ajila <atobia@ca.ibm.com>
Java -version output
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0, JRE 11 Linux amd64-64-Bit Compressed References 20200416_574 (JIT enabled, AOT enabled)
OpenJ9 - 05fa2d3
OMR - d4365f371
JCL - 838028fc9d based on jdk-11.0.7+10)
Summary of problem
We are in the process of migrating from using Java 8 to using Java 11 to compile our code. When compiling our code using Java 11, it fails, seemingly reproducibly in the same place.
Diagnostic files
Our other crash looks almost identical except for a few registrers, and the last two lines of the Stack Backtrace are:
Sadly, since this is a build server, the generated dump files are being automatically cleaned up. I will try to see if our build plans can be modified to make these available.
OutOfMemoryError: Java Heap Space
This doesn't appear to be related to OutOfMemory, however it might be, how can we tell?
The text was updated successfully, but these errors were encountered: