Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenJDK java/util/WeakHashMap/GCDuringIteration.java failed to do a "full" gc #10256

Closed
pshipton opened this issue Jul 27, 2020 · 25 comments
Closed
Labels
comp:test perm excluded The test is unsuitable and permanently excluded. test failure

Comments

@pshipton
Copy link
Member

pshipton commented Jul 27, 2020

https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_Nightly/121
java/util/WeakHashMap/GCDuringIteration.java
ub16-ppcle-3

java.lang.AssertionError: failed to do a "full" gc
	at GCDuringIteration.forceFullGc(GCDuringIteration.java:74)
	at GCDuringIteration.gcAwait(GCDuringIteration.java:81)
	at GCDuringIteration.test(GCDuringIteration.java:241)
	at GCDuringIteration.instanceMain(GCDuringIteration.java:263)
	at GCDuringIteration.main(GCDuringIteration.java:261)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.base/java.lang.Thread.run(Thread.java:836)
java.lang.AssertionError: Some tests failed
	at GCDuringIteration.instanceMain(GCDuringIteration.java:265)
	at GCDuringIteration.main(GCDuringIteration.java:261)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.base/java.lang.Thread.run(Thread.java:836)
@pshipton pshipton added this to To do in JTReg failures via automation Jul 27, 2020
@pshipton
Copy link
Member Author

@dmitripivkine fyi

@pshipton
Copy link
Member Author

pshipton commented Jul 27, 2020

30x grinder using latest Adopt build https://ci.eclipse.org/openj9/job/Grinder/992

@dmitripivkine
Copy link
Contributor

@0xdaryl @andrewcraik
As @cedrichansen found out #9043 (with reference to #8897) is similar to this problem (just use to be failure for ARM).
Would you please advice possible course of actions? Is it a JIT related test?

@andrewcraik
Copy link
Contributor

I don't know anything about the test so I am not sure

@0xdaryl
Copy link
Contributor

0xdaryl commented Aug 5, 2020

I wrote up a description of the problem on AArch64 here -> #8897 (comment)

I don't recall the exact name of the method, but the problem occurred when that method was compiled with optLevel cold which does not run an aggressive liveness optimization (the reason why this is a problem is described in that link above).

To see if this is the same issue you can try grinding by adding -Xjit:enableAggressiveLiveness to the options.

Any test that expects precise behaviour from the GC to occur at certain (un-speced) points is not really a valid test in my opinion.

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Aug 5, 2020

@0xdaryl Thank you very much!
I found source for the test https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/openj9/test/jdk/java/util/WeakHashMap/GCDuringIteration.java

There is a misleading message from assertion. The real problem is:
Test creates Weak Ref and run System GC in a loop expecting reference to be Finalized but it never happen.
I observed this problem earlier in #9651 , #8872 and a few more cases - object can not be sent to Finalization because it kept alive by O-slot in JIT frame.
This test does not generate system core to examine but this is exactly the same pattern in the test. I believe there is no reason to waste time for extra investigation. This failure is quite intermittent, so how many times we need to iterate with -Xjit:enableAggressiveLiveness to confirm it works?
I am not sure should test be excluded again.

@0xdaryl
Copy link
Contributor

0xdaryl commented Aug 6, 2020

In the AArch64 case, the failure rate was about 1/25. The test didn't take too long to run as I recall so I was able to let it run hundreds of times in a script with -Xjit:enableAggressliveLiveness to demonstrate no failures. Your mileage may vary though.

@dmitripivkine
Copy link
Contributor

@cedrichansen Would you please run grinders "as is" to determine failure rate and with provided option?

@cedrichansen
Copy link
Contributor

Yes, setting up grinders now 👍🏻

@cedrichansen
Copy link
Contributor

Ran GCDuringIteration 200x through grinder (with no jvm options provided) and no failures were recorded. After speaking with @dmitripivkine , this may be a slight difference in test environment (grinder tests running only GCDuringIteration vs nightly eclipse builds which run a larger set of tests) which might cause a different level of jit optimization to be used as the tests are run, and this resulted in different fail rate for the test in question.

@JasonFengJ9
Copy link
Member

Another occurrence at https://ci.eclipse.org/openj9/job/Test_openjdk15_j9_sanity.openjdk_s390x_linux_Nightly/51/console

01:55:26  STDERR:
01:55:26  java.lang.AssertionError: failed to do a "full" gc
01:55:26  	at GCDuringIteration.forceFullGc(GCDuringIteration.java:74)
01:55:26  	at GCDuringIteration.gcAwait(GCDuringIteration.java:81)
01:55:26  	at GCDuringIteration.test(GCDuringIteration.java:232)
01:55:26  	at GCDuringIteration.instanceMain(GCDuringIteration.java:263)
01:55:26  	at GCDuringIteration.main(GCDuringIteration.java:261)
01:55:26  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
01:55:26  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
01:55:26  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
01:55:26  	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
01:55:26  	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
01:55:26  	at java.base/java.lang.Thread.run(Thread.java:845)
01:55:26  java.lang.AssertionError: Some tests failed
01:55:26  	at GCDuringIteration.instanceMain(GCDuringIteration.java:265)
01:55:26  	at GCDuringIteration.main(GCDuringIteration.java:261)
01:55:26  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
01:55:26  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
01:55:26  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
01:55:26  	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
01:55:26  	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
01:55:26  	at java.base/java.lang.Thread.run(Thread.java:845)

@pshipton
Copy link
Member Author

@pshipton
Copy link
Member Author

pshipton commented Oct 1, 2020

@pshipton
Copy link
Member Author

@andrew-m-leonard
Copy link
Contributor

@0xdaryl
Copy link
Contributor

0xdaryl commented Oct 23, 2020

If the ultimate issue here is that the test is relying on un-spec'ed behaviour to succeed and that success only happens when the test is run at a "high enough" optimization level then is there any value in continuing to run this arguably invalid test as part of our standard suite of tests?

I haven't been following this issue, but what is the next course of action expected here? It certainly can't be to run the aggressive liveness optimization at lower optimization levels. Perhaps the next action should be to exclude the test so that it doesn't continue to consume triaging resources.

@pshipton
Copy link
Member Author

@pshipton
Copy link
Member Author

pshipton commented Nov 6, 2020

@pshipton
Copy link
Member Author

@pshipton
Copy link
Member Author

@pshipton
Copy link
Member Author

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 3, 2020

I am reclassifying this failure as a test issue because based on our investigation the JIT is working as designed. As mentioned above, it would seem we should either fix the test case or stop running it.

@0xdaryl 0xdaryl added comp:test and removed comp:jit labels Dec 3, 2020
pshipton added a commit to pshipton/openjdk-tests that referenced this issue Dec 3, 2020
Issue eclipse-openj9/openj9#10256

Signed-off-by: Peter Shipton <Peter_Shipton@ca.ibm.com>
@pshipton
Copy link
Member Author

pshipton commented Dec 3, 2020

Created adoptium/aqa-tests#2083 to exclude the test for jdk11, 15, 16.

@pshipton pshipton closed this as completed Dec 3, 2020
JTReg failures automation moved this from To do to Done Dec 3, 2020
pshipton added a commit to pshipton/openjdk-tests that referenced this issue Dec 3, 2020
Issue eclipse-openj9/openj9#10256

Signed-off-by: Peter Shipton <Peter_Shipton@ca.ibm.com>
smlambert pushed a commit to adoptium/aqa-tests that referenced this issue Dec 3, 2020
Issue eclipse-openj9/openj9#10256

Signed-off-by: Peter Shipton <Peter_Shipton@ca.ibm.com>
@pshipton pshipton added perm excluded The test is unsuitable and permanently excluded. and removed test excluded labels Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:test perm excluded The test is unsuitable and permanently excluded. test failure
Projects
Development

No branches or pull requests

7 participants