-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java/lang/ref/FinalizeOverride timeout #9651
Comments
https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_x86-64_linux_cm_Nightly/19 https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_ppc64le_linux_Nightly/1 |
https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_x86-64_linux_Nightly/9 |
https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_x86-64_linux_xl_Nightly/58 https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_x86-64_linux_Nightly/12 |
Ran a grinder to get a successful result. https://ci.eclipse.org/openj9/job/Grinder/907
|
@dmitripivkine can you pls take a look. |
Unfortunately, although the test contains printlns which might help diagnose the problem, the failure result doesn't show any System.out or System.err output even though System.out would contain some output. |
@smlambert @llxia to proceed to diagnose this, we need to capture a core file when the test times out. Do you know how we could do that, or do we have the source to the test harness used to run the openjdk tests that we could modify to send a |
@sophia-guo do you happen to know anything that could help us diagnose in this case? |
References for openjdk regression test harness (jtreg): You could potentially set a vm.opts or java.opts that would create a core at end of test run (which is at the point the test times out)
|
There is also sometimes additional info in the .jtr files in the test_output artifact, have you looked at those? |
Yes, we've analyzed the available information and determined the next step is to get a core file to proceed further.
The test fails intermittently. We need a solution that can be enabled for all testing until we get the core file we need. The test harness already has some handling for timeouts, i.e. running jstack, but it's not enough to figure out the cause of the problem. Probably in general it would be good to capture core files on timeouts to help diagnose the problems. I'm surprised the openjdk tests aren't already trying to do something like that. Although perhaps they are but it doesn't apply to OpenJ9. |
The source for jtreg lives here: https://hg.openjdk.java.net/code-tools/jtreg |
What is the corefile name format? We have set to obtain files with following name format if tests fail or error. |
.dmp should cover it, the cores are named
I'll take a look and see if I can make the required mods. I don't expect openjdk is going to take mods to capture OpenJ9 core files. thinking we'll need to mirror this to git and modify it. |
We allow for passing in custom jtreg binaries (needed to do this for zOS), so if you build it, there is a way to use a specialized one, though I would caution against this as a 'long-term' plan, but it may help to resolve this particular instance. |
Although I don't see it in the doc, looking at the code it seems there is support for a custom timeout handler, using |
It's working.
Although it shows |
If I can get a few pointers for where to put the code and whether we need a build.xml or a jenkins job, we can modify the OpenJ9 sanity.openjdk to use it. |
Look at internal server, Test_openjdk11_j9_sanity.openjdk_s390x_zos_Personal/32/consoleFull If you set an environment variable JTREG_URL to a URL where your custom jtreg build is, it will get used instead of the default location where we pick it up from (at AdoptOpenJDK). JTREG_URL=https://artifactoryServer/artifactory/sys-rt-generic-local/UploadFile/buildId/yourCustomJtreg.tar.gz You can use the UploadFile job to push your custom jtreg build to artifactory for use. |
I don't have a custom jtreg build. The timeout handler works with the existing jtreg.jar, and is a separate jar file. I'd like to deliver the code somewhere (Oracle copyright and GPL license, with my modifications) and build the new jar, and have it available when running openjdk tests, so we can add command line arguments to use it (permanently). |
I get it now... so I guess you need to figure out where you are going to land your code first. I will stop replying in git issues now, as I am clearly too tired to read them clearly. :) |
I was hoping you would have a spot for it, but if not I could put it into https://github.com/ibmruntimes/openj9-openjdk-jdk, or at least start a discussion about doing that. |
@smlambert if the previous commit looks ok to you, I'll go ahead and create the PR and ask for a review. Would we then create a jenkins build to create the jar file? |
Yes, that looks good, we can create a Jenkins job at ci.eclipse.org/openj9 (https://ci.eclipse.org/openj9/view/Infrastructure/job/Build_JDK_Timeout_Handler/) then can fetch the jar in a target in the openjdk-tests/openjdk/build.xml file. |
https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_sanity.openjdk_s390x_linux_Nightly_testList_0/158
|
Either we got unlucky, or this seems to be failing much more often all of a sudden. |
@dmitripivkine can you pls take a look since the recent failures seem to be stuck in the gc. |
Threads in the system:
For reference - related Public flags:
"AgentVMThread" is waiting on attempt to acquire Exclusive access for GC:
However currently Exclusive is owned by "Attachment portNumber: 42301":
This thread is in
@pshipton I am not sure where / why thread "Attachment portNumber: 42301" is waiting. Also I am not sure is it actual test problem or thread is in the process of generating javacore post failure. |
"Attachment portNumber: 42301" is the thread used to trigger the diagnostic dumps, it's in the process of creating the diagnostics. Whereas AgentVMThread seems to be stuck in Runtime.gc(), presumably it was stuck there long enough to timeout the test, which caused the timeout handler to use AttachAPI to connect via "Attachment portNumber: 42301" and trigger dumps. |
Ok, thank you for clarification. AgentVMThread seems is attempting to get Exclusive and waiting on omr thread monitor. |
Looking to the source of |
I wondering this might be not a hang in AgentVMThread, but large number of System GCs back to back this thread suddenly initiated. And now, we have two Global collections under single call umbrella. It makes the situation worse. |
Yes, the test is looping waiting for finalization.
|
DDR can not walk thread java stack unfortunately |
Found better DDR tool:
Object
|
@hzongaro I guess this is another case where |
Looking at the weekly Semeru build results, it seems the FinalizeOverride test is recently mostly failing on jdk11+ even with -Xjit:enableAggressiveLiveness. We weren't running the special.openjdk suite that contains this test across all platform until this past week (ibmruntimes/ci-jenkins-pipelines#206). Looking at one variant on xmac, the failure history is FAILED 2024-02-17 |
Also observed at JDK22 aarch64_linux
|
Temporarily disabled jdk_lang_ref_FinalizeOverride_j9 via adoptium/aqa-tests#5124 |
There is still be the same picture: the object test is waiting to be finalized has a hard root from o-slot despite
|
There is one more failure for the same test in this build. Again, the same:
|
For the record: this test doesn't have |
https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_Nightly/61
java/lang/ref/FinalizeOverride.java
The text was updated successfully, but these errors were encountered: