-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements related with rerun test jobs #5016
Comments
Regarding 3 versus 1, I think we will adjust to 1 at ci.adoptium.net, since often the failures we have are machine related, so there is no value to rerun 3x on the same machine. Since this was our 'trial use' of this feature, we set it to 3 to see how it would work. |
After doing triage for Jan 2024 CPU - there are several updates to TRSS that I intend to add, including the list of failed openjdk testcases (just as we add to TAP files) should be tracked in the TRSS database. Have to investigate, but this could be via changing how we configure jtreg, or actively printing out the TAP file contents to console and grabbing it at the end of the job. |
The rerun feature is ideally suited for environments that are more stable than ci.adoptium.net, but at the same time, if we wait for stability we may never get to try any new features. |
I have been seeing extended test durations with recent builds, and have done a bit of digging into an example:
The issue is even worse for https://ci.adoptium.net/job/Test_openjdk22_hs_extended.openjdk_x86-64_alpine-linux/ which is typically taking 2 days if it gets that far. I'm not sure as it currently stands that amount of extra test run time is effective? @sophia-guo @smlambert Thoughts? Can we just re-run the "testcases" ? Should we do a blanket exclude of the failing tests? The problem seems most highlighted for Alpine Linux. |
Do we understand what the failures are and whether they are system specific? That would seem to be the important thing to do the root analysis on. @Haroon-Khel are these on your radar? If they're taking longer than expected (and since it's happening on sanity and extended that seems likely) then it could be another example of the currency detection issues we've been seeing in containers. |
If we let it run to completion then we know we have a complete picture of the situation which should assist debugging. Also since we're only running one build a week it shouldn't cause as much of a problem as it did when we were running stuff nightly 🤷 But it did need to be understood, and probably as quite a high priority. |
The failing tests are quite clear from the 2 re-runs, no need to wait for the subsequent 2 re-runs! I think it does effectively highlight the problem :-) which is a bonus. I'm going to examine and raise an exclude of the rogue tests |
For the record, I have also dropped rerunIterations from 3 to 1 in our build pipeline code (via adoptium/ci-jenkins-pipelines#929). |
For openjdk tests it should be able to rerun testcases if the failure testcases number is not big. |
Also related as another suggested improvement to automatic reruns is #4874 (use of EXIT_SUCCESS flag). |
Also related as another suggested improvement to automatic reruns is #4379 (acknowledge and skip test targets tagged as notRerun in playlist). |
Example: |
Close this as most concerns have been resolved. The only one has no valid information anymore, if re happened can open a separate specific issue.
|
Rerun test jobs was recently enabled in adoptium, which definitely helps in the latest releases. Here are some thoughts or issues we met during the releases:
The text was updated successfully, but these errors were encountered: