-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent run times of sanity.openjdk on xLinux #1165
Comments
test-godaddy-ubuntu1604-x64-1, on which the job takes 9hrs, does so because certain tests under
This was from build 147. Build 153, which also uses test-godaddy-ubuntu1604-x64-1, has similar results |
Build 151, ran on |
The machines |
In recent runs test-godaddy-centos7-x64-1 was running the suite slower than the other machines, although test-scaleway-ubuntu1604-x64-1 has not been unduly slow (3h4m, although that run had failures) We should keep an eye in this in a weekly basis to ensure there are no significant issues |
Just had a quick look through the run times. All are around 1hr - 1hr 15 or under, except for builds 277 and 275, both of which ran on test-scaleway-ubuntu1604-x64-1. These builds also had many |
@smlambert @adam-thorpe are you aware of those failures happening on one of our machines. While I'm somewhat tempted to just decomission this machine at some point if it's exposing a problem it would be useful to track it |
search jdi in openjdk-tests repo and come up with a list of issues (though mainly the jdi tests that are .sh scripts and not the tests that you link to above). I guess no one is triaging the sanity.openjdk suite for hotspot runs at the moment (as in trying to figure out root cause), just reporting failures in the build repo (example where some of these test failures were reported adoptium/temurin-build#1634 (comment)). It is somewhat telling if only failing on certain machines, that should give a triager a place to start in terms of finding root cause. |
Looking more closely at the jdi failures, looks to be caused by ERROR: transport error 202: bind failed: Address already in use, as in some previously started process is still using the socket, and so these tests are unable to setup and use the socket, because its already in use. related: adoptium/TKG#45 will eventually list what processes are still present on machines, (and if possible, what resources they still have a hold on, sockets/file handles, etc). Wonder if its possible to get more fixes versus more reports via openjdk-build issue 1634? |
No sign of processes being left on the machine (although if they were, sxaProcessCheck would have cleared it up by now) so |
Seems to be running consistently in under an hour now, but I'm running https://ci.adoptopenjdk.net/view/Build%20and%20Test%20Pipeline%20Calendar/job/Test_openjdk11_hs_sanity.openjdk_x86-64_linux/431/ on test-godaddy-ubuntu1604-x64-1 as a final check before closing this |
This may have been down to leftover processes on the machine. We've done a lot of work to resolve such situations recently including a run of SXA-platybookCheck with the new The above job has completed in 47m minutes so the original issue is definitely resolved one way or another |
While looking at the status of some of the pipelines last night it became clear that we have some quite considerable differences in the run times of some of the
sanity.openjdk
jobs. We should look at whether this is machine-specific issue and how to optimise the pipelines if there is an underlying reason.Data from https://ci.adoptopenjdk.net/view/Build%20and%20Test%20Pipeline%20Calendar/job/Test_openjdk11_hs_sanity.openjdk_x86-64_linux/buildTimeTrend:
The text was updated successfully, but these errors were encountered: