-
-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf-macstadium-macos1015-x64-1 & 2 unable to download dacapo.jar #1363
Comments
reference original issue: adoptium/temurin-build#1808 |
@andrew-m-leonard If you need to do this in the future please let me know as I can move the issues between repos if required instead of opening a new one |
I was wondering why openjdk-infrastructure does not appear in the github "transfer issue" dropdown?...seemed a bit weird! |
yep it is:
|
Right ... That suggests you probably don't have comit rights in this repo then :-) |
Failed on zLinux this morning! |
You could take the approach of grabbing a copy of that binary and putting it on Jenkins master. We do this for freetype and some other libs. |
Agree, this can be staged on Jenkins master, we do this for core test dependencies, can have a job that pulls perf dependencies. (It is odd that it is only 2 machines that have issues with downloading from the location... so it does make me wonder if they will still have the issue fetching from the Adopt Jenkins server). |
ah I see, presume you're talking about this sort of thing: |
i'll try a fix to perf/build.xml to check if dacapo.jar already exists, in this case downloaded from Jenkins master |
maybe add a check for DACAPO_URL |
Ya, recognizing this is suboptimal way to resolve it (if it is indeed to fix a problem only seen on 2 machines, seems we are not solving the core issue). We are changing the location of the dacapo url, implication is that anyone who runs AQA perf testing anywhere would now fetch it from Adopt Jenkins which better always be 'up' (and where we will now have to have a mechanism for keeping it up-to-date, which was the case for pulling latest dacapo from sourceforge). Suppose could update perf/dacapo/build.xml to try sourceforge and if fail, fallback to Adopt Jenkins. |
The problem occured on zLinux box as well this morning, I think it's possibly an issue with the host server: https://sourceforge.net/projects/dacapobench/files/latest/download |
here's my suggestion:
|
The dacapo download link is not direct, wondering if we use their suggested direct link: |
They have an async javascript download timer, wondering if something "glitches" in that...? |
Ya, maybe its flaky because of redirect mechanism (we are using curl options for this, presume same version of curl on all machines...). vague recollection that we had to request an update a year or two ago, as older version of curl did not have a particular curl option that was needed. |
We use the redirect link, so when a newer version of dacopo uploaded, we just 'get it', but we could hard-code to v9.12 and see how that looks when Grinderized on the machines in question. |
curl seems up to date:
|
@smlambert which way would you like to try first:
or
|
Likely the first option, so we go to the benchmarks public location to get it. Option 3 is use the current approach, and if it fails to redirect and find latest file, pull it from a cached version at Adopt server. In either case, do we have Grinder stats for how frequently this occurs on one of these machines? That way we can grind to see if we ever do hit the same type of problem with either of these approaches in xx number of runs. For 2nd option, I can upload the jar file with the UploadFile job and we can Grind to see how it fares. (https://ci.adoptopenjdk.net/view/Test_grinder/job/UploadFile/22/artifact/upload/dacapo-9.12-MR1-bach.jar) |
i'm suspecting the failure occurs when the network route to sourceforge.net download is slow or bad in someway... I had a search back through the Test.perf jobs and found the problem is not confined to certain slaves. The problem has happened from the perf mac machines, a zLinux machine, an aarch64 machine and also several xLinux machines
Found an aarch64 failure: https://ci.adoptopenjdk.net/view/Test_perf/job/Test_openjdk11_j9_sanity.perf_aarch64_linux/111/console
Another zLinux: https://ci.adoptopenjdk.net/view/Test_perf/job/Test_openjdk14_hs_sanity.perf_s390x_linux/92/console |
Wondering whether we ought to have a generic "curl-with-retry" ant download task, to use generically...? network issues and glitches, which would work on a retry would save a lot of failed builds..... |
This sounds ideal? https://ant.apache.org/manual/Tasks/retry.html
|
I was thinking same thing re: generic retry task in ant that we eventually convert all test fetches to use. I had not yet looked to see one exists. Shall we try that approach and see the outcome? |
I've tested the ant retry task on a couple of dozen grinders, no issues at all, no network failures either unfortunately (sods law! it will happen on release day!) |
With the aforementioned PR being merged it looks like the job is now passing therefore I shall close this |
Platform:
Mac
https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_sanity.perf_x86-64_mac_xl/174/console
18:48:13 getDacapoSuite:
18:48:13 [echo] curl -Lks https://sourceforge.net/projects/dacapobench/files/latest/download -o dacapo.jar
18:49:36
18:49:36 BUILD FAILED
18:49:36 /Users/jenkins/workspace/Test_openjdk8_j9_sanity.perf_x86-64_mac_xl/openjdk-tests/TKG/scripts/build_test.xml:58: The following error occurred while executing this line:
18:49:36 /Users/jenkins/workspace/Test_openjdk8_j9_sanity.perf_x86-64_mac_xl/openjdk-tests/perf/build.xml:31: The following error occurred while executing this line:
18:49:36 /Users/jenkins/workspace/Test_openjdk8_j9_sanity.perf_x86-64_mac_xl/openjdk-tests/perf/dacapo/build.xml:44: The following error occurred while executing this line:
18:49:36 /Users/jenkins/workspace/Test_openjdk8_j9_sanity.perf_x86-64_mac_xl/openjdk-tests/perf/dacapo/build.xml:32: exec returned: 7
Tried re-building same issue, although sometimes rc=35
rc 7 : CURLE_COULDNT_CONNECT (7) Failed to connect() to host or proxy.
rc 35 : CURLE_SSL_CONNECT_ERROR (35) A problem occurred somewhere in the SSL/TLS handshake. You really want the error buffer and read the message there as it pinpoints the problem slightly more. Could be certificates (file formats, paths, permissions), passwords, and others.
Testing on other mac slaves it seems fine...
Running a simple test via the Scripting console:
shows it seems to be downloading but VERY VERY slowly, and seemingly sometimes fails as a result
The text was updated successfully, but these errors were encountered: