Inconsistent run times of sanity.openjdk on xLinux #1165

sxa · 2020-02-21T11:02:34Z

While looking at the status of some of the pipelines last night it became clear that we have some quite considerable differences in the run times of some of the sanity.openjdk jobs. We should look at whether this is machine-specific issue and how to optimise the pipelines if there is an underlying reason.

Data from https://ci.adoptopenjdk.net/view/Build%20and%20Test%20Pipeline%20Calendar/job/Test_openjdk11_hs_sanity.openjdk_x86-64_linux/buildTimeTrend:

Build ↑	Duration	Agent
153	9 hr 3 min	test-godaddy-ubuntu1604-x64-1
152	2 hr 17 min	test-packet-ubuntu1604-x64-1
151	4 hr 30 min	test-scaleway-ubuntu1604-x64-1
150	1 hr 28 min	test-godaddy-centos7-x64-1
149	2 hr 38 min	test-softlayer-ubuntu1604-x64-1
148	2 hr 4 min	test-godaddy-ubuntu1604-x64-3
147	9 hr 0 min	test-godaddy-ubuntu1604-x64-1
146	2 hr 6 min	test-godaddy-debian8-x64-2
145	2 hr 45 min	test-godaddy-debian8-x64-3
144	2 hr 24 min	test-packet-ubuntu1604-x64-1

The text was updated successfully, but these errors were encountered:

Haroon-Khel · 2020-02-25T15:56:24Z

test-godaddy-ubuntu1604-x64-1, on which the job takes 9hrs, does so because certain tests under net, nio and rmi fail. These failures usually involve test cases which return 'Connection Time out errors'.

Test	Duration	Status	Skip	Todo
jdk_io_0	1 min 39 sec	OK	No	No
jdk_lang_0	14 min	OK	No	No
jdk_math_0	1 min 55 sec	OK	No	No
jdk_net_0	1 hr 40 min	NOT OK	No	No
jdk_nio_0	2 hr 47 min	NOT OK	No	No
jdk_security1_0	2 min 38 sec	OK	No	No
jdk_util_0	12 min	OK	No	No
jdk_rmi_0	3 hr 55 min	NOT OK	No	No
jdk_native_sanity_0	13 sec	OK	No	No

This was from build 147. Build 153, which also uses test-godaddy-ubuntu1604-x64-1, has similar results

Haroon-Khel · 2020-03-02T10:41:16Z

Build 151, ran on test-scaleway-ubuntu1604-x64-1, was a bit odd. Only 2 tests failed, java/net/Inet6Address/B6206527.java.B6206527 and java/net/ipv6tests/B6521014.java.B6521014, which took 0.4 and 0.18 seconds respectively, yet the build took 4.5 hours

Haroon-Khel · 2020-03-02T10:43:09Z

The machines test-scaleway-ubuntu1604-x64-1 and test-godaddy-ubuntu1604-x64-1 seem to be the only machines which give job times outside of the average. The average range of the job time seems to be 1.5 to 2.5 hours, so anything outside of this range should be considered odd

sxa · 2020-04-02T11:45:05Z

In recent runs test-godaddy-centos7-x64-1 was running the suite slower than the other machines, although test-scaleway-ubuntu1604-x64-1 has not been unduly slow (3h4m, although that run had failures) We should keep an eye in this in a weekly basis to ensure there are no significant issues

Haroon-Khel · 2020-06-23T13:00:49Z

Just had a quick look through the run times. All are around 1hr - 1hr 15 or under, except for builds 277 and 275, both of which ran on test-scaleway-ubuntu1604-x64-1. These builds also had many com/sun/jdi test failures

sxa · 2020-06-23T16:36:00Z

@smlambert @adam-thorpe are you aware of those failures happening on one of our machines. While I'm somewhat tempted to just decomission this machine at some point if it's exposing a problem it would be useful to track it

smlambert · 2020-06-23T21:15:49Z

search jdi in openjdk-tests repo and come up with a list of issues (though mainly the jdi tests that are .sh scripts and not the tests that you link to above).

I guess no one is triaging the sanity.openjdk suite for hotspot runs at the moment (as in trying to figure out root cause), just reporting failures in the build repo (example where some of these test failures were reported adoptium/temurin-build#1634 (comment)). It is somewhat telling if only failing on certain machines, that should give a triager a place to start in terms of finding root cause.

smlambert · 2020-06-23T23:23:46Z

Looking more closely at the jdi failures, looks to be caused by ERROR: transport error 202: bind failed: Address already in use, as in some previously started process is still using the socket, and so these tests are unable to setup and use the socket, because its already in use.

related: adoptium/TKG#45 will eventually list what processes are still present on machines, (and if possible, what resources they still have a hold on, sockets/file handles, etc).

Wonder if its possible to get more fixes versus more reports via openjdk-build issue 1634?

sxa · 2020-06-24T08:15:58Z

No sign of processes being left on the machine (although if they were, sxaProcessCheck would have cleared it up by now) so ~~I'm running https://ci.adoptopenjdk.net/job/Grinder/3441~~ https://ci.adoptopenjdk.net/job/Grinder/3443 (3441 failed to copy artifacts as the upstream build job had been cleaned) and will look at the machine afterwards.~

sxa · 2020-11-18T20:53:26Z

Seems to be running consistently in under an hour now, but I'm running https://ci.adoptopenjdk.net/view/Build%20and%20Test%20Pipeline%20Calendar/job/Test_openjdk11_hs_sanity.openjdk_x86-64_linux/431/ on test-godaddy-ubuntu1604-x64-1 as a final check before closing this

sxa · 2020-11-18T21:32:23Z

This may have been down to leftover processes on the machine. We've done a lot of work to resolve such situations recently including a run of SXA-platybookCheck with the new kill -KILL option which has cleared up three jobs from extended.system runs at the start of November (so wouldn't have been around when the initial analysis for this issue was done) but it may have been a cause.

The above job has completed in 47m minutes so the original issue is definitely resolved one way or another

sxa added the enhancement label Feb 21, 2020

sxa added this to TODO in infrastructure via automation Feb 21, 2020

Haroon-Khel self-assigned this Feb 21, 2020

sxa mentioned this issue Feb 21, 2020

test-godaddy-debian8-x64-3.adoptopenjdk.net not properly configured? #1145

Closed

karianna moved this from TODO to In Progress in infrastructure Mar 15, 2020

sxa mentioned this issue Jun 24, 2020

AQAvit Meeting June 24, 2020 adoptium/aqa-tests#1844

Closed

sxa closed this as completed Nov 18, 2020

infrastructure automation moved this from In Progress to Done Nov 18, 2020

karianna added this to the November 2020 milestone Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent run times of sanity.openjdk on xLinux #1165

Inconsistent run times of sanity.openjdk on xLinux #1165

sxa commented Feb 21, 2020 •

edited

Loading

Haroon-Khel commented Feb 25, 2020 •

edited

Loading

Haroon-Khel commented Mar 2, 2020 •

edited

Loading

Haroon-Khel commented Mar 2, 2020 •

edited

Loading

sxa commented Apr 2, 2020

Haroon-Khel commented Jun 23, 2020 •

edited

Loading

sxa commented Jun 23, 2020 •

edited

Loading

smlambert commented Jun 23, 2020

smlambert commented Jun 23, 2020 •

edited

Loading

sxa commented Jun 24, 2020 •

edited

Loading

sxa commented Nov 18, 2020

sxa commented Nov 18, 2020

Inconsistent run times of sanity.openjdk on xLinux #1165

Inconsistent run times of sanity.openjdk on xLinux #1165

Comments

sxa commented Feb 21, 2020 • edited Loading

Haroon-Khel commented Feb 25, 2020 • edited Loading

Haroon-Khel commented Mar 2, 2020 • edited Loading

Haroon-Khel commented Mar 2, 2020 • edited Loading

sxa commented Apr 2, 2020

Haroon-Khel commented Jun 23, 2020 • edited Loading

sxa commented Jun 23, 2020 • edited Loading

smlambert commented Jun 23, 2020

smlambert commented Jun 23, 2020 • edited Loading

sxa commented Jun 24, 2020 • edited Loading

sxa commented Nov 18, 2020

sxa commented Nov 18, 2020

sxa commented Feb 21, 2020 •

edited

Loading

Haroon-Khel commented Feb 25, 2020 •

edited

Loading

Haroon-Khel commented Mar 2, 2020 •

edited

Loading

Haroon-Khel commented Mar 2, 2020 •

edited

Loading

Haroon-Khel commented Jun 23, 2020 •

edited

Loading

sxa commented Jun 23, 2020 •

edited

Loading

smlambert commented Jun 23, 2020 •

edited

Loading

sxa commented Jun 24, 2020 •

edited

Loading