[NETBEANS-5478]: DownloadBinaries task: 5 secs connect time is not enough #2826

lbruun · 2021-03-22T16:51:47Z

This is an improvement on the NetBeans build system, specifically the task which downloads binary prerequisites from Central Maven or https://netbeans.osuosl.org/binaries.

As per the explanation in NETBEANS-5478 a hard-coded connect timeout of 5 seconds will not work in some build environments because of the "network first touch penalty" (I'm looking at you GitHub Actions (= Azure)). This PR makes the connect timeout configurable so that it will be possible to do:

ant -DdownloadBinaries.connectTimeoutMs=30000 ....

(which will set the connect timeout to 30 seconds)

Current behavior is not changed, i.e the default is 5000 mseconds ... which I personally find low'ish, but admittedly has worked well for most users over the years. It just doesn't work in environments which need "warmup".

Note: This was very hard to track down. The reason is that DownloadBinaries.java swallows exceptions, see line 212 and line 309. This PR does not fix that in order to stay on a single topic for the PR.

The current harcoded 5 seconds value simply isn't enough on some platforms. The value can now be overridden from the Ant command line, such as: ant -DdownloadBinaries.connectTimeoutMs=30000 .... (which will set the connect timeout to 30 secs)

matthiasblaesing · 2021-03-22T18:37:40Z

I'm mildly surprised, that in the current cloud world a connect time of 5s is not enough, I'm wondering though, whether it would make sense to make use of this by modifying the CI tasks to take advantage of this. The change looks sane to me apart from that.

neilcsmith-net · 2021-03-22T19:10:01Z

If this is the cause of the various CI errors we get with downloading, what's the downside of making the default much longer (eg. 30s)? Given we'd have to modify tasks in at least 3 different places otherwise.

I should have stopped it swallowing that exception when I edited that loop recently. Caching and logging the last exception would make sense.

lbruun · 2021-03-22T19:13:14Z

I'm mildly surprised, that in the current cloud world a connect time of 5s is not enough,

Me too. However Cloud PaaS platforms (I suspect that GitHub Actions use some Azure PaaS wodoo) are known to use serverless as part of their infra, for example an outbound FW may be based on Azure Functions and thus have a cold start penalty. Now, to make matters worse, this infra may be shared between ephemeral environments, so one run may experience the problem while another doesn't. It will seem random. Just speculation, though.

For me the problem has escalated over the past 1-2 months, before that it was a very, very rare occurrence. So something may have changed on the GitHub side, dunno. There are quite a few people reporting something similar in the GitHub issue tracker.

It is also worth mentioning why the existing NetBeans GitHub Actions jobs do not - to my knowledge - experience a problem in this respect: they use the GitHub Actions cache component. As long as this cache has been build successfully once, no downloads will be attempted by the Ant script (the DownloadBinaries task will see the artifact is already there on disk). The cache will be re-used forever, it is used at least every 7 days and will therefore never fall victim to GitHub's delete strategy. There's no problem per-se that the build jobs use this cache, as long as external artifact versions are immutable. But pray we do not have to re-build it.

lbruun · 2021-03-22T19:35:37Z

If this is the cause of the various CI errors we get with downloading, what's the downside of making the default much longer (eg. 30s)? Given we'd have to modify tasks in at least 3 different places otherwise.

Nothing, really. I just didn't want to touch the default in case someone was relying on the current fail-fast behavior.

5 seconds is indeed very small. For example, an URLConnection on Windows will use 21 secs as its default connect timeout (URLConnection will delegate to the OS if timeout is not set explicitly). On CentOS 7 it is roughly 15 secs.

jlahoda · 2021-03-22T20:27:00Z

Thanks for doing this! I ran into this problem a several times, but never really got to submitting a PR, this is very welcome. I'd be +1 on making the timeout longer - 15s would seem fine to me, but not a really strong opinion.

lbruun · 2021-03-22T20:33:43Z

I've now updated the PR: the default connect timeout is now 15 secs, up from 5 secs. I think this is more than sufficient that we do not have to change existing CI build jobs in order to benefit from the change.

lbruun · 2021-04-01T20:05:08Z

Merging this one. There seems to be agreement.

lbruun added 2 commits March 22, 2021 17:24

Do not print empty log messages

b38415d

Increased default connect timeout from 5s to 15s.

c0cdeac

lbruun merged commit 3ea8571 into apache:master Apr 1, 2021

lbruun deleted the NETBEANS-5478-DownloadBinaries-improvement branch April 1, 2021 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NETBEANS-5478]: DownloadBinaries task: 5 secs connect time is not enough #2826

[NETBEANS-5478]: DownloadBinaries task: 5 secs connect time is not enough #2826

lbruun commented Mar 22, 2021 •

edited

matthiasblaesing commented Mar 22, 2021

neilcsmith-net commented Mar 22, 2021

lbruun commented Mar 22, 2021 •

edited

lbruun commented Mar 22, 2021

jlahoda commented Mar 22, 2021

lbruun commented Mar 22, 2021

lbruun commented Apr 1, 2021

[NETBEANS-5478]: DownloadBinaries task: 5 secs connect time is not enough #2826

[NETBEANS-5478]: DownloadBinaries task: 5 secs connect time is not enough #2826

Conversation

lbruun commented Mar 22, 2021 • edited

matthiasblaesing commented Mar 22, 2021

neilcsmith-net commented Mar 22, 2021

lbruun commented Mar 22, 2021 • edited

lbruun commented Mar 22, 2021

jlahoda commented Mar 22, 2021

lbruun commented Mar 22, 2021

lbruun commented Apr 1, 2021

lbruun commented Mar 22, 2021 •

edited

lbruun commented Mar 22, 2021 •

edited