-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NETBEANS-5478]: DownloadBinaries task: 5 secs connect time is not enough #2826
[NETBEANS-5478]: DownloadBinaries task: 5 secs connect time is not enough #2826
Conversation
The current harcoded 5 seconds value simply isn't enough on some platforms. The value can now be overridden from the Ant command line, such as: ant -DdownloadBinaries.connectTimeoutMs=30000 .... (which will set the connect timeout to 30 secs)
I'm mildly surprised, that in the current cloud world a connect time of 5s is not enough, I'm wondering though, whether it would make sense to make use of this by modifying the CI tasks to take advantage of this. The change looks sane to me apart from that. |
If this is the cause of the various CI errors we get with downloading, what's the downside of making the default much longer (eg. 30s)? Given we'd have to modify tasks in at least 3 different places otherwise. I should have stopped it swallowing that exception when I edited that loop recently. Caching and logging the last exception would make sense. |
Me too. However Cloud PaaS platforms (I suspect that GitHub Actions use some Azure PaaS wodoo) are known to use serverless as part of their infra, for example an outbound FW may be based on Azure Functions and thus have a cold start penalty. Now, to make matters worse, this infra may be shared between ephemeral environments, so one run may experience the problem while another doesn't. It will seem random. Just speculation, though. For me the problem has escalated over the past 1-2 months, before that it was a very, very rare occurrence. So something may have changed on the GitHub side, dunno. There are quite a few people reporting something similar in the GitHub issue tracker. It is also worth mentioning why the existing NetBeans GitHub Actions jobs do not - to my knowledge - experience a problem in this respect: they use the GitHub Actions cache component. As long as this cache has been build successfully once, no downloads will be attempted by the Ant script (the |
Nothing, really. I just didn't want to touch the default in case someone was relying on the current fail-fast behavior. 5 seconds is indeed very small. For example, an |
Thanks for doing this! I ran into this problem a several times, but never really got to submitting a PR, this is very welcome. I'd be +1 on making the timeout longer - 15s would seem fine to me, but not a really strong opinion. |
I've now updated the PR: the default connect timeout is now 15 secs, up from 5 secs. I think this is more than sufficient that we do not have to change existing CI build jobs in order to benefit from the change. |
Merging this one. There seems to be agreement. |
This is an improvement on the NetBeans build system, specifically the task which downloads binary prerequisites from Central Maven or https://netbeans.osuosl.org/binaries.
As per the explanation in NETBEANS-5478 a hard-coded connect timeout of 5 seconds will not work in some build environments because of the "network first touch penalty" (I'm looking at you GitHub Actions (= Azure)). This PR makes the connect timeout configurable so that it will be possible to do:
ant -DdownloadBinaries.connectTimeoutMs=30000 ....
(which will set the connect timeout to 30 seconds)
Current behavior is not changed, i.e the default is 5000 mseconds ... which I personally find low'ish, but admittedly has worked well for most users over the years. It just doesn't work in environments which need "warmup".
Note: This was very hard to track down. The reason is that
DownloadBinaries.java
swallows exceptions, see line 212 and line 309. This PR does not fix that in order to stay on a single topic for the PR.