New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the number of retries (1->3) #444
Conversation
Thanks for your pull request, @Geod24! |
@Geod24 I hope this won't cause a retry if a test suite failure is actually caused by a bug and not a networking failure? |
@WalterBright : That's the downside of it - it will. |
The obvious question - can we get a proper fix? |
Given that for a human it can be difficult to decide whether a bug is Heisenbug or not, what would be the algorithm to determine that automatically? |
Fine for me, the bill for those runners is fairly small. Did lower it to 1 in the past since many PR problems were not intermittent, but some are and human time is quite valuable. |
@MartinNowak : Perhaps you could take a look at https://github.com/dlang/ci/blob/master/buildkite/Dockerfile so contributors could run an agent as well ? I have a few servers that I would gladly use as permanent runners. |
All networking errors would be a great first approximation. |
Obviously, yes. The question is how to determine if a failure is networking related. For example, IIRC some (all?) |
Over here dlang/dmd#12409 (comment) the failure is:
Surely that's detectable. |
There seems to be almost zero benefit for a smart retry over a 3x blunt retry, won't even be noticeably faster.
IIRC there is a 5 min. wait-time for running jobs when downscaling agents.
If the problem occurs often, we could bump that a bit if there are many long-running jobs.
What's the benefit of someone else running servers? Sounds nice in theory, but reliability on a heterogeneous infrastructure run by an uncoordinated group is likely to suffer.
I guess a simpler dependency file might indeed help us to update the machines. Is this a real problem?
|
@MartinNowak thanks for the evaluation. I'll defer to your expertise in the matter! |
Any opinion on whether this is an actual problem @Geod24? |
@MartinNowak : The lack of machine has definitely hit us in the past. Sometimes there are no agents running for a visible amount of time, although I don't recall any time when it was more than an hour. I wasn't overly bothered by it because I just hit the retry button but @WalterBright was. |
Something that is a bit more lacking is the ability for projects to control their dependencies. With the changes we're seeing in the CI ecosystem (Travis disappearing, Github CI raising) I was hoping we could leverage the Github runner to simplify our current pipeline. That could theoretically make it easier for core contributors to run agents, too. |
Indeed we could rebuild the service in GitHub Actions 👍, might be more accessible for everyone, would require some additional setup time (hopefully fine). Not sure how long their free open source CI will last, I'd guess a while with MSFTs current strategy. |
CC @WalterBright