Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should be the action for build timeouts? #11072

Closed
2 tasks
jkotas opened this issue Sep 29, 2022 · 7 comments · Fixed by dotnet/runtime#76453
Closed
2 tasks

What should be the action for build timeouts? #11072

jkotas opened this issue Sep 29, 2022 · 7 comments · Fixed by dotnet/runtime#76453

Comments

@jkotas
Copy link
Member

jkotas commented Sep 29, 2022

Build

https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=34958

Build leg reported

Build windows x86 release Runtime_Debug

Pull Request

dotnet/runtime#76386

Action required for the engineering services team

To triage this issue (First Responder / @dotnet/dnceng):

  • Open the failing build above and investigate
  • Add a comment explaining your findings

If this is an issue that is causing build breaks across multiple builds and would get benefit from being listed on the build analysis check, follow the next steps:

  1. Add the label "Known Build Error"
  2. Edit this issue and add an error string in the Json below that can help us match this issue with future build breaks. You should use the known issues documentation
{
   "ErrorMessage" : "",
   "BuildRetry": false
}

Additional information about the issue reported

The build timeouts are relatively common reason for red PRs. I understand that there are number of factors outside our control that can lead to the build timeouts. Still, we need to have clarity on what one should do with these timeouts. They typically go away with manual retry.

Should we have "Known Build Error" issue that auto-retries? Or should we have a "Known Build Error" issue that does not auto-retry and just keeps track how often we are seeing these timeouts?

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@markwilkie
Copy link
Member

@ulisesh / @AlitzelMendez / @missymessa - are we able to auto retry the build for timeouts?

@missymessa
Copy link
Member

I'd imagine that should work, provided we know the timeout error message to look for. Ali can correct me if I'm wrong

@ulisesh
Copy link
Contributor

ulisesh commented Sep 30, 2022

I might be wrong but it looks like the pipeline had a timeout of 90 minutes for the job that timed out. The timeout should be increased.

@MattGal
Copy link
Member

MattGal commented Sep 30, 2022

A build which times out, except in very unusual circumstances, is pretty likely to time out on attempts 2..N. Seems like something we'd explicitly not auto-retry given the alternative workaround (increase the timeout)

@jkotas
Copy link
Member Author

jkotas commented Sep 30, 2022

A build which times out, except in very unusual circumstances, is pretty likely to time out on attempts 2..N.

It is not what I am seeing. There is a lot of intermittent "slow machine problems". It affects macOS the most (see #10794). It can be seen with other OSes too (windows x86 in this case). Build that times out is very likely to pass on rerun.

What would you recommend as a margin for slow machines? If we have a build that typically finishes in 1 hour, what should the timeout be set to in the yaml to account for intermittent slow machines?

jkotas added a commit to jkotas/runtime that referenced this issue Sep 30, 2022
Increase timeouts for runtime-dev-innerloop legs to compensate for intermittently slow build machines.

Fixes dotnet/arcade#11072
@MattGal
Copy link
Member

MattGal commented Sep 30, 2022

I'd say at least double, i.e. 2 hours for a 1 hour typical case. I am keenly aware of the "slow macOS" problem but it's hopelessly entangled in the "Helix runs take as long as they take because there's only one pool of machines" problem; perhaps you can point to some of these and we can discuss more specifically?

@jkotas
Copy link
Member Author

jkotas commented Sep 30, 2022

I'd say at least double, i.e. 2 hours for a 1 hour typical case.

Sounds good. It is what I have done in dotnet/runtime#76453

you can point to some of these and we can discuss more specifically?

I am going to open "Known build error" issue in dotnet/runtime that will accumulate the timeouts and we can then see what to do about them.

@jkotas jkotas closed this as completed Sep 30, 2022
jkotas added a commit to dotnet/runtime that referenced this issue Sep 30, 2022
Increase timeouts for runtime-dev-innerloop legs to compensate for intermittently slow build machines.

Fixes dotnet/arcade#11072
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants