Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime jobs being abandoned due to infra #50746

Closed
runfoapp bot opened this issue Apr 5, 2021 · 18 comments · Fixed by #51103
Closed

Runtime jobs being abandoned due to infra #50746

runfoapp bot opened this issue Apr 5, 2021 · 18 comments · Fixed by #51103
Labels
area-Infrastructure blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly

Comments

@runfoapp
Copy link

runfoapp bot commented Apr 5, 2021

Runfo Tracking Issue: Runtime jobs being abandoned due to infra

Definition Build Kind Job Name
runtime 1079569 Rolling Mono Product Build windows x86 debug
runtime 1079569 Rolling CoreCLR Product Build windows x64 checked
runtime 1079569 Rolling CoreCLR Product Build windows x86 checked
runtime 1079569 Rolling CoreCLR Product Build windows x64 release PGO
runtime 1079569 Rolling Libraries Build windows x86 Release
runtime 1079569 Rolling CoreCLR Product Build windows x86 release
runtime 1079569 Rolling Libraries Build windows net48 x64 Release
runtime 1079569 Rolling Libraries Build windows allConfigurations x64 Release
runtime 1079569 Rolling Libraries Build windows x64 Release
runtime 1079569 Rolling CoreCLR Product Build windows x64 release
runtime 1079569 Rolling CoreCLR Product Build windows arm64 checked
runtime 1079569 Rolling Build windows x64 Release SingleFile
runtime 1079569 Rolling Libraries Build windows arm64 Release
runtime 1079569 Rolling Mono Product Build windows x64 debug
runtime 1079569 Rolling CoreCLR Product Build windows arm checked
runtime 1079569 Rolling Libraries Build windows net48 x86 Release
runtime 1079569 Rolling Mono Product Build windows x64 release
runtime 1079569 Rolling Libraries Build windows arm Release
runtime 1079569 Rolling Mono Product Build windows x86 release
runtime 1079569 Rolling CoreCLR Product Build windows arm release
runtime 1079569 Rolling CoreCLR Product Build windows arm64 release
runtime 1079550 PR 50489 CoreCLR Product Build windows arm release
runtime 1079550 PR 50489 CoreCLR Product Build windows x64 release
runtime 1079550 PR 50489 CoreCLR Product Build windows x86 release
runtime 1079550 PR 50489 CoreCLR Product Build windows x64 release PGO
runtime 1079550 PR 50489 CoreCLR Product Build windows arm64 release
runtime 1079545 PR 50894 Mono Product Build windows x64 release
runtime 1079545 PR 50894 Libraries Build windows net48 x86 Release
runtime 1079545 PR 50894 CoreCLR Product Build windows x86 checked
runtime 1079545 PR 50894 Libraries Build windows allConfigurations x64 Debug
runtime 1079545 PR 50894 CoreCLR Product Build windows x64 release PGO
runtime 1079545 PR 50894 Libraries Build windows x86 Release
runtime 1079545 PR 50894 CoreCLR Product Build windows x86 release
runtime 1079545 PR 50894 Libraries Build windows x86 Debug
runtime 1079545 PR 50894 CoreCLR Product Build windows x64 release
runtime 1079545 PR 50894 CoreCLR Product Build windows arm release
runtime 1079545 PR 50894 Libraries Build windows x64 Debug
runtime 1079545 PR 50894 Mono Product Build windows x86 release
runtime 1079545 PR 50894 CoreCLR Product Build windows arm64 release
runtime 1079545 PR 50894 CoreCLR Product Build windows x64 checked
runtime 1079545 PR 50894 CoreCLR Product Build windows arm64 checked
runtime 1079545 PR 50894 Mono Product Build windows x86 debug
runtime 1079545 PR 50894 Build windows x64 Release SingleFile
runtime 1079545 PR 50894 Libraries Build windows arm64 Release
runtime 1079545 PR 50894 Mono Product Build windows x64 debug
runtime 1079545 PR 50894 Libraries Build windows arm Release
runtime 1079545 PR 50894 CoreCLR Product Build windows arm checked
runtime 1079530 PR 50986 Mono Product Build windows x64 release
runtime 1079530 PR 50986 CoreCLR Product Build windows x86 release
runtime 1079530 PR 50986 CoreCLR Product Build windows arm release
runtime 1079530 PR 50986 CoreCLR Product Build windows x64 release
runtime 1079530 PR 50986 Libraries Build windows x86 Debug
runtime 1079530 PR 50986 Libraries Build windows x86 Release
runtime 1079530 PR 50986 CoreCLR Product Build windows x64 release PGO
runtime 1079530 PR 50986 Libraries Build windows allConfigurations x64 Debug
runtime 1079530 PR 50986 CoreCLR Product Build windows x86 checked
runtime 1079530 PR 50986 Build windows x64 Release SingleFile
runtime 1079530 PR 50986 CoreCLR Product Build windows arm checked
runtime 1079530 PR 50986 Mono Product Build windows x64 debug
runtime 1079530 PR 50986 Libraries Build windows arm64 Release
runtime 1079530 PR 50986 Mono Product Build windows x86 debug
runtime 1079530 PR 50986 CoreCLR Product Build windows arm64 checked
runtime 1079530 PR 50986 CoreCLR Product Build windows x64 checked
runtime 1079530 PR 50986 CoreCLR Product Build windows arm64 release
runtime 1079530 PR 50986 Mono Product Build windows x86 release
runtime 1079530 PR 50986 Libraries Build windows x64 Debug
runtime 1079530 PR 50986 Libraries Build windows net48 x86 Release
runtime 1079530 PR 50986 Libraries Build windows arm Release
runtime 1079523 PR 50817 Libraries Build windows x64 Debug
runtime 1079523 PR 50817 Libraries Build windows x86 Debug
runtime 1079523 PR 50817 Libraries Build windows x86 Release
runtime 1079523 PR 50817 Libraries Build windows allConfigurations x64 Debug
runtime 1079385 PR 50954 Installer Build and Test coreclr windows_x86 Debug
runtime 1079385 PR 50954 CoreCLR Pri0 Runtime Tests Run windows x86 checked
runtime 1079324 Rolling Mono Product Build windows x86 release
runtime 1079324 Rolling Libraries Build windows arm Release
runtime 1079324 Rolling Mono Product Build windows x64 release
runtime 1079324 Rolling Libraries Build windows net48 x86 Release
runtime 1079324 Rolling Mono Product Build windows x64 debug
runtime 1079324 Rolling Libraries Build windows arm64 Release
runtime 1079324 Rolling Build windows x64 Release SingleFile
runtime 1079324 Rolling Libraries Build windows x64 Release
runtime 1079324 Rolling Libraries Build windows allConfigurations x64 Release
runtime 1079324 Rolling Libraries Build windows net48 x64 Release
runtime 1079324 Rolling Libraries Build windows x86 Release
runtime 1079324 Rolling Mono Product Build windows x86 debug
runtime 1079065 PR 50569 Mono Product Build windows x86 release
runtime 1078839 PR 50489 Mono Product Build windows x64 debug
runtime 1078839 PR 50489 Libraries Build windows arm64 Release
runtime 1078839 PR 50489 Libraries Build windows allConfigurations x64 Debug
runtime 1078839 PR 50489 CoreCLR Product Build windows x64 release PGO
runtime 1078839 PR 50489 Libraries Build windows x86 Release
runtime 1078839 PR 50489 CoreCLR Product Build windows x86 release
runtime 1078839 PR 50489 Libraries Build windows x86 Debug
runtime 1078839 PR 50489 CoreCLR Product Build windows x64 release
runtime 1078839 PR 50489 Libraries Build windows net48 x86 Release
runtime 1078839 PR 50489 Mono Product Build windows x64 release
runtime 1078839 PR 50489 Libraries Build windows arm Release
runtime 1078839 PR 50489 Libraries Build windows x64 Debug
runtime 1078839 PR 50489 Mono Product Build windows x86 release

Build Result Summary

Day Hit Count Week Hit Count Month Hit Count
7 9 9
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 5, 2021
@jkoritzinsky jkoritzinsky added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' area-Infrastructure labels Apr 5, 2021
@ghost
Copy link

ghost commented Apr 5, 2021

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

Runfo Tracking Issue: Runtime jobs being abandoned due to infra

Definition Build Kind Job Name
runtime 1072700 PR 50622 Mono Product Build windows x64 debug
runtime 1072700 PR 50622 Libraries Build windows arm64 Release
runtime 1072700 PR 50622 Libraries Build windows allConfigurations x64 Debug
runtime 1072700 PR 50622 CoreCLR Product Build windows x64 release PGO
runtime 1072700 PR 50622 Libraries Build windows x86 Release
runtime 1072700 PR 50622 CoreCLR Product Build windows x86 release
runtime 1072700 PR 50622 Libraries Build windows x86 Debug
runtime 1072700 PR 50622 CoreCLR Product Build windows x64 release
runtime 1072700 PR 50622 Libraries Build windows net48 x86 Release
runtime 1072700 PR 50622 Mono Product Build windows x64 release
runtime 1072700 PR 50622 Libraries Build windows arm Release
runtime 1072700 PR 50622 Libraries Build windows x64 Debug
runtime 1072700 PR 50622 Mono Product Build windows x86 release
runtime 1072700 PR 50622 CoreCLR Product Build windows arm64 release
runtime 1072700 PR 50622 Mono Product Build windows x86 debug
runtime 1072700 PR 50622 CoreCLR Product Build windows arm release
runtime 1072700 PR 50622 Build windows x64 Release SingleFile

Build Result Summary

Day Hit Count Week Hit Count Month Hit Count
1 1 1
Author: runfoapp[bot]
Assignees: -
Labels:

area-Infrastructure, blocking-clean-ci, untriaged

Milestone: -

@ghost ghost added this to Untriaged in Infrastructure Backlog Apr 5, 2021
@jkoritzinsky
Copy link
Member

This has started popping up again. @dotent/dnceng

@safern
Copy link
Member

safern commented Apr 5, 2021

cc: @adiaaida this is the same issue that I shared on the FR channel. Happening not only on windows.

@michellemcdaniel
Copy link
Contributor

Got it. Thanks.

@dotnet/dnceng I have opened https://github.com/dotnet/core-eng/issues/12732 to track this on our side

@michellemcdaniel
Copy link
Contributor

@safern These all appear to be running on windows. Is that incorrect?

@safern
Copy link
Member

safern commented Apr 6, 2021

Sorry I miss read somehow, yes they are all windows 🤦

@jakubstilec
Copy link

adding @lukas-lansky

@lukas-lansky
Copy link
Contributor

lukas-lansky commented Apr 6, 2021

Let's look!

@michellemcdaniel
Copy link
Contributor

It was actually Ulises who suspected the scaler. When we saw this was happening, he immediately manually scaled up the queue, and that's why you see that drop off in wait time.

@trylek
Copy link
Member

trylek commented Apr 6, 2021

Hmm, so is the issue supposed to be mitigated? It seems to me that all Windows legs in PR / CI runs are stuck right now, is that just a backlog caused by the previous slowdown or has the problem reappeared even with the upscaled queue?

@ilyas1974
Copy link

It appears there was an issue with the underlying service fabric framework that has been mitigated. We trying to manually scale this queue.

@trylek
Copy link
Member

trylek commented Apr 6, 2021

Thanks Ilya for the clarification; I have also realized that my previous formulation was kind of selfish, what I meant to say was that "all Windows legs in my PR / CI runs are stuck right now" and that continues to be the case, according to your comment for now I just hope that thanks to your manual adjustments the backlog will eventually disappear unless you advise me to take some proactive measures like abandoning my currently running tests and triggering new ones.

@ulisesh
Copy link
Contributor

ulisesh commented Apr 6, 2021

I don't think HelixProd scalesets have been fixed, I keep getting error trying to scale them up. @ilyas1974 @adiaaida we should create an IcM

The affected scalesets are buildpool.windows.10.amd64.vs2017.open-a-scaleset and buildpool.windows.10.amd64.vs2019.open-a-scaleset

@ilyas1974
Copy link

I've created Azure support ticket TrackingID#2104060010003005 for this issue.

@ulisesh
Copy link
Contributor

ulisesh commented Apr 7, 2021

Created a second ticket for vs2019 queue ID#2104070010000075

@jakubstilec
Copy link

The issue is still active, no response from Azure support. I chased 2104070010000075.

@jakubstilec
Copy link

Because there is no update I also create ICM ticket https://portal.microsofticm.com/imp/v3/incidents/details/235578196/home

@ViktorHofer ViktorHofer added tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly and removed untriaged New issue has not been triaged by the area owner labels Apr 7, 2021
@ViktorHofer ViktorHofer added this to the 6.0.0 milestone Apr 7, 2021
@ghost ghost moved this from Untriaged to 6.0.0 in Infrastructure Backlog Apr 7, 2021
akoeplinger added a commit to akoeplinger/runtime that referenced this issue Apr 9, 2021
akoeplinger added a commit that referenced this issue Apr 9, 2021
* Switch to VS preview pool for public builds

Should help mitigate #50746

* Run init-vs-env.cmd for Browser wasm Windows build

The BuildPool.Windows.10.Amd64.VS2019.Pre.Open queue doesn't have ninja installed outside of VS so it's only available in PATH if you run the init-vs-env.cmd script.
akoeplinger added a commit to akoeplinger/runtime that referenced this issue Apr 12, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 12, 2021
Infrastructure Backlog automation moved this from 6.0.0 to Done Apr 12, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 12, 2021
@runfoapp runfoapp bot removed this from the 6.0.0 milestone Apr 14, 2021
@ghost ghost moved this from Done to Untriaged in Infrastructure Backlog Apr 14, 2021
@akoeplinger akoeplinger moved this from Untriaged to Done in Infrastructure Backlog Apr 21, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants