[build] Shutdown/kill any build servers at the end of the build.#21315
[build] Shutdown/kill any build servers at the end of the build.#21315rolfbjarne merged 1 commit intomainfrom
Conversation
This is a log from our bots, note the 14 minute gap just before printing the timing results: ``` [...] 2024-09-27T07:34:00.3958920Z Making install in dotnet 2024-09-27T07:34:01.7633820Z Validated file permissions for Xamarin.Mac. 2024-09-27T07:34:01.7800150Z Validated file permissions for Xamarin.iOS. 2024-09-27T07:34:01.7825300Z 2024-09-27T07:34:01.7872490Z Xamarin.iOS has not been installed into your system by 'make install' 2024-09-27T07:34:01.7918570Z In order to set the currently built Xamarin.iOS as your system version, 2024-09-27T07:34:01.7965090Z execute 'make install-system'. 2024-09-27T07:34:01.7987920Z 2024-09-27T07:34:01.8034290Z Xamarin.Mac has not been installed into your system by 'make install' 2024-09-27T07:34:01.8080260Z In order to set the currently built Xamarin.Mac as your system version, 2024-09-27T07:34:01.8126200Z execute 'make install-system'. 2024-09-27T07:34:01.8148530Z 2024-09-27T07:48:22.3100850Z 2024-09-27T07:48:22.3102130Z real 15m26.160s 2024-09-27T07:48:22.3102800Z user 1m4.044s 2024-09-27T07:48:22.3103270Z sys 0m18.379s ```
📚 [CI Build] Artifacts 📚Artifacts were not provided. Pipeline on Agent |
💻 [CI Build] Tests on macOS X64 - Mac Sonoma (14) passed 💻✅ All tests on macOS X64 - Mac Sonoma (14) passed. Pipeline on Agent |
💻 [CI Build] Tests on macOS M1 - Mac Big Sur (11) passed 💻✅ All tests on macOS M1 - Mac Big Sur (11) passed. Pipeline on Agent |
💻 [CI Build] Tests on macOS M1 - Mac Monterey (12) passed 💻✅ All tests on macOS M1 - Mac Monterey (12) passed. Pipeline on Agent |
💻 [CI Build] Tests on macOS M1 - Mac Ventura (13) passed 💻✅ All tests on macOS M1 - Mac Ventura (13) passed. Pipeline on Agent |
This comment was marked as outdated.
This comment was marked as outdated.
✅ API diff for current PR / commitNET (empty diffs)
✅ API diff vs stable.NET (No breaking changes)ℹ️ Generator diffGenerator Diff: vsdrops (html) vsdrops (raw diff) gist (raw diff) - Please review changes) Pipeline on Agent |
💻 [CI Build] Windows Integration Tests passed 💻✅ All Windows Integration Tests passed. Pipeline on Agent |
This comment has been minimized.
This comment has been minimized.
🚀 [CI Build] Test results 🚀Test results✅ All tests passed on VSTS: test results. 🎉 All 97 tests passed 🎉 Tests counts✅ cecil: All 1 tests passed. Html Report (VSDrops) Download Pipeline on Agent |
) This is a log from our bots, note the 14 minute gap just before printing the timing results: ``` [...] 2024-09-27T07:34:00.3958920Z Making install in dotnet 2024-09-27T07:34:01.7633820Z Validated file permissions for Xamarin.Mac. 2024-09-27T07:34:01.7800150Z Validated file permissions for Xamarin.iOS. 2024-09-27T07:34:01.7825300Z 2024-09-27T07:34:01.7872490Z Xamarin.iOS has not been installed into your system by 'make install' 2024-09-27T07:34:01.7918570Z In order to set the currently built Xamarin.iOS as your system version, 2024-09-27T07:34:01.7965090Z execute 'make install-system'. 2024-09-27T07:34:01.7987920Z 2024-09-27T07:34:01.8034290Z Xamarin.Mac has not been installed into your system by 'make install' 2024-09-27T07:34:01.8080260Z In order to set the currently built Xamarin.Mac as your system version, 2024-09-27T07:34:01.8126200Z execute 'make install-system'. 2024-09-27T07:34:01.8148530Z 2024-09-27T07:48:22.3100850Z 2024-09-27T07:48:22.3102130Z real 15m26.160s 2024-09-27T07:48:22.3102800Z user 1m4.044s 2024-09-27T07:48:22.3103270Z sys 0m18.379s ``` What happens is this: * We're using parallel make, and parallel make will start a jobserver, managed by file descriptors, where these file descriptors must be closed in all subprocesses for make to realize it's done. * Any 'dotnet build' might start a build server * The build server does not close any file descriptors it may have inherited when daemonizing itself. * Thus the build server (which will still be alive after we're done building here) might have a file descriptor open which make is waiting for. * The proper fix is to fix the build server to close its file descriptors. * The intermediate working is to shut down the build server instead. This will save 10-15 minutes at the end of every build in the bots.
) This is a log from our bots, note the 14 minute gap just before printing the timing results: ``` [...] 2024-09-27T07:34:00.3958920Z Making install in dotnet 2024-09-27T07:34:01.7633820Z Validated file permissions for Xamarin.Mac. 2024-09-27T07:34:01.7800150Z Validated file permissions for Xamarin.iOS. 2024-09-27T07:34:01.7825300Z 2024-09-27T07:34:01.7872490Z Xamarin.iOS has not been installed into your system by 'make install' 2024-09-27T07:34:01.7918570Z In order to set the currently built Xamarin.iOS as your system version, 2024-09-27T07:34:01.7965090Z execute 'make install-system'. 2024-09-27T07:34:01.7987920Z 2024-09-27T07:34:01.8034290Z Xamarin.Mac has not been installed into your system by 'make install' 2024-09-27T07:34:01.8080260Z In order to set the currently built Xamarin.Mac as your system version, 2024-09-27T07:34:01.8126200Z execute 'make install-system'. 2024-09-27T07:34:01.8148530Z 2024-09-27T07:48:22.3100850Z 2024-09-27T07:48:22.3102130Z real 15m26.160s 2024-09-27T07:48:22.3102800Z user 1m4.044s 2024-09-27T07:48:22.3103270Z sys 0m18.379s ``` What happens is this: * We're using parallel make, and parallel make will start a jobserver, managed by file descriptors, where these file descriptors must be closed in all subprocesses for make to realize it's done. * Any 'dotnet build' might start a build server * The build server does not close any file descriptors it may have inherited when daemonizing itself. * Thus the build server (which will still be alive after we're done building here) might have a file descriptor open which make is waiting for. * The proper fix is to fix the build server to close its file descriptors. * The intermediate working is to shut down the build server instead. This will save 10-15 minutes at the end of every build in the bots.
) This is a log from our bots, note the 14 minute gap just before printing the timing results: ``` [...] 2024-09-27T07:34:00.3958920Z Making install in dotnet 2024-09-27T07:34:01.7633820Z Validated file permissions for Xamarin.Mac. 2024-09-27T07:34:01.7800150Z Validated file permissions for Xamarin.iOS. 2024-09-27T07:34:01.7825300Z 2024-09-27T07:34:01.7872490Z Xamarin.iOS has not been installed into your system by 'make install' 2024-09-27T07:34:01.7918570Z In order to set the currently built Xamarin.iOS as your system version, 2024-09-27T07:34:01.7965090Z execute 'make install-system'. 2024-09-27T07:34:01.7987920Z 2024-09-27T07:34:01.8034290Z Xamarin.Mac has not been installed into your system by 'make install' 2024-09-27T07:34:01.8080260Z In order to set the currently built Xamarin.Mac as your system version, 2024-09-27T07:34:01.8126200Z execute 'make install-system'. 2024-09-27T07:34:01.8148530Z 2024-09-27T07:48:22.3100850Z 2024-09-27T07:48:22.3102130Z real 15m26.160s 2024-09-27T07:48:22.3102800Z user 1m4.044s 2024-09-27T07:48:22.3103270Z sys 0m18.379s ``` What happens is this: * We're using parallel make, and parallel make will start a jobserver, managed by file descriptors, where these file descriptors must be closed in all subprocesses for make to realize it's done. * Any 'dotnet build' might start a build server * The build server does not close any file descriptors it may have inherited when daemonizing itself. * Thus the build server (which will still be alive after we're done building here) might have a file descriptor open which make is waiting for. * The proper fix is to fix the build server to close its file descriptors. * The intermediate working is to shut down the build server instead. This will save 10-15 minutes at the end of every build in the bots.
Parallel make (e.g. 'make all -j8', 'make world') has been hanging indefinitely at the end of the build. This is a long-standing issue (#13355) that has been patched three times (#15407, #21315, #22300) without fully fixing the root cause. The problem: when using parallel make, GNU Make uses a jobserver with pipe-based file descriptors to coordinate sub-makes. The dotnet CLI can start background build servers (MSBuild server, Roslyn/VBCSCompiler) that inherit these file descriptors but never close them. Make then waits indefinitely for those file descriptors to close, thinking there are still active jobs. The previous workaround attempted to shut down and force-kill dotnet processes after the build via a 'shutdown-build-server' target. This approach was unreliable because: - The shutdown ran from a double-colon all-hook:: rule with no prerequisites, so with -j it could execute in parallel with (or before) the actual build, killing nothing. - Build servers started by later subdirectories (e.g. tests/) after the dotnet/ shutdown were never killed. - The process-matching regex pattern might not match all server processes. The fix: disable build servers entirely via environment variables in Make.config: - DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server - UseSharedCompilation=false: prevents the Roslyn compiler server - MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse This eliminates the root cause - no background servers means no inherited file descriptors means no hang. The shutdown-build-server target and its invocations are removed as they are no longer needed. Additionally, 'make world' now prints the installed workloads at the end of the build for visibility. Build without changes: make world 2149.57s user 258.32s system 107% cpu 37:30.19 total Build with changes: make world 2242.74s user 286.38s system 354% cpu 11:52.55 total Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parallel make (e.g. 'make all -j8', 'make world') has been hanging indefinitely at the end of the build. This is a long-standing issue (#13355) that has been patched three times (#15407, #21315, #22300) without fully fixing the root cause. The problem: when using parallel make, GNU Make uses a jobserver with pipe-based file descriptors to coordinate sub-makes. The dotnet CLI can start background build servers (MSBuild server, Roslyn/VBCSCompiler) that inherit these file descriptors but never close them. Make then waits indefinitely for those file descriptors to close, thinking there are still active jobs. The previous workaround attempted to shut down and force-kill dotnet processes after the build via a 'shutdown-build-server' target. This approach was unreliable because: - The shutdown ran from a double-colon all-hook:: rule with no prerequisites, so with -j it could execute in parallel with (or before) the actual build, killing nothing. - Build servers started by later subdirectories (e.g. tests/) after the dotnet/ shutdown were never killed. - The process-matching regex pattern might not match all server processes. The fix: disable build servers entirely via environment variables in Make.config: - DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild-server https://github.com/dotnet/msbuild/blob/main/documentation/MSBuild-Server.md - UseSharedCompilation=false: prevents the Roslyn compiler server (VBCSCompiler) dotnet/roslyn#27975 - MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse https://github.com/dotnet/msbuild/wiki/MSBuild-Tips-&-Tricks This eliminates the root cause - no background servers means no inherited file descriptors means no hang. The shutdown-build-server target and its invocations are removed as they are no longer needed. Additionally, 'make world' now prints the installed workloads at the end of the build for visibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
) Parallel make (e.g. 'make all -j8', 'make world') has been hanging for a while at the end of the build. This is a long-standing issue (#13355) that has been patched three times (#15407, #21315, #22300) without fully fixing the root cause. The problem: when using parallel make, GNU Make uses a jobserver with pipe-based file descriptors to coordinate sub-makes. The dotnet CLI can start background build servers (MSBuild server, Roslyn/VBCSCompiler) that inherit these file descriptors but never close them. Make then waits for those file descriptors to close (which won't happen until the servers exit - which they typically do about 10 minutes without activity), thinking there are still active jobs. The previous workaround attempted to shut down and force-kill dotnet processes after the build via a 'shutdown-build-server' target. This approach was unreliable because: - The shutdown ran from a double-colon all-hook:: rule with no prerequisites, so with -j it could execute in parallel with (or before) the actual build, killing nothing. - Build servers started by later subdirectories (e.g. tests/) after the dotnet/ shutdown were never killed. - The process-matching regex pattern might not match all server processes. Ideally this would be fixed in when launching the build servers, by making them not inherit handles. Unfortunately this is currently not possible: dotnet/runtime#13943 (although this might change in a not so distant future: dotnet/runtime#123959) The workaround: disable build servers entirely via environment variables in Make.config: - DOTNET_CLI_USE_MSBUILD_SERVER=0: prevents the MSBuild server https://learn.microsoft.com/en-us/visualstudio/msbuild/msbuild-server https://github.com/dotnet/msbuild/blob/main/documentation/MSBuild-Server.md - UseSharedCompilation=false: prevents the Roslyn compiler server (VBCSCompiler) dotnet/roslyn#27975 - MSBUILDDISABLENODEREUSE=1: prevents MSBuild node reuse https://github.com/dotnet/msbuild/wiki/MSBuild-Tips-&-Tricks This eliminates the root cause - no background servers means no inherited file descriptors means no hang. The shutdown-build-server target and its invocations are removed as they are no longer needed. Additionally, 'make world' now prints the installed workloads at the end of the build for visibility. Build without changes: > make world 2149.57s user 258.32s system 107% cpu 37:30.19 total Build with changes: > make world 2242.74s user 286.38s system 354% cpu 11:52.55 total Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This is a log from our bots, note the 14 minute gap just before printing the timing results:
What happens is this:
This will save 10-15 minutes at the end of every build in the bots.