Area: Area-dotnet test (MTP)
Type: Bug / Test Debt (flaky CI hang)
Severity: High — hangs a dotnet test run for 60+ min, times out the whole macOS CI leg (150-min cap)
Summary
On macOS CI, the SDK integration test
Microsoft.DotNet.Cli.Test.Tests.GivenDotnetTestRunsConsoleAppWithoutHandshake.RunConsoleAppDoesNothing_ShouldReturnCorrectExitCode
hangs for the full 60-minute blame-hang inactivity window. The test invokes the product
dotnet test (Microsoft.Testing.Platform / MTP) against a console app that does nothing and never
performs the MTP handshake. The product is supposed to detect the missing handshake and exit with a
failure code — instead dotnet test hangs indefinitely, so the test never returns.
This appears to be causing macOS helix jobs to time out. This is intermittent, or possibly specific to macOS x64.
Expected behavior
dotnet test against a console app that does not handshake with MTP should fail fast with a meaningful
error and return ExitCode.GenericFailure (this is exactly what the test asserts):
result.ExitCode.Should().Be(ExitCodes.GenericFailure,
"dotnet test should fail with a meaningful error when run against console app without MTP handshake");
Actual behavior
dotnet test never returns. The child dotnet process(es) stay alive waiting for a handshake / exit
that never comes, so the xunit test method blocks in Process.WaitForExit, the test session goes idle,
and after 60 minutes the blame collector force-dumps and kills the process tree.
Evidence
PR / build: dotnet/sdk #54410 (codeflow [main] Source code updates from dotnet/dotnet), merged head
build 1447367, leg TestBuild: macOS (x64).
Helix job: c2dccf78-b122-463c-908a-c4055562985b (queue osx.15.amd64.open).
Helix summary: 205 passed, 0 real failures, 16 work items killed at job cancellation (exit -4),
2 still running. The 16 -4 items are collateral from the job timeout, not genuine failures.
Affected dotnet.Tests work items (all hung, all produced hang dumps):
| Work item |
Hung test |
Elapsed before dump |
dotnet.Tests.dll.19 |
GivenDotnetTestBuildsAndRunsTests.RunOnProjectWithClassLibrary_ShouldReturnExitCodeSuccess |
1h 02m+ |
dotnet.Tests.dll.20 |
GivenDotnetTestBuildsAndRunsTestsForMultipleTFMs.RunProjectWithMultipleTFMs_ParallelizationTest_RunInParallelShouldFail |
1h 21m |
dotnet.Tests.dll.21 |
GivenDotnetTestRunsConsoleAppWithoutHandshake.RunConsoleAppDoesNothing_ShouldReturnCorrectExitCode |
1h 22m |
Timeline (work item .21, from dotnetTestLog.host.*.log and dotnetTestLog.datacollector.*.log):
06:39:16.343 RecordStart: GivenDotnetTestRunsConsoleAppWithoutHandshake.RunConsoleAppDoesNothing_ShouldReturnCorrectExitCode
06:39:16.352 [OUTPUT] Executing 'dotnet test -c Release': ← test spawns child `dotnet test`
06:39:16.xxx (immediately prior test `MTPHelpSnapshotTests.VerifyMTPHelpOutput` ran `dotnet test --help` and PASSED in 9s — harness healthy)
06:39:49.954 [Long Running Test] ... Elapsed: 00:00:33 ← already flagged long-running
‹total silence — no test activity for a full hour›
07:39:16.590 "The specified inactivity time of 60 minutes has elapsed. Collecting hang dumps from testhost and its child processes"
07:39:16.xxx Process tree dumped: dotnet.Tests (testhost) + 2 live child `dotnet` processes
07:42:03 collector kills testhost + both child `dotnet` processes
Blame config in effect: <CollectDumpOnTestSessionHang TestTimeout="60m" HangDumpType="Full" />.
The fact that two child dotnet processes were still alive and had to be killed confirms the child
dotnet test orchestrator (and the spawned no-op app) never exited — dotnet test sat waiting for a
handshake/exit indefinitely.
Test source: test/dotnet.Tests/CommandTests/Test/GivenDotnetTestRunsConsoleAppWithoutHandshake.cs:19
Test asset: ConsoleAppDoesNothing
Hang dumps are available but would need to be analyzed on a MacOS machine.
Repro artifacts
- Helix job:
c2dccf78-b122-463c-908a-c4055562985b
- Work items:
dotnet.Tests.dll.19, dotnet.Tests.dll.20, dotnet.Tests.dll.21
- Files per work item:
*_hangdump.dmp (Mach-O), dotnetTestLog.host.*.log,
dotnetTestLog.datacollector.*.log, dotnetTestLog.log, *-msbuild-dotnet-test.binlog
/cc @dotnet/dotnet-testing-admin @Youssef1313
Area: Area-dotnet test (MTP)
Type: Bug / Test Debt (flaky CI hang)
Severity: High — hangs a
dotnet testrun for 60+ min, times out the whole macOS CI leg (150-min cap)Summary
On macOS CI, the SDK integration test
Microsoft.DotNet.Cli.Test.Tests.GivenDotnetTestRunsConsoleAppWithoutHandshake.RunConsoleAppDoesNothing_ShouldReturnCorrectExitCodehangs for the full 60-minute blame-hang inactivity window. The test invokes the product
dotnet test(Microsoft.Testing.Platform / MTP) against a console app that does nothing and neverperforms the MTP handshake. The product is supposed to detect the missing handshake and exit with a
failure code — instead
dotnet testhangs indefinitely, so the test never returns.This appears to be causing macOS helix jobs to time out. This is intermittent, or possibly specific to macOS x64.
Expected behavior
dotnet testagainst a console app that does not handshake with MTP should fail fast with a meaningfulerror and return
ExitCode.GenericFailure(this is exactly what the test asserts):Actual behavior
dotnet testnever returns. The childdotnetprocess(es) stay alive waiting for a handshake / exitthat never comes, so the xunit test method blocks in
Process.WaitForExit, the test session goes idle,and after 60 minutes the blame collector force-dumps and kills the process tree.
Evidence
PR / build: dotnet/sdk #54410 (codeflow
[main] Source code updates from dotnet/dotnet), merged headbuild 1447367, leg TestBuild: macOS (x64).
Helix job:
c2dccf78-b122-463c-908a-c4055562985b(queueosx.15.amd64.open).Helix summary: 205 passed, 0 real failures, 16 work items killed at job cancellation (exit
-4),2 still running. The 16
-4items are collateral from the job timeout, not genuine failures.Affected
dotnet.Testswork items (all hung, all produced hang dumps):dotnet.Tests.dll.19GivenDotnetTestBuildsAndRunsTests.RunOnProjectWithClassLibrary_ShouldReturnExitCodeSuccessdotnet.Tests.dll.20GivenDotnetTestBuildsAndRunsTestsForMultipleTFMs.RunProjectWithMultipleTFMs_ParallelizationTest_RunInParallelShouldFaildotnet.Tests.dll.21GivenDotnetTestRunsConsoleAppWithoutHandshake.RunConsoleAppDoesNothing_ShouldReturnCorrectExitCodeTimeline (work item
.21, fromdotnetTestLog.host.*.loganddotnetTestLog.datacollector.*.log):Blame config in effect:
<CollectDumpOnTestSessionHang TestTimeout="60m" HangDumpType="Full" />.The fact that two child
dotnetprocesses were still alive and had to be killed confirms the childdotnet testorchestrator (and the spawned no-op app) never exited —dotnet testsat waiting for ahandshake/exit indefinitely.
Test source:
test/dotnet.Tests/CommandTests/Test/GivenDotnetTestRunsConsoleAppWithoutHandshake.cs:19Test asset:
ConsoleAppDoesNothingHang dumps are available but would need to be analyzed on a MacOS machine.
Repro artifacts
c2dccf78-b122-463c-908a-c4055562985bdotnet.Tests.dll.19,dotnet.Tests.dll.20,dotnet.Tests.dll.21*_hangdump.dmp(Mach-O),dotnetTestLog.host.*.log,dotnetTestLog.datacollector.*.log,dotnetTestLog.log,*-msbuild-dotnet-test.binlog/cc @dotnet/dotnet-testing-admin @Youssef1313