Skip to content

dotnet test (MTP) hangs instead of failing when the test app never handshakes — macOS CI timeout #54580

@dsplaisted

Description

@dsplaisted

Area: Area-dotnet test (MTP)
Type: Bug / Test Debt (flaky CI hang)
Severity: High — hangs a dotnet test run for 60+ min, times out the whole macOS CI leg (150-min cap)

Summary

On macOS CI, the SDK integration test
Microsoft.DotNet.Cli.Test.Tests.GivenDotnetTestRunsConsoleAppWithoutHandshake.RunConsoleAppDoesNothing_ShouldReturnCorrectExitCode
hangs for the full 60-minute blame-hang inactivity window. The test invokes the product
dotnet test (Microsoft.Testing.Platform / MTP) against a console app that does nothing and never
performs the MTP handshake. The product is supposed to detect the missing handshake and exit with a
failure code
— instead dotnet test hangs indefinitely, so the test never returns.

This appears to be causing macOS helix jobs to time out. This is intermittent, or possibly specific to macOS x64.

Expected behavior

dotnet test against a console app that does not handshake with MTP should fail fast with a meaningful
error and return ExitCode.GenericFailure (this is exactly what the test asserts):

result.ExitCode.Should().Be(ExitCodes.GenericFailure,
    "dotnet test should fail with a meaningful error when run against console app without MTP handshake");

Actual behavior

dotnet test never returns. The child dotnet process(es) stay alive waiting for a handshake / exit
that never comes, so the xunit test method blocks in Process.WaitForExit, the test session goes idle,
and after 60 minutes the blame collector force-dumps and kills the process tree.

Evidence

PR / build: dotnet/sdk #54410 (codeflow [main] Source code updates from dotnet/dotnet), merged head
build 1447367, leg TestBuild: macOS (x64).
Helix job: c2dccf78-b122-463c-908a-c4055562985b (queue osx.15.amd64.open).
Helix summary: 205 passed, 0 real failures, 16 work items killed at job cancellation (exit -4),
2 still running. The 16 -4 items are collateral from the job timeout, not genuine failures.

Affected dotnet.Tests work items (all hung, all produced hang dumps):

Work item Hung test Elapsed before dump
dotnet.Tests.dll.19 GivenDotnetTestBuildsAndRunsTests.RunOnProjectWithClassLibrary_ShouldReturnExitCodeSuccess 1h 02m+
dotnet.Tests.dll.20 GivenDotnetTestBuildsAndRunsTestsForMultipleTFMs.RunProjectWithMultipleTFMs_ParallelizationTest_RunInParallelShouldFail 1h 21m
dotnet.Tests.dll.21 GivenDotnetTestRunsConsoleAppWithoutHandshake.RunConsoleAppDoesNothing_ShouldReturnCorrectExitCode 1h 22m

Timeline (work item .21, from dotnetTestLog.host.*.log and dotnetTestLog.datacollector.*.log):

06:39:16.343  RecordStart: GivenDotnetTestRunsConsoleAppWithoutHandshake.RunConsoleAppDoesNothing_ShouldReturnCorrectExitCode
06:39:16.352  [OUTPUT] Executing 'dotnet test -c Release':            ← test spawns child `dotnet test`
06:39:16.xxx  (immediately prior test `MTPHelpSnapshotTests.VerifyMTPHelpOutput` ran `dotnet test --help` and PASSED in 9s — harness healthy)
06:39:49.954  [Long Running Test] ... Elapsed: 00:00:33               ← already flagged long-running
   ‹total silence — no test activity for a full hour›
07:39:16.590  "The specified inactivity time of 60 minutes has elapsed. Collecting hang dumps from testhost and its child processes"
07:39:16.xxx  Process tree dumped:  dotnet.Tests (testhost) + 2 live child `dotnet` processes
07:42:03      collector kills testhost + both child `dotnet` processes

Blame config in effect: <CollectDumpOnTestSessionHang TestTimeout="60m" HangDumpType="Full" />.

The fact that two child dotnet processes were still alive and had to be killed confirms the child
dotnet test orchestrator (and the spawned no-op app) never exited — dotnet test sat waiting for a
handshake/exit indefinitely.

Test source: test/dotnet.Tests/CommandTests/Test/GivenDotnetTestRunsConsoleAppWithoutHandshake.cs:19
Test asset: ConsoleAppDoesNothing

Hang dumps are available but would need to be analyzed on a MacOS machine.

Repro artifacts

  • Helix job: c2dccf78-b122-463c-908a-c4055562985b
  • Work items: dotnet.Tests.dll.19, dotnet.Tests.dll.20, dotnet.Tests.dll.21
  • Files per work item: *_hangdump.dmp (Mach-O), dotnetTestLog.host.*.log,
    dotnetTestLog.datacollector.*.log, dotnetTestLog.log, *-msbuild-dotnet-test.binlog

/cc @dotnet/dotnet-testing-admin @Youssef1313

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions