Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler crashes when failing to release Mutex #53420

Open
ViktorHofer opened this issue May 28, 2021 · 37 comments
Open

Compiler crashes when failing to release Mutex #53420

ViktorHofer opened this issue May 28, 2021 · 37 comments
Labels
area-PAL-coreclr bug Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro)
Milestone

Comments

@ViktorHofer
Copy link
Member

ViktorHofer commented May 28, 2021

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "ReleaseMutex failed. WaitOne Id:",
  "BuildRetry": true,
  "ErrorPattern": "",
  "ExcludeConsoleLog": true
}

Example error:

/__w/1/s/.packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error MSB3883: Unexpected exception:  [/__w/1/s/src/tasks/installer.tasks/installer.tasks.csproj]
/__w/1/s/.packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error : One or more errors occurred. (ReleaseMutex failed. WaitOne Id: 14 Release Id: 14) [/__w/1/s/src/tasks/installer.tasks/installer.tasks.csproj]

Build: https://dev.azure.com/dnceng/public/_build/results?buildId=1161146&view=logs&j=1a1e7658-955a-5bfe-8410-74fa4a1a1fbf&t=9528cbd6-a8dd-5006-9c48-38938c12a3af

Configuration: CoreCLR Product Build Linux arm64 release

cc @jkotas @jaredpar @stephentoub

Known issue validation

Build: 🔎
Result validation: ⚠️ Provided build not found. Provide a valid build in the "Build: 🔎" line.
Validation performed at: 9/15/2023 5:21:46 PM UTC

Report

Build Definition Step Name Console log Pull Request
730977 dotnet/runtime Build product Log
2486515 dotnet-runtime Build product Log
717803 dotnet/runtime Build product Log #103804
716419 dotnet/runtime Build product Log #103846
711889 dotnet/runtime Build product Log #103617
707785 dotnet/runtime Build product Log #99818
707074 dotnet/runtime Build product Log #103439
704514 dotnet/runtime Build product Log #103322
703069 dotnet/runtime Build product Log

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 2 9
@ViktorHofer ViktorHofer added this to the 6.0.0 milestone May 28, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label May 28, 2021
@ViktorHofer ViktorHofer changed the title Compiler crashes with failing to release Mutex Compiler crashes when failing to release Mutex May 28, 2021
@jkotas
Copy link
Member

jkotas commented May 28, 2021

Is the exception that it failed with written in some log?

@ghost
Copy link

ghost commented May 28, 2021

Tagging subscribers to this area: @cston
See info in area-owners.md if you want to be subscribed.

Issue Details
/__w/1/s/.packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error MSB3883: Unexpected exception:  [/__w/1/s/src/tasks/installer.tasks/installer.tasks.csproj]
/__w/1/s/.packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error : One or more errors occurred. (ReleaseMutex failed. WaitOne Id: 14 Release Id: 14) [/__w/1/s/src/tasks/installer.tasks/installer.tasks.csproj]

Build: https://dev.azure.com/dnceng/public/_build/results?buildId=1161146&view=logs&j=1a1e7658-955a-5bfe-8410-74fa4a1a1fbf&t=9528cbd6-a8dd-5006-9c48-38938c12a3af

Configuration: CoreCLR Product Build Linux arm64 release

cc @jkotas @jaredpar @stephentoub

Author: ViktorHofer
Assignees: -
Labels:

area-Microsoft.CSharp, bug, untriaged

Milestone: 6.0.0

@hoyosjs
Copy link
Member

hoyosjs commented May 28, 2021

@jaredpar
Copy link
Member

That crash is coming from this line

https://sourceroslyn.io/#Microsoft.Build.Tasks.CodeAnalysis/BuildServerConnection.cs,239

When communicating between MSBuild nodes and the Compiler server we use a Mutex as a part of the initial handshake. The code here is crashing because when we release the Mutex via Dispose we're getting an ApplicationException.

Our code considers this fatal because the docs say that ApplicationException will only be thrown when the thread calling Dispose does not own the Mutex. Basically coding error on our part that is significant. That is also why we log the thread IDs in the message here. You'll notice though in this case that the thread IDs are the same though which means the same thread both opened and released the Mutex which should be safe (based on our understanding).

What other circumstances can cause an ApplicationException to happen here?

@RikkiGibson, @chsienki, @cston

@cston
Copy link
Member

cston commented May 31, 2021

in this case that the thread IDs are the same though which means the same thread both opened and released the Mutex which should be safe

Could the Mutex have been disposed on another thread?

@jaredpar
Copy link
Member

jaredpar commented Jun 1, 2021

In our case unlikely. This Mutex instance is created and managed on a per-thread basis. It's a local that isn't shared amongst threads and the message confirms that there isn't any tricky await code involved that causes us to inadvertently switch threads.

The one issue to keep in mind is that we have an abstraction over Mutex here to deal with the difference implementation between the server on the Mono and .NET runtime. Hence the Dispose call is really a wrapper over:

        public void Dispose()
        {
            if (IsDisposed)
                return;
            IsDisposed = true;
 
            try
            {
                if (IsLocked)
                    Mutex.ReleaseMutex();
            }
            finally
            {
                Mutex.Dispose();
                IsLocked = false;
            }
        }

One possibility is we are incorrectly setting IsLocked which would lead to this exception being thrown.

@jkotas
Copy link
Member

jkotas commented Jun 2, 2021

Failed again in #53590 . I think it is most likely Linux arm64 specific bug in CoreCLR PAL.

@jkotas jkotas unassigned cston Jun 2, 2021
@jkotas jkotas added arch-arm64 os-linux Linux OS (any supported distro) and removed area-Microsoft.CSharp labels Jun 2, 2021
@jkotas
Copy link
Member

jkotas commented Jun 2, 2021

Current runfo query shows a hit on x64 too, so it is not arm64 specific.

@danmoseley
Copy link
Member

Again, but in Mono Android Arm64 release build, using 3.1.

2021-06-25T16:42:48.0342786Z  /__w/1/s/build.sh -ci -arch x64 -os Android -s mono+libs+host+packs+mono.mscordbi -c Release /p:OfficialBuildId=20210625.7 /p:CrossBuild=false  
2021-06-25T16:42:48.0343494Z ========================== Starting Command Output ===========================
2021-06-25T16:42:48.0364929Z [command]/bin/bash --noprofile --norc /__w/_temp/455af3b1-60d8-49d1-88ab-abbd6629e02b.sh
2021-06-25T16:42:48.0702381Z __DistroRid: android-x64
2021-06-25T16:42:48.2260529Z /__w/1/s/.dotnet/sdk/6.0.100-preview.4.21255.9/MSBuild.dll /nologo -logger:/__w/1/s/.packages/microsoft.dotnet.arcade.sdk/6.0.0-beta.21311.3/tools/netcoreapp3.1/Microsoft.DotNet.ArcadeLogging.dll -maxcpucount /m -verbosity:m /v:minimal /bl:/__w/1/s/artifacts/log/Release/Build.binlog /clp:Summary /nr:false /p:TreatWarningsAsErrors=true /p:ContinuousIntegrationBuild=true /p:Configuration=Release /p:RepoRoot=/__w/1/s/ /p:Restore=true /p:Build=true /p:Rebuild=false /p:Test=false /p:Pack=false /p:IntegrationTest=false /p:PerformanceTest=false /p:Sign=false /p:Publish=false /p:TargetOS=Android /p:Subset=mono+libs+host+packs+mono.mscordbi /p:TargetArchitecture=x64 /p:BuildArchitecture=x64 /p:CMakeArgs= /p:OfficialBuildId=20210625.7 /p:CrossBuild=false /warnaserror /__w/1/s/.packages/microsoft.dotnet.arcade.sdk/6.0.0-beta.21311.3/tools/Build.proj
...
2021-06-25T16:45:35.4955064Z /__w/1/s/.packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error MSB3883: Unexpected exception:  [/__w/1/s/src/tasks/Crossgen2Tasks/Crossgen2Tasks.csproj]
2021-06-25T16:45:35.5037748Z ##[error].packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error MSB3883: (NETCORE_ENGINEERING_TELEMETRY=Build) Unexpected exception: 
2021-06-25T16:45:35.5051214Z /__w/1/s/.packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error : One or more errors occurred. (ReleaseMutex failed. WaitOne Id: 14 Release Id: 14) [/__w/1/s/src/tasks/Crossgen2Tasks/Crossgen2Tasks.csproj]
2021-06-25T16:45:35.5056497Z ##[error].packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error : (NETCORE_ENGINEERING_TELEMETRY=Build) One or more errors occurred. (ReleaseMutex failed. WaitOne Id: 14 Release Id: 14)
2021-06-25T16:45:35.5059398Z /__w/1/s/.packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error MSB3883: Unexpected exception:  [/__w/1/s/src/tasks/AotCompilerTask/MonoAOTCompiler.csproj]
2021-06-25T16:45:35.5063602Z ##[error].packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error MSB3883: (NETCORE_ENGINEERING_TELEMETRY=Build) Unexpected exception: 
2021-06-25T16:45:35.5089855Z /__w/1/s/.packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error : One or more errors occurred. (ReleaseMutex failed. WaitOne Id: 14 Release Id: 14) [/__w/1/s/src/tasks/AotCompilerTask/MonoAOTCompiler.csproj]
2021-06-25T16:45:35.5094289Z ##[error].packages/microsoft.net.compilers.toolset/3.10.0-4.21273.6/tasks/netcoreapp3.1/Microsoft.CSharp.Core.targets(71,5): error : (NETCORE_ENGINEERING_TELEMETRY=Build) One or more errors occurred. (ReleaseMutex failed. WaitOne Id: 14 Release Id: 14)

@janvorli
Copy link
Member

@jaredpar I would like to look into this issue - is there a reasonable way to repro it locally?

@am11
Copy link
Member

am11 commented Dec 13, 2021

Looks like this condition is failing:

if (IsSignaled || _ownershipInfo!.Thread != Thread.CurrentThread)

If current thread was the owning thread, then the only other explanation is waitableobject was already signaled.

@kouvel
Copy link
Member

kouvel commented Dec 14, 2021

@am11 I don't think that would be it since that implementation does not support named mutexes.

I didn't see anything obvious that would cause a ReleaseMutex to fail with ApplicationException. Another reason for throwing that exception is if the handle does not represent a mutex for some reason, though that would be unlikely.

Maybe it would be useful to add some more info to the exception message, like maybe:

  • Whether the mutex was created, and whether the mutex was abandoned in TryLock
  • The handle value seen after creation and before release
  • Include the exception stack trace in the message

Not sure it would help much but may at least rule something out.

Noticed something else, though it probably wouldn't lead to this issue. This line should probably catch AbandonedMutexException and set IsLocked to true in that case, instead of catching it in the caller, otherwise IsLocked would remain false upon exception and the thread would abandon the mutex again.

@am11
Copy link
Member

am11 commented Dec 14, 2021

to fail with ApplicationException

Do we know where is this coming from? (if not WaitSubsystem.WaitableObject.Unix.cs#714)

@kouvel
Copy link
Member

kouvel commented Dec 14, 2021

Do we know where is this coming from? (if not WaitSubsystem.WaitableObject.Unix.cs#714)

It's probably from here:

throw new ApplicationException(SR.Arg_SynchronizationLockException);

After getting an exception here:

catch (SharedMemoryException ex)

@am11
Copy link
Member

am11 commented Dec 15, 2021

Hm, original report was for Linux arm64, so I was tracing Unix partials (which calls into the waitsubsystem).

@kouvel
Copy link
Member

kouvel commented Dec 15, 2021

Mutex.Unix.cs is only used with the WaitSubsystem implementation, on CoreCLR the Unix implementation (confusingly) uses Mutex.Windows.cs, the p/invoke goes into the PAL, which implements Windows-like APIs such as ReleaseMutex.

@mangod9
Copy link
Member

mangod9 commented Jul 3, 2023

this is an old issue, is this something which needs to be looked at ?

@kouvel
Copy link
Member

kouvel commented Jul 18, 2023

Closing for now, as there haven't been recent reports and there isn't sufficient info available to diagnose the issue. Issue #89090 covers adding more info about system call errors to named mutex exceptions.

@kouvel kouvel closed this as completed Jul 18, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 17, 2023
@ericstj
Copy link
Member

ericstj commented Sep 15, 2023

@ericstj ericstj reopened this Sep 15, 2023
@ericstj ericstj added the Known Build Error Use this to report build issues in the .NET Helix tab label Sep 15, 2023
@mangod9 mangod9 modified the milestones: 8.0.0, 9.0.0 Sep 20, 2023
@build-analysis build-analysis bot removed this from the 9.0.0 milestone Nov 15, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 15, 2023
@akoeplinger akoeplinger added this to the 9.0.0 milestone Nov 24, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Nov 24, 2023
@dotnet dotnet unlocked this conversation Nov 24, 2023
@Somfic
Copy link

Somfic commented Feb 2, 2024

Hey, this just hit on my ci/cd, (macos) https://github.com/Somfic/vla/actions/runs/7751958359/job/21140666679?pr=74

@jaredpar
Copy link
Member

jaredpar commented Feb 2, 2024

From @Somfic log

ReleaseMutex failed. WaitOne Id: 16 Release Id: 16

Confirms that the calls are indeed happening on the same thread id.

@AaronRobinsonMSFT
Copy link
Member

I'm seeing this while building on my macOS M2.

@kouvel
Copy link
Member

kouvel commented Apr 4, 2024

I'm seeing this while building on my macOS M2.

There should be better diagnostic info in .NET 9 including info from system call errors (#92603). Can you share more details about the failure?

@AaronRobinsonMSFT
Copy link
Member

@kouvel Not really. The error message was basically the same as this issue. Let me see if I can reproduce it.

@kouvel
Copy link
Member

kouvel commented Apr 4, 2024

Ah right I forgot the extra info is not tracked for ReleaseMutex. I'll also try to repro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-PAL-coreclr bug Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests