Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mono_os_mutex_destroy: pthread_mutex_destroy failed #97565

Closed
lewing opened this issue Jan 26, 2024 · 9 comments · Fixed by #99332
Closed

mono_os_mutex_destroy: pthread_mutex_destroy failed #97565

lewing opened this issue Jan 26, 2024 · 9 comments · Fixed by #99332
Assignees
Labels
area-Diagnostics-mono blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@lewing
Copy link
Member

lewing commented Jan 26, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=542041
Build error leg or test failing: tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd
Pull request: #97553

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "mono_os_mutex_destroy: pthread_mutex_destroy failed with \"Resource busy\" (16)",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=542041
Error message validated: mono_os_mutex_destroy: pthread_mutex_destroy failed with "Resource busy" (16)
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 1/26/2024 5:28:29 PM UTC

Report

Build Definition Test Pull Request
588137 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #99251
586983 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #99040
584536 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #98495
584243 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #99112
579704 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #98946
577037 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #96440
575061 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #98469
574152 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #98623
570024 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd
570057 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #97135
568701 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #98129
561807 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd
558019 dotnet/runtime tracing/runtimeeventsource/nativeruntimeeventsource/nativeruntimeeventsource.cmd #98138

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 4 13
@lewing lewing added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Jan 26, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 26, 2024
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 26, 2024
@lambdageek
Copy link
Member

Stack trace:

mono_os_mutex_destroy: pthread_mutex_destroy failed with "Resource busy" (16)

=================================================================
	Native Crash Reporting
=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

=================================================================
	Native stacktrace:
=================================================================
	0x1074d0785 - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : mono_dump_native_crash_info
	0x10746e2be - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : mono_handle_native_crash
	0x1076c6858 - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : sigabrt_signal_handler.cold.1
	0x1074d0100 - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : mono_runtime_setup_stat_profiler
	0x7ff8047cadfd - /usr/lib/system/libsystem_platform.dylib : _sigtramp
	0x0 - Unknown
	0x7ff804700d14 - /usr/lib/system/libsystem_c.dylib : abort
	0x10757d238 - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : monoeg_assert_abort
	0x10758b7ca - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : mono_log_write_logfile
	0x10757d6a8 - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : monoeg_g_logv
	0x10757d842 - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : monoeg_g_log
	0x10755247b - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : ep_rt_mono_fini
	0x1073c32ac - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : mini_cleanup
	0x1074269f8 - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : mono_main
	0x1074ab29d - /private/tmp/helix/working/C2010A91/p/libcoreclr.dylib : monovm_execute_assembly
	0x106e4e702 - /private/tmp/helix/working/C2010A91/p/corerun : _ZL3runRK13configuration
	0x106e4aa72 - /private/tmp/helix/working/C2010A91/p/corerun : main
	0x10c3f352e - Unknown

So it's eventpipe cleanup. I think we saw somethign similar recently...

@lambdageek lambdageek added area-Diagnostics-mono and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 26, 2024
@lambdageek
Copy link
Member

Might be related to #85960 (comment)

@tommcdon
Copy link
Member

Possibly addressed with #96936
@davmason

@davmason
Copy link
Member

The callstack pasted above is not related to #96936, this is crashing in ep_rt_mono_fini which is mono specific cleanup. It's likely a similar issue though, just with a different resource.

@tommcdon
Copy link
Member

The callstack pasted above is not related to #96936, this is crashing in ep_rt_mono_fini which is mono specific cleanup. It's likely a similar issue though, just with a different resource.

@lambdageek would you mind taking a second look and/or provide pointers to @davmason for next steps?

@lambdageek
Copy link
Member

Next steps are to do less cleanup in ep_rt_mono_fini. As Johan mentioned in #85960 (comment)

ep_rt_mono_fini assumes that all threads that might run EventPipe code has been stopped, so if there are still threads that can call into EventPipe at that point, it will race with shutdown logic.

On CoreClr/NativeAot we don't have any cleanup done in ep_rt_shutdown and those runtimes will leak resources, but on Mono we do cleanup of runtime resources. We probably need to detect if the shutdown is triggered in a way where other managed threads migth still be running when calling ep_rt_shutdown, if so we would probably need to leak these resources.

I'm not sure what Johan had in mind, but one possiblity is just to call mono_runtime_is_shutting_down and if it is FALSE, just make ep_rt_mono_fini exit early without cleaning up. But I'm not 100% certain that this will account for all situations where event pipe might be shutting down but managed threads are still running.

@tommcdon tommcdon added this to the 9.0.0 milestone Feb 26, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Feb 26, 2024
@davmason
Copy link
Member

@lambdageek, I spent a little time looking at the mono code and I can't convince myself that mono_runtime_is_shutting_down () guarantees that we are not running managed code. Looking at mono_runtime_try_shutdown () it looks like we just stop creating new threads and don't have any guarantee that current threads are gone. Am I missing something?

@lambdageek
Copy link
Member

@davmason you're probably not missing anything. I don't think we have a way to know when all the existing managed threads are really stopped/gonge (we used to havemono_thread_suspend_all_other_threads in "classic" Mono, but in modern .NET we don't try to stop other threads anymore before exiting - it wouldn't work on platforms like WASM, anyway, where we don't have signals).

My suggestion was more "best effort" - if we know we haven't started shutting down at all, don't even try to do any cleanup in ep_rt_mono_fini. If on the other hand shutdown started, we may possibly try to cleanup, but we could still hit the deadlock.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 5, 2024
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 6, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
@pavelsavara
Copy link
Member

it wouldn't work on platforms like WASM, anyway, where we don't have signals).

Probably off-topic here, but we could kill all threads on emscripten via JavaScript, if we are on UI thread. I'm working on it here @lambdageek

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics-mono blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants