Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge slowdowns on threaded operations when debugger attached (macOS) #10605

Closed
peppy opened this issue Jun 29, 2018 · 32 comments
Closed

Huge slowdowns on threaded operations when debugger attached (macOS) #10605

peppy opened this issue Jun 29, 2018 · 32 comments
Assignees
Milestone

Comments

@peppy
Copy link

peppy commented Jun 29, 2018

When debugging our application (with attached debugger, no breakpoints), performance can drop to a point it becomes frustrating to do anything. On occasion we are also seeing OS level hard locking for seconds to minutes, which may be related.

Reproducible in both VSCode and Jetbrains Rider. This is exclusive to netcore (2.0 and 2.1) – does not occur under mono or net471 runtime environments. It also seems limited to macOS as I have not been able to reproduce on windows.

This can easily be reproduced on our game framework project: https://github.com/ppy/osu-framework (building should require not extra steps beyond checking it out).

  • Start in VisualTests configuration
  • Switch to DelayedLoad in the left menu
  • Observe severe frame drops when threaded load events occur

Testing with debugger attached should drop to less than 1fps while it is easy to maintain hundreds without a debugger attached.

It seems to be directly related to creation of threads, specifically with the TaskCreationOptions.LongRunning flag. On removing this flag from hot paths (#1 #2), performance will return to normal.

I've been trying to reproduce this with a more isolated test case but have not succeeded yet. Some pointers on moving forward in diagnosing this issue would be appreciated!

@RussKeldorph
Copy link
Contributor

@dotnet/dotnet-diag

@peppy
Copy link
Author

peppy commented Jul 5, 2018

Here's a sample taken in our application while artificially reproducing the slowdown by triggering a

Task.Factory.StartNew(() => { }, TaskCreationOptions.LongRunning);

100 times a second:

Sample of dotnet.txt

Here's the same exact execution run under net471 (mono) for comparison:

Sample of mono.txt

@peppy
Copy link
Author

peppy commented Jul 18, 2018

Is there any further details I can provide, or avenues for getting this investigated? It's a pretty critical issue for us and is likely affecting other projects on macOS targets. Currently the workaround is to never attach a debugger or use a VM to debug under windows.

@mikem8361
Copy link
Member

Sorry, we haven't had time to investigate this yet. I'll keep you informed of any status.

/cc: @tommcdon

@peppy
Copy link
Author

peppy commented Aug 27, 2018

We've removed all LongRunning usage and are still running into slowdowns. It has actually gotten worse with increasing use of async-await in our project to the point we can no longer use the debugger outside of a windows VM.

@smoogipoo
Copy link

Hiya! We've been looking into resolving this issue since it's very critical for us.

Just recently we've noticed that Debugger.NotifyOfCrossThreadDependency() is taking more than an order of magnitude longer on macOS than on Windows.

Here's a simple test:

using System;
using System.Diagnostics;

namespace DebuggerTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var sw = Stopwatch.StartNew();
            for (int i = 0; i < 10000; i++)
                Debugger.NotifyOfCrossThreadDependency();
            sw.Stop();

            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}

On Windows this takes ~0.1ms per call (in a VM), and on macOS it takes ~2.6ms per call.

Since this function is called very readily through iterations of the async state machine, I believe this may provide some insight into why debugging performance is pretty poor.

@mikem8361
Copy link
Member

@stephentoub do you have any ideas on this problem?

@jkotas
Copy link
Member

jkotas commented Aug 28, 2018

Debugger.NotifyOfCrossThreadDependency is implemented by calling Debugger::SendCustomDebuggerNotification. This suggests that the handshake done by Debugger::SendCustomDebuggerNotification is slow on macOS.

@mikem8361
Copy link
Member

As far as I can tell there is nothing different about this notification from all notifications. It does stop/suspend all the threads and sends a message via the named pipes to the debugger side.

@mikem8361
Copy link
Member

On the debugger side it does look up the class token of the notification object to check if the notification is enabled. Maybe @noahfalk when he is back in the office can help.

@noahfalk
Copy link
Member

@mikem8361 - I think the next step would be to do some native profiling of the debugger and determine where it spends its time. You could also do a comparison with something like:

for(int i = 0; i < 10000; i++)
{
     Debugger.Log("message");
}

If I had to make a guess, I wouldn't be surprised if all debugger events are slow on macOS, not only the custom notifications. Custom Notification just happens to be one of the debugger scenarios that can generate a high rate of debugger notifications which would expose the poor performance.

In terms of fixing it one approach is probably to address whatever the performance bottleneck is once it is identified via profiling. A second option is that we could implement a debuggee-side cache for the custom notification filter. Most of the time VS doesn't enable the custom notifications but rather than detecting this immediately in the debuggee, instead the runtime suspends, sends the notification to the debugger, then DBI determines nothing is listening to the event and resumes without having done any work.

@smoogipoo
Copy link

smoogipoo commented Aug 29, 2018

Haven't had time to go into a deeper investigation yet (will probably do so on the weekend), but a quick sampling reveals the following (using Debugger.Log()):

2664 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 609  [0x104505c81]
! 2648 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 486  [0x104484406]
! : 2643 Thread::RareDisablePreemptiveGC()  (in libcoreclr.dylib) + 343  [0x1046cd107]
! : | 2641 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 248  [0x1046cdad8]
! : | + 2640 Thread::WaitSuspendEventsHelper()  (in libcoreclr.dylib) + 188  [0x1046cf1bc]
! : | + ! 2639 CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*)  (in libcoreclr.dylib) + 176  [0x1046c6e40]
! : | + ! : 2639 WaitForSingleObjectEx  (in libcoreclr.dylib) + 77  [0x10442776d]
! : | + ! :   2639 CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int)  (in libcoreclr.dylib) + 1891  [0x1044275a3]
! : | + ! :     2639 CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*)  (in libcoreclr.dylib) + 375  [0x104422c27]
! : | + ! :       2639 CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*)  (in libcoreclr.dylib) + 338  [0x104423012]
! : | + ! :         2632 _pthread_cond_wait  (in libsystem_pthread.dylib) + 732  [0x7fff6efeb589]
! : | + ! :         | 2632 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff6ee22a16]
! : | + ! :         7 _pthread_cond_wait  (in libsystem_pthread.dylib) + 750,759  [0x7fff6efeb59b,0x7fff6efeb5a4]
! : | + ! 1 CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*)  (in libcoreclr.dylib) + 153  [0x1046c6e29]
! : | + !   1 NativeExceptionHolderBase::Push()  (in libcoreclr.dylib) + 16  [0x1043f41a0]
! : | + !     1 tlv_get_addr  (in libdyld.dylib) + 0  [0x7fff6ecd4fc4]
! : | + 1 Thread::WaitSuspendEventsHelper()  (in libcoreclr.dylib) + 209  [0x1046cf1d1]
! : | +   1 CLRException::HandlerState::CleanupTry()  (in libcoreclr.dylib) + 37  [0x1045ef195]
! : | 1 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 207  [0x1046cdaaf]
! : | + 1 SetEvent  (in libcoreclr.dylib) + 120  [0x10441e288]
! : | +   1 CorUnix::CSharedMemoryWaitableObject::GetSynchStateController(CorUnix::CPalThread*, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 55  [0x104410627]
! : | 1 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 202  [0x1046cdaaa]
! : 5 Thread::RareDisablePreemptiveGC()  (in libcoreclr.dylib) + 563  [0x1046cd1e3]
! :   5 SwitchToThread  (in libcoreclr.dylib) + 9  [0x10442ce99]
! :     4 sched_yield  (in libsystem_pthread.dylib) + 11  [0x7fff6efeb22e]
! :     + 4 swtch_pri  (in libsystem_kernel.dylib) + 10  [0x7fff6ee19306]
! :     1 sched_yield  (in libsystem_pthread.dylib) + 11  [0x7fff6efeb22e]
! 10 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 426  [0x1044843ca]
! : 5 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 282  [0x10446ed9a]
! : | 4 SetEvent  (in libcoreclr.dylib) + 158  [0x10441e2ae]
! : | + 4 CorUnix::CSynchControllerBase::Release()  (in libcoreclr.dylib) + 130  [0x104420592]
! : | +   3 CorUnix::CThreadSynchronizationInfo::RunDeferredThreadConditionSignalings()  (in libcoreclr.dylib) + 148  [0x1044267e4]
! : | +   ! 3 pthread_cond_signal  (in libsystem_pthread.dylib) + 505  [0x7fff6efec78c]
! : | +   !   3 __psynch_cvsignal  (in libsystem_kernel.dylib) + 10  [0x7fff6ee229fe]
! : | +   1 CorUnix::CThreadSynchronizationInfo::RunDeferredThreadConditionSignalings()  (in libcoreclr.dylib) + 158  [0x1044267ee]
! : | +     1 pthread_mutex_unlock  (in libsystem_pthread.dylib) + 0  [0x7fff6efe86cf]
! : | 1 DebuggerRCThread::WatchForStragglers()  (in libcoreclr.dylib) + 11  [0x1044925ab]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 113  [0x10446ecf1]
! : | 1 StressLog::LogOn(unsigned int, unsigned int)  (in libcoreclr.dylib) + 11  [0x10445896b]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 205  [0x10446ed4d]
! : | 1 ThreadSuspend::LockThreadStore(ThreadSuspend::SUSPEND_REASON)  (in libcoreclr.dylib) + 112  [0x1046cce30]
! : |   1 Thread::SetDebugCantStop(bool)  (in libcoreclr.dylib) + 24  [0x1045abe58]
! : |     1 CExecutionEngine::GetTlsData()  (in libcoreclr.dylib) + 13  [0x10450316d]
! : |       1 tlv_get_addr  (in libdyld.dylib) + 0  [0x7fff6ecd4fc4]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 270  [0x10446ed8e]
! : | 1 Thread::SysStartSuspendForDebug(AppDomain*)  (in libcoreclr.dylib) + 228  [0x1046ce994]
! : |   1 ResetEvent  (in libcoreclr.dylib) + 120  [0x10441e408]
! : |     1 CorUnix::CSharedMemoryWaitableObject::GetSynchStateController(CorUnix::CPalThread*, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 71  [0x104410637]
! : |       1 CorUnix::CPalSynchronizationManager::CreateSynchStateController(CorUnix::CPalThread*, CorUnix::CObjectType*, void*, CorUnix::ObjectDomain, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 1  [0x104424171]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 297  [0x10446eda9]
! : | 1 StressLog::LogOn(unsigned int, unsigned int)  (in libcoreclr.dylib) + 0  [0x104458960]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 336  [0x10446edd0]
! :   1 DebuggerRCThread::IsRCThreadReady()  (in libcoreclr.dylib) + 36  [0x104493984]
! :     1 WaitForSingleObject  (in libcoreclr.dylib) + 72  [0x104426e38]
! :       1 CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int)  (in libcoreclr.dylib) + 788  [0x104427154]
! 4 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 417  [0x1044843c1]
! : 4 DebuggerRCThread::SendIPCEvent()  (in libcoreclr.dylib) + 219  [0x10449394b]
! :   4 Debugger::SendRawEvent(DebuggerIPCEvent const*)  (in libcoreclr.dylib) + 25  [0x10446e689]
! :     4 DbgTransportSession::SendDebugEvent(DebuggerIPCEvent*)  (in libcoreclr.dylib) + 278  [0x104495056]
! :       2 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 467  [0x104494bf3]
! :       | 2 TwoWayPipe::Write(void const*, unsigned int)  (in libcoreclr.dylib) + 36  [0x10449e5d4]
! :       |   2 write  (in libsystem_kernel.dylib) + 10  [0x7fff6ee246fa]
! :       1 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 298  [0x104494b4a]
! :       | 1 _platform_memmove$VARIANT$Haswell  (in libsystem_platform.dylib) + 240  [0x7fff6efe4010]
! :       1 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 499  [0x104494c13]
! :         1 TwoWayPipe::Write(void const*, unsigned int)  (in libcoreclr.dylib) + 36  [0x10449e5d4]
! :           1 write  (in libsystem_kernel.dylib) + 10  [0x7fff6ee246fa]
! 1 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 244  [0x104484314]
! : 1 EEDbgInterfaceImpl::EnablePreemptiveGC()  (in libcoreclr.dylib) + 0  [0x10451a860]
! 1 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 586  [0x10448446a]
!   1 EventPipeWriteEventDebugIPCEventEnd  (in libcoreclr.dylib) + 16  [0x104832f00]
!     1 EventPipeEvent::IsEnabled() const  (in libcoreclr.dylib) + 7  [0x104625237]
2 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 280  [0x104505b38]
! 1 OutputDebugStringW  (in libcoreclr.dylib) + 69  [0x1043f3795]
! : 1 malloc  (in libsystem_malloc.dylib) + 24  [0x7fff6ee7a4c7]
! :   1 malloc_zone_malloc  (in libsystem_malloc.dylib) + 103  [0x7fff6ee7b1bd]
! :     1 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 422  [0x7fff6ee7b3bf]
! :       1 tiny_malloc_from_free_list  (in libsystem_malloc.dylib) + 379  [0x7fff6ee7c30e]
! 1 OutputDebugStringW  (in libcoreclr.dylib) + 104  [0x1043f37b8]
!   1 WideCharToMultiByte  (in libcoreclr.dylib) + 141  [0x104403f3d]
!     1 UnicodeToUTF8  (in libcoreclr.dylib) + 191  [0x1044051ef]
!       1 UTF8Encoding::GetBytes(char16_t*, int, unsigned char*, int)  (in libcoreclr.dylib) + 772  [0x1044059e4]
1 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 616  [0x104505c88]

And yes, I can confirm that Debugger.Log() is just the same in terms of slowdown.

@pieter-venter
Copy link

I would like to add that we've seen the same behaviour on Linux. Debugging our application is extremely slow on Linux and fast on Windows.

Using your small repo, I got the following results:

On Windows: 465 ms (0.0465 ms per call)
On Linux: 14730 ms (1.473 ms per call)

Tried on Ubuntu 1804 and 1604 with .Net SDK 2.1.401, Runtime Version: 2.1.3 Commit: 124038c13e

The application makes extensive use of async/await and tasks. This is often on the call stack if you randomly break:

image

mikem8361 referenced this issue in mikem8361/coreclr Sep 12, 2018
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.
mikem8361 referenced this issue in mikem8361/coreclr Sep 13, 2018
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
mikem8361 referenced this issue in dotnet/coreclr Sep 13, 2018
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
@mikem8361
Copy link
Member

This has been fixed in master.

@mikem8361 mikem8361 reopened this Sep 13, 2018
@mikem8361
Copy link
Member

Need to check if it meets the bar for 2.2 or even 2.1.x.

@peppy
Copy link
Author

peppy commented Sep 14, 2018

Thanks a lot for the fix! This is going to immensely improve our QoL.

@pieter-venter
Copy link

Thank you @mikem8361
Is there a way I can test it for you? I have a docker image that can build the CLR locally for Linux. I can patch your changes into a local build easily. Which assemblies/so files would I need to swap out - or is there more to it than that?

@mikem8361
Copy link
Member

mikem8361 commented Sep 17, 2018 via email

@peppy
Copy link
Author

peppy commented Sep 17, 2018

I have tested this and confirm it has resolved the issues at my end (macOS). Debug performance is on par with my windows VM now 👍

@pieter-venter
Copy link

I have tested this on the same Ubuntu 1804 machine. The repo above went from 14730 ms to 514 ms, on par with Windows.

mikem8361 referenced this issue in mikem8361/coreclr Sep 19, 2018
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
mikem8361 referenced this issue in mikem8361/coreclr Sep 26, 2018
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
mikem8361 referenced this issue in mikem8361/coreclr Oct 2, 2018
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
@danmoseley
Copy link
Member

@peppy @pieter-venter @smoogipoo we are considering porting this into 2.2, which will release later this year. Is it acceptable for you to upgrade to that to receive the fix?

mikem8361 referenced this issue in mikem8361/coreclr Oct 2, 2018
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
@peppy
Copy link
Author

peppy commented Oct 3, 2018

At our end we are tracking the latest releases so are eager to see this love as soon as possible!

In fact we are even willing to use nightly builds for this fix, but unfortunately they do not play well with Rider currently.

@pieter-venter
Copy link

2.2 would be great. We also move to the latest release as soon as possible.

With your guidance, I'd also be open to building a 2.1 release from source and applying the patch locally. My understanding is this fix requires rebuilding the SDK, not just coreclr.

@danmoseley
Copy link
Member

@pieter-venter the 2.1 change would be dotnet/coreclr#20239 - this isn't currently approved for 2.1 but you could build it.

@pieter-venter
Copy link

Thank you for pointing me to the correct pull request. @danmosemsft . This is in the coreclr repo, does that mean I just need to rebuild System.Private.CoreLib.dll or does this change affect other binaries in the SDK?

danmoseley referenced this issue in dotnet/coreclr Oct 4, 2018
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
@danmoseley
Copy link
Member

You would presumably first build CoreCLR from the root, for your target platform. Once that is done @mikem8361 is the best person to say which binaries you need to patch.

@mikem8361
Copy link
Member

You need all the coreclr binaries to make sure everything matches, but it should only affect libcoreclr.dylib libcoredaccore.dylib and libcoredbi.dylib.

@pieter-venter
Copy link

pieter-venter commented Oct 9, 2018

@mikem8361 Thanks for the info. I'm running Ubuntu. I've checked out tags/v.2.1.5 in coreclr repo (that is the version I have installed) and applied the changes in your pull request and rebuilt it. I replaced the equivalent .so files in usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5 you mentioned above (I think the dylib files are for Mac?) and tried the issue reproduction but it was still slow. The debugger still works. This kind of monkey patching not ideal - maybe there are other binaries that I'm missing.
At this point I'll just wait for 2.2 release.

@mikem8361
Copy link
Member

You should copy all the binaries built in the coreclr/bin/Product/Linux.x64.Release dir to /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5.

The fixed has made it in the "release/2.2" coreclr branch so if you checkout that out, do a "clean" build (git clean -xdf) and do the above copy, it should work.

@peppy
Copy link
Author

peppy commented Nov 21, 2018

Just checking back on this – should we be seeing the fix in the latest 2.2 nightlies available here under Release/2.2.1XX (2.2.x Runtime)? It doesn't seem to be applied yet.

@mikem8361
Copy link
Member

mikem8361 commented Nov 23, 2018 via email

@danmoseley
Copy link
Member

@peppy if you install/restore this version you can use ildasm on system.privare.corelib.dll and in the metadata is the coreclr hash that was built from .

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 2.2.x milestone Jan 31, 2020
@msftbot msftbot bot locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants