New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge slowdowns on threaded operations when debugger attached (macOS) #18705

Open
peppy opened this Issue Jun 29, 2018 · 20 comments

Comments

Projects
None yet
7 participants
@peppy

peppy commented Jun 29, 2018

When debugging our application (with attached debugger, no breakpoints), performance can drop to a point it becomes frustrating to do anything. On occasion we are also seeing OS level hard locking for seconds to minutes, which may be related.

Reproducible in both VSCode and Jetbrains Rider. This is exclusive to netcore (2.0 and 2.1) – does not occur under mono or net471 runtime environments. It also seems limited to macOS as I have not been able to reproduce on windows.

This can easily be reproduced on our game framework project: https://github.com/ppy/osu-framework (building should require not extra steps beyond checking it out).

  • Start in VisualTests configuration
  • Switch to DelayedLoad in the left menu
  • Observe severe frame drops when threaded load events occur

Testing with debugger attached should drop to less than 1fps while it is easy to maintain hundreds without a debugger attached.

It seems to be directly related to creation of threads, specifically with the TaskCreationOptions.LongRunning flag. On removing this flag from hot paths (#1 #2), performance will return to normal.

I've been trying to reproduce this with a more isolated test case but have not succeeded yet. Some pointers on moving forward in diagnosing this issue would be appreciated!

@RussKeldorph

This comment has been minimized.

Show comment
Hide comment
@RussKeldorph
Member

RussKeldorph commented Jul 2, 2018

@peppy

This comment has been minimized.

Show comment
Hide comment
@peppy

peppy Jul 5, 2018

Here's a sample taken in our application while artificially reproducing the slowdown by triggering a

Task.Factory.StartNew(() => { }, TaskCreationOptions.LongRunning);

100 times a second:

Sample of dotnet.txt

Here's the same exact execution run under net471 (mono) for comparison:

Sample of mono.txt

peppy commented Jul 5, 2018

Here's a sample taken in our application while artificially reproducing the slowdown by triggering a

Task.Factory.StartNew(() => { }, TaskCreationOptions.LongRunning);

100 times a second:

Sample of dotnet.txt

Here's the same exact execution run under net471 (mono) for comparison:

Sample of mono.txt

@peppy

This comment has been minimized.

Show comment
Hide comment
@peppy

peppy Jul 18, 2018

Is there any further details I can provide, or avenues for getting this investigated? It's a pretty critical issue for us and is likely affecting other projects on macOS targets. Currently the workaround is to never attach a debugger or use a VM to debug under windows.

peppy commented Jul 18, 2018

Is there any further details I can provide, or avenues for getting this investigated? It's a pretty critical issue for us and is likely affecting other projects on macOS targets. Currently the workaround is to never attach a debugger or use a VM to debug under windows.

@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 Jul 18, 2018

Member

Sorry, we haven't had time to investigate this yet. I'll keep you informed of any status.

/cc: @tommcdon

Member

mikem8361 commented Jul 18, 2018

Sorry, we haven't had time to investigate this yet. I'll keep you informed of any status.

/cc: @tommcdon

@peppy

This comment has been minimized.

Show comment
Hide comment
@peppy

peppy Aug 27, 2018

We've removed all LongRunning usage and are still running into slowdowns. It has actually gotten worse with increasing use of async-await in our project to the point we can no longer use the debugger outside of a windows VM.

peppy commented Aug 27, 2018

We've removed all LongRunning usage and are still running into slowdowns. It has actually gotten worse with increasing use of async-await in our project to the point we can no longer use the debugger outside of a windows VM.

@smoogipoo

This comment has been minimized.

Show comment
Hide comment
@smoogipoo

smoogipoo Aug 28, 2018

Hiya! We've been looking into resolving this issue since it's very critical for us.

Just recently we've noticed that Debugger.NotifyOfCrossThreadDependency() is taking more than an order of magnitude longer on macOS than on Windows.

Here's a simple test:

using System;
using System.Diagnostics;

namespace DebuggerTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var sw = Stopwatch.StartNew();
            for (int i = 0; i < 10000; i++)
                Debugger.NotifyOfCrossThreadDependency();
            sw.Stop();

            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}

On Windows this takes ~0.1ms per call (in a VM), and on macOS it takes ~2.6ms per call.

Since this function is called very readily through iterations of the async state machine, I believe this may provide some insight into why debugging performance is pretty poor.

smoogipoo commented Aug 28, 2018

Hiya! We've been looking into resolving this issue since it's very critical for us.

Just recently we've noticed that Debugger.NotifyOfCrossThreadDependency() is taking more than an order of magnitude longer on macOS than on Windows.

Here's a simple test:

using System;
using System.Diagnostics;

namespace DebuggerTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var sw = Stopwatch.StartNew();
            for (int i = 0; i < 10000; i++)
                Debugger.NotifyOfCrossThreadDependency();
            sw.Stop();

            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}

On Windows this takes ~0.1ms per call (in a VM), and on macOS it takes ~2.6ms per call.

Since this function is called very readily through iterations of the async state machine, I believe this may provide some insight into why debugging performance is pretty poor.

@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 Aug 28, 2018

Member

@stephentoub do you have any ideas on this problem?

Member

mikem8361 commented Aug 28, 2018

@stephentoub do you have any ideas on this problem?

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Aug 28, 2018

Member

Debugger.NotifyOfCrossThreadDependency is implemented by calling Debugger::SendCustomDebuggerNotification. This suggests that the handshake done by Debugger::SendCustomDebuggerNotification is slow on macOS.

Member

jkotas commented Aug 28, 2018

Debugger.NotifyOfCrossThreadDependency is implemented by calling Debugger::SendCustomDebuggerNotification. This suggests that the handshake done by Debugger::SendCustomDebuggerNotification is slow on macOS.

@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 Aug 28, 2018

Member

As far as I can tell there is nothing different about this notification from all notifications. It does stop/suspend all the threads and sends a message via the named pipes to the debugger side.

Member

mikem8361 commented Aug 28, 2018

As far as I can tell there is nothing different about this notification from all notifications. It does stop/suspend all the threads and sends a message via the named pipes to the debugger side.

@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 Aug 28, 2018

Member

On the debugger side it does look up the class token of the notification object to check if the notification is enabled. Maybe @noahfalk when he is back in the office can help.

Member

mikem8361 commented Aug 28, 2018

On the debugger side it does look up the class token of the notification object to check if the notification is enabled. Maybe @noahfalk when he is back in the office can help.

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk Aug 28, 2018

Member

@mikem8361 - I think the next step would be to do some native profiling of the debugger and determine where it spends its time. You could also do a comparison with something like:

for(int i = 0; i < 10000; i++)
{
     Debugger.Log("message");
}

If I had to make a guess, I wouldn't be surprised if all debugger events are slow on macOS, not only the custom notifications. Custom Notification just happens to be one of the debugger scenarios that can generate a high rate of debugger notifications which would expose the poor performance.

In terms of fixing it one approach is probably to address whatever the performance bottleneck is once it is identified via profiling. A second option is that we could implement a debuggee-side cache for the custom notification filter. Most of the time VS doesn't enable the custom notifications but rather than detecting this immediately in the debuggee, instead the runtime suspends, sends the notification to the debugger, then DBI determines nothing is listening to the event and resumes without having done any work.

Member

noahfalk commented Aug 28, 2018

@mikem8361 - I think the next step would be to do some native profiling of the debugger and determine where it spends its time. You could also do a comparison with something like:

for(int i = 0; i < 10000; i++)
{
     Debugger.Log("message");
}

If I had to make a guess, I wouldn't be surprised if all debugger events are slow on macOS, not only the custom notifications. Custom Notification just happens to be one of the debugger scenarios that can generate a high rate of debugger notifications which would expose the poor performance.

In terms of fixing it one approach is probably to address whatever the performance bottleneck is once it is identified via profiling. A second option is that we could implement a debuggee-side cache for the custom notification filter. Most of the time VS doesn't enable the custom notifications but rather than detecting this immediately in the debuggee, instead the runtime suspends, sends the notification to the debugger, then DBI determines nothing is listening to the event and resumes without having done any work.

@smoogipoo

This comment has been minimized.

Show comment
Hide comment
@smoogipoo

smoogipoo Aug 29, 2018

Haven't had time to go into a deeper investigation yet (will probably do so on the weekend), but a quick sampling reveals the following (using Debugger.Log()):

2664 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 609  [0x104505c81]
! 2648 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 486  [0x104484406]
! : 2643 Thread::RareDisablePreemptiveGC()  (in libcoreclr.dylib) + 343  [0x1046cd107]
! : | 2641 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 248  [0x1046cdad8]
! : | + 2640 Thread::WaitSuspendEventsHelper()  (in libcoreclr.dylib) + 188  [0x1046cf1bc]
! : | + ! 2639 CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*)  (in libcoreclr.dylib) + 176  [0x1046c6e40]
! : | + ! : 2639 WaitForSingleObjectEx  (in libcoreclr.dylib) + 77  [0x10442776d]
! : | + ! :   2639 CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int)  (in libcoreclr.dylib) + 1891  [0x1044275a3]
! : | + ! :     2639 CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*)  (in libcoreclr.dylib) + 375  [0x104422c27]
! : | + ! :       2639 CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*)  (in libcoreclr.dylib) + 338  [0x104423012]
! : | + ! :         2632 _pthread_cond_wait  (in libsystem_pthread.dylib) + 732  [0x7fff6efeb589]
! : | + ! :         | 2632 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff6ee22a16]
! : | + ! :         7 _pthread_cond_wait  (in libsystem_pthread.dylib) + 750,759  [0x7fff6efeb59b,0x7fff6efeb5a4]
! : | + ! 1 CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*)  (in libcoreclr.dylib) + 153  [0x1046c6e29]
! : | + !   1 NativeExceptionHolderBase::Push()  (in libcoreclr.dylib) + 16  [0x1043f41a0]
! : | + !     1 tlv_get_addr  (in libdyld.dylib) + 0  [0x7fff6ecd4fc4]
! : | + 1 Thread::WaitSuspendEventsHelper()  (in libcoreclr.dylib) + 209  [0x1046cf1d1]
! : | +   1 CLRException::HandlerState::CleanupTry()  (in libcoreclr.dylib) + 37  [0x1045ef195]
! : | 1 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 207  [0x1046cdaaf]
! : | + 1 SetEvent  (in libcoreclr.dylib) + 120  [0x10441e288]
! : | +   1 CorUnix::CSharedMemoryWaitableObject::GetSynchStateController(CorUnix::CPalThread*, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 55  [0x104410627]
! : | 1 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 202  [0x1046cdaaa]
! : 5 Thread::RareDisablePreemptiveGC()  (in libcoreclr.dylib) + 563  [0x1046cd1e3]
! :   5 SwitchToThread  (in libcoreclr.dylib) + 9  [0x10442ce99]
! :     4 sched_yield  (in libsystem_pthread.dylib) + 11  [0x7fff6efeb22e]
! :     + 4 swtch_pri  (in libsystem_kernel.dylib) + 10  [0x7fff6ee19306]
! :     1 sched_yield  (in libsystem_pthread.dylib) + 11  [0x7fff6efeb22e]
! 10 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 426  [0x1044843ca]
! : 5 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 282  [0x10446ed9a]
! : | 4 SetEvent  (in libcoreclr.dylib) + 158  [0x10441e2ae]
! : | + 4 CorUnix::CSynchControllerBase::Release()  (in libcoreclr.dylib) + 130  [0x104420592]
! : | +   3 CorUnix::CThreadSynchronizationInfo::RunDeferredThreadConditionSignalings()  (in libcoreclr.dylib) + 148  [0x1044267e4]
! : | +   ! 3 pthread_cond_signal  (in libsystem_pthread.dylib) + 505  [0x7fff6efec78c]
! : | +   !   3 __psynch_cvsignal  (in libsystem_kernel.dylib) + 10  [0x7fff6ee229fe]
! : | +   1 CorUnix::CThreadSynchronizationInfo::RunDeferredThreadConditionSignalings()  (in libcoreclr.dylib) + 158  [0x1044267ee]
! : | +     1 pthread_mutex_unlock  (in libsystem_pthread.dylib) + 0  [0x7fff6efe86cf]
! : | 1 DebuggerRCThread::WatchForStragglers()  (in libcoreclr.dylib) + 11  [0x1044925ab]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 113  [0x10446ecf1]
! : | 1 StressLog::LogOn(unsigned int, unsigned int)  (in libcoreclr.dylib) + 11  [0x10445896b]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 205  [0x10446ed4d]
! : | 1 ThreadSuspend::LockThreadStore(ThreadSuspend::SUSPEND_REASON)  (in libcoreclr.dylib) + 112  [0x1046cce30]
! : |   1 Thread::SetDebugCantStop(bool)  (in libcoreclr.dylib) + 24  [0x1045abe58]
! : |     1 CExecutionEngine::GetTlsData()  (in libcoreclr.dylib) + 13  [0x10450316d]
! : |       1 tlv_get_addr  (in libdyld.dylib) + 0  [0x7fff6ecd4fc4]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 270  [0x10446ed8e]
! : | 1 Thread::SysStartSuspendForDebug(AppDomain*)  (in libcoreclr.dylib) + 228  [0x1046ce994]
! : |   1 ResetEvent  (in libcoreclr.dylib) + 120  [0x10441e408]
! : |     1 CorUnix::CSharedMemoryWaitableObject::GetSynchStateController(CorUnix::CPalThread*, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 71  [0x104410637]
! : |       1 CorUnix::CPalSynchronizationManager::CreateSynchStateController(CorUnix::CPalThread*, CorUnix::CObjectType*, void*, CorUnix::ObjectDomain, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 1  [0x104424171]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 297  [0x10446eda9]
! : | 1 StressLog::LogOn(unsigned int, unsigned int)  (in libcoreclr.dylib) + 0  [0x104458960]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 336  [0x10446edd0]
! :   1 DebuggerRCThread::IsRCThreadReady()  (in libcoreclr.dylib) + 36  [0x104493984]
! :     1 WaitForSingleObject  (in libcoreclr.dylib) + 72  [0x104426e38]
! :       1 CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int)  (in libcoreclr.dylib) + 788  [0x104427154]
! 4 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 417  [0x1044843c1]
! : 4 DebuggerRCThread::SendIPCEvent()  (in libcoreclr.dylib) + 219  [0x10449394b]
! :   4 Debugger::SendRawEvent(DebuggerIPCEvent const*)  (in libcoreclr.dylib) + 25  [0x10446e689]
! :     4 DbgTransportSession::SendDebugEvent(DebuggerIPCEvent*)  (in libcoreclr.dylib) + 278  [0x104495056]
! :       2 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 467  [0x104494bf3]
! :       | 2 TwoWayPipe::Write(void const*, unsigned int)  (in libcoreclr.dylib) + 36  [0x10449e5d4]
! :       |   2 write  (in libsystem_kernel.dylib) + 10  [0x7fff6ee246fa]
! :       1 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 298  [0x104494b4a]
! :       | 1 _platform_memmove$VARIANT$Haswell  (in libsystem_platform.dylib) + 240  [0x7fff6efe4010]
! :       1 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 499  [0x104494c13]
! :         1 TwoWayPipe::Write(void const*, unsigned int)  (in libcoreclr.dylib) + 36  [0x10449e5d4]
! :           1 write  (in libsystem_kernel.dylib) + 10  [0x7fff6ee246fa]
! 1 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 244  [0x104484314]
! : 1 EEDbgInterfaceImpl::EnablePreemptiveGC()  (in libcoreclr.dylib) + 0  [0x10451a860]
! 1 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 586  [0x10448446a]
!   1 EventPipeWriteEventDebugIPCEventEnd  (in libcoreclr.dylib) + 16  [0x104832f00]
!     1 EventPipeEvent::IsEnabled() const  (in libcoreclr.dylib) + 7  [0x104625237]
2 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 280  [0x104505b38]
! 1 OutputDebugStringW  (in libcoreclr.dylib) + 69  [0x1043f3795]
! : 1 malloc  (in libsystem_malloc.dylib) + 24  [0x7fff6ee7a4c7]
! :   1 malloc_zone_malloc  (in libsystem_malloc.dylib) + 103  [0x7fff6ee7b1bd]
! :     1 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 422  [0x7fff6ee7b3bf]
! :       1 tiny_malloc_from_free_list  (in libsystem_malloc.dylib) + 379  [0x7fff6ee7c30e]
! 1 OutputDebugStringW  (in libcoreclr.dylib) + 104  [0x1043f37b8]
!   1 WideCharToMultiByte  (in libcoreclr.dylib) + 141  [0x104403f3d]
!     1 UnicodeToUTF8  (in libcoreclr.dylib) + 191  [0x1044051ef]
!       1 UTF8Encoding::GetBytes(char16_t*, int, unsigned char*, int)  (in libcoreclr.dylib) + 772  [0x1044059e4]
1 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 616  [0x104505c88]

And yes, I can confirm that Debugger.Log() is just the same in terms of slowdown.

smoogipoo commented Aug 29, 2018

Haven't had time to go into a deeper investigation yet (will probably do so on the weekend), but a quick sampling reveals the following (using Debugger.Log()):

2664 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 609  [0x104505c81]
! 2648 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 486  [0x104484406]
! : 2643 Thread::RareDisablePreemptiveGC()  (in libcoreclr.dylib) + 343  [0x1046cd107]
! : | 2641 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 248  [0x1046cdad8]
! : | + 2640 Thread::WaitSuspendEventsHelper()  (in libcoreclr.dylib) + 188  [0x1046cf1bc]
! : | + ! 2639 CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*)  (in libcoreclr.dylib) + 176  [0x1046c6e40]
! : | + ! : 2639 WaitForSingleObjectEx  (in libcoreclr.dylib) + 77  [0x10442776d]
! : | + ! :   2639 CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int)  (in libcoreclr.dylib) + 1891  [0x1044275a3]
! : | + ! :     2639 CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*)  (in libcoreclr.dylib) + 375  [0x104422c27]
! : | + ! :       2639 CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*)  (in libcoreclr.dylib) + 338  [0x104423012]
! : | + ! :         2632 _pthread_cond_wait  (in libsystem_pthread.dylib) + 732  [0x7fff6efeb589]
! : | + ! :         | 2632 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff6ee22a16]
! : | + ! :         7 _pthread_cond_wait  (in libsystem_pthread.dylib) + 750,759  [0x7fff6efeb59b,0x7fff6efeb5a4]
! : | + ! 1 CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*)  (in libcoreclr.dylib) + 153  [0x1046c6e29]
! : | + !   1 NativeExceptionHolderBase::Push()  (in libcoreclr.dylib) + 16  [0x1043f41a0]
! : | + !     1 tlv_get_addr  (in libdyld.dylib) + 0  [0x7fff6ecd4fc4]
! : | + 1 Thread::WaitSuspendEventsHelper()  (in libcoreclr.dylib) + 209  [0x1046cf1d1]
! : | +   1 CLRException::HandlerState::CleanupTry()  (in libcoreclr.dylib) + 37  [0x1045ef195]
! : | 1 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 207  [0x1046cdaaf]
! : | + 1 SetEvent  (in libcoreclr.dylib) + 120  [0x10441e288]
! : | +   1 CorUnix::CSharedMemoryWaitableObject::GetSynchStateController(CorUnix::CPalThread*, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 55  [0x104410627]
! : | 1 Thread::RareEnablePreemptiveGC()  (in libcoreclr.dylib) + 202  [0x1046cdaaa]
! : 5 Thread::RareDisablePreemptiveGC()  (in libcoreclr.dylib) + 563  [0x1046cd1e3]
! :   5 SwitchToThread  (in libcoreclr.dylib) + 9  [0x10442ce99]
! :     4 sched_yield  (in libsystem_pthread.dylib) + 11  [0x7fff6efeb22e]
! :     + 4 swtch_pri  (in libsystem_kernel.dylib) + 10  [0x7fff6ee19306]
! :     1 sched_yield  (in libsystem_pthread.dylib) + 11  [0x7fff6efeb22e]
! 10 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 426  [0x1044843ca]
! : 5 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 282  [0x10446ed9a]
! : | 4 SetEvent  (in libcoreclr.dylib) + 158  [0x10441e2ae]
! : | + 4 CorUnix::CSynchControllerBase::Release()  (in libcoreclr.dylib) + 130  [0x104420592]
! : | +   3 CorUnix::CThreadSynchronizationInfo::RunDeferredThreadConditionSignalings()  (in libcoreclr.dylib) + 148  [0x1044267e4]
! : | +   ! 3 pthread_cond_signal  (in libsystem_pthread.dylib) + 505  [0x7fff6efec78c]
! : | +   !   3 __psynch_cvsignal  (in libsystem_kernel.dylib) + 10  [0x7fff6ee229fe]
! : | +   1 CorUnix::CThreadSynchronizationInfo::RunDeferredThreadConditionSignalings()  (in libcoreclr.dylib) + 158  [0x1044267ee]
! : | +     1 pthread_mutex_unlock  (in libsystem_pthread.dylib) + 0  [0x7fff6efe86cf]
! : | 1 DebuggerRCThread::WatchForStragglers()  (in libcoreclr.dylib) + 11  [0x1044925ab]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 113  [0x10446ecf1]
! : | 1 StressLog::LogOn(unsigned int, unsigned int)  (in libcoreclr.dylib) + 11  [0x10445896b]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 205  [0x10446ed4d]
! : | 1 ThreadSuspend::LockThreadStore(ThreadSuspend::SUSPEND_REASON)  (in libcoreclr.dylib) + 112  [0x1046cce30]
! : |   1 Thread::SetDebugCantStop(bool)  (in libcoreclr.dylib) + 24  [0x1045abe58]
! : |     1 CExecutionEngine::GetTlsData()  (in libcoreclr.dylib) + 13  [0x10450316d]
! : |       1 tlv_get_addr  (in libdyld.dylib) + 0  [0x7fff6ecd4fc4]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 270  [0x10446ed8e]
! : | 1 Thread::SysStartSuspendForDebug(AppDomain*)  (in libcoreclr.dylib) + 228  [0x1046ce994]
! : |   1 ResetEvent  (in libcoreclr.dylib) + 120  [0x10441e408]
! : |     1 CorUnix::CSharedMemoryWaitableObject::GetSynchStateController(CorUnix::CPalThread*, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 71  [0x104410637]
! : |       1 CorUnix::CPalSynchronizationManager::CreateSynchStateController(CorUnix::CPalThread*, CorUnix::CObjectType*, void*, CorUnix::ObjectDomain, CorUnix::ISynchStateController**)  (in libcoreclr.dylib) + 1  [0x104424171]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 297  [0x10446eda9]
! : | 1 StressLog::LogOn(unsigned int, unsigned int)  (in libcoreclr.dylib) + 0  [0x104458960]
! : 1 Debugger::TrapAllRuntimeThreads()  (in libcoreclr.dylib) + 336  [0x10446edd0]
! :   1 DebuggerRCThread::IsRCThreadReady()  (in libcoreclr.dylib) + 36  [0x104493984]
! :     1 WaitForSingleObject  (in libcoreclr.dylib) + 72  [0x104426e38]
! :       1 CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int)  (in libcoreclr.dylib) + 788  [0x104427154]
! 4 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 417  [0x1044843c1]
! : 4 DebuggerRCThread::SendIPCEvent()  (in libcoreclr.dylib) + 219  [0x10449394b]
! :   4 Debugger::SendRawEvent(DebuggerIPCEvent const*)  (in libcoreclr.dylib) + 25  [0x10446e689]
! :     4 DbgTransportSession::SendDebugEvent(DebuggerIPCEvent*)  (in libcoreclr.dylib) + 278  [0x104495056]
! :       2 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 467  [0x104494bf3]
! :       | 2 TwoWayPipe::Write(void const*, unsigned int)  (in libcoreclr.dylib) + 36  [0x10449e5d4]
! :       |   2 write  (in libsystem_kernel.dylib) + 10  [0x7fff6ee246fa]
! :       1 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 298  [0x104494b4a]
! :       | 1 _platform_memmove$VARIANT$Haswell  (in libsystem_platform.dylib) + 240  [0x7fff6efe4010]
! :       1 DbgTransportSession::SendMessage(DbgTransportSession::Message*, bool)  (in libcoreclr.dylib) + 499  [0x104494c13]
! :         1 TwoWayPipe::Write(void const*, unsigned int)  (in libcoreclr.dylib) + 36  [0x10449e5d4]
! :           1 write  (in libsystem_kernel.dylib) + 10  [0x7fff6ee246fa]
! 1 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 244  [0x104484314]
! : 1 EEDbgInterfaceImpl::EnablePreemptiveGC()  (in libcoreclr.dylib) + 0  [0x10451a860]
! 1 Debugger::SendLogMessage(int, SString*, SString*)  (in libcoreclr.dylib) + 586  [0x10448446a]
!   1 EventPipeWriteEventDebugIPCEventEnd  (in libcoreclr.dylib) + 16  [0x104832f00]
!     1 EventPipeEvent::IsEnabled() const  (in libcoreclr.dylib) + 7  [0x104625237]
2 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 280  [0x104505b38]
! 1 OutputDebugStringW  (in libcoreclr.dylib) + 69  [0x1043f3795]
! : 1 malloc  (in libsystem_malloc.dylib) + 24  [0x7fff6ee7a4c7]
! :   1 malloc_zone_malloc  (in libsystem_malloc.dylib) + 103  [0x7fff6ee7b1bd]
! :     1 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 422  [0x7fff6ee7b3bf]
! :       1 tiny_malloc_from_free_list  (in libsystem_malloc.dylib) + 379  [0x7fff6ee7c30e]
! 1 OutputDebugStringW  (in libcoreclr.dylib) + 104  [0x1043f37b8]
!   1 WideCharToMultiByte  (in libcoreclr.dylib) + 141  [0x104403f3d]
!     1 UnicodeToUTF8  (in libcoreclr.dylib) + 191  [0x1044051ef]
!       1 UTF8Encoding::GetBytes(char16_t*, int, unsigned char*, int)  (in libcoreclr.dylib) + 772  [0x1044059e4]
1 DebugDebugger::Log(int, StringObject*, StringObject*)  (in libcoreclr.dylib) + 616  [0x104505c88]

And yes, I can confirm that Debugger.Log() is just the same in terms of slowdown.

@pieter-venter

This comment has been minimized.

Show comment
Hide comment
@pieter-venter

pieter-venter Sep 6, 2018

I would like to add that we've seen the same behaviour on Linux. Debugging our application is extremely slow on Linux and fast on Windows.

Using your small repo, I got the following results:

On Windows: 465 ms (0.0465 ms per call)
On Linux: 14730 ms (1.473 ms per call)

Tried on Ubuntu 1804 and 1604 with .Net SDK 2.1.401, Runtime Version: 2.1.3 Commit: 124038c13e

The application makes extensive use of async/await and tasks. This is often on the call stack if you randomly break:

image

pieter-venter commented Sep 6, 2018

I would like to add that we've seen the same behaviour on Linux. Debugging our application is extremely slow on Linux and fast on Windows.

Using your small repo, I got the following results:

On Windows: 465 ms (0.0465 ms per call)
On Linux: 14730 ms (1.473 ms per call)

Tried on Ubuntu 1804 and 1604 with .Net SDK 2.1.401, Runtime Version: 2.1.3 Commit: 124038c13e

The application makes extensive use of async/await and tasks. This is often on the call stack if you randomly break:

image

mikem8361 added a commit to mikem8361/coreclr that referenced this issue Sep 12, 2018

Fix xplat debugging perf problem.
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

mikem8361 added a commit to mikem8361/coreclr that referenced this issue Sep 13, 2018

Fix xplat debugging perf problem.
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.

mikem8361 added a commit that referenced this issue Sep 13, 2018

Fix xplat debugging perf problem. (#19911)
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 Sep 13, 2018

Member

This has been fixed in master.

Member

mikem8361 commented Sep 13, 2018

This has been fixed in master.

@mikem8361 mikem8361 closed this Sep 13, 2018

@mikem8361 mikem8361 reopened this Sep 13, 2018

@mikem8361 mikem8361 added this to the 2.2 milestone Sep 13, 2018

@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 Sep 13, 2018

Member

Need to check if it meets the bar for 2.2 or even 2.1.x.

Member

mikem8361 commented Sep 13, 2018

Need to check if it meets the bar for 2.2 or even 2.1.x.

@peppy

This comment has been minimized.

Show comment
Hide comment
@peppy

peppy Sep 14, 2018

Thanks a lot for the fix! This is going to immensely improve our QoL.

peppy commented Sep 14, 2018

Thanks a lot for the fix! This is going to immensely improve our QoL.

@pieter-venter

This comment has been minimized.

Show comment
Hide comment
@pieter-venter

pieter-venter Sep 17, 2018

Thank you @mikem8361
Is there a way I can test it for you? I have a docker image that can build the CLR locally for Linux. I can patch your changes into a local build easily. Which assemblies/so files would I need to swap out - or is there more to it than that?

pieter-venter commented Sep 17, 2018

Thank you @mikem8361
Is there a way I can test it for you? I have a docker image that can build the CLR locally for Linux. I can patch your changes into a local build easily. Which assemblies/so files would I need to swap out - or is there more to it than that?

@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 Sep 17, 2018

Member
Member

mikem8361 commented Sep 17, 2018

@peppy

This comment has been minimized.

Show comment
Hide comment
@peppy

peppy Sep 17, 2018

I have tested this and confirm it has resolved the issues at my end (macOS). Debug performance is on par with my windows VM now 👍

peppy commented Sep 17, 2018

I have tested this and confirm it has resolved the issues at my end (macOS). Debug performance is on par with my windows VM now 👍

@pieter-venter

This comment has been minimized.

Show comment
Hide comment
@pieter-venter

pieter-venter Sep 17, 2018

I have tested this on the same Ubuntu 1804 machine. The repo above went from 14730 ms to 514 ms, on par with Windows.

pieter-venter commented Sep 17, 2018

I have tested this on the same Ubuntu 1804 machine. The repo above went from 14730 ms to 514 ms, on par with Windows.

mikem8361 added a commit to mikem8361/coreclr that referenced this issue Sep 19, 2018

Fix xplat debugging perf problem. (#19911)
Issue #18705

Add threadId to DebuggerIPCEvent so we don't need to use the
slow DAC functions (because of extra memory reads) to get it.

Fixed CorDBIPC_BUFFER_SIZE on arm builds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment