New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge slowdowns on threaded operations when debugger attached (macOS) #10605
Comments
@dotnet/dotnet-diag |
Here's a sample taken in our application while artificially reproducing the slowdown by triggering a Task.Factory.StartNew(() => { }, TaskCreationOptions.LongRunning); 100 times a second: Here's the same exact execution run under net471 (mono) for comparison: |
Is there any further details I can provide, or avenues for getting this investigated? It's a pretty critical issue for us and is likely affecting other projects on macOS targets. Currently the workaround is to never attach a debugger or use a VM to debug under windows. |
Sorry, we haven't had time to investigate this yet. I'll keep you informed of any status. /cc: @tommcdon |
We've removed all |
Hiya! We've been looking into resolving this issue since it's very critical for us. Just recently we've noticed that Here's a simple test:
On Windows this takes ~0.1ms per call (in a VM), and on macOS it takes ~2.6ms per call. Since this function is called very readily through iterations of the async state machine, I believe this may provide some insight into why debugging performance is pretty poor. |
@stephentoub do you have any ideas on this problem? |
|
As far as I can tell there is nothing different about this notification from all notifications. It does stop/suspend all the threads and sends a message via the named pipes to the debugger side. |
On the debugger side it does look up the class token of the notification object to check if the notification is enabled. Maybe @noahfalk when he is back in the office can help. |
@mikem8361 - I think the next step would be to do some native profiling of the debugger and determine where it spends its time. You could also do a comparison with something like:
If I had to make a guess, I wouldn't be surprised if all debugger events are slow on macOS, not only the custom notifications. Custom Notification just happens to be one of the debugger scenarios that can generate a high rate of debugger notifications which would expose the poor performance. In terms of fixing it one approach is probably to address whatever the performance bottleneck is once it is identified via profiling. A second option is that we could implement a debuggee-side cache for the custom notification filter. Most of the time VS doesn't enable the custom notifications but rather than detecting this immediately in the debuggee, instead the runtime suspends, sends the notification to the debugger, then DBI determines nothing is listening to the event and resumes without having done any work. |
Haven't had time to go into a deeper investigation yet (will probably do so on the weekend), but a quick sampling reveals the following (using
And yes, I can confirm that |
I would like to add that we've seen the same behaviour on Linux. Debugging our application is extremely slow on Linux and fast on Windows. Using your small repo, I got the following results: On Windows: 465 ms (0.0465 ms per call) Tried on Ubuntu 1804 and 1604 with .Net SDK 2.1.401, Runtime Version: 2.1.3 Commit: 124038c13e The application makes extensive use of async/await and tasks. This is often on the call stack if you randomly break: |
Issue #18705 Add threadId to DebuggerIPCEvent so we don't need to use the slow DAC functions (because of extra memory reads) to get it.
Issue #18705 Add threadId to DebuggerIPCEvent so we don't need to use the slow DAC functions (because of extra memory reads) to get it. Fixed CorDBIPC_BUFFER_SIZE on arm builds.
Issue #18705 Add threadId to DebuggerIPCEvent so we don't need to use the slow DAC functions (because of extra memory reads) to get it. Fixed CorDBIPC_BUFFER_SIZE on arm builds.
This has been fixed in master. |
Need to check if it meets the bar for 2.2 or even 2.1.x. |
Thanks a lot for the fix! This is going to immensely improve our QoL. |
Thank you @mikem8361 |
I have tested this and confirm it has resolved the issues at my end (macOS). Debug performance is on par with my windows VM now |
I have tested this on the same Ubuntu 1804 machine. The repo above went from 14730 ms to 514 ms, on par with Windows. |
Issue #18705 Add threadId to DebuggerIPCEvent so we don't need to use the slow DAC functions (because of extra memory reads) to get it. Fixed CorDBIPC_BUFFER_SIZE on arm builds.
Issue #18705 Add threadId to DebuggerIPCEvent so we don't need to use the slow DAC functions (because of extra memory reads) to get it. Fixed CorDBIPC_BUFFER_SIZE on arm builds.
Issue #18705 Add threadId to DebuggerIPCEvent so we don't need to use the slow DAC functions (because of extra memory reads) to get it. Fixed CorDBIPC_BUFFER_SIZE on arm builds.
@peppy @pieter-venter @smoogipoo we are considering porting this into 2.2, which will release later this year. Is it acceptable for you to upgrade to that to receive the fix? |
Issue #18705 Add threadId to DebuggerIPCEvent so we don't need to use the slow DAC functions (because of extra memory reads) to get it. Fixed CorDBIPC_BUFFER_SIZE on arm builds.
At our end we are tracking the latest releases so are eager to see this love as soon as possible! In fact we are even willing to use nightly builds for this fix, but unfortunately they do not play well with Rider currently. |
2.2 would be great. We also move to the latest release as soon as possible. With your guidance, I'd also be open to building a 2.1 release from source and applying the patch locally. My understanding is this fix requires rebuilding the SDK, not just coreclr. |
@pieter-venter the 2.1 change would be dotnet/coreclr#20239 - this isn't currently approved for 2.1 but you could build it. |
Thank you for pointing me to the correct pull request. @danmosemsft . This is in the coreclr repo, does that mean I just need to rebuild System.Private.CoreLib.dll or does this change affect other binaries in the SDK? |
You would presumably first build CoreCLR from the root, for your target platform. Once that is done @mikem8361 is the best person to say which binaries you need to patch. |
You need all the coreclr binaries to make sure everything matches, but it should only affect libcoreclr.dylib libcoredaccore.dylib and libcoredbi.dylib. |
@mikem8361 Thanks for the info. I'm running Ubuntu. I've checked out tags/v.2.1.5 in coreclr repo (that is the version I have installed) and applied the changes in your pull request and rebuilt it. I replaced the equivalent .so files in |
You should copy all the binaries built in the The fixed has made it in the "release/2.2" coreclr branch so if you checkout that out, do a "clean" build (git clean -xdf) and do the above copy, it should work. |
Just checking back on this – should we be seeing the fix in the latest 2.2 nightlies available here under |
The fix is in the release/2.2 branch so it should be in a recent nightly.
|
@peppy if you install/restore this version you can use ildasm on system.privare.corelib.dll and in the metadata is the coreclr hash that was built from . |
When debugging our application (with attached debugger, no breakpoints), performance can drop to a point it becomes frustrating to do anything. On occasion we are also seeing OS level hard locking for seconds to minutes, which may be related.
Reproducible in both VSCode and Jetbrains Rider. This is exclusive to netcore (2.0 and 2.1) – does not occur under mono or net471 runtime environments. It also seems limited to macOS as I have not been able to reproduce on windows.
This can easily be reproduced on our game framework project: https://github.com/ppy/osu-framework (building should require not extra steps beyond checking it out).
VisualTests
configurationDelayedLoad
in the left menuTesting with debugger attached should drop to less than 1fps while it is easy to maintain hundreds without a debugger attached.
It seems to be directly related to creation of threads, specifically with the
TaskCreationOptions.LongRunning
flag. On removing this flag from hot paths (#1 #2), performance will return to normal.I've been trying to reproduce this with a more isolated test case but have not succeeded yet. Some pointers on moving forward in diagnosing this issue would be appreciated!
The text was updated successfully, but these errors were encountered: