Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Story: Enable diagnosing common Async problems #90

Open
1 of 4 tasks
tommcdon opened this issue Nov 30, 2018 · 13 comments
Open
1 of 4 tasks

User Story: Enable diagnosing common Async problems #90

tommcdon opened this issue Nov 30, 2018 · 13 comments
Labels
enhancement New feature or request
Milestone

Comments

@tommcdon
Copy link
Member

tommcdon commented Nov 30, 2018

  • As a developer for a .Net Core app, I can use SOS to get a list of all async thread stacks, so that I can understand what async operations are in progress and diagnose my app

  • As a developer for a .Net Core app, I can use Visual Studio to view a list of all async thread stacks in the parallel stacks window, so that I can understand what async operations are in progress and diagnose my app

  • There needs to be documentation that is easily discovered on docs.microsoft.com that guides you in the diagnosis procedure. Place the URL here for that guidance when it exists. It is expected that you will want guidance for the case you are debugging with Visual Studio, and other guidance when you are not. Note that we expect to have the dotnet analyze tool that will host SOS, and that can be a useful entryway into the tool needed.

  • We need to identify a set (lets say 5), of what we would suggest are representative dumps that show the typical problems (in particular deadlock, thread-starvation (others?)) at what we would consider large but still reasonable scale (e.g. > 2GB dump size, and with alot of async/threadpool stuff), that we can validate (by hand) that following the guidance identifies the problem.

@tommcdon tommcdon added this to the v3.0 milestone Nov 30, 2018
@tommcdon tommcdon added this to Backlog in .NET Core Diagnostics Nov 30, 2018
@shirhatti
Copy link
Contributor

@noahfalk will drive this

@noahfalk
Copy link
Member

I think the next thing that needs to be done is to define what success looks like. I'm out for December so either I'll make a proposal in January, or someone from our group of cohorts can drive it in my absense : )

@vancem @davidfowl @stephentoub

@stephentoub
Copy link
Member

I think the next thing that needs to be done is to define what success looks like. I'm out for December so either I'll make a proposal in January

That's great, thanks.

As a developer for a .Net Core app, I can use SOS to get a list of all async thread stacks, so that I can understand what async operations are in progress and diagnose my app

With the DumpAsync command I added for 3.0, I hope we're really close here. But few other than me have actually played with the command to my knowledge, so it'll be good to get someone else's take on whether there are remaining gaps in the functionality, in how the information is conveyed, etc. The main remaining thing I'd hoped to add but haven't yet is a parallel-stacks-like consolidation, where the command would optionally collapse all of the "stacks" in common, either entirely or in a partial fashion as parallel stacks does.

@vancem
Copy link
Contributor

vancem commented Dec 4, 2018

@vancem
Copy link
Contributor

vancem commented Dec 4, 2018

Here are the docs for the DumpAsync command currently

!DumpAsync traverses the garbage collected heap, looking for objects representing
async state machines as created when an async method's state is transferred to the
heap.  This command recognizes async state machines defined as "async void", "async Task",
"async Task<T>", "async ValueTask", and "async ValueTask<T>".  It also optionally supports
any other tasks.

"Usage: DumpAsync [-addr ObjectAddr] [-mt MethodTableAddr] [-type TypeName] [-tasks] [-completed] [-fields] [-stacks] [-roots]\n"
  "[-addr ObjectAddr]    => Only display the async object at the specified address.\n"
  "[-mt MethodTableAddr] => Only display top-level async objects with the specified method table address.\n"
  "[-type TypeName]      => Only display top-level async objects whose type name includes the specified substring.\n"
  "[-tasks]              => Include Task and Task-derived objects, in addition to any state machine objects found.\n"
  "[-completed]          => Include async objects that represent completed operations but that are still on the heap.\n"
  "[-fields]             => Show the fields of state machines.\n"
  "[-stacks]             => Gather, output, and consolidate based on continuation chains / async stacks for discovered async objects.\n"
  "[-roots]              => Perform a gcroot on each rendered async object.\n"

@vancem
Copy link
Contributor

vancem commented Dec 5, 2018

Possibly relevant to this scenario is the SOS work item command

threadpool -wi

That Stephen added in dotnet/coreclr#20872.

Which may be useful in determining if the threadpool is starved (not enough threads to run items). It is not clear this is the way we would guide people to diagnoses this, however...

@vancem
Copy link
Contributor

vancem commented Dec 5, 2018

Just so we don't lose it. Stephen wrote up a slide deck and meeting nodes on async.
Async Working Group Notes. We should review when things are mostly done to see if there are any other ideas that we should be following up on. In particular we identified the issue with diagnosing TaskCompletionSource issue, but did not think about it much.

@patricksuo
Copy link

patricksuo commented Dec 21, 2018

update

I can create minidump with the latest createdump utility .
And I can use DumpAsync command with the latest sos plugin in this repository.
I will use these tools to diagnose some tricky bugs next week.
Thank you.


Can I use the latest SOS (dotnet/coreclr@b1e2c66) on 2.1 ?

I create a dump with latest createdump.

And I want to debug the dump with the following script(https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md#debugging-core-dumps-with-lldb):

#!/usr/bin/env bash

if [ "$#" -ne 1 ]; then
    echo 'USEAGE: debugcore $CORE_FILE_PATH'
    exit 0
fi

COREFILE=$1
RUNTIME_PATH='/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/'
#RUNTIME_PATH='/home/supei/workspace/coreclr/bin/Product/Linux.x64.Debug/'
#PATH_TO_LIBSOSPLUGIN='/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/libsosplugin.so'
PATH_TO_LIBSOSPLUGIN='/home/supei/workspace/coreclr/bin/Product/Linux.x64.Debug/libsosplugin.so'
HOST_PATH='/usr/share/dotnet/dotnet '

echo $COREFILE


lldb-3.9 -O "settings set target.exec-search-paths $RUNTIME_PATH" -o "plugin load $PATH_TO_LIBSOSPLUGIN" --core $COREFILE $HOST_PATH

I got undefined symbol: DumpAsync

/tmp/coredump.27570
(lldb) settings set target.exec-search-paths /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/
(lldb) target create "/usr/share/dotnet/dotnet" --core "/tmp/coredump.27570"
Core file '/tmp/coredump.27570' (x86_64) was loaded.
(lldb) plugin load /home/supei/workspace/coreclr/bin/Product/Linux.x64.Debug/libsosplugin.so
(lldb) dumpasync
SOS command 'DumpAsync' not found /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.5/libsos.so: undefined symbol: DumpAsync

@mikem8361
Copy link
Member

mikem8361 commented Dec 21, 2018 via email

@noahfalk
Copy link
Member

@vancem - before my big vacation I tagged myself to write up some goalposts, but it looks like you took care of that for me (thanks!). Are you happy with the goalposts as they exist now or was there more you planned to add?

We need to identify a set (lets say 5) of what we would suggest are representative dumps

I've got some, except most of them are on desktop. I'm not sure we've got a wealth of customer dumps that both have the size you were hoping for and they are running .net core 2.1+ (which is where our tools are most likely to work). @davidfowl might have some better examples from his experiments analyzing test apps at scale, or @stephentoub might have examples he has received?

From my learnings on the topic thus far, issues that are likely to show up in the dotnet core space:

  1. The app isn't making progress on something and the dev believes it should be (it might be waiting, deadlocked, live locked, running but slow) - dev needs to understand what async threads exist, where they are in the progress of doing their unit of work, and what if anything is preventing forward progress
  2. starvation (probably from using a sync API like Task.Wait()/.Result in an async method)

deadlocks trying to enter a synchronization context are a major issue for desktop, but I'm hoping that since asp.net core eliminated their synchronization context this is substantially less of an issue on core right now. If other libraries make it common again (say UI libraries with an STA thread) then it could easily shoot back to the top.

@davidfowl
Copy link
Member

I We need to figure out the TaskCompletionSource issue that @vancem mentions here (#90 (comment))

@stephentoub
Copy link
Member

stephentoub commented Jan 17, 2019

I keep bumping up against dotnet/roslyn#22428 as well; it'd be great to have a fix for that as part of this effort. Example: dotnet/roslyn#22428 (comment)

@noahfalk noahfalk moved this from Backlog to Post 3.0 User-Story Backlog in .NET Core Diagnostics Apr 26, 2019
@tommcdon tommcdon moved this from Post 3.0 User-Story Backlog to Done in .NET Core Diagnostics May 15, 2019
@tommcdon tommcdon moved this from Done to Post 3.0 User-Story Backlog in .NET Core Diagnostics May 15, 2019
@noahfalk noahfalk moved this from Post 3.0 User-Story Backlog to Backlog in .NET Core Diagnostics Nov 6, 2019
@tommcdon tommcdon modified the milestones: 3.0, 5.0 Nov 20, 2019
@noahfalk noahfalk added the enhancement New feature or request label Nov 6, 2020
@noahfalk noahfalk removed their assignment Nov 6, 2020
@tommcdon tommcdon modified the milestones: 5.0, 6.0 Dec 18, 2020
@davidfowl
Copy link
Member

I was going to file a new issue but I see we still have this one so I'll pile onto it. I think the parallel tasks view in VS has set the bar for features (and I sent the team some updates they can do to make it nicer) and dumpasync should follow suit. I think the first order of business would be porting dump async to a CLRMD command so we can use managed code.

Then there are a bunch of improvements that can be made to it. I'll file those as separate issues.

@tommcdon tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 21, 2021
@tommcdon tommcdon removed this from the 7.0.0 milestone Sep 12, 2022
@tommcdon tommcdon added this to the 8.0.0 milestone Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants