New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sos DumpAsync command #18160

Merged
merged 1 commit into from May 30, 2018

Conversation

Projects
None yet
7 participants
@stephentoub
Member

stephentoub commented May 29, 2018

Debugging async methods with sos can be time consuming and relies on knowing how to use dumpheap, dumpvc, gcroot, and other commands to get the desired information, while often digging through a myriad of objects on the heap to find the desired info.

This commit adds a new DumpAsync command, which finds the async state machine objects on the GC heap and outputs relevant information about each, including the fields of its state machine, any registered continuation, and GC roots for the state machine object (as they often serve as a valid substitute for call stacks).

Example program used as a test:

using System.Threading.Tasks;

class Program
{
    static async Task Main() => await MethodA();
    static async Task MethodA() => await MethodB();
    static async Task MethodB() => await MethodC();
    static async Task MethodC() => await MethodD();
    static async Task MethodD() => await Task.Delay(int.MaxValue);
}

and example command output:

0:011> !DumpAsync -type MethodD
         Address               MT     Size Name
#0
000001989f413de0 00007ff88c506ba8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]
StateMachine: Program+<MethodD>d__4 (struct)
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8d3df4b80  400000d        0         System.Int32  1 instance                0 <>1__state
00007ff8d3e082c0  400000e        8 ...TaskMethodBuilder  1 instance 000001989f413e38 <>t__builder
00007ff8d3dfea90  400000f       10 ...vices.TaskAwaiter  1 instance 000001989f413e40 <>u__1
Continuation: 000001989f413e50 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodC>d__3, test]])
GC roots:
    Thread 2936c:
        000000071a37e050 00007ff8d3ac1657 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [d:\repos\coreclr\src\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs @ 2977]
            rbp+10: 000000071a37e0c0
                ->  000001989f413fa0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
                ->  000001989f413f30 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
                ->  000001989f413ec0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
                ->  000001989f413e50 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodC>d__3, test]]
                ->  000001989f413de0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]

    HandleTable:
        000001989d8415f8 (pinned handle)
        -> 00000198af3e1038 System.Object[]
        -> 000001989f413410 System.Threading.TimerQueue[]
        -> 000001989f413468 System.Threading.TimerQueue
        -> 000001989f413330 System.Threading.TimerQueueTimer
        -> 000001989f412e40 System.Threading.Tasks.Task+DelayPromise
        -> 000001989f413de0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]

Found 1 state machines.

cc: @jkotas, @janvorli, @mikem8361, @noahfalk, @danmosemsft

@danmosemsft

This comment has been minimized.

Show comment
Hide comment
@danmosemsft
Member

danmosemsft commented May 29, 2018

@jaredpar fyi

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub May 29, 2018

Member

@jaredpar FYI

Note that this doesn't take any dependencies on what the C# compiler does; I purposefully stayed away from that. So, for example, it's not parsing decorated names or expecting certain compiler-generated fields. Rather, it's only looking at data structures defined in System.Private.CoreLib, which seems like very fair game. This was all made easier by the changes in #13105, which changed how the boxing was done, putting the state machine struct into a strongly-typed field.

(Actually, as I wrote the above, I realize this isn't entirely true: it currently depends on the state machine type being a struct, which already isn't true in C# debug builds. I'll fix that.)

Member

stephentoub commented May 29, 2018

@jaredpar FYI

Note that this doesn't take any dependencies on what the C# compiler does; I purposefully stayed away from that. So, for example, it's not parsing decorated names or expecting certain compiler-generated fields. Rather, it's only looking at data structures defined in System.Private.CoreLib, which seems like very fair game. This was all made easier by the changes in #13105, which changed how the boxing was done, putting the state machine struct into a strongly-typed field.

(Actually, as I wrote the above, I realize this isn't entirely true: it currently depends on the state machine type being a struct, which already isn't true in C# debug builds. I'll fix that.)

@jaredpar

This comment has been minimized.

Show comment
Hide comment
@jaredpar

jaredpar May 29, 2018

Member

@stephentoub

Note that this doesn't take any dependencies on what the C# compiler does; I purposefully stayed away from that

I appreciate that but also part of me really really wants to say: please add an option where I can restrict to only tasks that are currently "waiting" which would require that dependency. That's really useful for items like figuring out why a unit test is hanging. Not sure if that's a good idea or not though.

Member

jaredpar commented May 29, 2018

@stephentoub

Note that this doesn't take any dependencies on what the C# compiler does; I purposefully stayed away from that

I appreciate that but also part of me really really wants to say: please add an option where I can restrict to only tasks that are currently "waiting" which would require that dependency. That's really useful for items like figuring out why a unit test is hanging. Not sure if that's a good idea or not though.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub May 29, 2018

Member

part of me really really wants to say: please add an option where I can restrict to only tasks that are currently "waiting" which would require that dependency

Yeah... I had that as an internal debate all weekend while thinking about adding this command. It would require interpreting <>1__state (-2 == done, -1 == running, 0+ at the corresponding numbered await)...

Member

stephentoub commented May 29, 2018

part of me really really wants to say: please add an option where I can restrict to only tasks that are currently "waiting" which would require that dependency

Yeah... I had that as an internal debate all weekend while thinking about adding this command. It would require interpreting <>1__state (-2 == done, -1 == running, 0+ at the corresponding numbered await)...

@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 May 29, 2018

Member

I'm currently moving SOS (along with the lldb plugin and the managed part of SOS) to a new "diagnostics" repo and attempting to make SOS runtime version independent (hopefully even compatible going back to 1.1). Some of this will be documenting and asserting the runtime data structures SOS that haven't changed much over the years.

It sounds like your new command (which is great by the way) is dependent on the latest 2.2 (master branch) runtime. We may consider adding the version dependent part the command as a new interface to the DAC making it easy to determine. At the very least there should be a runtime version check in the command (currently easy on Windows via GetEEVersion(), but I need to do something for xplat).

I'll port any changes from coreclr to the new repo and (with your help) do the work to create the new DAC interface.

Member

mikem8361 commented May 29, 2018

I'm currently moving SOS (along with the lldb plugin and the managed part of SOS) to a new "diagnostics" repo and attempting to make SOS runtime version independent (hopefully even compatible going back to 1.1). Some of this will be documenting and asserting the runtime data structures SOS that haven't changed much over the years.

It sounds like your new command (which is great by the way) is dependent on the latest 2.2 (master branch) runtime. We may consider adding the version dependent part the command as a new interface to the DAC making it easy to determine. At the very least there should be a runtime version check in the command (currently easy on Windows via GetEEVersion(), but I need to do something for xplat).

I'll port any changes from coreclr to the new repo and (with your help) do the work to create the new DAC interface.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub May 29, 2018

Member

It sounds like your new command (which is great by the way)

Thanks!

is dependent on the latest 2.2 (master branch)

It should work fine with 2.1, as it doesn't depend on anything that's changed since 2.1. But it won't work earlier than 2.1 (it won't find any state machine objects).

I'll port any changes from coreclr to the new repo

Thanks.

Member

stephentoub commented May 29, 2018

It sounds like your new command (which is great by the way)

Thanks!

is dependent on the latest 2.2 (master branch)

It should work fine with 2.1, as it doesn't depend on anything that's changed since 2.1. But it won't work earlier than 2.1 (it won't find any state machine objects).

I'll port any changes from coreclr to the new repo

Thanks.

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas May 29, 2018

Member

At the very least there should be a runtime version check in the command

I think it is better to just check for the presence of structures that the command depends on and fail without crashing if they are not present. The implementation does that already.

Member

jkotas commented May 29, 2018

At the very least there should be a runtime version check in the command

I think it is better to just check for the presence of structures that the command depends on and fail without crashing if they are not present. The implementation does that already.

@mikem8361

This comment has been minimized.

Show comment
Hide comment
@mikem8361

mikem8361 May 29, 2018

Member
Member

mikem8361 commented May 29, 2018

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk May 29, 2018

Member

@kouvel

Thanks for putting this together Stephen! Above you mentioned:

Yeah... I had that as an internal debate all weekend while thinking about adding this command. It would require interpreting <>1__state (-2 == done, -1 == running, 0+ at the corresponding numbered await)...

In your opinion what is the downside of adding this interpretation? (concern about taking a dependency on the compiler naming scheme "<>1__"?) I'm hoping to push the diagnostics in this area further and it sounded like there might be obstacles here that needed to be addressed.

Thanks!

Member

noahfalk commented May 29, 2018

@kouvel

Thanks for putting this together Stephen! Above you mentioned:

Yeah... I had that as an internal debate all weekend while thinking about adding this command. It would require interpreting <>1__state (-2 == done, -1 == running, 0+ at the corresponding numbered await)...

In your opinion what is the downside of adding this interpretation? (concern about taking a dependency on the compiler naming scheme "<>1__"?) I'm hoping to push the diagnostics in this area further and it sounded like there might be obstacles here that needed to be addressed.

Thanks!

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub May 29, 2018

Member

what is the downside of adding this interpretation? (concern about taking a dependency on the compiler naming scheme "<>1__"?)

Yup, the downside is just a dependency on the implementation-level detail that the compiler uses for this: how and where the state for the state machine is stored, and how to interpret it. To my knowledge this has been fairly stable (@jaredpar?), it just makes me a bit nervous, as the compiler could easily change this and then the dependency would be broken. If you and Jared don't think it's a problem, though, I'll go ahead and add the filtering, as I agree it's valuable.

Member

stephentoub commented May 29, 2018

what is the downside of adding this interpretation? (concern about taking a dependency on the compiler naming scheme "<>1__"?)

Yup, the downside is just a dependency on the implementation-level detail that the compiler uses for this: how and where the state for the state machine is stored, and how to interpret it. To my knowledge this has been fairly stable (@jaredpar?), it just makes me a bit nervous, as the compiler could easily change this and then the dependency would be broken. If you and Jared don't think it's a problem, though, I'll go ahead and add the filtering, as I agree it's valuable.

@jaredpar

This comment has been minimized.

Show comment
Hide comment
@jaredpar

jaredpar May 29, 2018

Member

@stephentoub

To my knowledge this has been fairly stable (@jaredpar?),

It's been this way since the feature was authored.

If you and Jared don't think it's a problem, though, I'll go ahead and add the filtering, as I agree it's valuable.

Our advice for production tools is generally: never, ever, depend on the shape of our emitted code. We do change it regularly and if we break you we won't really feel bad about it (well I do feel bad, but still going to take the change).

The one exception is debugging tools. In order to function at times they need to understand the shape of our generated code. It's the only way they can sensible unwind closures into the original locals for example.

At the same time though, we don't have much qualm about breaking our debuggers either. But we own them and tend to change them in concert with making the emit change.

Member

jaredpar commented May 29, 2018

@stephentoub

To my knowledge this has been fairly stable (@jaredpar?),

It's been this way since the feature was authored.

If you and Jared don't think it's a problem, though, I'll go ahead and add the filtering, as I agree it's valuable.

Our advice for production tools is generally: never, ever, depend on the shape of our emitted code. We do change it regularly and if we break you we won't really feel bad about it (well I do feel bad, but still going to take the change).

The one exception is debugging tools. In order to function at times they need to understand the shape of our generated code. It's the only way they can sensible unwind closures into the original locals for example.

At the same time though, we don't have much qualm about breaking our debuggers either. But we own them and tend to change them in concert with making the emit change.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub May 29, 2018

Member

Hmmm. Ok, thanks. I'll leave it out for now, but we can potentially add it later.

Member

stephentoub commented May 29, 2018

Hmmm. Ok, thanks. I'll leave it out for now, but we can potentially add it later.

Add sos DumpAsync command
Debugging async methods with sos can be time consuming and relies on knowing how to use dumpheap, dumpvc, gcroot, and other commands to get the desired information, while often digging through a myriad of objects on the heap to find the desired info.

This commit adds a new DumpAsync command, which finds the async state machine objects on the GC heap and outputs relevant information about each, including the fields of its state machine, any registered continuation, and GC roots for the state machine object (as they often serve as a valid substitute for call stacks).

Example program used as a test:
```C#
using System.Threading.Tasks;

class Program
{
    static async Task Main() => await MethodA();
    static async Task MethodA() => await MethodB();
    static async Task MethodB() => await MethodC();
    static async Task MethodC() => await MethodD();
    static async Task MethodD() => await Task.Delay(int.MaxValue);
}
```
and example command output:
```
0:011> !DumpAsync -type MethodD
         Address               MT     Size Name
000001989f413de0 00007ff88c506ba8      112  System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]
StateMachine: Program+<MethodD>d__4
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff8d3df4b80  400000d        0         System.Int32  1 instance                0 <>1__state
00007ff8d3e082c0  400000e        8 ...TaskMethodBuilder  1 instance 000001989f413e38 <>t__builder
00007ff8d3dfea90  400000f       10 ...vices.TaskAwaiter  1 instance 000001989f413e40 <>u__1
Continuation: 000001989f413e50 (System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodC>d__3, test]])
GC roots:
    Thread 2936c:
        000000071a37e050 00007ff8d3ac1657 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [d:\repos\coreclr\src\System.Private.CoreLib\src\System\Threading\Tasks\Task.cs @ 2977]
            rbp+10: 000000071a37e0c0
                ->  000001989f413fa0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<Main>d__0, test]]
                ->  000001989f413f30 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodA>d__1, test]]
                ->  000001989f413ec0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodB>d__2, test]]
                ->  000001989f413e50 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodC>d__3, test]]
                ->  000001989f413de0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]

    HandleTable:
        000001989d8415f8 (pinned handle)
        -> 00000198af3e1038 System.Object[]
        -> 000001989f413410 System.Threading.TimerQueue[]
        -> 000001989f413468 System.Threading.TimerQueue
        -> 000001989f413330 System.Threading.TimerQueueTimer
        -> 000001989f412e40 System.Threading.Tasks.Task+DelayPromise
        -> 000001989f413de0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Program+<MethodD>d__4, test]]

Found 1 state machines.
```

@stephentoub stephentoub merged commit a9c56e2 into dotnet:master May 30, 2018

19 checks passed

CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
Linux-musl x64 Debug Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Tizen armel Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm64 Cross Debug Innerloop Build Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
WIP ready for review
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
license/cla All CLA requirements met.
Details

@stephentoub stephentoub deleted the stephentoub:dumpasync branch May 30, 2018

stephentoub added a commit to dotnet/buildtools that referenced this pull request Jun 1, 2018

Include dumpasync command when evaluating core dumps
dotnet/coreclr#18160 and dotnet/coreclr#18213 added a new `dumpasync` command to sos that will dump out data on async methods currently in use.  Including this in the core dump analysis has similar value to dumping out all call stacks, in that it'll give us more details into what was going on in the process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment