Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: new System.Diagnostics.StackTrace(System.Threading.Thread) #79463

Open
jhudsoncedaron opened this issue Dec 9, 2022 · 10 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Diagnostics enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@jhudsoncedaron
Copy link

Background and motivation

Enable applications to provide self diagnostics:

This has come up a few times where we've had a runaway thread in production (sometimes our hosted environment, sometimes the customer's environment). There doesn't be a better way to get the running stacks than attaching a debugger, taking a dump, and walking a dump. We did indeed look into doing exactly that but are unhappy with the half a gigabyte of temp space this allocates and the performance of the operation (atm that's secondary but if this becomes a standard debugging technique it won't be.... Think about it: if there's a diagnostics button support is going to push that button a lot whether or not it makes sense to push it).

I looked into how to do it; found StackFrameHelper and discovered that it takes a Thread in its constructor. This looks ideal, so I tried it and found that it asserts that it is passed current thread or a thread that isn't running. The comments suggest that a suspended thread should be usable but the actual code at the point of assertion doesn't check for a suspended thread.

Now at this point somebody's going to jump in and say that SuspendThread was banished for a reason and they'd be right. I definitely don't want a SuspendThread as it was either. However consider this: the GC is able to suspend a thread and it doesn't cause the issues that SuspendThread normally causes. I ran out of puff attempting to determine how this works however we know the GC isn't troubled by threads currently being in native code when it walks their stacks, nor is it troubled by deadlocks when it suspends them for full mark/sweep GC. Stack walk itself is in native code and would not be troubled by a managed-suspend so we should be able to do this.

API Proposal

namespace System.Diagnostics;

public partial class StackTrace
{
    public StackTrace(System.Threading.Thread thread, int numFramesToSkip = 0);
}

API Usage

    IActionResult DebugGetWokerStacks() =>
        Content(string.Join("\r\n", w.Name + ": " + string.Join("\r\n", WorkerThreads.Select((w) => new StackTrace(w).GetStackFrames()));
}

Alternative Designs

I wouldn't have bothered with numFramesToSkip except for the source code already has it; all the work goes into unlocking the ability in native code; the argument is already passed to it.

Risks

Unless I'm very much mistaken there is no risk that comes into play unless somebody actually calls the function. I can see a bad enough bug in the implementation causing sporadic deadlocks, but such a bug would still only be triggered if somebody calls the function.

@jhudsoncedaron jhudsoncedaron added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 9, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 9, 2022
@ghost
Copy link

ghost commented Dec 9, 2022

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Enable applications to provide self diagnostics:

This has come up a few times where we've had a runaway thread in production (sometimes our hosted environment, sometimes the customer's environment). There doesn't be a better way to get the running stacks than attaching a debugger, taking a dump, and walking a dump. We did indeed look into doing exactly that but are unhappy with the half a gigabyte of temp space this allocates and the performance of the operation (atm that's secondary but if this becomes a standard debugging technique it won't be.... Think about it: if there's a diagnostics button support is going to push that button a lot whether or not it makes sense to push it).

I looked into how to do it; found StackFrameHelper and discovered that it takes a Thread in its constructor. This looks ideal, so I tried it and found that it asserts that it is passed current thread or a thread that isn't running. The comments suggest that a suspended thread should be usable but the actual code at the point of assertion doesn't check for a suspended thread.

Now at this point somebody's going to jump in and say that SuspendThread was banished for a reason and they'd be right. I definitely don't want a SuspendThread as it was either. However consider this: the GC is able to suspend a thread and it doesn't cause the issues that SuspendThread normally causes. I ran out of puff attempting to determine how this works however we know the GC isn't troubled by threads currently being in native code when it walks their stacks, nor is it troubled by deadlocks when it suspends them for full mark/sweep GC. Stack walk itself is in native code and would not be troubled by a managed-suspend so we should be able to do this.

API Proposal

namespace System.Diagnostics;

public partial class StackTrace
{
    public StackTrace(System.Threading.Thread thread, int numFramesToSkip = 0);
}

API Usage

    IActionResult DebugGetWokerStacks() =>
        Content(string.Join("\r\n", w.Name + ": " + string.Join("\r\n", WorkerThreads.Select((w) => new StackTrace(w).GetStackFrames()));
}

Alternative Designs

I wouldn't have bothered with numFramesToSkip except for the source code already has it; all the work goes into unlocking the ability in native code; the argument is already passed to it.

Risks

Unless I'm very much mistaken there is no risk that comes into play unless somebody actually calls the function. I can see a bad enough bug in the implementation causing sporadic deadlocks, but such a bug would still only be triggered if somebody calls the function.

Author: jhudsoncedaron
Assignees: -
Labels:

api-suggestion, area-System.Diagnostics, untriaged

Milestone: -

@jander-msft
Copy link
Member

If you are open to using out-of-process tools rather than requiring an API, you can use:

  • dotnet-stack tool; this runs completely out-of-process and only requires the diagnostic event pipe to be available
  • dotnet-monitor tool's /stack route (currently experimental; this loads a profiler into the process to get the information)

@jhudsoncedaron
Copy link
Author

@jander-msft : I'm fine with out of process tools; the problem I ran into was taking the dump of the entire process to do so. It's quite overweight.

@jander-msft
Copy link
Member

Neither of these tools capture a dump of the process. They collect stack information directly from the runtime. However, you won't get as great of data fidelity as a dump since they only report the stack frames (modules, method names, argument types) for each thread.

If you use dotnet-stack and have feedback, feel free to log issues at https://github.com/dotnet/diagnostics/issues
If you use dotnet-monitor and have feedback, feel free to log issues at https://github.com/dotnet/dotnet-monitor/issues

@jhudsoncedaron
Copy link
Author

jhudsoncedaron commented Dec 9, 2022

@jander-msft : Apparently these don't exist as libraries. (dotnet tool install isn't something that can be packaged up.)

If they were libraries I'd just do Process.Start(...) to a bundled binary that takes care of the serialization to standard output.

@jander-msft
Copy link
Member

dotnet-stack has a direct download option (find "Direct download" in the Install section) that allows you to run a self-extracting framework-dependent executable.

At this time, dotnet-monitor is not offered as such a package (I'll look into see how this tool can provide a similar acquisition experience in the future) and requires the .NET SDK to install it. There's probably a way you can install it on one machine using the .NET SDK, zip up the bits, and unzip it on the target machine, but that's a bit unnatural to do.

@jhudsoncedaron
Copy link
Author

"framework-dependent executable" Can't use. :( I'll gladly withdraw this for a nuget package I can link against though.

@epeshk
Copy link
Contributor

epeshk commented Dec 11, 2022

For example, Java has Thread.getAllStackTraces(). It is a quite useful API that can save a lot of time when ThreadPool starvation or deadlock issues appear.

Yes, external tools can be used, but these tools must be deployed on the server. And for self-contained apps, not only additional tools, but also an entire runtime. Maybe an option to include dotnet-stack to the application bundle (as it done for createdump by default) would be useful?

Or it may be done manually when #53834 will be done

@jhudsoncedaron
Copy link
Author

@epeshk : I have a local solution to #53834 that depends on 1) all exe targets using the exact same runtimeframework version and target RID, 2) all exe targets being built framework independent, and 3) *.deps.json files being generated. I can handle references to different mutually incompatible versions of the same nuget package.

@tommcdon tommcdon added this to the 8.0.0 milestone Dec 13, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Dec 13, 2022
@hoyosjs hoyosjs added the enhancement Product code improvement that does NOT require public API changes/additions label Jan 7, 2023
@tommcdon tommcdon modified the milestones: 8.0.0, 9.0.0 Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Diagnostics enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

6 participants