Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Non-cooperative abortion of code execution #69622

Closed
AntonLapounov opened this issue May 20, 2022 · 17 comments
Closed

[API Proposal]: Non-cooperative abortion of code execution #69622

AntonLapounov opened this issue May 20, 2022 · 17 comments
Labels
api-approved API was approved in API review, it can be implemented area-VM-coreclr
Milestone

Comments

@AntonLapounov
Copy link
Member

AntonLapounov commented May 20, 2022

Background and motivation

Update: Replaced TryThrowException with TryAbort as was in the original proposal.

There is a number of execution environments for .NET code that may benefit from having an option to perform non-cooperative abortion of code execution, e.g., C# and F# interactive compilers (csi and fsi) or .NET Interactive Notebooks (see #41291). While the legacy Framework provided the Thread.Abort method, it has never been supported by .NET Core, because aborting code execution at a random point is inherently unsafe and may leave the process in an inconsistent or a corrupted state. Thus, the only available option to interrupt stuck code execution is to terminate the process and lose its state, which is undesirable for interactive scenarios.

Below is the proposed API to allow controlled execution of given code. The Run method starts execution of the delegate and TryAbort method may be used to attempt aborting its execution by throwing an asynchronous ThreadAbortException on that thread. Aborting may be successful only when the code is executing managed code, e.g., running a CPU-intensive loop, on the original thread. This API would not help cases where the thread is stuck in unmanaged code or waiting for some other thread to finish its work.

The implementation will generally use the same mechanism as the one utilized by the debugger to abort threads doing function evaluation (Thread::UserAbort). That includes marking the thread for abortion, suspending it, checking whether it is running managed code and safe to redirect, redirecting the thread to the ThrowControlForThread runtime helper, walking the stack to verify the tread is in a safe spot, and raising the exception for thread abort.

A possible option to make the implementation simpler is to rely only on GC polls / polls for thread abort instead of redirecting the thread on Windows, similarly to what is done on Unixes.

This design was discussed in #66480.

API Proposal

namespace System.Runtime.CompilerServices;

public class ControlledExecution
{
    public ControlledExecution(Action action);

    // Runs the specified action.
    // Throws an InvalidOperationException if the action is already running.
    // Throws a ThreadAbortException if execution was aborted.
    public void Run();

    // Attempts to abort the execution of the action by injecting an asynchronous exception.
    // Returns true on success or false otherwise.
    // Throws an InvalidOperationException if the action is not running.
    public bool TryAbort(TimeSpan timeout);
}

API Usage

s_executor = new ControlledExecution(SomeLengthyAction);
s_executor.Run();

// On another thread:
s_executor.TryAbort(TimeSpan.FromMilliseconds(500));

Alternative Designs

No response

Risks

No response

@AntonLapounov AntonLapounov added the api-ready-for-review API is ready for review, it is NOT ready for implementation label May 20, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 20, 2022
@teo-tsirpanis
Copy link
Contributor

Because this API can unpredictably corrupt the application's state by design and to avoid misuse, it might need to be gated by an opt-in runtime configuration setting. This means we also need a static bool IsSupported { get; } property to the ControlledExecution class.

My understanding is that it's impossible to implement this API safely, and this would strongly discourage using it as a replacement of Thread.Abort.

@JanEggers
Copy link

whats the differnce of this vs a method that takes a cancellation token?

@teo-tsirpanis
Copy link
Contributor

A method with a cancellation token has to periodically check for it and pass it down the call stack. This API will allow terminating methods forcefully and immediately, even if they don't accept a cancellation token.

@stephentoub
Copy link
Member

it might need to be gated by an opt-in runtime configuration setting. This means we also need a static bool IsSupported { get; } property to the ControlledExecution class.

This is a dangerous API, one very few folks should use, but I don't see how the suggestion helps; it just forces extra boilerplate that someone who needs to call this has to go through, and is on the path towards reintroducing Code Access Security. How are the dangers here different from a library, say, P/Invoking to TerminateThread and similarly leaving things in a corrupted state? That doesn't require additional opt-in today. Opting-in is the act of calling the method or using a library that does.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented May 26, 2022

This API is marked ready for review. Before we approve it, I'd like to get confirmation from the runtime + libraries architects that we're not committing to changing any BCL code to be resilient in the face of asynchronous exceptions being thrown. The original post from Don Syme said that the reliability for this API has to be "good but not perfect" - but I never saw a definition of how we would measure success here.

I really don't want this to be a pretext for reintroducing CERs back into our code after we spent so much effort removing them. See dotnet/docs#29624 (comment) for some additional discussion on how newer coding constructs (like the array pool) could result in additional pits of failure with regard to asynchronous exceptions.

@stephentoub
Copy link
Member

stephentoub commented May 26, 2022

I'd like to get confirmation from the runtime + libraries architects that we're not committing to changing any BCL code to be resilient in the face of asynchronous exceptions being thrown.

We're not going to add back CERs. As far as library code is concerned, thread aborts do not exist.

@GrabYourPitchforks
Copy link
Member

I wonder if it could be useful to expose a "was aborted?" API from this to let the original caller know if an interrupt indeed took place. They could check exceptions coming from the Run method, and if this Boolean property also returns true, they could normalize it back to an OperationCanceledException (or whatever) for reporting purposes.

In the extreme, the caller might want the answers to the following questions:

  • Was this operation aborted?
    • If so, what was the original exception? (It could have been lost if the delegate threw a different exception internally.)
    • What was the call stack of the thread which called TryThrowException?
    • What was the call stack of the Run method at the time the exception was thrown?
  • Was any operation ever aborted? (This would indicate the process is potentially unhealthy, and a REPL service might take action such as spinning up a new service instance in the background.)

@jkotas
Copy link
Member

jkotas commented May 27, 2022

I wonder if it could be useful to expose a "was aborted?" API from this to let the original caller know if an interrupt indeed took place.

Is it different from the bool returned by TryThrowException?

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented May 27, 2022

Is it different from the bool returned by TryThrowException?

My understanding is that TryThrowException is invoked by a thread other than the thread which called Run. The "was this operation aborted?" API would be for use by the thread which called Run.

ControlledExecution cex;
try
{
   cex.Run(some_action);
}
catch (Exception ex) when (cex.WasCanceled)
{
    throw new OperationCanceledException(); // normalize
}

@jkotas
Copy link
Member

jkotas commented May 27, 2022

Ah ok, so this would be just for convenience. I am not sure whether it is worth it. The interactive window is going to have wrapper around this that can keep track of what is going on.

@GrabYourPitchforks
Copy link
Member

Ah ok, so this would be just for convenience. I am not sure whether it is worth it. The interactive window is going to have wrapper around this that can keep track of what is going on.

Sure. So the suggestion is that since the code which kicks off the work and the code which aborts the work are assumed to have been written by the same author, they should already have some synchronization mechanism between themselves to communicate this information? I'm fine with that.

@AntonLapounov AntonLapounov added this to the 7.0.0 milestone May 31, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label May 31, 2022
@AntonLapounov
Copy link
Member Author

I wonder if it could be useful to expose a "was aborted?" API from this to let the original caller know if an interrupt indeed took place.

Maybe the Run method should return bool indicating a successful execution and not throw an exception? That would save on try/catch boilerplate code (as in #69622 (comment)).

@AntonLapounov AntonLapounov added the blocking Marks issues that we want to fast track in order to unblock other important work label Jun 21, 2022
@deeprobin
Copy link
Contributor

Because this API can unpredictably corrupt the application's state by design and to avoid mis

@teo-tsirpanis

How about marking this class unsafe?

@teo-tsirpanis
Copy link
Contributor

@deeprobin it won't work, we need #31354. And it wouldn't solve the problem of a dependency using it. The goal of my proposal is to entirely disable it from any library in the app, like the BinaryFormatter switch.

@bartonjs
Copy link
Member

bartonjs commented Jul 7, 2022

Video

  • Is System.Runtime.CompilerServices the right namespace?
    • Changed to just System.Runtime
  • We changed the instance ctor/Run()/Abort to a single static using CancellationToken
  • To discourage people from calling it, we should declare it Obsolete out of the box, with a custom diagnostic ID (and message)
  • ThreadAbortException should not leak out of Run, it should get normalized to OperationCancelledException.
    • Other exceptions can go through without normalization
namespace System.Runtime;

[Obsolete(some diag code)]
public static class ControlledExecution
{
    public static void Run(Action action, CancellationToken cancellationToken);
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 7, 2022
@AntonLapounov
Copy link
Member Author

Implemented in #71661.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-VM-coreclr
Projects
None yet
Development

No branches or pull requests

8 participants