Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't hide exceptions on the async cancellation-path #3257

Closed
wants to merge 9 commits into from

Conversation

matthid
Copy link
Contributor

@matthid matthid commented Jun 24, 2017

Short description

Don't hide exceptions on the cancellation-path but collect them in the InnerException of the OperationCancelledException. Use this information to improve AwaitTask behavior.

See discussion in #3219 (comment)

I guess this is the most controversial PR. It builds on top of #3256 and #3255 because we need those fixes for the tests to work properly.

/cc @eiriktsarpalis

Examples and detailed description

The main problem right now this PR tries to fix is that user-exceptions are swallowed by the Async-module in various occasions. This means that in the cancellation-path a lot of programming errors will go unnoticed. While fixing/debugging these code-paths several other changes in behaviors were uncovered and fixed.

Example - Bug in shutdown/finally

        let cts = new CancellationTokenSource()
        let ewh = new ManualResetEvent(false)
        let msg = "Cleanup failure"
        let a = async {
            try ewh.Set() |> ignore
                do! Async.Sleep 10000
            finally raise <| Exception msg}
        async {
            ewh.WaitOne() |> ignore
            cts.Cancel()
        } |> Async.Start
        Async.RunSynchronously(a, cancellationToken = cts.Token)

Consider in the finally block we have a programming error which throws some exception.
This programming error will be swallowed.

So the general behavior is: Once the cancellation-token has been cancelled all APIs will return an OperationCanceledException. There is however an exception to this rule:

Example - Cancellation in 3rd Party APIs

Some Task-based APIs will not honor cancellation correctly (or will just fail by accident of because of the token). They will throw some custom exception. In this scenario there is a behavior change depending on how your code is structured:

        let cts = new CancellationTokenSource()
        let tcs = TaskCompletionSource<unit>()
        let a = async {
            cts.CancelAfter (100)
            do! tcs.Task |> Async.AwaitTask }
        use t : Task<unit> =
            Async.StartAsTask(a, cancellationToken = cts.Token)

        // Should not finish
        try
            let result = t.Wait(300)
            Assert.IsFalse (result)
        with :? AggregateException -> Assert.Fail "Task should not finish, yet"

        let msg = "Custom non-conforming 3rd-Party-Api throws"
        tcs.SetException(Exception msg)

In the above code t (ie StartAsTask) will eventually fail with the 3rd party exception. However if you change your code-structure within the async:

        let a = async {
            try cts.CancelAfter (100)
                do! tcs.Task |> Async.AwaitTask
            with _ when false -> printfn "test" }

You get the OperationCanceledException with the swallowed user-exception as above.

This solution

We will always continue with a TaskCanceledException, but we will use the InnerException property to forward user-exceptions. This will properly flow over nested Async.AwaitTask |> Async.StartAsTask |> Async.AwaitTask ... boundaries.
We also change the observable behavior to be more independent on the used control-flow structures.

@matthid matthid changed the title don't hide potentially important exceptions when cancelling an async [WIP] don't hide potentially important exceptions when cancelling an async Jun 24, 2017
@matthid
Copy link
Contributor Author

matthid commented Jun 24, 2017

Seems like not even the .net framework itself is doing the correct thing because of backwards compat:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Threading/Tasks/Task.cs#L1926-L1932

interestingly though it improves C# interop, because when used in an C#-async-method it will throw the correct exception ...

Question is can we have the correct thing within F# :/

let exn = edi.GetAssociatedSourceException()
let collected =
match cexn.InnerException with
| null -> [|exn|]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we throwing an AggregateException as the inner exception when there's only one exception? Feels a bit superfluous.

Copy link
Contributor Author

@matthid matthid Jun 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saul It is one of the details I have not completely figured out yet. On one hand you are correct on the other hand it feels wrong that you need to change your exception handling code when another exception is added...

Also I feel like people should not write too much code depending on those inner exceptions. So making things awkward is probably a good thing. Maybe we should add our own now exception type, because then we can add some additional warnings/documentation on its members...

At least I now know why @eiriktsarpalis is so skeptical about this. Even the bcl guys saw that this is hard to do afterwards and maintain compat.

But I still feel like hiding exceptions is not an option (again, bcl guys felt the same and added it in the c# await feature)

@matthid matthid changed the title [WIP] don't hide potentially important exceptions when cancelling an async [Discuss] don't hide potentially important exceptions when cancelling an async Jun 25, 2017
@matthid
Copy link
Contributor Author

matthid commented Jun 25, 2017

OK I think this is ready to discuss. These are my notes on this:

  • I consider this as an important fix. Hiding exceptions of user-code is probably never a good idea.
    -> The only alternative would be a logging mechanism in FSharp.Core or some GetLastError API ...
  • We need to capture and forward the info, just consider nesting a bunch of t |> AwaitTask |> StartAsTask |> AwaitTask |> StartAsTask, I think people expect the final Task to have similar behavior as the one you started with.
  • We probably need to be a bit careful regarding compatibility
    • Not breaking old code (IMHO adding an 'InnerException' can be considered non-breaking)
    • I'm not sure if this is the correct way to expose this new information. Maybe our own (internal) sub-types of OperationCanceledException and TaskCanceledException make sense? This way people are able to see it in the debugger (or even logging if we overwrite ToString) but cannot depend on it programmatically. (People could still get the raw exceptions via |> StartAsTask |> GetAwaiter |> GetResult)

@KevinRansom
Copy link
Member

@dsyme, @matthid, @saul

Where are we on this PR? The code looks fine, expose the underlying exception as inner exception seems okay to me.

conflicts need resolving though.

Kevin

@abelbraaksma
Copy link
Contributor

I'm curious, as this is marked 'ready to discuss', what's the biggest risk here? Does @MattiD mean we introduce a different behaviour wth c# or are we making certain existing scenarios backwards incompatible? If I read your comment correctly it looks like with this PR we're improving by propagating the exceptions, as opposed to hiding them, right? I'd say, go for it ;).

@abelbraaksma
Copy link
Contributor

Sorry, looks like editing comments in github on mobile is not possible anymore. I meant @matthid of course.

@cartermp
Copy link
Contributor

cartermp commented Feb 5, 2018

Now that #3256 and #3255 are merged, it would be good to fully review this - @KevinRansom you seem on board. @dsyme?

Also @matthid can you resolve conflicts?

@matthid
Copy link
Contributor Author

matthid commented Feb 5, 2018

Yes give me a couple of days (also to respond to the comments in depth).

@@ -1304,38 +1332,38 @@ namespace Microsoft.FSharp.Control
member __.Proceed = not isStopped
member __.Stop() = isStopped <- true

let StartAsTask (token:CancellationToken, computation : Async<_>,taskCreationOptions) : Task<_> =
let StartAsTask (token:CancellationToken, computation : Async<'a>,taskCreationOptions) : Task<'a> =
#if !FX_NO_ASYNCTASKMETHODBUILDER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please switch the #if and #else branches here

Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthid Could you adjust the description at the top of the PR to be a complete description of the observable change in this PR? I can try to divine it from your test cases and code, and the various conversation history, but it would be better to have a complete, independent description that I can check against. We also need to decide if this is worthy of an RFC or if it is just a feature improvement - a description will help me decide that

thanks

else
edi.SourceException
tcs.SetException wrapper |> fake)
#if !FX_NO_ASYNCTASKMETHODBUILDER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise switch the branches here

@dsyme
Copy link
Contributor

dsyme commented Feb 9, 2018

Please resolve the conflicts too, thanks!

@matthid
Copy link
Contributor Author

matthid commented Feb 9, 2018

Ok I managed to rebase this and included the latest suggestions from @dsyme.

All my points from #3257 (comment) still apply.

Where are we on this PR? The code looks fine, expose the underlying exception as inner exception seems okay to me.

I'm curious, as this is marked 'ready to discuss', what's the biggest risk here?

Yes but it is not an ideal solution for various reasons:

  • The implementation - while it works - is arguably a bit "hacky". This is due to the lack of low-level APIs (on the TaskHelpers side).
  • We change the exception type on AwaitTask from OperationCanceledException to TaskCanceledException (but that one is a subtype) on some occasions.
  • Once people start to depend on the InnerExceptions we cannot change it anymore (as it will be a breaking change).
  • In a "perfect" world I feel like we would just properly propagate the source exception instead of having a OperationCanceledException at all. Especially when the Exception happens after the cancellation (just like a Exception within a finally block hides the underlying exception).

On the other hand hiding exceptions is a really really bad thing, especially on the "cancellation"-path as that is usually where people make mistakes. However, I'm not sure this change provides the required visibility to those exceptions.

@matthid matthid changed the title [Discuss] don't hide potentially important exceptions when cancelling an async [Async] Don't hide exceptions on the cancellation-path but collect them in the InnerException of the OperationCancelledException. Use this information to improve AwaitTask behavior. Feb 9, 2018
@matthid
Copy link
Contributor Author

matthid commented Feb 9, 2018

@dsyme is this new description better?

@dsyme
Copy link
Contributor

dsyme commented Feb 13, 2018

@dsyme is this new description better?

Yes, thanks. But I wasn't really meaning the title - I was meaning the description of the PR at the top of this PR, the one which currently says "See discussion in #3219 (comment)"

Basically I want a written, independent speclet of what you're changing here - like a mini-RFC. So we know what the problem is, what the ramifications are, what the workarounds are etc. Use the general outline of the RFC form if it helps.

@dsyme dsyme changed the title [Async] Don't hide exceptions on the cancellation-path but collect them in the InnerException of the OperationCancelledException. Use this information to improve AwaitTask behavior. Don't hide exceptions on the asyc cancellation-path Feb 13, 2018
@dsyme dsyme changed the title Don't hide exceptions on the asyc cancellation-path Don't hide exceptions on the async cancellation-path Feb 13, 2018
Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthid Your update of the title helped me a lot in helping to grok the code. I've left some initial questions in the code. I think it's basically a good idea - we should be populating InnerExceptions where possible and the initial implementation of async didn't consider this at all.

I'll do a full review after you've considered this initial feedback

// protect an exception in an cancellation workflow and adds it to the given OperationCanceledException
// ie creates a new instance, where the new information is added,
// if cexn is a TaskCanceledException the type is preserved as well.
let protectExn (cexn:OperationCanceledException) (edi:ExceptionDispatchInfo) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name might be augmentOperationCancelledException. In this file protect normally means protect a function with a try/with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes makes sense

(fun edi -> resultCell.RegisterResult(Error(edi),reuseThread=true))
(fun edi ->
let result =
if token.IsCancellationRequested then Canceled(protectExn (new OperationCanceledException()) edi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this token.IsCancellationRequested check so late. There is always a race here - we've reached the end of the computation and found an Error result. Why another check for cancellation so late?

I mean, it's certainly valid to check for cancellation so late, but is it necessary for the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is to have a consistent result. If I remember currently we can have different results depending of whether our Async contained a task or not. Or even worse how the code is formatted.
For example adding a try - with could lead to a different exception (even if the with block has nothing to do with the actual exception thrown). I think one of the tests actually checks this.

So basically this is to give a more consistent result when there is an interop with tasks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't yet see how this achieves consistency. IsCancellationRequested could become true the moment after we check it. And I don't think there are cases where we know a-priori that it was true coming into the exception continuation (we should never be taking the exception continuation once the cancellation continuation is taken)

I do understand that there is a case related to tasks where you want to make things consistent

For example adding a try - with could lead to a different exception (even if the with block has nothing to do with the actual exception thrown).

Could you give me an example?

Copy link
Contributor Author

@matthid matthid Feb 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the following test will fail:

    [<Test>]
    member this.StartAsTaskCancellationViaException () =
        let cts = new CancellationTokenSource()
        let tcs = TaskCompletionSource<unit>()
        let a = async {
            cts.CancelAfter (100)
            do! tcs.Task |> Async.AwaitTask }
        use t : Task<unit> =
            Async.StartAsTask(a, cancellationToken = cts.Token)

        // Should not finish
        try
            let result = t.Wait(300)
            Assert.IsFalse (result)
        with :? AggregateException -> Assert.Fail "Task should not finish, yet"

        let msg = "Custom non-conforming 3rd-Party-Api throws"
        tcs.SetException(Exception msg)

        try
            this.WaitASec t
        with :? TaskCanceledException as t ->
            match t.InnerException with
            | :? AggregateException as a ->
                Assert.AreEqual(1, a.InnerExceptions.Count)
                Assert.AreEqual(msg, a.InnerException.Message)
            | _ -> reraise()
        Assert.IsTrue (t.IsCompleted, "Task is not completed")
        Assert.IsTrue (t.IsCanceled, "Task is not cancelled")

Basically without checking the cancellation-token check above we will yield no TaskCanceledException but an Exception. This is inconsistent because if you change the async to

        let a = async {
            try
                cts.CancelAfter (100)
                do! tcs.Task |> Async.AwaitTask
            with _ when false -> printfn "test" }

You will in-fact get the TaskCanceledException as you would expect. This is surprising behavior at the very least (ie we get a different exception only by adding a try ... with _ when false while everything else stays the same).

Therefore I added the additional checks on all the constructs. Is there a better option?

(fun edi -> resultCell.RegisterResult(Error(edi),reuseThread=true))
(fun edi ->
let result =
if token.IsCancellationRequested then Canceled(protectExn (new OperationCanceledException()) edi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above. We have to do this for every relevant element

(fun edi -> tcs.SetException edi.SourceException |> fake)
(fun edi ->
let wrapper =
if token.IsCancellationRequested then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and again.

@@ -1178,14 +1222,29 @@ namespace Microsoft.FSharp.Control
// Contains helpers that will attach continuation to the given task.
// Should be invoked as a part of protectedPrimitive(withResync) call
module TaskHelpers =
// This uses a trick to get the underlying OperationCanceledException
let inline getCancelledException (completedTask:Task) (waitWithAwaiter) =
let fallback = new TaskCanceledException(completedTask) :> OperationCanceledException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allocate the fallback here when it isn't needed on all paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I guess I make this a function instead.

if useCcontForTaskCancellation
then args.aux.ccont (new OperationCanceledException(args.aux.token))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happened to the args.aux.token argument? Previously we were passing it in but now getCancelledException is being used and AFAICS it isn't setting the token

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one of the changes where we changed to an TaskCanceledException exception if possible (or leave an existing exception in place). The TaskCanceledException improves interop when used from C# (ie when this is Started as Task on an higher level) but

  • The TaskCanceledException has no parameter to set a token, we associate with the Task instead
  • If we re-use an OperationCanceledException we assume it is correct
  • The third case shouldn't actually happen, but we could use an OperationCanceledException with the token there I guess (as long as we set the InnerException it should be fine). I'd prefer the TaskCanceledException as that's what actually happend (completedTask.IsCancelled is true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll play around with changing this and there is probably one of the new tests that will fail. And I'll try to give some examples.
Because this might as well be a merge artifact (maybe it has been changed in master in the mean time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok currently this doesn't have any observable impact, because useCcontForTaskCancellation is only true when FSCORE_PORTABLE_NEW is defined and only for the Sleep implementation (so this change would have changed the observable exception a bit in this scenario).
I changed this part by renaming the useCcontForTaskCancellation to useSimpleCcontForTaskCancellationAndLooseException to make it more clear that setting this to true has quite an impact. I also changed the code that useSimpleCcontForTaskCancellationAndLooseException has the old behavior (which I think is fine because Task.Sleep will probably never throw user-exceptions).

if useCcontForTaskCancellation
then args.aux.ccont (new OperationCanceledException(args.aux.token))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here

@dsyme
Copy link
Contributor

dsyme commented Feb 13, 2018

We change the exception type on AwaitTask from OperationCanceledException to TaskCanceledException (but that one is a subtype) on some occasions.

I'm ok with this

@@ -969,9 +996,15 @@ namespace Microsoft.FSharp.Control
member __.Proceed = not isStopped
member __.Stop() = isStopped <- true

let StartAsTask (token:CancellationToken, computation : Async<_>,taskCreationOptions) : Task<_> =
let StartAsTask (token:CancellationToken, computation : Async<'a>,taskCreationOptions) : Task<'a> =
#if FX_NO_ASYNCTASKMETHODBUILDER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an artifact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean: Does this define still exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't exist any more

let a = async {
cts.CancelAfter (100)
do! tcs.Task |> Async.AwaitTask }
#if FSCORE_PORTABLE_NEW || coreclr
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking a look through these various old flags now, will send a PR to flush this

@matthid
Copy link
Contributor Author

matthid commented Feb 18, 2018

@dsyme I tried to motivate this with some examples in the description and to extend it a bit.

@matthid
Copy link
Contributor Author

matthid commented Feb 22, 2018

So, I'll just remove the #if stuff here? What about the CI error?

Also (@dsyme), just let you know, a lot of discussion was hidden by the updates. I think the open points are answered.

@dsyme
Copy link
Contributor

dsyme commented Feb 22, 2018

@dotnet-bot test Ubuntu14.04 Release_default Build please

@dsyme
Copy link
Contributor

dsyme commented Feb 22, 2018

You can remove the FX_NO_ASYNCTASKMETHODBUILDER branch

@matthid
Copy link
Contributor Author

matthid commented Apr 12, 2018

@dotnet-bot test Ubuntu16.04 Release_fcs Build please

I think the failure was unrelated at the time.

@dsyme
Copy link
Contributor

dsyme commented May 30, 2018

@matthid Can you see if you want to keep this? Otherwise I will close it

@matthid Also, if you have time please review #4867

@matthid
Copy link
Contributor Author

matthid commented May 30, 2018

Question is if you want to take it?
My feeling around this have changed a bit over time. What hasn't changed is the feeling that we do some things "wrong" around this area. Especially considering that we "swallow" bugs in the cancellation code-path.

While the changes or the implementation in this PR can be considered "hacky", I think they improve the behavior (and the new tests show how).

The point is: I have no better idea on how to tackle the above problem.

@KevinRansom
Copy link
Member

@matthid, @dsyme

This is a super old PR (more than 1 year), and it is not clear that we are agreed on taking this.

Can we decide if we want to keep this, or try something else, or just stay with what we have?

Thanks

Kevin

@matthid
Copy link
Contributor Author

matthid commented Sep 11, 2018

I'm probably happy either way. Please decide and close if you don't think the ideas in my initial post are worth it.

@KevinRansom
Copy link
Member

@matthid

the pressure !!!!!!

Please reopen if you feel you want to continue to adress this issue.

@KevinRansom
Copy link
Member

@matthid
please note, the closure is not a comment on your original ideas, which I think are indeed valuable. I just wish to get our open PR's back to a manageable level.

@matthid
Copy link
Contributor Author

matthid commented Sep 11, 2018

@KevinRansom Was there something for me to address?

@KevinRansom
Copy link
Member

I was concerned by this:

While the changes or the implementation in this PR can be considered "hacky", I think they improve the behavior (and the new tests show how).

The point is: I have no better idea on how to tackle the above problem.

Usually that is an indication, that a redesign is needed. If you are satisfied that this is ready, then reopen it and we will run it through the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants