Transparently propagate exceptions to callers without modification #1378

jdom · 2016-02-05T16:17:47Z

Based on a set of tests created by #1356, it's clear that if a grain throws a user-unhandled exception, then Orleans forwards it to the caller, but after it tampers a little bit with it.
It basically wraps it into an AggregateException and also flattens it (so it removes nested AggregateExceptions, messing with the stack trace and so on).

This entails a breaking change that would not be caught by compilers, but it seems that all who responded in #1356 agree that this should be fixed regardless.

The code to fix it is non entirely straightforward, as there are several levels of TaskContinuationSources in the middle (see #875 for example). It would be simpler to fix for now doing a one-off fix, but at some point there's a bigger follow up to avoid the unnecessary use of TaskCompletionSources (paying extra care of in what scheduler the continuations run). I will open another issue for it, nevertheless that might be easier to clean up with .NET 4.6

For now, the tests in ExceptionPropagationTests are being skipped. When we implement this we'll reenable them.

jthelin · 2016-02-05T19:00:48Z

Bravo @jdom. 👍
This is a very worthwhile change, although i fear it may mushroom into a lifetime's worth of work!

Any thoughts on whether it could be broken down into some smaller units, to allow "piloting" the changes on small scale first before we break for everybody / everything?

Any way to just focus initially on (say) fixing exception thrown from storage providers [for example] or some other small-ish API surface used directly by grain code?

gabikliot · 2016-02-06T01:47:14Z

Sorry for joining in late. It is anyway good that we checked those tests, to document some expectations.

I think there are 2 questions here:
a) what are the right expectations, given a standard (non-Orleans) usage of await in .NET and the distributed multi hop nature of the calls.
b) how Orleans messes up with those expectations.

I think the tests encode your expectations (as if Orleans was not messing with it). The expectation are actually a bit non-intuitive, as for example evident from @ReubenBond question. Most of the cases throw AggregateException, since there is await in the stack call. Only in one case, the 1st one, we expect direct exception. Also, consider a case where A grain is calling grain B and B throws, A just awaits. The caller will get AggregateException, unrelated to Orleans, just since there was an intermediate await. That actually breaks the concept of encapsulation - you may change your implementation, by adding a shim grain in front of a grain, and suddenly the exception type changed. The impl. detail (that is it multi hop or multi await call) leaked.

So first we need to agree that those are the expectation. There are not very good expectations BTW, I would prefer await not to wrap it in AggregateException, but that how async await works (I think there is also a way to avoid it, with some GetAwaiter code, I think you showed that to me). That is actually one of the critics of async await I heard - the "broken exception propagation" semantics. But that is how it is, and we are not discussing here how to change it. We simply need to adhere to the "standard" .NET async await behavior.

On top of that, definitely, Orleans should not flatten or mess with exceptions at all.
There is a separate question now about awaits that Orleans adds - lets say on the call path we have 2 app awaits and 3 Orleans internal awaits. You will now probably get an exception with 5 inner exceptions, right? Should we strip the other 3?
Same for this TaskContinuationSources. We should remove one of the double TCSs and use RunContinuationsAsynchronously, but I would wait until 4.6, like you said, until the deadlock bug is fixed.

jdom · 2016-02-06T02:25:47Z

Hmmm, I think you might have gotten confused, or I am miss-understanding you now.

I would prefer await not to wrap it in AggregateException, but that how async await works (I think there is also a way to avoid it, with some GetAwaiter code, I think you showed that to me).

A faulted Task always has an AggregateException in its task.Exception (that's it's type actually) that is thrown as-is when getting the task.Result property (or when calling task.Wait() in case of a non-generic Task). But calling await task does not ever surface this aggregate exception, no matter how many awaits happened in between when the original exception was thrown and when you are handling it. I think you have this expectation wrong.
So if Orleans is in the middle, it should just forward the exception as is without adding its own AggregateExceptions nor modifying its stack traces... The reason this is not currently the case and why the tests are failing, is because Orleans does NOT use the await keyword AND instead it forwards the exceptions from one task cancellation source to the next poorly (as there is of course ways to fix these without major refactoring nor having to fall back to await everywhere).

The trick to use GetAwaiter() that you mentioned is to avoid the AggregateException when you compare it to calling .Result or .Wait(), so that it acts similar to when using the await keyword.

In my test code I actually called .Wait() in some of those cases to actually force an AggregateException in user code and then assert that Orleans doesn't mess with them either since they are a little bit special (and especially because I know the Orleans implementation uses .Flatten() at some point).

If you don't want to take my word for granted just in case, you can easily modify the tests to avoid using Orleans entirely and new up the grains objects manually and call those same methods. You can also decorate the grain so that there is code in the middle that just calls await and forwards the exception to the caller, so that you test that multiple awaits in the middle do not show up as additional AggregateExceptions

gabikliot · 2016-02-06T03:00:11Z

OK, I think this is great. This is exactly what I wanted us to setup - to specify the expectations very clearly. I was obviously confused (or maybe I just forgot C# in the last couple of months). But I am sure there will be other people who will be confused as well.

So maybe you can modify the tests, to specify the expected behavior, in all the different cases, with nested awaits, with multi-hop calls, with accessing .Task directly, as if there was no Orleans. You probably got the majority of them already, but missing nested awaits and multi-hop calls.

Once we have that sorted, we can look at where Orleans violates it and see how we can fix it. It is probably a couple of places only (one with this double TCS) so we should be able to fix it easily.

jdom · 2016-02-09T18:55:39Z

Hi @gabikliot, good that we are in agreement that the current expectations are right.
To sum up, the expectations are that whenever an exception bubbles up from a grain, then Orleans will forward that exception unmodified to the caller, whether that's grain client or another grain.
This is exactly as if Orleans was NOT proxying not in the middle, and they were just normal method invocations.
The only distinction (to which I'll add a test case) is in the case that the grain throws an exception synchronously (instead of returning a faulted task). In this case Orleans will have to undoubtedly transform that into a faulted task. As a side note, based on TAP design guidelines, developers shouldn't throw synchronously unless it's a usage error (denoting a bug that should be fixed in code). Because of this, it might be useful to log a warning in this case to let know users they have a usage errors. Having said that, not a lot of people are very strict with this rule given that using await let's you catch exceptions thrown synchronously and as a faulted task, so it's a separate story that we might not need to address.

Regarding the nested awaits or asserting task.Exception directly, I'm a little hesitant to add those tests, since they would just be asserting how Tasks or c# work, and not how Orleans work with it, so I think it will not add any additional coverage.

I'm not sure what you meant with multi-hop calls. If it's having multiple awaits, then I answered above. If it's related to having calls to other grains, then I already had tests for those. See GrainForwardingExceptionPropagation and GrainForwardingExceptionPropagationDoesNotUnwrapAggregateExceptions

jdom · 2016-02-09T19:41:02Z

Added test in PR #1418

gabikliot · 2016-02-10T08:06:21Z

I indeed meant GrainForwardingExceptionPropagation.

You are clearly on top of it. Ignore everything I wrote above, I should have looked at yours tests more carefully before giving advice.

jdom · 2016-02-10T17:40:07Z

Cool, although you left me thinking about edge cases, and I just created a new PR to make sure AggregateExceptions are propagated with even more fidelity, as I believe some naive implementations might brake them, because we sometimes assume the default await semantics that just unwrap the first inner exception and do not carry all the other exceptions that might have occurred. This is typically more than fine when handling exceptions in app code, but Orleans should propagate with the most fidelity possible to let the user decide what to do with it.

gabikliot · 2016-02-11T05:06:27Z

So what do you think about #1415 fix?

jdom · 2016-02-11T06:03:03Z

I think it's probably OK. Not sure if it will pass 1 of the last tests I added, but we'll see once he updates de PR. I believe @dVakulen was waiting for the tests to be merged to do it. One of them is not extremely important that it passes, as it's an edge condition, but it might be easy to fix too

…change (dotnet#1378)

ReubenBond · 2021-09-02T23:39:27Z

Exceptions are propagated transparently now (at least as of #7070)

jdom added a commit to jdom/orleans that referenced this issue Feb 5, 2016

Skip tests until dotnet#1378 is implemented

c77e1ed

jthelin added the enhancement label Feb 5, 2016

This was referenced Feb 6, 2016

Fixed string.Format arguments number error #1407

Merged

Made exceptions to propagate transparently to callers #1415

Merged

jdom mentioned this issue Feb 10, 2016

More AggregateException propagation tests #1427

Merged

jdom mentioned this issue Feb 11, 2016

1.2.0 release #1433

Closed

12 tasks

sergeybykov added this to the 1.2.0 milestone Feb 12, 2016

jdom added a commit to jdom/orleans that referenced this issue Mar 16, 2016

Fixes to Functionals related to the exception propagation behavioral …

16f9a56

…change (dotnet#1378)

jdom mentioned this issue May 9, 2016

Stack trace in exceptions #1739

Closed

ReubenBond removed this from the 1.2.0 milestone Apr 6, 2021

ReubenBond self-assigned this Sep 2, 2021

ReubenBond closed this as completed Sep 2, 2021

ghost locked as resolved and limited conversation to collaborators Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transparently propagate exceptions to callers without modification #1378

Transparently propagate exceptions to callers without modification #1378

jdom commented Feb 5, 2016

jthelin commented Feb 5, 2016

gabikliot commented Feb 6, 2016

jdom commented Feb 6, 2016

gabikliot commented Feb 6, 2016

jdom commented Feb 9, 2016

jdom commented Feb 9, 2016

gabikliot commented Feb 10, 2016

jdom commented Feb 10, 2016

gabikliot commented Feb 11, 2016

jdom commented Feb 11, 2016

ReubenBond commented Sep 2, 2021

Transparently propagate exceptions to callers without modification #1378

Transparently propagate exceptions to callers without modification #1378

Comments

jdom commented Feb 5, 2016

jthelin commented Feb 5, 2016

gabikliot commented Feb 6, 2016

jdom commented Feb 6, 2016

gabikliot commented Feb 6, 2016

jdom commented Feb 9, 2016

jdom commented Feb 9, 2016

gabikliot commented Feb 10, 2016

jdom commented Feb 10, 2016

gabikliot commented Feb 11, 2016

jdom commented Feb 11, 2016

ReubenBond commented Sep 2, 2021