Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparently propagate exceptions to callers without modification #1378

Closed
jdom opened this issue Feb 5, 2016 · 11 comments
Closed

Transparently propagate exceptions to callers without modification #1378

jdom opened this issue Feb 5, 2016 · 11 comments
Assignees

Comments

@jdom
Copy link
Member

jdom commented Feb 5, 2016

Based on a set of tests created by #1356, it's clear that if a grain throws a user-unhandled exception, then Orleans forwards it to the caller, but after it tampers a little bit with it.
It basically wraps it into an AggregateException and also flattens it (so it removes nested AggregateExceptions, messing with the stack trace and so on).

This entails a breaking change that would not be caught by compilers, but it seems that all who responded in #1356 agree that this should be fixed regardless.

The code to fix it is non entirely straightforward, as there are several levels of TaskContinuationSources in the middle (see #875 for example). It would be simpler to fix for now doing a one-off fix, but at some point there's a bigger follow up to avoid the unnecessary use of TaskCompletionSources (paying extra care of in what scheduler the continuations run). I will open another issue for it, nevertheless that might be easier to clean up with .NET 4.6

For now, the tests in ExceptionPropagationTests are being skipped. When we implement this we'll reenable them.

jdom added a commit to jdom/orleans that referenced this issue Feb 5, 2016
@jthelin
Copy link
Contributor

jthelin commented Feb 5, 2016

Bravo @jdom. 👍
This is a very worthwhile change, although i fear it may mushroom into a lifetime's worth of work!

Any thoughts on whether it could be broken down into some smaller units, to allow "piloting" the changes on small scale first before we break for everybody / everything?

Any way to just focus initially on (say) fixing exception thrown from storage providers [for example] or some other small-ish API surface used directly by grain code?

@gabikliot
Copy link
Contributor

Sorry for joining in late. It is anyway good that we checked those tests, to document some expectations.

I think there are 2 questions here:
a) what are the right expectations, given a standard (non-Orleans) usage of await in .NET and the distributed multi hop nature of the calls.
b) how Orleans messes up with those expectations.

I think the tests encode your expectations (as if Orleans was not messing with it). The expectation are actually a bit non-intuitive, as for example evident from @ReubenBond question. Most of the cases throw AggregateException, since there is await in the stack call. Only in one case, the 1st one, we expect direct exception. Also, consider a case where A grain is calling grain B and B throws, A just awaits. The caller will get AggregateException, unrelated to Orleans, just since there was an intermediate await. That actually breaks the concept of encapsulation - you may change your implementation, by adding a shim grain in front of a grain, and suddenly the exception type changed. The impl. detail (that is it multi hop or multi await call) leaked.

So first we need to agree that those are the expectation. There are not very good expectations BTW, I would prefer await not to wrap it in AggregateException, but that how async await works (I think there is also a way to avoid it, with some GetAwaiter code, I think you showed that to me). That is actually one of the critics of async await I heard - the "broken exception propagation" semantics. But that is how it is, and we are not discussing here how to change it. We simply need to adhere to the "standard" .NET async await behavior.

On top of that, definitely, Orleans should not flatten or mess with exceptions at all.
There is a separate question now about awaits that Orleans adds - lets say on the call path we have 2 app awaits and 3 Orleans internal awaits. You will now probably get an exception with 5 inner exceptions, right? Should we strip the other 3?
Same for this TaskContinuationSources. We should remove one of the double TCSs and use RunContinuationsAsynchronously, but I would wait until 4.6, like you said, until the deadlock bug is fixed.

@jdom
Copy link
Member Author

jdom commented Feb 6, 2016

Hmmm, I think you might have gotten confused, or I am miss-understanding you now.

I would prefer await not to wrap it in AggregateException, but that how async await works (I think there is also a way to avoid it, with some GetAwaiter code, I think you showed that to me).

A faulted Task always has an AggregateException in its task.Exception (that's it's type actually) that is thrown as-is when getting the task.Result property (or when calling task.Wait() in case of a non-generic Task). But calling await task does not ever surface this aggregate exception, no matter how many awaits happened in between when the original exception was thrown and when you are handling it. I think you have this expectation wrong.
So if Orleans is in the middle, it should just forward the exception as is without adding its own AggregateExceptions nor modifying its stack traces... The reason this is not currently the case and why the tests are failing, is because Orleans does NOT use the await keyword AND instead it forwards the exceptions from one task cancellation source to the next poorly (as there is of course ways to fix these without major refactoring nor having to fall back to await everywhere).

The trick to use GetAwaiter() that you mentioned is to avoid the AggregateException when you compare it to calling .Result or .Wait(), so that it acts similar to when using the await keyword.

In my test code I actually called .Wait() in some of those cases to actually force an AggregateException in user code and then assert that Orleans doesn't mess with them either since they are a little bit special (and especially because I know the Orleans implementation uses .Flatten() at some point).

If you don't want to take my word for granted just in case, you can easily modify the tests to avoid using Orleans entirely and new up the grains objects manually and call those same methods. You can also decorate the grain so that there is code in the middle that just calls await and forwards the exception to the caller, so that you test that multiple awaits in the middle do not show up as additional AggregateExceptions

@gabikliot
Copy link
Contributor

OK, I think this is great. This is exactly what I wanted us to setup - to specify the expectations very clearly. I was obviously confused (or maybe I just forgot C# in the last couple of months). But I am sure there will be other people who will be confused as well.

So maybe you can modify the tests, to specify the expected behavior, in all the different cases, with nested awaits, with multi-hop calls, with accessing .Task directly, as if there was no Orleans. You probably got the majority of them already, but missing nested awaits and multi-hop calls.

Once we have that sorted, we can look at where Orleans violates it and see how we can fix it. It is probably a couple of places only (one with this double TCS) so we should be able to fix it easily.

@jdom
Copy link
Member Author

jdom commented Feb 9, 2016

Hi @gabikliot, good that we are in agreement that the current expectations are right.
To sum up, the expectations are that whenever an exception bubbles up from a grain, then Orleans will forward that exception unmodified to the caller, whether that's grain client or another grain.
This is exactly as if Orleans was NOT proxying not in the middle, and they were just normal method invocations.
The only distinction (to which I'll add a test case) is in the case that the grain throws an exception synchronously (instead of returning a faulted task). In this case Orleans will have to undoubtedly transform that into a faulted task. As a side note, based on TAP design guidelines, developers shouldn't throw synchronously unless it's a usage error (denoting a bug that should be fixed in code). Because of this, it might be useful to log a warning in this case to let know users they have a usage errors. Having said that, not a lot of people are very strict with this rule given that using await let's you catch exceptions thrown synchronously and as a faulted task, so it's a separate story that we might not need to address.

Regarding the nested awaits or asserting task.Exception directly, I'm a little hesitant to add those tests, since they would just be asserting how Tasks or c# work, and not how Orleans work with it, so I think it will not add any additional coverage.

I'm not sure what you meant with multi-hop calls. If it's having multiple awaits, then I answered above. If it's related to having calls to other grains, then I already had tests for those. See GrainForwardingExceptionPropagation and GrainForwardingExceptionPropagationDoesNotUnwrapAggregateExceptions

@jdom
Copy link
Member Author

jdom commented Feb 9, 2016

Added test in PR #1418

@gabikliot
Copy link
Contributor

I indeed meant GrainForwardingExceptionPropagation.

You are clearly on top of it. Ignore everything I wrote above, I should have looked at yours tests more carefully before giving advice.

@jdom
Copy link
Member Author

jdom commented Feb 10, 2016

Cool, although you left me thinking about edge cases, and I just created a new PR to make sure AggregateExceptions are propagated with even more fidelity, as I believe some naive implementations might brake them, because we sometimes assume the default await semantics that just unwrap the first inner exception and do not carry all the other exceptions that might have occurred. This is typically more than fine when handling exceptions in app code, but Orleans should propagate with the most fidelity possible to let the user decide what to do with it.

@gabikliot
Copy link
Contributor

So what do you think about #1415 fix?

@jdom
Copy link
Member Author

jdom commented Feb 11, 2016

I think it's probably OK. Not sure if it will pass 1 of the last tests I added, but we'll see once he updates de PR. I believe @dVakulen was waiting for the tests to be merged to do it. One of them is not extremely important that it passes, as it's an edge condition, but it might be easy to fix too

@jdom jdom mentioned this issue Feb 11, 2016
12 tasks
@sergeybykov sergeybykov added this to the 1.2.0 milestone Feb 12, 2016
jdom added a commit to jdom/orleans that referenced this issue Mar 16, 2016
@ReubenBond ReubenBond removed this from the 1.2.0 milestone Apr 6, 2021
@ReubenBond ReubenBond self-assigned this Sep 2, 2021
@ReubenBond
Copy link
Member

Exceptions are propagated transparently now (at least as of #7070)

@ghost ghost locked as resolved and limited conversation to collaborators Oct 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants