Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider original extensible design #118

Closed
RossTate opened this issue Jun 15, 2020 · 50 comments
Closed

Reconsider original extensible design #118

RossTate opened this issue Jun 15, 2020 · 50 comments

Comments

@RossTate
Copy link
Contributor

I understand that it's frustrating to change to another design and then be advised to change back. But the discussion that led to that decision did not consider a number of important points, nor did it consider modifications to the original design that would have addressed its hypothetical weaknesses. Unfortunately the points that were not considered are the ones that most affect long-term possibilities for WebAssembly, and that leaves us with no real path for improvement/extension to accommodate more tooling or languages. As such, I encourage us to reconsider the original design strategy, albeit with some modifications, which does have a clear path for improvement/extension. The following is a summary of the issues that have been identified, a concrete suggestion based on these discussions, and an illustration of how that suggestion is extensible in ways the current proposal is not.

exnref has no exception-specific use

Firstly, I get the impression that many impose more purpose on exnref than it really has, and part of its appeal in the first place was people imagining it to be more than it is. So I want to start by tackling what exnref really is. To simplify this discussion, let's suppose anyref is back.

Suppose there were some way to declare "tags", like tag $mytag : [i32 externref], and we had operations new_tagged $tag : [t*] -> [anyref] (where tag $tag : [t*]), and extract_tagged $tag : [anyref] -> [t*] or br_on_tagged $tag $l : [anyref] -> [anyref] (where $l is a label of type [t*]). This is something that could be generally useful outside of exception handling. The point is that it gives us a way to create, test for, and branch on tagged references, i.e. open case types.

Then throw could take an arbitrary anyref, and catch could catch an arbitrary anyref, and rethrow has no purpose and could be removed. br_on_exn would simply be br_on_tagged.

This transformation loses no exception-handling functionality, which demonstrates that the current proposal has an extremely simple notion of exception handling, and that exnref has no real functionality specific to exception handling that a general-purpose open case type couldn't provide.

Stack traces

The counterargument I expect to hear is that exnref implicitly has a stack trace. But this stack trace is not visible within WebAssembly; it is not visible in the JS API; and it is not visible in C++. It's only visible as debug information. But we could recover that stack trace without exnref if we left the stack in tact when an exception is uncaught.

In some discussions, JavaScript's exception-handling design has been lamented for not associating a stack trace with the exception, which was one of the rationale's behind the design of exnref. But not having a stack trace with all throwable values is really not the problem with JS's exception handling. The problem is that JS's design makes it so hard to tell when an exception will not be caught that in the common case most of the stack has been unwound by the time the exception reaches the root of control. If not for that, a browser could print out the stack trace from typically its original throw after determining the exception was uncaught. The current proposal recreates that exact problem because it is essentially the same exception-handling design as JS. So the hidden stack traces of exnref are just a bandage over a much bigger problem with both JS and the current proposal.

Filtering exceptions

This issue of having unwound the stack by the time an exception is caught is much less prevalent in other languages because they have filtering mechanisms for determining whether a given catcher is really meant for a given exception. Unfortunately, it is very difficult to do this filtering in a general-purpose manner for WebAssembly without explicitly requiring two-phase exception handling (in which the search for the handler is done in one pass, potentially followed by unwinding of the stack in a second pass). Making throw and catch specify a tag (or event) does some filtering, but it is far too course-grained to be useful for most languages (including C++, due to subtyping). (Using tags is still helpful for determining that modules are throwing/catching completely unrelated exceptions, e.g. because they are compiled from different languages, and because tags make it so that we do not need anyref or memory management to do exception handling.)

A nice quality of the current design is that it's compatible with single-phase exception handling, which might be useful for simpler engines. But, unlike the original design before exnref, the current proposal produces code that is incompatible with two-phase exception handling due to bundling unwinding code with searching and catching code (JS, on the other hand, at least has finally clauses to separate these concerns).

So although it's reasonable, and in fact useful, to not have filtering in the MVP, it's concerning that it's already been established that an entirely new exception-handling design will be necessary to support more features like exception filtering and resumable exceptions.

Unwinding

It seems the main reason that the exnref design was chosen over the original was to avoid hypothetical duplication of destructor code (though no one noted that it doesn't actually do this in the common case where only one type of exception is caught). But this can be resolved by having a try/unwind instruction that executes the body of unwind when an exception is thrown from within the try block. The unwind block always has type [] -> [], meaning it does not get to examine the exception and by default returns control to the "unwinder" at the end. (Because of this, try/unwind actually doesn't need any type annotation, unlike try/catch.)

In addition to reducing code-duplication of destructors, try/unwind has the advantage of separating unwinding code from catching code, making it possible to add features for filtering and resuming exceptions without having to come up with an entire new exception-handling design. It also can be used to actually reduce code-duplication of destructors, because we could add (though probably not in the MVP) instructions like return_unwinding that would execute all unwind blocks the statement is nested within before returning from the function.

Rethrowing

There seems to be a misconception about rethrow. This instruction does not correspond to rethrowing in surface languages. In C++, try {...} catch (int x) {throw;} is semantically equivalent to try {...} catch (int x) {throw x;} (though this is not true for all exception types, though for reasons related to copy constructors rather than to rethrow). Python has a specific stack-trace semantics for rethrowing that is not served by rethrow. A rethrown exception in C# captures a new stack trace (actually, the stack trace is often collected during unwinding if it is needed at all).

The main purpose of rethrow is to propagate exceptions that the catcher does not understand after executing unwinding code. This value is obviated by making catchers only catch exceptions they understand (via tags) and by separating out unwinding code that does not care about the specific exception at all. In other words, rethrow only solves problems that the current proposal creates.

Putting it all together

Since I understand it helps to have concrete suggestions, especially given the late phase, here's what I would do to make exception-handling still simple and compatible with single-phase exception handling but also forwards-compatible with two-phase exception handling and the various exception-handling extensions that require that (as well as interactive debugging).

  1. Rename events to tags (or something of the like) so that we have declarations like tag $tag : [t*]. They have utility beyond exception handling (though going into them here would be a digression) and so in the long run might be served by a more general name. And it's already been established that one of the reasons for naming them events and switching to exnref—namely supporting algebraic effects or resumable exceptions in the future—doesn't pan out.
  2. Have an instruction throw $tag : [t*] -> unreachable, where tag $tag : [t*], that looks for the first matching handler up the stack, trapping if none exists.
  3. Have an instruction try (do instr*) (catch $tag $l) : [ti*] -> [to*], where instr* : [ti*] -> [to*] and tag $tag : [t*] and label $l : [t*], that is a matching handler for throw $tag while control is within do, executing unwinders on the stack until control reaches $l.
  4. Have an instruction try (do instr1*) (unwind instr2*) : [ti*] -> [to*], where instr1* : [ti*] -> [to*] and instr2* : [] -> [], that is an unwinder while control is within do. (If control exits the unwind clause other than reaching its end, that ends the unwinding process.)

An important point of flexibility is that handlers are not limited to catch, and unwinders are not limited to unwind. In particular, the host can have their own handlers and unwinders. For example, JS finally clauses can be considered unwinders. Also, to support single-phase exception handling a host just needs to have a (conceptual) universal handler at the root of the stack just to guarantee one always exists somewhere; these particular instructions cannot observe the difference between unwinding as you look for the matching handler and unwinding after you find the matching handler, so long as you know a matching handler exists.

The JS API would, as suggested above, treat finally clauses as unwinders. It would also treat JS throws and catches as using a special $jstag, which wasm modules could import with tag type [externref]. That's it; no need for wrapping and unwrapping. JS code doesn't (currently) have a way to catch wasm exceptions directly, but if a wasm throw has no handler, it'll trap, and JS can catch that trap. Of course, JS code always has the option of creating a wasm module to catch wasm exceptions, and we could make a JS API shorthand for that functionality if it's pressing.

Extensions

The main advantage of this change is that we can extend this variant much more easily. To illustrate that point, here are a few exceptions that come to mind.

  1. Add an unwinding instruction modifier. That is, it's an instruction that modifies the behavior of the following (appropriate) instruction. In this case, the following instruction must be one that redirects control up the stack, specifically return and br-and-variants. Its effect is to execute all unwinders on the stack between that instruction and the targeted point of control. This in particular would be useful for implementing finally clauses in surface languages, as well as C++ destructors in more recent C++ semantics, instead of duplicating the code across the "successful" and "exceptional" paths.
  2. Add try (do instr1*) (handle $tag instr2*) end : [ti*] -> [to*], where [instr1*] : [ti*] -> [to*] and tag $tag : [t*] and instr2* : [t*] -> []. This is a matching handler for throw $tag while control is within do, but unlike catch it doesn't necessarily cause the stack to unwind. Instead, the handle body is executed during the search phase, and if control reaches the end of the body then the search phase is continued. So you can do handle (if some-filter-expression then (unwinding (br $catcher))) to implement exception filtering. Or you can implement resumable exceptions by doing try (throw $exn_tag) (catch $resume_tag $resumer) around the exception throw, and then doing handle (... (throw $resume_tag)) in the handler to end the search and transfer control to $resumer with appropriate values. catch $tag $l itself is just shorthand for handle $tag (unwinding (br $l)).
  3. Add catch_all and handle_all, which take no inputs and are universal handlers. At this point, it might be useful to let a try have multiple handlers, all for distinct tags, and then a universal handler that applies to all but those tags. Alternatively, catch_all/handle_all could take a list of tags they don't handle.

The list goes on, but the point is not to suggest that these be added now. The point is to illustrate that they can be added in a way that composes naturally with the suggested changes above (but which does not compose naturally with the current proposal).

Summary

As frustrating and painful as it is, I believe the above considerations strongly suggest that we should take exception handling back to its original, more extensible, design. Not doing so now will lead to even more frustration and pain down the line, since we'll want to create a two-phase-compatible exception-handling design at some point to support more languages and (debugging) tools, and then we'll be in the horrible spot of having two exception-handling designs in the same system, with one forcing single-phase exception handling and the other forcing two-phase exception handling, each jumping through hoops to interop with the other as best as they can manage.

@aheejin
Copy link
Member

aheejin commented Jun 15, 2020

  1. Have an instruction throw $tag : [t*] -> unreachable, where tag $tag : [t*], that looks for the first matching handler up the stack, trapping if none exists.

This effectively means "add two-phase stack unwinding", right? You made it sound really simple but this is not, and I think you very well know that. As I said before, this involves the VM calling some filtering function (in case of C++, the personality function) each time it unwinds the stack, which requires a separate stack to run these things, and the structure of the filtering functions or mechanisms differs greatly for each language, which makes an language-agnostic design very hard.

While I think two-phase unwinding is useful to have for tools like debuggers, it's not impossible to add one on top of the current proposal, albeit that won't be zero-cost. We can add the "traverse the stack and exit the program without unwinding the stack" functionality as library. This needs save some info to do this and this incurs costs even when exceptions are not thrown, but given that debuggers that examine stack traces will be used mostly with debug builds, adding this behavior only to debug build can make easy debugging possible while keeping the release build zero-cost. This library-based stack unwinding was also used in PNaCl. (It did this to both builds, and it didn't have zero-cost EH, but the mechanism is similar.) Adding two-phase EH in the spec level makes the interactions between VM and the code complicated and it can be tied too closely to some languages.

  1. Have an instruction try (do instr*) (catch $tag $l) : [ti*] -> [to*], where instr* : [ti*] -> [to*] and tag $tag : [t*] and label $l : [t*], that is a matching handler for throw $tag while control is within do, executing unwinders on the stack until control reaches $l.

Not sure how this is different from the current try-catch. And I'm not sure if I understand your notations well. Please define terms before using.

  1. Have an instruction try (do instr1*) (unwind instr2*) : [ti*] -> [to*], where instr1* : [ti*] -> [to*] and instr2* : [] -> [], that is an unwinder while control is within do. (If control exits the unwind clause other than reaching its end, that ends the unwinding process.)

We had many previous discussion history, but at the current stage exnref is not really for destructor code reuse. But it makes a lot of code transformations easier.

What you suggested here sounds like divide catch into two kinds, one for real catch and the other for destructors, right? While this is not impossible, I don't get the reasoning behind that.

@RossTate
Copy link
Contributor Author

This effectively means "add two-phase stack unwinding", right?

No it does not. I tried to design this to be compatible with both single-phase and two-phase exception handling. As part of that goal, there is no custom filtering in this version.

Something's come up and I have to run at the moment (sorry for the bad timing), but you have good questions and I'll try to get back to them later today.

aheejin pushed a commit to aheejin/exception-handling that referenced this issue Jun 15, 2020
@RossTate
Copy link
Contributor Author

  1. Have an instruction throw $tag : [t*] -> unreachable, where tag $tag : [t*], that looks for the first matching handler up the stack, trapping if none exists.

This phrasing is suggestive of two-phase exception handling, but on it's own it does not require two-phase exception handling. That depends on the kinds of handlers that might be present. In particular, extensions aside, the only handler required to be possible is catch $tag $l. This kind of handler has the property that whether or not it causes stack unwinding can be determined in a pure manner. So there's no way to observe the difference between finding the handler and then unwinding (two-phase) versus unwinding and then finding the handler (single-phase).

But there is one important corner case which is when there is no matching handler at all, in which case you should trap before executing the unwinders. One way to avoid this is for an engine to have a universal handler at the root of the stack (the host is free to define it's own kinds of handlers in addition to the ones built into wasm), thereby guaranteeing such a handler will always exist and a throw will never trap. What's best for JS is not immediately obvious, but this design leaves room for multiple options due to host's being able to define their own kinds of handlers. Obviously we have to pick one before releasing, but that can be a separate focused discussion.

Not sure how this is different from the current try-catch.

The suggestion here has a $tag that tells you what kind of exceptions it's intended to catch (without running any custom code) and what the payload of that exception looks like. The current proposal requires running custom code to find out if the exception is meant to be caught (and that code might be in the same block of instructions as unwinding code).

And I'm not sure if I understand your notations well. Please define terms before using.

I tried my best to balance precision and concision. I'm happy to explain something in more detail if you let me know what particular thing(s) are unclear.

But it makes a lot of code transformations easier.

One other difference between catch here versus in the current proposal is that catch here does not have a body; it just specifies a label to redirect control to upon finding a match. I thought this might make it easier to still accommodate the transformations you had shown me earlier despite not having exnref.

What you suggested here sounds like divide catch into two kinds, one for real catch and the other for destructors, right? While this is not impossible, I don't get the reasoning behind that.

The idea is to separate the various components—search, filtering, unwinding, catching—so that they can be more easily combined in other ways. So if we do add a way to custom-filter exceptions or to resume exceptions, then we can do just the search components first, and later just unwinders in the appropriate phase (if at all). Right now the search code, catching code, and unwinding code is all in one block, so we can't search for a handler of a potentially-resumable exception without also unwinding the stack along the way and thereby making it impossible to resume the exception.

New item

I think we're both in agreement that it's generally better if we can manage to do more handler-filtering before necessarily unwinding the stack, but that for the MVP we also cannot use custom filtering code so that we're compatible with single-phase exception handling. One thing we could do with the design here is let throw take a (nonempty) list of tags (all of the same type)—a handler matches so long as it matches one of the tags in the list.

Then for C++ (and various other languages), one could have tags for coarse approximations of types. A C++ throw e; could translate to a wasm throw with the list of tags coarsely approximating the type of e and all its supertypes. A catch would specify the tag coarsely approximating the type it can handle. The more fine-grained the approximation is, the more filtering-before-unwinding this scheme will provide you. Probably not an MVP feature, but it's an example of an extension that would be compatible with both single- and two-phase exception handling.

@aheejin
Copy link
Member

aheejin commented Jun 16, 2020

OK basically what you want is going back to the first proposal, with tagged catch and no first-class exnref, right? We spent so much time discussing this years ago, so I really don't want to repeat all that.

I'm not necessarily against making catch capable of taking optional tags. And I think having this option will leave the door open to the behavior you described: when no tags match, throw should trap without unwinding. But I still want to keep the current catch (which is catch_all) and first-class exnref. Code transformation is currently severely restricted without them. And because we have the optional tag capability of catch, we can switch to that version of catch later when we have time for it and if it turns out to be not detrimental to other factors (such as code size).

One other difference between catch here versus in the current proposal is that catch here does not have a body; it just specifies a label to redirect control to upon finding a match. I thought this might make it easier to still accommodate the transformations you had shown me earlier despite not having exnref.

Whether catch has a body or has a label is I think a choice of flavor, and not essential to the points you are suggesting. And actually if catch does not have a body, not having first-class exnref will become even bigger pain.

Also I'm not sure what you mean by "the transformations you had shown me earlier". I don't recall showing any transformation that will be made easier with labeled catches. I said (in reply to your direct email) there are many code transformations in LLVM backend (and also Binaryen, I should mention) that will be made very hard if we don't have first-class exnref. That's all I said.

One thing we could do with the design here is let throw take a (nonempty) list of tags (all of the same type)—a handler matches so long as it matches one of the tags in the list.

Then for C++ (and various other languages), one could have tags for coarse approximations of types.

This is not even feasible. There's no way to statically know a type of thrown C++ class type at the point of throwing. Also types in C++ class hierarchy and the actual type gets thrown in wasm (i32) are not even relevant.


What's makes it hard and frustrating with discuss on your argument is, you present many of your flavor-based choices as something essential to the core of your argument. (This long article is, nearly, "list of everything you suggested so far".) I think the core of your argument is that we should make the proposal more open to future possibilities of two-phase unwinding. And other than making catch capable of taking tags (which I suggest should be optional), I'm not sure why all other your suggestion of changes are essential to prevent "future frustration" you mentioned.

Also the other thing I consistently said and was ignored from you is, if some functionality may not be essential to make working code snippet from the spec's perspective but makes toolchain development much easier, it is a perfectly valid argument to keep it, such as first-class exnref. When we changed to the first to the current proposal, no one in the meeting thought it was impossible to generate working code without it. We decided based on the argument that it was convenient to have. You keep throwing some random feature of your proposal saying "if you have this new feature, your code transformation problem will go away even without exnref!" No. they don't.

@RossTate
Copy link
Contributor Author

Also I'm not sure what you mean by "the transformations you had shown me earlier". I don't recall showing any transformation that will be made easier with labeled catches. I said (in reply to your direct email) there are many code transformations in LLVM backend (and also Binaryen, I should mention) that will be made very hard if we don't have first-class exnref. That's all I said.

You sent me this link. I read it to understand your concern. That took quite a while.

Here's how I thought my suggestion might address this concern. Notice that try/catch $tag $l is quite lightweight. In particular, because my catch is not a block, neither try nor catch needs a type annotation. This makes it much easier to insert my try/catch at very precise locations within the code. For example, I can place a try/catch $tag $l around each call instruction if I want. This has an effect similar to (though admittedly not identical to) the unwind specifier in LLVM. That is, it indicates where to redirect control to if the function call throws a matching exception. So, if you want, you can have block instructions at a coarse outer level, each providing a label to a particular exception catcher, and you can have fine-grained try/catch __cpp_exception $ls around each function call to indicate which handler that function call should go to.

But maybe it turns out that doesn't work. I can't know without you engaging in this discussion and sharing your expertise rather than lambasting me for trying to understand and suggest solutions to a significant extensibility issue that the current proposal seems to have.

@aheejin
Copy link
Member

aheejin commented Jun 16, 2020

This makes it much easier to insert my try/catch at very precise locations within the code. For example, I can place a try/catch $tag $l around each call instruction if I want. This has an effect similar to (though admittedly not identical to) the unwind specifier in LLVM. That is, it indicates where to redirect control to if the function call throws a matching exception. So, if you want, you can have block instructions at a coarse outer level, each providing a label to a particular exception catcher, and you can have fine-grained try/catch __cpp_exception $ls around each function call to indicate which handler that function call should go to.

This will turn every single call instruction into 3 instructions, so I think the code size will suffer. We spent nontrivial amount of time brainstorming on things like this, such as #29. (This is not exactly the same as this, and this was when the proposal was the first version so not related to exnref, but it was suggested to tackle that mismatch of unwind destination issue. I'm not asking you to read the whole issue at all; just looking at a couple code snippets will be enough to get the feel.)

And while I understand you suggested this to solve the unwind destination mismatch problem in CFGStackify, I don't get how this is related to removing exnref or the "serious extensibility issue" you implied. I think all three are different things, so conflating them or presenting all of them as a essential steps to solve a single big problem doesn't help discussion.

This might (with significant code size increase) solve the mismatch problem, but in this scenario catch does not have blocks, then we need first-class exnref more than ever. exnref was omitted in the first proposal because catch was a block there was only one implicit exception we can refer to within the block. And while these two (unwind mismatch and first-class exnref) are separate issue, I can't see why either of this is related to the "extensibility concern" you mentioned.

But maybe it turns out that doesn't work. I can't know without you engaging in this discussion and sharing your expertise rather than lambasting me for trying to understand and suggest solutions to a significant extensibility issue that the current proposal seems to have.

I'm not sure what is that 'significant extensibility issue'. As I said, removing exnref is a separate thing that is unrelated to two-phase unwinding. If what you mean by the extensible issue here is that catch doesn't have tags, I said I'm open to add optional tags to them, but still want to keep catch_all and exnref. Does that not solve your "extensibility concern"?

@RossTate
Copy link
Contributor Author

Rather than continue carrying two conversations at once, I'm going to focus on the bigger-picture one.

If what you mean by the extensible issue here is that catch doesn't have tags, I said I'm open to add optional tags to them, but still want to keep catch_all and exnref. Does that not solve your "extensibility concern"?

No. You still have the current design that bundles searching code, catching code, and unwinding code.

It would help me understand your perspective on the extensibility if you would illustrate how you expect to extend the proposal to accommodate these features. Here's a concrete case to consider:

Suppose I have a module compiled from the following C# code (where middle is some imported function):

bool flag;

int outer(bool b) {
    flag = b;
    try {
        return middle();
    } catch (Exception e) when (flag) {
        return 2;
    }
}

void inner() {
    try {
        thrower();
    } finally {
        flag = !flag;
    }
}

void thrower() { throw new Exception(); }

Next suppose middle comes from the following module compiled from C++ using this proposal (where inner is imported from the module above, although maybe via a funcref provided after instantiation to resolve the mutually recursive linking):

int middle() {
   Resource res; // some constructor; destructor crashes if ran twice
    try {
        inner();
        return 1;
    } catch (int x) {
        return x;
    }
}

I see two key challenges here. First, C#'s when clause is required to be evaluated before unwinding. So a call outer(true) is supposed to return 2, whereas a call outer(false) should leave the stack in tact so that a debugger can kick in. Second, we need to ensure the destruct or res only gets called once (and only if the inner exception is caught).

How are you expecting to extend this proposal to support this functionality, or more generally functionality that requires two-phase exception handling?

@aheejin
Copy link
Member

aheejin commented Jun 16, 2020

First, please include only the suggestions that pertain to your extensibility example. Apparently not all your suggestions in the original post, which is a grab bag of everything you suggested so far in various issues, are related to this two phase thing. As I said above, conflating issues and presenting all of them as necessary steps don't help.

Also, it's hard to understand the purpose of that particular example.

So a call outer(true) is supposed to return 2, whereas a call outer(false) should leave the stack in tact so that a debugger can kick in.

How are you expecting to extend this proposal to support this functionality, or more generally functionality that requires two-phase exception handling?

The current proposal does not support the two-phase unwinding. This was not designed for it. But I'm not sure why going back to the first proposal will solve it either.

Second, we need to ensure the destruct or res only gets called once (and only if the inner exception is caught).

What makes you think the destructor for res will be called twice in the current proposal?

No. You still have the current design that bundles searching code, catching code, and unwinding code.

First, I don't understand why 'catching code' and 'unwinding code' should be separate. When we have two-phase unwinding, both of these should not run in the first searching phase. And both of those should run only when we find a matching EH pad.

Whether we should extend the current proposal to include two phase unwinding is another problem. As I said, I think it can be achieved in library level w/ the current proposal. But even if we extend the current proposal to include two-phase at some point, only searching code should be separated. And I think that searching code can be extracted into a function or something, whose address should be passed to the VM to be called. Probably catch should take one more argument, which represents the filter function. This will be more complicated in the VM level I think, because we need a separate stack to run it.

@RossTate
Copy link
Contributor Author

I think you misunderstood the purpose of my last comment; sorry for not making that purpose clear. I started this thread expressing a concern about the extensibility of the current proposal. I also gave some constructive suggestions as to how to revise it in order to make it more extensible (and these revisions look more like the original design). You have been focusing on these suggestions rather than the extensibility concern. I am wondering if you do not share those concerns and, if so, if you could illustrate how you would extend the current proposal to accommodate features that use two-phase exception handling (like in the concrete example above). I am not suggesting adding those features now; I just want to know that they can be added at some point in the future. So, feeling free to ignore all of the suggestions I have made, could you illustrate how you would extend this proposal to accommodate the example program I gave and thereby help alleviate the concerns about forwards-compatibility with two-phase exception handling?

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

I don't understand what exactly you mean by "extensibility". The current proposal doesn't support two-phase unwinding. Not sure if there is a grey area in which "we don't support it but we can be extended for it". If we make the next version of the proposal that supports more functionalities, its binary will not likely to be compatible with this version anyway. (Whether we should make the next version is a completely separate question. I'm being hypothetical.)

I also don't understand why all those things you said (first-class exnref removal, separating catch and unwind, and all other things you suggested here) help "extending" the current proposal. I think that's something you should prove, given that you come back to this point endlessly for months, before questioning why the current proposal cannot handle the two-phase unwinding, which we don't even claim to support. Also I don't get the point why I should "prove" that the current proposal can support two phase unwinding on your "specific example".

I also gave some constructive suggestions as to how to revise it in order to make it more extensible (and these revisions look more like the original design).

I don't even get how your numerous suggestions are related to whatever extensibility you claim. Again, that's something you should prove first. And I'm not really sure all those random suggestions which I can't even count anymore you threw in this repo have been constructive. I once asked you that if you are really concerned about a specific aspect of the proposal, please make each issue very specific to something you want to claim on that issue, rather than posting a big wall of text that is a grab bag of "all things you want to change for some reason". I don't think I was heard.

@RossTate
Copy link
Contributor Author

I don't think I was heard.

You were heard. Unfortunately, the problem is fundamental to the current design and so can't be fixed by some small change like making tags optional. I have tried to illustrate that in a variety of ways. I'll try again using the above example.

void inner() {
    try {
        thrower();
    } finally {
        flag = !flag;
    }
}

First, when the exception is thrown by thrower(), it is important that the unwinder for the finally clause not be run because that will change the state in a way that affects the when clause in outer.

int middle() {
   Resource res; // some constructor; destructor crashes if ran twice
    try {
        inner();
        return 1;
    } catch (int x) {
        return x;
    }
}

Similarly, the destructors in the unwinder in middle should not be run. More generally this is because the exception could be resumed, but more specific to C# it's because the state affecting a when clause could also be modified by these destructors. (This is even part of .NET's C#/C++ intereporability semantics.)

It's not clear how to do this in the current proposal because the wasm catch needs to inspect the exception to determine if it's a C++ exception. This is where a tag on catch, rather than using exnref and br_on_exn, would help with forwards compatibility. So unfortunately this means that any design we make for exception handling with filtering will have to bypass all the catch clauses of the existing proposal. This means we'd effectively have two completely distinct exception-handling designs. It also means that catch in the current proposal would not serve as a catch-all.

But let's suppose we go forward with adding an entirely new exception-handling design. Even this still doesn't solve the problem, as I'll show next.

int outer(bool b) {
   flag = b;
   try {
       return middle();
   } catch (Exception e) when (flag) {
       return 2;
   }
}

Now when the search phase gets to outer, it checks the value of flag to determine whether or not the exception should be caught here.

If flag is false, the search goes on. If that search reaches the top, the stack is still in tact if a debugger wants to kick in or if simply the console wants to print out its stack trace.

If flag is true, then we need to unwind the stack up to this catch clause. That's where things get tricky.

With my design above, this is completely trivial. You just treat the relevant unwind blocks like little functions. Reaching the end of the block corresponds to that little function returning, granting control back to the unwinding operation which continues to the next unwind block until it reaches the target destination.

But with the current proposal, control is instead redirected dynamically using exnref. So the unwinding operation has to give each catch clause (which you skipped in the search phase) an exnref. But that exnref can outlive the lifetime of the catch clause and can be rethrown at any point, including even after the stack frame for outer has been popped off the stack. It sounds like the current plan for addressing this is to use first-class-stacks/continuations and store the continuation in the exnref.

So in order to extend the current proposal with just exception filtering, the plan seems to be/necessitate adding entirely new throwing and catching constructs that operate orthogonally to the existing ones (and yet supersede the existing ones in functionality) and then to use first-class-stacks/continuations in order to execute the unwinding code in programs written in the existing proposal. I would hardly call that an "extension". To me that's a complete redesign plus a heavy-weight backwards-compatibility patch for legacy code using the current proposal.

On the other hand, my design above is compatible with single-phase exception handling but can be extended with features like exception filtering by just adding a handle clause that can be easily implemented with two-phases exception handling (no first-class-stacks/continuations required). Critical to this extensibility is not using rethrow on exnref to redirect control, which is why the problem can't be fixed by something simple like optional tags.

Hope that clarifies the issue. I am trying very hard to explain a complex problem with just ASCII.

@tlively
Copy link
Member

tlively commented Jun 17, 2020

@aheejin and @RossTate, it looks like you both agree that the current proposal does not support two-phase unwinding and that it would require adding new primitives on top of those this proposal introduces in order to support two-phase unwinding in the future.

@aheejin thinks that this is fine because it is not a design goal of this proposal to consider two phase unwinding at all, so it makes perfect sense for that to be addressed in follow-on proposal. If that follow-on proposal cannot reuse most of the machinery from this proposal, then that's fine because it is a separate proposal anyway. As a result, @aheejin does not want to think about two-phase unwinding examples right now.

@RossTate would prefer that all kinds of exception handling, both one-phase and two-phase, use the same primitives, which would require this initial proposal to use different primitives. He thinks that having shared primitives in the long run is important enough to warrant design changes to this proposal. Illustrating the necessity of using different primitives to support two-phase unwinding requires looking at two-phase unwinding examples.

Let's settle this fundamental disagreement about design goals before diving into detailed explanations of any concrete problems or solutions.

@RossTate
Copy link
Contributor Author

Thanks, @tlively, for the summary of concerns. I'll keep the discussion at a high level.

There's unfortunately a critical item missing from your summary (though I otherwise agree with your summary of my perspective).

Even if we were to develop an entirely new set of exception-handling instructions for two-phase exception handling, these programs will have to interoperate with programs written using the current design. Unfortunately, that will require adding not just two-phase exception handling, but also full-blown first-class stacks. That is, every time C# or Java—or even C++ programs wanting standard interactive debugging for uncaught exceptions—enters a surface-level try clause, they will have to allocate a new stack just saw that they can soundly run the unwinders buried in the catch clauses of the current proposal. The example above was an attempt to illustrate why this is the case. But it also seems that this same conclusion had already been previously independently arrived at according to here.

So it is not just a matter of taste. The current proposal significantly limits what features we can add to WebAssembly, or at least how those features can be implemented, with significant consequences in complexity and performance.

My preference is that we amend this proposal so that it is possible to add functionality associated with two-phase exception handling without also requiring first-class stacks/continuations. Based on the analysis above, that amendment seems to require that, at the least, unwinding code (such as destructors) be put in its own unwind block that does not use exnref to return control to unwinding, but instead returns control by reaching the end of the block. My suspicion is that, after doing this, catch blocks will very frequently follow a pattern that would be better served by using tags, but that revision is not strictly necessary to address my main concern.

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

@RossTate

  1. Tagged catches
    I think I understand where you are coming from a little better. But adding a filter function, which both does the tag check and runs an actual filter function to decide whether it should be caught, also can solve the problem. To have two-phase unwinding, catch has to take a filter function anyway.

  2. Splitting catch and unwind
    You want this because unwind shouldn't run in the first phase? I'm not terribly opposed to this, but note that this also can be simulated catch with a filter function. (The filter function should be able to tell us one of these three scnarios: 1. the current exception should be caught 2. the current exception should not be caught 3. this EH pad is cleanup.)

  3. Removal of first-class exnref
    I don't get this part at all. You mentioned something like first-class stack or continuation which this proposal is not even considering to embed in exnref.

But with the current proposal, control is instead redirected dynamically using exnref.

At least in the current proposal, exnref has nothing to do with control flow. It is a collection of value plus some auxiliary info, that's all.

So the unwinding operation has to give each catch clause (which you skipped in the search phase) an exnref. But that exnref can outlive the lifetime of the catch clause and can be rethrown at any point, including even after the stack frame for outer has been popped off the stack.

It is the toolchain's role not to generate incorrect code. We can generate incorrect code for any proposal if the toolchain tries to. We don't do that (unless there is a bug).

It sounds like the current plan for addressing this is to use first-class-stacks/continuations and store the continuation in the exnref.

No. This proposal does not associate exnref with first class stack or continuation or whatsoever.

So in order to extend the current proposal with just exception filtering, the plan seems to be/necessitate adding entirely new throwing and catching constructs that operate orthogonally to the existing ones (and yet supersede the existing ones in functionality) and then to use first-class-stacks/continuations in order to execute the unwinding code in programs written in the existing proposal.

Again, don't have a clue what you mean by relationship between exrnef and first class stack / continuations.

And also, I think I said this like million times now, we need exnref for code transformations such as what you've seen in CFGStackify. Could you please value what toolchain needs as legitimate causes?

@tlively
Copy link
Member

tlively commented Jun 17, 2020

Thanks, @RossTate, that's an important consideration. As a result of not considering two-phase unwinding in the current proposal, there will be an ABI break between modules using the current proposal and modules using a follow-up two-phase unwinding proposal, at least until we have full-blown stack switching, at which point a clunky kind of interop will become possible.

@RossTate, I suspect we might disagree about how bad it is to require an ABI break. From my point of view, it's not a big deal and it won't be the first ABI break we have. Even just switching from the current Emscripten exceptions to the current exception proposal is an ABI break, and that's just a cost of making forward progress. Considering the pressing business need for native exception support, we don't see avoiding another ABI break as a very strong reason to pump the brakes and redesign this proposal.

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

@tlively

@aheejin and @RossTate, it looks like you both agree that the current proposal does not support two-phase unwinding and that it would require adding new primitives on top of those this proposal introduces in order to support two-phase unwinding in the future.

Just to note, I mentioned a possibility that we can add two-phase unwinding without extending the current spec above. (I'm not saying I'm necessarily in favor of this approach than the spec approach.)

@aheejin thinks that this is fine because it is not a design goal of this proposal to consider two phase unwinding at all, so it makes perfect sense for that to be addressed in follow-on proposal. If that follow-on proposal cannot reuse most of the machinery from this proposal, then that's fine because it is a separate proposal anyway. As a result, @aheejin does not want to think about two-phase unwinding examples right now.

Umm... not quite. Whether we should have two-phase unwinding at some point, and if we decide to have it in the spec, whether it should be this one or the follow-on one, are I think separate questions, which we should discuss in a separate issue. (What I said in #118 (comment) was mostly a reply to Ross demanding me to show how the current proposal solves two-phase in his specific example. I didn't have anything to say other than "sorry, this proposal does not even claim to do that yet.")

All the points I countered Ross was, to me, 1. More than half of his suggestions in this long text didn't even seem relevant to two-phase unwinding 2. For some more relevant others I don't think they are the only ways to achieve two-phase. Also I don't even follow some of his arguments.

Thanks, @RossTate, that's an important consideration. As a result of not considering two-phase unwinding in the current proposal, there will be an ABI break between modules using the current proposal and modules using a follow-up two-phase unwinding proposal, at least until we have full-blown stack switching, at which point a clunky kind of interop will become possible.

I'm not sure my definition of ABI break is the same as yours. Even a small change in a single instruction will cause an ABI break, no? So if we have any kinds of follow-on proposal, however small addition it is, it will cause an ABI break. So Ross's version of changes (which he claims is smaller) will cause an ABI break too.

@RossTate
Copy link
Contributor Author

Umm... not quite. Whether we should have two-phase unwinding at some point, and if we decide to have it in the spec, whether it should be this one or the follow-on one, are I think separate questions, which we should discuss in a separate issue.

The primary purpose of this issue is to illustrate that the current proposal is not easy to add two-phase unwinding to (regardless of whether you want to add new instructions or reuse existing instructions). I tried to illustrate above how this is in large part due to unwinding code using rethrow (of an exnref) to hand off control. @aheejin, I cannot tell yet if you agree or disagree with this assessment of forwards compatibility.

Could you please value what toolchain needs as legitimate causes?

I do value this. I thought I demonstrated that by illustrating how to address the concerns you raised regarding CFGStackify in a manner I thought might generalize to other situations. I also have answers to the issues you raised about code size, but I deferred that to focus on the bigger picture rather than get caught up in even lower-level details. I am not sure how else to demonstrate that I am trying to take toolchain concerns into account.

I suspect we might disagree about how bad it is to require an ABI break.

Right now WebAssembly has a pretty small ecosystem. I suspect it is uncommon for a stack to have multiple wasm modules on it at a time, whereas I suspect that there are many times a stack has frames from multiple JS libraries on it at the same time. That makes it easy to break conventions now. But as more code moves to WebAssembly, it will get more and more difficult to break conventions. This at least has been my experience on other language teams—the conventions eventually become nearly as important to maintain compatibility with as the semantics.

Also, this would be more than an ABI break. Modules using the new design would be conventionally incompatible with the current proposal. So the "fix" would be for all modules to switch to the new design. @aheejin, given the amount of work you put into this, would you be happy with everyone abandoning this design for a new one at some point?

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

Umm... not quite. Whether we should have two-phase unwinding at some point, and if we decide to have it in the spec, whether it should be this one or the follow-on one, are I think separate questions, which we should discuss in a separate issue.

The primary purpose of this issue is to illustrate that the current proposal is not easy to add two-phase unwinding to (regardless of whether you want to add new instructions or reuse existing instructions). I tried to illustrate above how this is in large part due to unwinding code using rethrow (of an exnref) to hand off control. @aheejin, I cannot tell yet if you agree or disagree with this assessment of forwards compatibility.

I said in #118 (comment) that I don't understand your argument on exnref and first class stack. This proposal does not associate anything about exnref and continuations. Also I gave reasons on why your other suggestions are not strictly necessary.

Could you please value what toolchain needs as legitimate causes?

I do value this. I thought I demonstrated that by illustrating how to address the concerns you raised regarding CFGStackify in a manner I thought might generalize to other situations. I also have answers to the issues you raised about code size, but I deferred that to focus on the bigger picture rather than get caught up in even lower-level details. I am not sure how else to demonstrate that I am trying to take toolchain concerns into account.

Toolchain stuff is not lower level details. And even without code size concerns, first of all, your solution does not solve exnref problem in the first place. exnref was able to be omitted in the first proposal only because there was a catch block. When you rethrow without exnref, how do you know which exception you are referring to, if there's not a catch block?

I repeat. Why do we have to remove exnref? I don't get any of your reasoning about continuation or whatever. You just really wanted to remove it for 5 months now, and the reasons you gave were different each time. Now you insist on removing that once again, saying we should even change the fundamental try-catch structure (which does not solve exnref problem anyway). Also I don't think exnref is related to two-phase unwinding support anyway.

With this and #113, what you are claiming not even an "amendment" anymore. You are trying to change every single instruction and every single concept.

@aheejin, given the amount of work you put into this, would you be happy with everyone abandoning this design for a new one at some point?

This is a really bold claim. I spent hours explaining that why many of your suggestions are not necessary to support two phase unwinding. You somehow ignored all of them, and now claim your design is the best and with the current proposal everyone has to throw out their work.

@RossTate
Copy link
Contributor Author

Even the (I believe) creator of exnref has stated that it turned out to not be extensible in the way that was originally expected and intends to use continuations to patch the problem.

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

FYI, exnref was not added in this proposal in order to be extended to resumable exception. That was added for toolchain expressiveness.

First-class exnref was first discussed to support toolchain. And the primary author of the proposal (both first and second) was @KarlSchimpf, who sadly passed away in 2018. Of course, @rossberg did a lot of design work there, and I joined with toolchain side of requirements too. (Also there were other people who participated too.) What I'm trying to say here is, it was more of a collective work, not something @rossberg single-handedly added for his new proposal.

@RossTate
Copy link
Contributor Author

That's useful historical context. Sorry for incorrectly interpreting the conversations I had read, and for the resulting misattributions.

The point unfortunately still stands that everyone who has tried to extend this with two-phase functionality has found it would need continuations. Are you on board with that assessment? Because it's critical to the tradeoffs of the decision to be made.

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

The point unfortunately still stands that everyone who has tried to extend this with two-phase functionality has found it would need continuations. Are you on board with that assessment? Because it's critical to the tradeoffs of the decision to be made.

No. Why? I asked this more than three times now above.

For two-phase unwinding, you need a separate stack to run the filter function, whether you have exnref or not. Not sure if this is what you are referring to.

Other than that, in this proposal, we don't associate exnref with continuations in any way, and I don't see why that has to change even if we add two-phase unwinding.

@tlively
Copy link
Member

tlively commented Jun 17, 2020

I'm not sure my definition of ABI break is the same as yours. Even a small change in a single instruction will cause an ABI break, no? So if we have any kinds of follow-on proposal, however small addition it is, it will cause an ABI break. So Ross's version of changes (which he claims is smaller) will cause an ABI break too.

@aheejin, if I understand correctly, Ross's proposal does not cause an ABI break because a module using his proposed modified version of the current proposal would be able to seamlessly interop with another module using his hypothetical two-phase unwinding extension without either needing to be recompiled.

Right now WebAssembly has a pretty small ecosystem. I suspect it is uncommon for a stack to have multiple wasm modules on it at a time, whereas I suspect that there are many times a stack has frames from multiple JS libraries on it at the same time. That makes it easy to break conventions now. But as more code moves to WebAssembly, it will get more and more difficult to break conventions. This at least has been my experience on other language teams—the conventions eventually become nearly as important to maintain compatibility with as the semantics.

@RossTate, I agree with your assessment of the current state of the ecosystem, but the difference moving forward is that we shouldn't need cross-language usage conventions because cross-language communication in general will be mediated by interface types. A single toolchain should be able to change its internal conventions without affecting how it interacts with other toolchains. I think there is a good point to be made about allowing toolchains to make stronger backwards-compat and cross-compat guarantees, but that's not a design goal we have been operating under so far.

@RossTate
Copy link
Contributor Author

For two-phase unwinding, you need a separate stack to run the filter function, whether you have exnref or not. Not sure if this is what you are referring to.

This is not true. You can execute the filter function at the leaf of the current stack.

No. Why? I asked this more than three times now above.

I tried to explain above. I will try answering another way now using the proposal on continuations presented at the in-person meeting.

A filterable or resumable exception might at some point decide to unwind the stack. The issue is that WebAssembly needs to ensure that only the portion of the stack that isn't needed any more is unwound. This isn't something that can rely on proper tooling; it's a security problem if something goes wrong here. So the plan given here is for the try block of a filterable/resumable exception to be run on a new stack, and for the catch to be handed that stack. At some point the stack might be deemed unnecessary and an abort will be attempted by creating an exnref not matching any exception tag so that it hopefully propagates up through all the unwinding code until it hopefully reaches the cite of the abort, which can then catch that exnref and continue onwards knowing that the portion of the stack it needs is still valid.

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

@tlively

@aheejin, if I understand correctly, Ross's proposal does not cause an ABI break because a module using his proposed modified version of the current proposal would be able to seamlessly interop with another module using his hypothetical two-phase unwinding extension without either needing to be recompiled.

I thought, to add two-phase unwinding we need catch's with some pointer to a filter function (which will be similar to the personality function in C++), so its instruction arguments will be different anyway. So even this single change will cause binaries incompatible. I can be mistaken about what you mean.

@tlively
Copy link
Member

tlively commented Jun 17, 2020

A filterable or resumable exception might at some point decide to unwind the stack. The issue is that WebAssembly needs to ensure that only the portion of the stack that isn't needed any more is unwound. This isn't something that can rely on proper tooling; it's a security problem if something goes wrong here.

@RossTate What is the security property that would be violated? Or do you mean that WebAssembly would be unsound in this case? I think we can all assume that WebAssembly will remain sound, so if some suggestion would violate soundness, it would be good to point that out directly.

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

@RossTate

A filterable or resumable exception might at some point decide to unwind the stack. The issue is that WebAssembly needs to ensure that only the portion of the stack that isn't needed any more is unwound. This isn't something that can rely on proper tooling; it's a security problem if something goes wrong here. So the plan given here is for the try block of a filterable/resumable exception to be run on a new stack, and for the catch to be handed that stack. At some point the stack might be deemed unnecessary and an abort will be attempted by creating an exnref not matching any exception tag so that it hopefully propagates up through all the unwinding code until it hopefully reaches the cite of the abort, which can then catch that exnref and continue onwards knowing that the portion of the stack it needs is still valid.

I'm not very familiar with the status quo of the resumable EH proposal, but I don't understand why VM need a new stack every time we enter a try for the current EH proposal, whether we support two-phase unwinding or not. I'm not sure what your argument will be this time, but I think we shouldn't. That won't be even zero-cost anymore. What we need in two-phase is the ability to walking up and examining the stack without modifying it and not resumable continuations. And we are not planning to embed continuations in exnref in this proposal either. That's irrelevant to what resumable EH will do.

@RossTate
Copy link
Contributor Author

That won't be even zero-cost anymore.

I know. That's why I'm trying to avoid it being necessary.

I'm not very familiar with the status quo of the resumable EH proposal

There is no resumable EH proposal. It was abandoned for continuations instead for the reasons I'm trying to explain.

What is the security property that would be violated? Or do you mean that WebAssembly would be unsound in this case?

I mean to say it would be unsound. You need to make sure you do not unwind the stack the portion of the stack you are using. That's easy to ensure with try/unwind, but it's not clear how to ensure it with the current proposal, though allocating a new stack to run the body of the try block in—thereby clearly separating it from the stack of the handler—is one way that might work.

@aheejin
Copy link
Member

aheejin commented Jun 17, 2020

I still don't see why the current proposal has to allocate a new stack every time it enters a try. I don't understand what your unsoundness argument is either. You once suggested that having uncatchable exception was unsafe. This time again I'm not sure your assumption of unsafeness is something people agreed on.

The reason the continuation proposal needed a new stack every time it entered a try was that it had to do stack switching between the stack after throw and after resuming. Two-phase unwinding does not have that complex requirements. As I said, what we do is just to examine the stack without unwinding first. It is hard to discuss if you base your argument on your own random assumption which was not discussed or agreed on.

I don't see why whether catch returns extracted values or a packaged exnref makes any difference from the stack perspective, at least in the scope of this proposal.

@RossTate
Copy link
Contributor Author

Two-phase unwinding does not have that complex requirements. As I said, what we do is just to examine the stack without unwinding first.

Every system that I know of that has two-phase exception handling has code relevant to the various stages cleanly separated. Furthermore, unwinding segments are generally not given the exception, and they return control to the two-phase unwinding process simply by reaching the end of the unwinding code. The current proposal differs on all of these points.

This means that we cannot by default assume the current proposal is forwards compatible with two-phase exception handling since it is not similar to any system that has that property. So I understand that two-phase exception handling on its own can work by searching and then unwinding; the question is how to do that with this proposal whose design is very different from other two-phase systems.

@tlively
Copy link
Member

tlively commented Jun 17, 2020

@RossTate, do you agree that it would be possible to add a separate two-phase unwinding mechanism later, even if that meant an ABI break, and have it work soundly with the current proposal, or would there necessarily be a soundness problem there? I'm pretty sure that having the two kinds of exceptions unwind through each other's handlers undisturbed would be sound, if undesirable.


@tlively

@aheejin, if I understand correctly, Ross's proposal does not cause an ABI break because a module using his proposed modified version of the current proposal would be able to seamlessly interop with another module using his hypothetical two-phase unwinding extension without either needing to be recompiled.

I thought, to add two-phase unwinding we need catch's with some pointer to a filter function (which will be similar to the personality function in C++), so its instruction arguments will be different anyway. So even this single change will cause binaries incompatible. I can be mistaken about what you mean.

The key here is this paragraph from this comment.

This phrasing is suggestive of two-phase exception handling, but on it's own it does not require two-phase exception handling. That depends on the kinds of handlers that might be present. In particular, extensions aside, the only handler required to be possible is catch $tag $l. This kind of handler has the property that whether or not it causes stack unwinding can be determined in a pure manner. So there's no way to observe the difference between finding the handler and then unwinding (two-phase) versus unwinding and then finding the handler (single-phase).

Since there is no observable difference between the single-phase and two-phase unwinding, engines could implement the single-phase unwinding now and in the future when they are extended to use two-phase unwinding, the old code would still work and exceptions it throws could be caught correctly by both old and new code.


Separately, @aheejin, could you describe how you see two-phase unwinding be supported as a library by the current proposal, as you mentioned? If the current proposal does already support two-phase unwinding that way, perhaps we wouldn't need to modify or extend it at all.

@RossTate
Copy link
Contributor Author

@RossTate, do you agree that it would be possible to add a separate two-phase unwinding mechanism later, even if that meant an ABI break, and have it work soundly with the current proposal, or would there necessarily be a soundness problem there?

I don't believe there would be unsoundness there. My concern is that it would mean two incompatible ecosystems, but I believe you already understand that concern.


Separately, I think I came up with a better way to illustrate the problem. I'm going to reuse my outer example:

bool flag;

int outer(bool b) {
    flag = b;
    try {
        return middle();
    } catch (Exception e) when (flag) {
        return 2;
    }
}

So here's how I would compile this using a set of two-phase instructions (all ending with 2):

(func $outer (param $b i32) (result i32)
    (global.set $flag (local.get $b))
    (block $catcher (result i32)
        (try2
            (do
                (return (call $middle))
            end)
            (handle2 $csharp_exception
                (if (global.get $flag) then (unwinding2 (br $catcher)) end)
            end)
        end)
    end)
    (return (i32.const 2))
)

If that's too hard to follow, I think one can get away with just the following: The question here is how to implement the (unwinding2 (br $catcher)) instruction that unwinds the stack up to the $catcher label and then transfers control to $catcher, i.e. to (return (i32.const 2)). In particular, if there is a try/catch (of the current proposal) on the stack, in order to be interoperable I need to run its unwinding code. To do so, I need to give the catch block an exnref that when rethrown will continue unwinding the stack up to the $catcher label.

What I would like to do is give an exnref with an "unwinding" tag that doesn't match any event. In addition to that unwinding tag, the exnref has the stack pointer and the code pointer associated with $catcher. This won't match any br_on_exn, and the appropriate unwinding code in the catch block will execute as planned. Then when it's done, it'll rethrow that exception. I'd have rethrow check for the "unwinding" tag, in which case it continues the unwinding process up to the location in the stack designated by the stack pointer in the "unwinding" exnref and, upon reaching that point, redirects control to the code pointer in the "unwinding" exnref.

That works perfectly fine. Unfortunately though, this implicitly assumes that the "unwinding" exnref I gave the catch block is rethrown before the catch block ends. But if instead the end of the catch block is reached, control simply goes to the instruction after the end instruction, and the exnref can still be accessed. What can happen, then, is that a bunch of functions on the stack can return, including outer, and then a bunch of calls can be made pushing new frames onto the stack, and then someone can execute rethrow with my "unwinding" exnref. Only now, even though the stack pointer for $catcher still points into the stack, it no longer necessarily points to a stack frame for outer, making this unsound.

Hopefully that better illustrates the unfortunate interplay of the features at hand.

@rossberg
Copy link
Member

FYI, exnref was not added in this proposal in order to be extended to resumable exception. That was added for toolchain expressiveness.

Indeed. FWIW, in an earlier design, rethrow actually was lexically scoped and merely took an index immediate referencing a surrounding catch block whose exception to rethrow. It was the implementation experience in LLVM by @aheejin and others that showed that more flexibility was important to producers and convinced some of us sceptics to go with exnref.

@aheejin
Copy link
Member

aheejin commented Jun 18, 2020

@RossTate, do you agree that it would be possible to add a separate two-phase unwinding mechanism later, even if that meant an ABI break, and have it work soundly with the current proposal, or would there necessarily be a soundness problem there?

I don't believe there would be unsoundness there. My concern is that it would mean two incompatible ecosystems, but I believe you already understand that concern.

Separately, I think I came up with a better way to illustrate the problem. I'm going to reuse my outer example:

bool flag;

int outer(bool b) {
    flag = b;
    try {
        return middle();
    } catch (Exception e) when (flag) {
        return 2;
    }
}

So here's how I would compile this using a set of two-phase instructions (all ending with 2):

(func $outer (param $b i32) (result i32)
    (global.set $flag (local.get $b))
    (block $catcher (result i32)
        (try2
            (do
                (return (call $middle))
            end)
            (handle2 $csharp_exception
                (if (global.get $flag) then (unwinding2 (br $catcher)) end)
            end)
        end)
    end)
    (return (i32.const 2))
)

If that's too hard to follow, I think one can get away with just the following: The question here is how to implement the (unwinding2 (br $catcher)) instruction that unwinds the stack up to the $catcher label and then transfers control to $catcher, i.e. to (return (i32.const 2)). In particular, if there is a try/catch (of the current proposal) on the stack, in order to be interoperable I need to run its unwinding code. To do so, I need to give the catch block an exnref that when rethrown will continue unwinding the stack up to the $catcher label.

What I would like to do is give an exnref with an "unwinding" tag that doesn't match any event. In addition to that unwinding tag, the exnref has the stack pointer and the code pointer associated with $catcher. This won't match any br_on_exn, and the appropriate unwinding code in the catch block will execute as planned. Then when it's done, it'll rethrow that exception. I'd have rethrow check for the "unwinding" tag, in which case it continues the unwinding process up to the location in the stack designated by the stack pointer in the "unwinding" exnref and, upon reaching that point, redirects control to the code pointer in the "unwinding" exnref.

That works perfectly fine. Unfortunately though, this implicitly assumes that the "unwinding" exnref I gave the catch block is rethrown before the catch block ends. But if instead the end of the catch block is reached, control simply goes to the instruction after the end instruction, and the exnref can still be accessed. What can happen, then, is that a bunch of functions on the stack can return, including outer, and then a bunch of calls can be made pushing new frames onto the stack, and then someone can execute rethrow with my "unwinding" exnref. Only now, even though the stack pointer for $catcher still points into the stack, it no longer necessarily points to a stack frame for outer, making this unsound.

Hopefully that better illustrates the unfortunate interplay of the features at hand.

I have only one thing to say to this giant wall of text. I think I said this million times in this issue. exnref does not contain continuation or stack pointer, from which execution can be resumed. Please don't base your argument on some random assumption you made, which I even denied multiple times.

@aheejin
Copy link
Member

aheejin commented Jun 18, 2020

Two-phase unwinding does not have that complex requirements. As I said, what we do is just to examine the stack without unwinding first.

Every system that I know of that has two-phase exception handling has code relevant to the various stages cleanly separated. Furthermore, unwinding segments are generally not given the exception, and they return control to the two-phase unwinding process simply by reaching the end of the unwinding code. The current proposal differs on all of these points.

This means that we cannot by default assume the current proposal is forwards compatible with two-phase exception handling since it is not similar to any system that has that property. So I understand that two-phase exception handling on its own can work by searching and then unwinding; the question is how to do that with this proposal whose design is very different from other two-phase systems.

I asked why on earth we need a new stack every time a try is entered, because that was your reason for removing exnref. Now you deflected that question and said "this scheme is not similar to others". I don't think I can continue this kind of discussion for a long time.

Not sure what are those "every system that you know", but for one, Itanium ABI does not distinguish catch and unwind and gives the unwinding part of the block the full access to an exception pointer. (This doesn't mean our model follows Itanium ABI, so don't please conflate that)

Also, not sure why wasm has to be similar to others. (Of course, those "others" are chosen arbitrarily by you, as I just noted above). Those "others" don't have the same code structure or problem as us. If the argument is "being similar to (arbitrary chosen) others" vs. "making code generation convenient", I don't know why we should be spending endless time debating this.

@aheejin
Copy link
Member

aheejin commented Jun 18, 2020

@rossberg By the way, do you have any idea why @RossTate is claiming why we need a new stack every time a try is entered in this proposal, to support two-phase unwinding?

@aheejin
Copy link
Member

aheejin commented Jun 18, 2020

I want to be clear: I am not against adding two-phase unwinding to EH proposal. Whether we should do that in this version or the next version is something we should discuss separately. I'm even open to adding one in the current version if other people, especially VM people, agree, and our shipping schedule can be adjusted.

All things I countered @RossTate were that his suggestions were either 1. not even relevant to two-phase or 2. not strictly necessary. Among them, he really insisted strongly on exnref removal, which was added in 2018 in-person CG meeting after discussions, to support toolchain expressiveness. He wanted to remove exnref for a long time for various reasons from February, and his reason changed every time, and I don't remember all of them. In this issue only, he said

  • Using exnref is unsafe, because it has continuations/stack pointers and it can transfer control flow. (I denied this multiple times, saying in this proposal exnref does not contain such things, but he didn't get it.)
  • Using exnref means unwinding exactly to the point we need becomes hard and unsound (why???)
  • Using exnref means we need a new stack every time we enter a try (why??? I asked him multiple times but didn't get an answer. He said resumable EH does it. This is not resumable EH.)
  • It is not easily extensible to resumable EH proposal (We need this regardless of resumable EH.)

@rossberg
Copy link
Member

@aheejin:

@rossberg By the way, do you have any idea why @RossTate is claiming why we need a new stack every time a try is entered in this proposal, to support two-phase unwinding?

I'm afraid not, but admittedly I haven't been able to follow most of the discussion, since it's just too time-consuming overall.

@RossTate
Copy link
Contributor Author

@rossberg, in that case, can you describe how you expect to add the requisite functionality in an interoperable manner? Elsewhere I believe you suggested continuations, but maybe you I misunderstood you or the context.

@rossberg
Copy link
Member

@RossTate, I'm afraid I haven't been able to catch enough context from the discussion for a qualified answer.

@tlively
Copy link
Member

tlively commented Jun 18, 2020

@RossTate, @aheejin, @rossberg, It seems to me that forward progress on this issue has come to a halt. Part of the problem is that the discussion thread has become so long that it is difficult for the participants to page in all the context that has been discussed, much less newcomers to the discussion. Further clarification of previous points actually adds to the problem as the thread grows linearly with the amount of back-and-forth.

I suggest we move this discussion to a draft PR that adds a markdown document laying out the problem Ross has identified, the costs of not addressing the problem in this proposal, the solution Ross is suggesting, and the cost of implementing that solution. By using a PR, we will be able to ask for and receive clarifications on each other's points without linearly increasing the amount of text one has to read to get up to speed.

If this sounds like a reasonable experiment to everyone, @RossTate, would you mind taking your opening post, editing it to reflect the subsequent conversation in this thread, and turning it into a draft PR?

@aheejin
Copy link
Member

aheejin commented Jun 18, 2020

I suggest we move this discussion to a draft PR that adds a markdown document laying out the problem Ross has identified, the costs of not addressing the problem in this proposal, the solution Ross is suggesting, and the cost of implementing that solution. By using a PR, we will be able to ask for and receive clarifications on each other's points without linearly increasing the amount of text one has to read to get up to speed.

I doubt a draft PR will be any more clarifying than or even different from this issue post. He basically wants to change every single instruction and every type, so that will not be viewed as a diff against the current proposal. That will just be a new wall of text exactly like this one. I pushed for him for clarification on several points here, but didn't get answers that were not based on some random assumptions. I don't think they will magically appear with that draft PR.

And more seriously, I don't really know why we should repeat all these fruitless discussion on the same thing again on a new PR.

@dschuff
Copy link
Member

dschuff commented Jun 19, 2020

A new PR might more clearly illustrate what a separate approach might be, but I'm not sure it will help address the issue of whether we should abandon the current form of the proposal (in particular first-class exnref), adjust it, or neither.

Regarding that, I think we are still talking past each other.

  1. I don't think Ross is claiming that the current proposal requires stack-switching or first-class stacks/continuations to implement.
  2. I do think that Ross is claiming that any future proposal (in particular filtering or resuming) that used first-class exnref would require continuations to implement.
  3. I think Ross is claiming that any such future exnref-using proposal cannot be made "compatible" with existing modules (i.e. those compiled with the current proposal), in the sense that either the new or old-style module will not work as intended when they are mixed.

I'm still working to understand the exact nature of those incompatibilities and what might happen when they are mixed, and whether it might be possible to e.g. place some restriction on exnrefs that escape their catch block that might enable the transformations toolchains need without nuking the whole thing from orbit. (even a compromise solution that, for example had some suboptimal dynamic behavior or relied on the toolchain to have some convention in order to be forward-compatible would be potentially on the table in my view). Given where we are in this process, we should try as hard as we can do do that, and I would think there is a strong onus on someone proposing a wholesale redesign to show that there really is no other way in the existing framework.

@aheejin
Copy link
Member

aheejin commented Jun 19, 2020

Regarding that, I think we are still talking past each other.

  1. I don't think Ross is claiming that the current proposal requires stack-switching or first-class stacks/continuations to implement.
  2. I do think that Ross is claiming that any future proposal (in particular filtering or resuming) that used first-class exnref would require continuations to implement.
  3. I think Ross is claiming that any such future exnref-using proposal cannot be made "compatible" with existing modules (i.e. those compiled with the current proposal), in the sense that either the new or old-style module will not work as intended when they are mixed.

I was understanding these points. I meant exnref does not contain or need to contain continuations in order to support two-phase unwinding within the scope of current + possible follow-on proposal. (which does not include resumable EH) What I asked Ross repeatedly was also "why do we need to remove exnref to support two-phase unwinding?"

I'm still working to understand the exact nature of those incompatibilities and what might happen when they are mixed, and whether it might be possible to e.g. place some restriction on exnrefs that escape their catch block that might enable the transformations toolchains need without nuking the whole thing from orbit. (even a compromise solution that, for example had some suboptimal dynamic behavior or relied on the toolchain to have some convention in order to be forward-compatible would be potentially on the table in my view).

I'd like to avoid phrasing our solution as a compromise, before we figure out why we need to include continuations to support two-phase. But yeah, even if it contains one hypothetically, the responsibility of toolchain is to make sure to generate correct code.

@RossTate
Copy link
Contributor Author

before we figure out why we need to include continuations to support two-phase

Based on the feedback above, I've created a new issue (#122) focused on just figuring this out. There are other items we could discuss, but it seems getting consensus on this item is the first order of business, and I'm sorry for having tried to cover too much ground in this issue. I apologize that the opening post is long, but I wanted to make it self contained so that people would not need to read this thread (or other threads) to get at least the critical relevant context and contribute to the discussion. I've also tried to incorporate the feedback on what did and did not help people understand the problem better. So I'm hoping that 1) giving appropriate context, 2) keeping the topic focused, and 3) starting with a better opening explanation will together reduce/eliminate the need for subsequent large comments from me and hopefully save everyone time.

Also, I should clarify that I am trying to respond to comments quickly in order to help come to a resolution quickly, since I understand y'all have pressure to get exception handling developed quickly. I don't want to raise an issue and then hold things up by leaving y'all hanging. I realize now, though, that that unintentionally imposed pressure to respond to my comments quickly. I greatly appreciate the sense of duty, but I know that y'all (like me) have other responsibilities you are balancing, so even if I respond quickly I hope y'all will feel free to progress the discussion at a pace that properly balances your other responsibilities with whatever timeline y'all need to meet for resolving this issue.

@aheejin
Copy link
Member

aheejin commented Jun 23, 2020

Now I think better understand why you suggested a catch should take a label instead of a body. I guess you suggested it as a way to fix CFGStackify problem, right? But this isn't gonna work, because we would need the same mechanism for your unwind. You said you read fixUnwindMismatches in CFGStackify. That mismatch can happen not only for catch but also for unwind (which means, it can happen for any EH pad). But making unwind take a label too is not very different from having exnref, because control flow can escape the unwinder.

As I said, that fixUnwindMismatches was the main reason we added exnref. So basically when a mismatch occurs, we need to redirect control from one catch (or unwind) to another catch (or unwind). We use br and exnref to do that in the current proposal. If we don't have exnref, we need another way to do that. One thing I can think of is rethrowing it. I once suggested to make rethrow have an immediate argument to achieve the same thing (#74), but I don't think this will solve the two-phase problem either, because without following rethrow we don't know where the destination catch is. We need a way to encode the next destination within catch to solve this. Not sure what we should do.


About splitting catch and unwind: I don't have a lot of problems with this in the current toolchain, because a modified version of Windows EH frontend we are currently using in LLVM is well suited for this split. But I'm concerned whether it makes the spec more inflexible than necessary, because I don't think all EH-generating frontends split catching and unwinding code. For one, Itanium EH does not do that. Their EH pads can contain both catching and unwinding code at the same time.

@RossTate
Copy link
Contributor Author

That mismatch can happen not only for catch but also for unwind (which means, it can happen for any EH pad).

Ah, I had wondered about this. Thanks for the insight.

But this isn't gonna work, because we would need the same mechanism for your unwind.

So I've been wondering about this. As you say, my unwind is essentially doing two things: specifying unwinding code, and registering an unwinder. To mirror the change I made to catch, we'd need to separate these two parts. Unfortunately, WebAssembly right now doesn't have something analogous to the label we can use for catch; that is, it's important that the unwinder has an end at which point it returns control to the current exception-handling phase that it was called from.

We could add a general-purpose analog to label for this, but for the sake of simplicity/speediness, I'll suggest something specific to the current needs (that could later be desugared to something more general):

  1. Introduce unwinder $unwinder_name instr1* within instr2* end. This just declares unwinding code instr1* that is in scope (referenceable by $unwinder_name) during the execution of instr2*. Execution jumps straight into instr2*. The sequence instr1* must have type [] -> [], in which case the overall unwinder $unwinder_name instr1* within instr2* end inherits the type of instr2*.
  2. Change try/unwind to be of the form try instr* unwind $unwinder_name. This executes instr* but with the unwinding code referenced by $unwinder_name registered so that, if an exception is thrown from within instr*, then that unwinding code will be run during the appropriate phase of exception handling.

With this, my old try instr1* unwind instr2* end would translate to the new unwinder $name instr2* within (try instr1* unwind $name) end. This regrettably takes up more bytes (which can be mitigated, though I suggest we not focus on binary size right now), but it's much more flexible.

Would that change address the needs of fixUnwindMismatches? It does unfortunately still require catch and unwind to be split, but my sense is that that splitting is key to compatibility with two-phase exception handling.

@aheejin
Copy link
Member

aheejin commented Jun 24, 2020

I think, how we should structure unwind and whether we should split catch and unwind are two different problems, and we should discuss it separately. Also, this is even separate from whether we should have exnref or not. This is the problems occurring when mixing many different problems within a big post. It is also hard to follow or join for many people. I also want to hear about VM people too on this, this discussion got unnecessarily too long and other people don't even try to join anymore, which is unfortunate.

First, on whether we should split catch and unwind:
I still don't understand why this is "essential" to support two-phase unwinding as you suggest. I countered you on this several times in this post, but I didn't get any answers for them. Usually, when I counter you on some points, you don't answer them, but you bring some other reasons and add other discussion issues, which has made discussion very confusing for me.

I'll summarize my points again here.

  • To me, the difference between catch and unwind is only one thing: unwind does not need a rethrow at the end. I don't see why this distinction is crucial for two-phase, because, unwinder blocks that use catch always can have rethrow at the end.
  • Both catching and unwinding code run not in the first phase but only in the second phase. Their behaviors are the same in both phase. Why are they need to be split to support two-phase?
  • Also, as I said in Reconsider original extensible design #118 (comment), I don't want to make the spec needlessly inflexible. Your design requires all frontends that target wasm EH to split unwinding code and catching code, which is not easy for some EH schemes.

Second, on your new syntax:
First of all, not sure what your syntax exactly is. try instr* unwind $unwinder_name sounds like while running instr*, if something is thrown, you go to the unwinder specified by $unwinder_name. But you also proposed unwinder $unwinder_name instr1* within instr2* end, which sounds like, while running instr2* if something is thrown, run instr1*. (I think the order of instr1* and instr2* should be reversed, by the way) Why do we have both try's instr* and unwind's instr2*? So what I mean is, why does unwinder need to have two parts, and not only the unwinding code?

Long ago, we also brainstormed on whether we can split try and catch's location so that multiple try can target a single catch. (For example, "Catchless try block" in #29). But it didn't get enough traction, and we got exnref, so it was abandoned. If we ever get to remove exnref, I'm in favor of reexploring this path. But I also don't want to change the fundamental structure of current code structure too much, because it can be unnecessary churns for toolchain and VM implementors.

I and @dschuff recently talked about an alternative syntax too, and while we haven't written it somewhere in a doc or issue, it preserves the current try-catch structure, adds a label to catch (or reuse a label for try), and adds a new instruction catch_br that does not have a body but takes a label. So this is possible:

try $label
  ...
  try
   ...
  catch_br $label
catch
  handler code
end

(I attached $label to try, but you can think of it as attached to catch. They don't really matter, because they are all gonna be immediates in wasm code.)
This does not touch the current fundamental structure too much, but adds a new instruction catch_br, which is necessary for code transformations without exnref.

@aheejin
Copy link
Member

aheejin commented Jun 24, 2020

Also, I should clarify that I am trying to respond to comments quickly in order to help come to a resolution quickly, since I understand y'all have pressure to get exception handling developed quickly. I don't want to raise an issue and then hold things up by leaving y'all hanging. I realize now, though, that that unintentionally imposed pressure to respond to my comments quickly. I greatly appreciate the sense of duty, but I know that y'all (like me) have other responsibilities you are balancing, so even if I respond quickly I hope y'all will feel free to progress the discussion at a pace that properly balances your other responsibilities with whatever timeline y'all need to meet for resolving this issue.

Different people have different thoughts and feedbacks, but I for one, I don't want you to be pressured to slow-roll your replies or anything. Thanks.

@aheejin
Copy link
Member

aheejin commented Oct 12, 2020

Many of concerns here were focused on exnref, which was removed in Sep 15 CG meeting. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants