Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficiency Due to Eager Stack Tracing #102

Open
RossTate opened this issue Mar 11, 2020 · 32 comments
Open

Inefficiency Due to Eager Stack Tracing #102

RossTate opened this issue Mar 11, 2020 · 32 comments

Comments

@RossTate
Copy link
Contributor

Preface: Although stack tracing is not explicitly part of the semantics of the proposal, it certainly is part of the rationale of the design and there are many implicit expectations about how the proposal supports stack tracing, so here I'm discussing the consequences of implicit expectations.

When I sought advice on designing for exceptions, the first and strongly stated piece of advice I received was to not trace the stack eagerly. I was given three main reasons for this: 1) tracing the stack is slow; 2) tracing the stack is proportionate to the size of the stack, so eagerly tracing it makes the performance of throw-catch depend on the stack size; and 3) in the common case the exception is caught and handled, making this expensive effort wasted. At a higher level, the advice was that good exception design makes the performance of throw-catch proportionate to just the number of stack frames that were unwound, i.e. the distance the exception travelled. (In case it turns out to be relevant, I was also reminded that some languages distinguish between exceptions and errors and ensure that exceptions get caught, making stack traces useless for exceptions.)

Unfortunately, the current design seems to necessitate eager stack tracing. Even just to run destructors and finallys, the exception is reified into an exnref, which can be stored or otherwise escape arbitrarily, and which can be rethrown from anywhere with the expectation that its stack trace is preserved. This problem can be mitigated through escape analysis, but my understanding is that WebAssembly is intended to not rely on things like escape analysis for good performance.

While I know that it's important to support this stack-trace functionality, exception handling is a cross-language feature, and it seems problematic to bake inefficient stack-trace functionality into the exception-handling primitives of WebAssembly. If it helps to have a comparison point, .NET is able to support this functionality without requiring eager stack collection in the common case. Its exception-handling primitives support (at least as best as I can tell) throw-catch performance proportionate to the distance travelled, as suggested by the advice I was given.

@RossTate
Copy link
Contributor Author

For some background, there seem to be two predominant alternatives to eager stack tracing.

  1. First walk the stack to see if the exception is caught. If not, report the trace. If caught, unwind the stack up that point. This doesn't seem like a good fit for WebAssembly because it relies on a way to check when exceptions are caught without mangling the stack, which seems difficult to do in a multi-language setting where languages would need to provide custom filters to determine whether the exception should be caught and these filters would in enough cases likely need to mangle the stack.
  2. Build the tail of the trace as you unwind the stack. This still wastes time if the trace is caught, but at least that time is still just proportionate to the distance traveled. If a handler needs a full stack trace for an exception, it can append this tail to the trace of the current stack. This seems to be what .NET does, whereas the current proposal is what the JVM does (which often is not a good indicator).

@lukewagner
Copy link
Member

Speaking just about the JS API: my working assumption has been that an uncaught wasm exception would not turn into a NativeError and would thus not implicitly have the stack property that ends up forcing an eager stack trace. Rather, I assume (I can't find this written up anywhere) we'll define some new WebAssembly.Exception object and it should also not be given a stack property; if a developer wants to capture a stack, they can always call out to JS, create an Error object, and stuff it in the exception's payload via anyref.

Is there anywhere else where eager stack tracing is implied by the current design?

@RossTate
Copy link
Contributor Author

Hmm, maybe there's a miscommunication somewhere then. Over e-mail, @aheejin has been helping fill me on the details of this proposal. I had noticed that exnrefs weren't actually being used to support C++'s rethrow functionality, but according to the following that is just because that functionality has not been implemented:

yes, current '__cxa_rethrow' is not using exnref now, because that part of implementation is currently missing. Currently we don't preserve exnrefs and just throw a fresh i32 from the wasm standpoint. To make this actually preserve [the trace] when rethrowing, we need to keep a stack of exnrefs in a table. ... I'm planning to do that part of the implementation later.

I interpreted this to mean that the expectation is that exnref and rethrow is to be used to preserve stack traces, and in many points during our discussion of the rationale this functionality was emphasized. In line with that, the current payload for the __cpp_exception event is just an i32—no anyref.

Is there anywhere else where eager stack tracing is implied by the current design?

Unsure. But the current design also makes as-you-go stack tracing (i.e. option (2) above) impossible. So people compiling to WebAssembly will have to decide between stack traces and efficient exceptions, which is not exactly a great choice to be forced to make.

@aheejin
Copy link
Member

aheejin commented Mar 15, 2020

I don't think the spec forces eager stack tracing, and the embedder may or may not embed a stack trace based on its preference. But I acknowledge that if the embedder chooses to embed a stack trace, it should do that eagerly, because the spec says

To do this, WebAssembly exceptions are immutable once created, to avoid cyclic data structures that cannot easily be reference-counted. It also means that exceptions can't be stored into linear memory.

If it were not for this, stack traces would be able to be appended as stack unwinding goes. I agree it would be better to allow stack traces be generated incrementally if possible, but I'm not sure if we should loosen the immutability requirement. I think I should note that the proposal was mainly designed to ensure zero-cost for the code pass in which exceptions don't occur, and optimizing the exception handling capability itself was not its main goal. It'd be better of course if we can optimize that part though.

@RossTate
Copy link
Contributor Author

If it were not for [immutability], stack traces would be able to be appended as stack unwinding goes.

This is not true. Because the exnref can be stored, it can be rethrown from a place with a different stack. So mutating the exnref' would give you a stack trace that's a mix of where it was rethrown from and where it was thrown and later reified.

That is, the problem stems from the proposal having too much functionality to be efficiently implementable (assuming the engine/language wants stack traces). And that means the problem cannot be solved by adding more functionality. Plus, we know from .NET that the functionality this proposal provides really is more than is necessary to support its intended language features, like C++ destructors and exception rethrowing.

@aheejin
Copy link
Member

aheejin commented Mar 16, 2020

Because the exnref can be stored, it can be rethrown from a place with a different stack. So mutating the exnref' would give you a stack trace that's a mix of where it was rethrown from and where it was thrown and later reified.

Mixup of stack traces is possible in this case as you said, but this is not a recommended use case and stack trace itself is not a mandatory part of the spec anyway. Stack traces are after all some auxiliary info mainly useful for debugging that can be embedded into exnref if the embedder chooses to do that, and I don't think the spec has to guard against some bizarre use case like this. If a toolchain (= wasm generator/compiler) wants to support reliable debugging experience, it simply should not generate such code.

That is, the problem stems from the proposal having too much functionality to be efficiently implementable (assuming the engine/language wants stack traces).

Not sure what you mean here. Do you mean there is a functionality we can get rid of while we have stack traces?

Plus, we know from .NET that the functionality this proposal provides really is more than is necessary to support its intended language features, like C++ destructors and exception rethrowing.

Again, I'm not sure what you'd like to imply here. What functionality do you want to get rid of from the current spec? What do you suggest as an alternative? Also I don't know much about .NET's primitive instructions on exception handling, so I'm not sure if what they're doing can be applied to us. I'd appreciate if you give some introduction on what their primitives are like and how their environment compares with us.

@RossTate
Copy link
Contributor Author

RossTate commented Mar 16, 2020

Also I don't know much about .NET's primitive instructions on exception handling, so I'm not sure if what they're doing can be applied to us. I'd appreciate if you give some introduction on what their primitives are like and how their environment compares with us.

Sure. .NET has a try block with a single handler of 4 types: finally, fault, catch, and filter. finally is used for languages with finally clauses; it is executed regardless of how control flow exits. fault is used just for C++ destructors; it is executed only when an exception is being thrown but does not touch the exception. catch specifies an exception type; it is only executed when such an exception is thrown and only then provides access to the exception. filter specifies an exception-accepting block returning true/false; it is only executed if the exception-accepting block returned true.

As for throwing exceptions, throw takes an exception and semantically associates it with a stack trace at that point (but does not necessarily generate it then); any previously-associated stack trace is forgotten. rethrow does not take an exception and is only permitted inside catch blocks; it rethrows the exception being handled by the associated catch block. This restriction on rethrow ensures that the stack is still the same.

If a toolchain (= wasm generator/compiler) wants to support reliable debugging experience

This gets to another problem. The current proposal does not seem to support good debugging. Many debuggers want the stack to remain in tact when an uncaught exception is thrown. That means the process of determining whether the exception is or is not caught should preserve the stack. The reason why .NET has filter rather than just relying on rethrow is so that the filtering block can be run on a separate stack and the answer to whether the exception is caught can be determined without mangling the stack. But the current proposal bundles all of this functionality together, making it impossible to preserve the stack while determining whether an exception is caught. And yes, even .NET's restricted rethrow still makes it impossible to always preserve the stack, .NET's design keeps rethrow rare (only occurring where present in the surface language) whereas the current proposal makes rethrow extremely common.

P.S. In case it helps, this is the reference I am using: http://www.ecma-international.org/publications/standards/Ecma-335.htm

@aheejin
Copy link
Member

aheejin commented Mar 16, 2020

Also I don't know much about .NET's primitive instructions on exception handling, so I'm not sure if what they're doing can be applied to us. I'd appreciate if you give some introduction on what their primitives are like and how their environment compares with us.

Sure. .NET has a try block with a single handler of 4 types: finally, fault, catch, and filter. finally is used for languages with finally clauses; it is executed regardless of how control flow exits. fault is used just for C++ destructors; it is executed only when an exception is being thrown but does not touch the exception. catch specifies an exception type; it is only executed when such an exception is thrown and only then provides access to the exception. filter specifies an exception-accepting block returning true/false; it is only executed if the exception-accepting block returned true.

As for throwing exceptions, throw takes an exception and semantically associates it with a stack trace at that point (but does not necessarily generate it then); any previously-associated stack trace is forgotten. rethrow does not take an exception and is only permitted inside catch blocks; it rethrows the exception being handled by the associated catch block. This restriction on rethrow ensures that the stack is still the same.

I think here you are again suggesting that we go back to the first proposal..? I'm not sure if we should repeat all discussions happened in #101 and over our emails (which I prefer we do here in the repo next time, if we should really do that). And your reason for that is possible mixup of stack traces, which I still think is a very contrived use case. As I said, if a toolchain wants to provide reliable stack traces, it should not generate such code.

If a toolchain (= wasm generator/compiler) wants to support reliable debugging experience

This gets to another problem. The current proposal does not seem to support good debugging. Many debuggers want the stack to remain in tact when an uncaught exception is thrown. That means the process of determining whether the exception is or is not caught should preserve the stack.

I'm little confused by what you'd like to suggest in this paragraph (and the next). Are you suggesting that we should have two-phase stack unwinding, in which the first phase only tries to determine whether the current exception should be caught or not and the second phase actually unwinds the stack? I think this is a good thing to have in general, and C++ EH in x86/ARM has it, but they can have it because they actually unwind the stack within their libunwind library that communicates with the personality function in libc++abi. Can we make our language-neutral proposal have that kind of functionality, which probably should involve a callback-like function that the VM calls to check in every stack frame or something..? Can a wasm VM call a callback??

The reason why .NET has filter rather than just relying on rethrow is so that the filtering block can be run on a separate stack and the answer to whether the exception is caught can be determined without mangling the stack. But the current proposal bundles all of this functionality together, making it impossible to preserve the stack while determining whether an exception is caught. And yes, even .NET's restricted rethrow still makes it impossible to always preserve the stack, .NET's design keeps rethrow rare (only occurring where present in the surface language) whereas the current proposal makes rethrow extremely common.

I'm not very sure how the separate stack unwinding thing you mentioned work, but it will certainly require a very special VM support. (Can it be done by coroutines?) Do you suggest that we require all VMs that support the EH proposal should have that kind of capability? If so, I think we'd have to have a better reason than "There might be a toolchain that tries to mixup stack traces".

@RossTate
Copy link
Contributor Author

I think here you are again suggesting that we go back to the first proposal..?

No. The first and second proposals are not the only two options. There are many options. I am pointing out that .NET, a system that supports C++ and many other languages, provides another option, and I am illustrating the rationale behind its design, something that does not appear to have happened in any prior discussion.

Are you suggesting that we should have two-phase stack unwinding, in which the first phase only tries to determine whether the current exception should be caught or not and the second phase actually unwinds the stack?

I am pointing out that this is important for good debugging support, and I am illustrating how the current proposal does not support this. I am also not making it necessary to support; I am trying to be compatible with multiple implementation strategies.

Can we make our language-neutral proposal have that kind of functionality...?

I mentioned .NET's filter design because it is a language-neutral way to support such functionality. However, as with any comparable system, some adaptations would be appropriate if we were to use such a design for WebAssembly.

Do you suggest that we require all VMs that support the EH proposal should have that kind of capability?

No. I am pointing out the implementation complexity such a design seems to require so that we can decide if it is appropriate. That said, I also believe there is a clean way to make "filterable" exceptions an additional feature, rather than part of core exception handling, so that not every engine has to support it. In fact, it might even be that there's a way to make this support both double-walking and single-walking strategies, with the former providing better debugging support, so that there's no implementation constraint at all.

@aheejin
Copy link
Member

aheejin commented Mar 17, 2020

I briefly checked how .NET supports general C++ throw;, which can occur anywhere including function calls, with the rethrow primitive only permitted within catch blocks as you said, and it seems what you described is not the full picture. This document clarifies there are two kinds of rethrows: one is the "managed rethrow", which can only occur within catch blocks. I think this is what you described. But this cannot handle arbitrary throw; in C++ within function calls, so rethrows of this type can only be caught as an exception of type SEHException, which does not support stack traces. I'm not very sure if we should mimic this two kinds of rethrow model.

About your suggestions on possible two-phase stack unwinding or exception filtering, please note that we have two levels of testing or filtering. The first level, what br_on_exn checks, is whether this is a C++ exception or not (in case of C++). This means every C++ exception is gonna be caught, even if it gets rethrown afterwards. The second-level check is whether this exception matches some actual C++ language clause (such as catch (int)), and this is done in application code with help from LSDA data and personality function. (This is an illustration how we transform code to call the personality function. In x86/ARM, the personality function is called from libunwind during stack unwinding process. We can't do that because the unwinding is done by the VM). So, if we talk about the first-level check, two-phase unwinding is not very useful anyway, because every exception is gonna be caught (and some are gonna be rethrown). If we mean the second level, the language level check (such as if this is an int? or a std::exception&?), this level of check is buried in the application and library code and I'm not sure the VM can query this while unwinding the stack.

@RossTate
Copy link
Contributor Author

Nice find! And thanks to the link to EHScheme.md; it's so far been easy to follow (though I can't say I've managed to read it all yet).

I'm not very sure if we should mimic this two kinds of rethrow model.

If we provide the restricted throw and rethrow pattern, you can simulate C++'s functionality by using @lukewagner's suggestion of importing a "get_trace" function returning an anyref and making the anyref trace be part of the __cpp_exception event's payload. After the MVP we could add more functionality to optimize this pattern through direct support, but this seems to be enough functionality to at least support this pattern reasonably well for now. It would also work for other languages whose semantics effectively force eager stack tracing.

So, if we talk about the first-level check, two-phase unwinding is not very useful anyway, because every exception is gonna be caught (and some are gonna be rethrown). If we mean the second level, the language level check (such as if this is an int? or a std::exception&?), this level of check is buried in the application and library code and I'm not sure the VM can query this while unwinding the stack.

Right. Any approach that effectively bundles "filtering" and "unwinding" together makes two-phase unwinding impossible. If one separates these two, then the big question tends to be how to do "filtering". If the filtering process is "nearly pure" (put aside details for now), then the semantics supports both two-phase and single-phase unwinding. If the filtering process is built-in, then it's faster to implement in general and easier to implement while still supporting two-phase (and generally trivial to guarantee is nearly pure). The challenge is that usually the built-in process is specialized for a given language, and WebAssembly needs a language-agnostic filter process, which might force our hand into customizable filters.

@aheejin
Copy link
Member

aheejin commented Mar 17, 2020

I haven't managed to read all of your last comment yet, and I have to go now for today (I'm in a different timezone now), but I forgot to mention that the EHScheme doc has some out-of-date parts. It was first written in 2017, when we tried to use Itanium IR, so I think this part does not hold anymore, and I didn't mean to ask you to read all of that anyway. Mostly, what I wanted to refer to in that doc is why we need to insert calls to the personality function in the user code, which is because we don't have the two-phase stack unwinding.

@RossTate
Copy link
Contributor Author

Ah, thanks for the clarification. I do worry that such an involved scheme won't scale well to other languages or to stacks crossing between multiple modules.

@aheejin
Copy link
Member

aheejin commented Mar 19, 2020

Nice find! And thanks to the link to EHScheme.md; it's so far been easy to follow (though I can't say I've managed to read it all yet).

I'm not very sure if we should mimic this two kinds of rethrow model.

If we provide the restricted throw and rethrow pattern, you can simulate C++'s functionality by using @lukewagner's suggestion of importing a "get_trace" function returning an anyref and making the anyref trace be part of the __cpp_exception event's payload. After the MVP we could add more functionality to optimize this pattern through direct support, but this seems to be enough functionality to at least support this pattern reasonably well for now. It would also work for other languages whose semantics effectively force eager stack tracing.

I don't understand what you mean by "the restricted throw and rethrow pattern". As I pointed out there are two kinds of rethrows in .NET C++ support, and while I'm not sure which version you are referring to as the restricted rethrow, if we are gonna mimic them, we also need both rethrows to fully support C++'s rethrow because it can occur anywhere including function calls. I also don't follow very well what you mean by get_trace function and stuff. (I thought I understood Luke's comment, but I'm not sure if I follow this one)

So, if we talk about the first-level check, two-phase unwinding is not very useful anyway, because every exception is gonna be caught (and some are gonna be rethrown). If we mean the second level, the language level check (such as if this is an int? or a std::exception&?), this level of check is buried in the application and library code and I'm not sure the VM can query this while unwinding the stack.

Right. Any approach that effectively bundles "filtering" and "unwinding" together makes two-phase unwinding impossible. If one separates these two, then the big question tends to be how to do "filtering". If the filtering process is "nearly pure" (put aside details for now), then the semantics supports both two-phase and single-phase unwinding. If the filtering process is built-in, then it's faster to implement in general and easier to implement while still supporting two-phase (and generally trivial to guarantee is nearly pure). The challenge is that usually the built-in process is specialized for a given language, and WebAssembly needs a language-agnostic filter process, which might force our hand into customizable filters.

Yes, the very nature of stack unwinding done by VM in wasm makes two-phase unwinding very hard. The idea of a VM calling into a filter function, which depends on the data structures within the application, is hard. Making it language-agnostic is harder, if not infeasible.

Ah, thanks for the clarification. I do worry that such an involved scheme won't scale well to other languages or to stacks crossing between multiple modules.

This infeasibility of VM calling into filter functions is why the scheme was designed that way. I don't necessarily agree that the scheme is 'complex'; the doc is long, but the part I'm referring to can be summarized in a sentence: "Because the unwinder cannot call a filter function, compiler inserts filter function calls to landingpads." This is at least language agnostic. I'm not sure what you suggest as a better language-agnostic alternative. Also please note that the EH scheme doc I linked above is not a part of the spec; it is an example of illustration on how a C++ toolchain can work with the spec. Other C++ toolchains and toolchains for other languages can choose other schemes as long as they provide semantically correct functionalities.

Overall, I have little idea what you are suggesting as a whole. I'm not even sure if all these things you mentioned pertain to preventing eager stack tracing, let alone if that is a very serious problem that warrants a whole redesign.

@rossberg
Copy link
Member

Here is one small concrete suggestion to support at least first-level filtering: add an optional tag list to a catch clause. The clause would only be invoked for a thrown exception if its tag is in the list. That would make the difference between catch-all clauses and more specific handlers explicit.

While that can't express filtering on more than the outermost tag (such as needed for C++ exceptions, or ML-style exception pattern matches), it still allows more efficient handler dispatch in many cases, depending on how user code makes use of the tags. C++ likely would benefit least, because of its ubiquitous destructors mapping to catch-all handlers dominating most unwinding, but other languages might be much better off.

@RossTate
Copy link
Contributor Author

RossTate commented Mar 19, 2020

Here is one small concrete suggestion to support at least first-level filtering: add an optional tag list to a catch clause. The clause would only be invoked for a thrown exception if its tag is in the list. That would make the difference between catch-all clauses and more specific handlers explicit.

This won't help. catch-all clauses are more frequent than filtered-catch clauses in the current design because they are used to implement destructors.

Overall, I have little idea what you are suggesting as a whole.

I am suggesting that this is a bad tradeoff [revised below]:

Pros for this design over something closer to .NET:

  • Built-in support for specifically C++'s rethrow construct preserving stack traces

Cons for this design over something closer to .NET:

  • Permanently inefficient exceptions (due to stack tracing, which most systems will want, needing to be done eagerly)
  • Permanent lack of built-in debugging support, such as what two-phase unwinding would provide

And those are only the cons it sounds like we all agree on. And the only pro can likely be addressed by adding more features later, making it only a short-term advantage. And it's not even clear to me that that's properly handled by the current design, since its implementation assumes all C++ programs on the stack are using the same exnref stack, which won't be true once we have separate C++ modules calling into each other.

@rossberg
Copy link
Member

This won't help. catch-all clauses are more frequent than filtered-catch clauses in the current design because they are used to implement destructors.

In C++, yes, as mentioned in the 2nd paragraph. However, other langs would benefit.

  • Built-in support for specifically C++'s rethrow construct preserving stack traces

This isn't specific to C++. Other language implementations also want to preserve stack traces when rethrowing an exception value previously caught, even if they don't use explicit syntax for it. JS and Ocaml are examples I know off the top of my head.

@RossTate
Copy link
Contributor Author

This isn't specific to C++.

Oops, yeah, I got mixed up with the other complication that C++'s throw; does not directly reference the exception to rethrow. Regardless, eager stack tracing could easily be provided without built-in support, say be importing an get_stack_trace function and making the resulting anyref part of the event's payload. On the other hand, debugging support is not nearly so easy to provide without built-in support.

In C++, yes, as mentioned in the 2nd paragraph. However, other langs would benefit.

Other languages still have finally, using, and filtered catch clauses. Without custom filtering, the optional tag list would be such a coarse filter that it'll catch most exceptions anyways (assuming most exceptions pass through catch clauses in the same module).

Revised trade-off summary:

Pros for this design over something closer to .NET:

  • Built-in support for trace-preserving rethrow (though this is easily implementable without built-in support)

Cons for this design over something closer to .NET:

  • Permanently inefficient exceptions (due to stack tracing, which most systems will want, needing to be done eagerly)
  • Permanent lack of built-in debugging support, such as what two-phase unwinding would provide

@aheejin
Copy link
Member

aheejin commented Mar 19, 2020

I don't think we have any clear idea on what you mean by "something closer to .NET" in the first place. As I said, .NET has a kind of weird model, with two kinds of rethrows, for a reason.

@RossTate
Copy link
Contributor Author

Here are the three most important steps:

  1. Exceptions are only reified when caught (e.g. no exnref provided when just executing destructors or finally clauses)
  2. Rethrow is restricted to within its associate catch clause
    • Languages whose semantics force eager stack tracing, like C++, would not use rethrow but would instead make the stack trace part of the payload. This addresses .NET's oddity.
  3. Filter before catching as much as possible

For custom filters, I realized how .NET does this, and it's much simpler than I had thought. (One of those "duh" moments.) You just run the filter on the tail of the stack being walked while giving it the pointer to its relevant stack frame, similar to how generators are implemented.

@aheejin
Copy link
Member

aheejin commented Mar 19, 2020

Here are the three most important steps:

  1. Exceptions are only reified when caught (e.g. no exnref provided when just executing destructors or finally clauses)

What's the benefit of this? As long as exnref can be reified, we need reference types.

  1. Rethrow is restricted to within its associate catch clause

    • Languages whose semantics force eager stack tracing, like C++, would not use rethrow but would instead make the stack trace part of the payload. This addresses .NET's oddity.
  • Making C++ library dependent on VM or requiring VM a function like get_trace() doesn't sound very plausible.
  • If we do this, we can't use wasm rethrows for C++ ever, even for destructor frames within catches. This seems hacky, and C++ is currently our most important target.
  • And this is eager tracing anyway, isn't it? What are we improving?
  1. Filter before catching as much as possible

For custom filters, I realized how .NET does this, and it's much simpler than I had thought. (One of those "duh" moments.) You just run the filter on the tail of the stack being walked while giving it the pointer to its relevant stack frame, similar to how generators are implemented.

I don't understand what this part means. What is 'the tail of the stack'? Are you talking about the first-level check (C++ or not) or the second-level check (Is it std::exception& or int)? If it's the latter, we should run some part of application code (in C++ that includes application code + personality function in libc++abi) from the VM, which I don't know how to achieve.

At this point I'm not even sure the series of things you are suggesting is relevant to preventing eager stack tracing anymore, which I'm still not convinced as a critical problem that warrants a whole redesign and reimplementation in the first place, also which does not improve anything (and rather make things hacky) for C++, currently the target language we have most client requests for.

@RossTate
Copy link
Contributor Author

RossTate commented Mar 19, 2020

What's the benefit of this? As long as exnref can be reified, we need reference types.

The stack trace is no longer implicitly associated with the exnref. Since the exception has been caught, the stack trace isn't necessary. And the stack is still in tact, so if you rethrow (which you can only do while still in the catch clause), then the stack will still be preserved.

Making C++ library dependent on VM or requiring VM a function like get_trace() doesn't sound very plausible.

Then add a get_trace instruction. Traces are useful for more than just exceptions after all.

If we do this, we can't use wasm rethrows for C++ ever, even for destructor frames within catches. This seems hacky, and C++ is currently our most important target.

No. If you're still in the catch clause, you can still use rethrow.

And this is eager tracing anyway, isn't it? What are we improving?

  1. C++ exceptions are not particularly well designed for efficiency. All those hurdles you are jumping through is because of this. We are making it so that only languages with inefficient exceptions have inefficient exceptions.
  2. What I mentioned was the simplest solution. A slightly fancier one has a get_full_trace instruction executable only within a catch clause that gets the full stack trace of the currently caught exception. C++ code would compile to use this only when the handler has the possibility of indirectly rethrowing in a nested call via throw;, a possibility that's often easy to rule out. So even C++'s inefficient exceptions would be made more efficient with this redesign.
  3. Also, since this redesign more directly supports the more common patterns, it makes for smaller binaries. As an experiment, I took the code below and compiled it with the current design (-fwasm-exceptions -O3) versus a revised design, and the revised design reduced the body of foo from 119 instructions to 59 instructions.
class Resource {
public:
  Resource(int);
  ~Resource();
};

void func(Resource& rec1, Resource& rec2);

int foo(bool b) {
  Resource rec1(1);
  try {
    Resource rec2(2);
    func(rec1, rec2);
  } catch (int val) {
    if (!val)
        throw;
    return val;
  }
  return 6;
}

So C++ stands to benefit as well from more direct debugging support, from faster exceptions due to less frequent stack tracing, and from smaller binaries.

@binji
Copy link
Member

binji commented Mar 19, 2020

As an experiment, I took the code below and compiled it with the current design (-fwasm-exceptions -O3) versus a revised design, and the revised design reduced the body of foo from 119 instructions to 59 instructions.

Can you share this as a gist?

@RossTate
Copy link
Contributor Author

Sure. Here's the diff: https://linediff.com/?id=5e741e53687f4bf8498b4567

Changes:

  • Removed personality code for manually supporting two-phase unwinding since now such support is built in
  • Removed library calls for rethrowing because now directly supported (in this case—more on this below)
  • Used finally to consolidate destructor code and reduce manual control flow (for older C++ code, you would need a separate fault clause for destructors because they have different semantics during stack unwinding due to exceptions)
  • Since catch now specifies an event and implicitly stores the stack trace, changed it to simply put the event's payload on the stack (made the payload be the pair of the rtti and the value)
  • The above change to catch, along with the change to the design of rethrow, made exnref unnecessary, so removed all the code for storing/fetching the exnref
  • Used a finally clause to restore the global stack pointer before executing the function (no matter how), which also made it reasonable to assume that all called functions conformed to the same convention

@RossTate
Copy link
Contributor Author

Oh, forgot to talk about C++'s rethrow more. Suppose code in module A catches some C++ exception. While handling that exception, it calls some function imported from module B. That imported function than executes the compilation of throw;. The question is, what exception will be rethrown?

With the current compilation strategy, I suspect the answer will be the wrong exception. Module A and Module B, both being compiled from C++, will each be maintaining their own stacks and, in particular, their own stacks of exceptions currently being handled. The compilation of throw; looks at this stack and throws the exception on the top of it. The problem is that the exception currently being handled was caught by Module A, and so is in A's exception stack, whereas the compilation of throw; was executed by Module B, and so will inspect B's exception stack. This mismatch will lead to an error, either by throwing the wrong an exception that happens to be on B's stack or by trapping because there is no such exception.

Because of this, I suspect the ideal long-term solution to throw; is to use something like resumable exceptions. Let's add an event __cpp_rethrow that has no parameters and instead has both the rtti and the value as a result, i.e. the payload of a __cpp_exception.

We then compile throw; to throw __cpp_rethrow; throw __cpp_exception. The first throw __cpp_rethrow will walk up the stack and eventually return with the exception payload to rethrow, which the second throw __cpp_exception then throws.

As for compiling a C++ catch, we translate it to the following pattern:

catch __cpp_exception
  if rtt does not match expected type: rethrow
  try
    execute handler
  catch __cpp_rethrow
    resume the __cpp_rethrow with payload of the __cpp_exception
  end_try
end_try

This pattern will work even across C++ module boundaries. It also makes it unnecessary to maintain exception stacks.

If you want to maintain stack traces, then we just tweak the strategy above. We add anyref to the payload for __cpp_exception (and to the result of __cpp_rethrow). We use the null reference to indicate that no stack has been associated with the exception yet, i.e. because we are collecting the stack trace lazily. When we catch the __cpp_rethrow, if that anyref is null then we execute an instruction to get the full stack trace of the __cpp_exception (which is still in tact) and resume with that as the associated stack trace.

This pattern lets us stack trace lazily in the common case, even for C++, only using eager stack tracing when a compilation of throw; is executed.

@aheejin
Copy link
Member

aheejin commented Mar 20, 2020

No. If you're still in the catch clause, you can still use rethrow.

Isn't rethrow supposed to generate stack traces as it goes on? Are we gonna tell rethrow not to generate stack trace only in C++, because it is in payload?

  1. C++ exceptions are not particularly well designed for efficiency. All those hurdles you are jumping through is because of this. We are making it so that only languages with inefficient exceptions have inefficient exceptions.

OK now it is not even about eager tracing anymore.
What's C++'s inefficiency in design compared to other languages?

  1. What I mentioned was the simplest solution.

That's... such a bold claim without much justification.

A slightly fancier one has a get_full_trace instruction executable only within a catch clause that gets the full stack trace of the currently caught exception. C++ code would compile to use this only when the handler has the possibility of indirectly rethrowing in a nested call via throw;, a possibility that's often easy to rule out. So even C++'s inefficient exceptions would be made more efficient with this redesign.

This condition is equivalent to when there's not a single function call within a whole catch body, which I think is basically none. (There are at least some library function calls.)

Sure. Here's the diff: https://linediff.com/?id=5e741e53687f4bf8498b4567

Changes:

  • Removed personality code for manually supporting two-phase unwinding since now such support is built in

We discussed the infeasibilty of making VM call some random application code. You just assume it's solved automagically.

You also removed not only a personality function call but also all the necessary code after that. What the personality function returns is the exception pointer and a selector. Given the selector, we need to run a series of instructions to find the right code (= C++ catch clause) to run, which compares the selector to other values. After we arrive at the right code to run, we call library functions like __cxa_begin_catch and __cxa_end_catch to handle it. You just deleted all these and say you reduced 40% code size.

  • Removed library calls for rethrowing because now directly supported (in this case—more on this below)

You cannot replace a call to __cxa_rethrow with a single wasm rethrow instruction. They are not equivalent. There are many things __cxa_rethrow does other than just rethrowing. Here's the library code. Within the library code we use wasm rethrow builtin, which is lowered to wasm rethrow instruction. (This code is LLVM central repo and this does not contain this wasm changes yet. And this part of the work is in progress)

  • Used finally to consolidate destructor code and reduce manual control flow (for older C++ code, you would need a separate fault clause for destructors because they have different semantics during stack unwinding due to exceptions)

I think I addressed why introducing finally in clang does not necessarily reduce code size in #101 (comment). Clang compiles away finally in some languages like obj-c as well.

  • The above change to catch, along with the change to the design of rethrow, made exnref unnecessary, so removed all the code for storing/fetching the exnref

As I said, __cxa_rethrow call is different from a wasm single rethrow. We need exnref in order to pass it to __cxa_rethrow, unless we are gonna inline the whole body of __cxa_rethrow every time.

@aheejin
Copy link
Member

aheejin commented Mar 20, 2020

And while these discussion and brainstorming can continue for a long time, I'd like to paste what I wrote to you in one of the replies to your email:

Thank you for your interests and suggestions. While it is always interesting and learning to do many iterations of design and thought experiments, we had to settle down to something at some point, and now the toolchain and V8 have more or less working implementation for this, which we put significant amount of efforts into. So while we very much appreciate your suggestions and insights, we may not be able to easily overhaul the whole proposal, especially when all the reasons that made us switch from the first to the current proposal still stand. Anyway, if you still want to discuss this, I think Github is a better place.

@RossTate
Copy link
Contributor Author

Isn't rethrow supposed to generate stack traces as it goes on? Are we gonna tell rethrow not to generate stack trace only in C++, because it is in payload?

In most cases, the anyref in the payload will be null, with the stack trace generated only after it has escaped uncaught. In cases with nested throw;, the anyref in the payload will be the stack trace of the original exception, whereas the lazily generated stack trace will be that of the nested throw;. I figured that was a reasonable compromise. This, too, can be improved by adding more stack-walking functionality.

This condition is equivalent to when there's not a single function call within a whole catch body, which I think is basically none. (There are at least some library function calls.)

No it isn't. There's a noexcept specifier in C++ that functions can use to declare that they do not throw exceptions. There's even a built in noexcept operator than uses that specifier to tell you whether an expression might throw an exception. You don't even have to do an interprocedural analysis if you don't want to, though it might be wise since really all we're worried about is ruling out throw;, which is even less common than throw e;. (The library functions you are referring to are implementation artifacts, not part of the C++ semantics. Any exceptions they throw are not part of the C++ semantics and only occur when undefined behavior has happened.)

We discussed the infeasibilty of making VM call some random application code. You just assume it's solved automagically.

No, I mentioned that it's solvable using a standard technique, say by adapting the technique on slide 8 of this lecture from our undergrad compilers class.

You also removed not only a personality function call but also all the necessary code after that. What the personality function returns is the exception pointer and a selector. Given the selector, we need to run a series of instructions to find the right code (= C++ catch clause) to run, which compares the selector to other values.

Right. That's why I changed the payload to be an rtti and a value (i.e. the exception pointer). The first thing my code does is check the rtti of the exception with the rtti of int (which I didn't know the magic constant for, so I used 1). It rethrows if that doesn't match. If there were more cases, it would check with them successively. If one of the cases had non-trivial subtypes, then rather than an equality check it would call an internal (so guaranteed to be error free except when undefined behavior has happened) function to do the subtyping check. So in fact, all this functionality of the personality function is addressed.

After we arrive at the right code to run, we call library functions like __cxa_begin_catch and __cxa_end_catch to handle it.

Right. These are used to keep track of the current "catch" stack so that we know which exception to rethrow when a nested throw; occurs. In this particular case, I know that can't happen, so I can remove them. More generally, these wouldn't necessary if one were to use the __cpp_rethrow design I mentioned.

You just deleted all these and say you reduced 40% code size.

40% of the code you are generating is just dealing with the fact that WebAssembly has poor control-flow/stack-walking primitives. Isn't that reason enough to improve WebAssembly? Not to mention that your code assumes that every C++ wasm module on the stack shares the same "catch" stack.

You cannot replace a call to __cxa_rethrow with a single wasm rethrow instruction. They are not equivalent. There are many things __cxa_rethrow does other than just rethrowing. Here's the library code.

I've already read that code. I know what it does. It isn't necessary for this example, or more generally for whenever you can guarantee throw; does not occur within a nested function call in the C++ catch clause, so long as I remove the matching surrounding __cxa_begin_catch and __cxa_end_catch calls, which I did.

So, we know my code is different, but the real question here is, is there a bug in my code? (Supposing we put in the correct magic constant for the rtti for int.) Because if it's correct, it's also half the size, more efficient, enables built-in debugging support, and—if we add resumable exceptions or generators—works correctly across module boundaries, all of which seem like substantial improvements.

@dschuff
Copy link
Member

dschuff commented Mar 21, 2020

I've already read that code. I know what it does. It isn't necessary for this example,

I’d like to repeat the objection that it’s not fair to compare the current output of the compiler (which is currently in the state of “let’s get it working first, for the general case, using something close to the existing C++ ABI”) against hand-written assembly, written without any regard for the existing ABI and optimized for this specific simple case.
Most of the optimizations you made here could be done independently of any suggested changes to the primitives. Conflating them is not helping your core argument.

We discussed the infeasibilty of making VM call some random application code. You just assume it's solved automagically.

No, I mentioned that it's solvable using a standard technique, say by adapting the technique on slide 8 of this lecture from our undergrad compilers class.

Before we start accusing each other of falling asleep in undergrad compilers class.... Just because something can be implemented using a "standard" technique, doesn't mean it can be done anywhere. Web VM folks have historically been very resistant to this kind of tight mixing of trusted and untrusted code (a concrete example that comes to mind: issues with calling untrusted code during GC and exposing GC details mean JS has never had the anything like the kind of finalizers that Java and .NET have).

WRT dynamic exception filters, running them on a separate stack or stack segment or imposing some other kind of restriction might address some of those concerns (I don't recall the details of that discussion if that we had before, I'd be interested if VM folks have thoughts on that).

But let's step back to the original issue.

If we keep the requirement that we should at least allow for disassociated rethrow to work (which the .NET scheme does not) then we need to allow exnref objects to escape catch clauses.
Ideally we could have something that would allow stack trace construction to be incremental (and therefore have a cost proportional to throw distance), or allow the cost to be explicitly incurred or avoided (as having an explicit stack trace capture would).

In the current scheme, can this be accomplished by only adding frames to the trace on rethrow (or on pass-through, for frames that do not have any catches)?
The thrown exnref can still escape but it can't actually be inspected within wasm, so therefore the trace doesn't have to be complete until the exception propagates all the way out to the environment or some final outer catch-all that wants to reify it explicitly.
So as @lukewagner suggests, we maybe can't use the "standard" JS way of capturing a string at throw time, but could we propagate the exception out to JS as some other kind of object, with a method to get the current trace?

Ideally we would also allow also user-controlled 2-phase unwinding. But your assertion here seems to be that the current scheme forever precludes it. I don't think I agree with that; e.g. I can imagine making that work well with resumable exceptions.

@RossTate
Copy link
Contributor Author

RossTate commented Mar 21, 2020

Before we start accusing each other of falling asleep in undergrad compilers class....

Oops, I totally see how that sentence reads like that. Just meant to illustrate that I wasn't relying on something magical, and that was the first reference that came to mind. I am sorry for the harmfully sloppy writing.

Web VM folks have historically been very resistant to this kind of tight mixing of trusted and untrusted code

I get this principle, but I don't see how it applies here. The stack is the program's stack, and the code being executed is the program's code. There aren't different levels of trust being mixed together here. If you can explain, I'd greatly appreciate it.

But let's step back to the original issue.

Okay. I have some updates on this.

  1. A major premise of this conversation (which seems to have been inherited over years, so I'm not sure where this premise came from originally), is that C++'s throw; needs to preserve the stack trace. This isn't part of the spec, but I figured it was a standard convention (my C++ days were primarily as a console video-game developer, where conventions are rather different for obvious reasons). But I just tested this with g++ and gdb, and the convention isn't true there either. This is great for @aheejin, because it means the __cxa_rethrow-and-family functions do not need to be modified to store/fetch an exnref on the "catch" stack; you just need to store the payload (comprised of just i32s) and __cxa_rethrow can just throw a new __cpp_exception with that payload. This also means there's no longer a good reason for an exnref to escape its catch clause, which means it can be removed entirely. This is great for WebAssembly because that alone removes 8% of the instructions (which were just for locally storing/getting the exnref), and means that exceptions are no longer dependent on reference types (which embedded devices might particularly like).

  2. This conversation has been very focused on a specific language, a specific ABI, and a specific compiler. But WebAssembly is supposed to be language/ABI/compiler agnostic, so it's worth looking at what other languages (with real conventions for stack traces) do with rethrowing exceptions. As already discussed, C# generates a new stack trace upon rethrowing. Java keeps the original stack trace. Python actually mixes the stack trace of the original and the rethrow together. So if we want to really support stack tracing, WebAssembly probably needs to provide more basic stack primitives with which languages can implement their own stack-tracing semantics.

  3. On that note, it's worth going into another detail of C#/.NET. The C#/.NET filters we talked about are executed before any stack unwinding is done. This isn't an implementation detail; it's visible in the semantics. In particular, the custom filter can be stateful, meaning any changes to the state that would be made by finallys and destructors could affect the filter if they were executed before the filter. So in terms of stack primitives, there's stack walking (finding a handler) and there's stack unwinding, and C#/.NET keeps these primitves separate. On the other hand, the current proposal bundles them together: the stack unwinding is already executed and internally visible by the time catch is executed before it even has a chance to consider whether it wants to actually catch the exception. Sure, if an engine really wants it can "preserve" the stack in a sense, but by the time we can figure out that the exception gets to the top all the destructors will have been run, the resources and memory would have been freed, and many of the stack frames would have been mangled, making for a pretty unhelpful stack. For the same reason, it'd be hard to ever add (potentially-)resumable exceptions because, in the process of walking the stack to find out if it might be resumed, any code that didn't recognize that exception would have ran its destructors and so can no longer be resumed. For resumable exceptions, you need to separate stack-walking code from stack-unwinding code.

@aheejin
Copy link
Member

aheejin commented Mar 23, 2020

You made a bunch of different suggestions within the last several weeks.

... maybe more. I don't remember.

At this point I'm honestly not sure if there is any specific part of the proposal you want to change. It looks like you want to remake all things from scratch.

As I said, we are not in the initial design and brainstorming stage anymore, and while some of what you suggest might be good to have, I don't think that necessarily means we should overhaul the proposal from scratch. Some of them might be not trivial to implement in Web VM settings. Some of them might not be critical enough to nullify all the engineering work done so far. Some of them can be implemented as a future complementary proposal, like a resumable exception. I'm not sure if "My proposal is the best and the simplest so you should have it right now in this proposal" kind of attitude really helps.

And I suggest, if you really want to discuss all of those N things, it'd be better to make a single issue per suggestion. In this issue alone you made N different suggestions which are not about eager tracing anymore and it's really hard to track them. I can't say we are able to take all those suggestions, but some of them might invoke good discussions so we can either make some small adjustments or it can be carried over to the next proposal so it's not lost, if discussions are done in a more organized and trackable way.

A bit of comments on your last comment:

  1. A major premise of this conversation (which seems to have been inherited over years, so I'm not sure where this premise came from originally), is that C++'s throw; needs to preserve the stack trace. This isn't part of the spec, but I figured it was a standard convention (my C++ days were primarily as a console video-game developer, where conventions are rather different for obvious reasons). But I just tested this with g++ and gdb, and the convention isn't true there either. This is great for @aheejin, because it means the __cxa_rethrow-and-family functions do not need to be modified to store/fetch an exnref on the "catch" stack; you just need to store the payload (comprised of just i32s) and __cxa_rethrow can just throw a new __cpp_exception with that payload. This also means there's no longer a good reason for an exnref to escape its catch clause, which means it can be removed entirely. This is great for WebAssembly because that alone removes 8% of the instructions (which were just for locally storing/getting the exnref), and means that exceptions are no longer dependent on reference types (which embedded devices might particularly like).
  2. This conversation has been very focused on a specific language, a specific ABI, and a specific compiler. But WebAssembly is supposed to be language/ABI/compiler agnostic, so it's worth looking at what other languages (with real conventions for stack traces) do with rethrowing exceptions. As already discussed, C# generates a new stack trace upon rethrowing. Java keeps the original stack trace. Python actually mixes the stack trace of the original and the rethrow together. So if we want to really support stack tracing, WebAssembly probably needs to provide more basic stack primitives with which languages can implement their own stack-tracing semantics.

Thanks for letting me know that __cxa_rethrow does not preserve the stack trace. It's good to know that I don't need to implement it in C++. But as you said, there are still other languages that do preserve it. I also checked myself and Python and Java seem to preserve the stack traces, and they also allow rethrowing within nested function calls. (I didn't check C# because I didn't have the environment installed.) So I'm not sure why that we don't need to do this in C++ means we don't need exnref.

And exnref was not only added for __cxa_rethrow. It was added for general expressiveness. Without that various code transformation in toolchain suffers a lot.

  1. On that note, it's worth going into another detail of C#/.NET. The C#/.NET filters we talked about are executed before any stack unwinding is done. This isn't an implementation detail; it's visible in the semantics. In particular, the custom filter can be stateful, meaning any changes to the state that would be made by finallys and destructors could affect the filter if they were executed before the filter. So in terms of stack primitives, there's stack walking (finding a handler) and there's stack unwinding, and C#/.NET keeps these primitves separate. On the other hand, the current proposal bundles them together: the stack unwinding is already executed and internally visible by the time catch is executed before it even has a chance to consider whether it wants to actually catch the exception. Sure, if an engine really wants it can "preserve" the stack in a sense, but by the time we can figure out that the exception gets to the top all the destructors will have been run, the resources and memory would have been freed, and many of the stack frames would have been mangled, making for a pretty unhelpful stack. For the same reason, it'd be hard to ever add (potentially-)resumable exceptions because, in the process of walking the stack to find out if it might be resumed, any code that didn't recognize that exception would have ran its destructors and so can no longer be resumed. For resumable exceptions, you need to separate stack-walking code from stack-unwinding code.

I think this paragraph basically means "This proposal, without two-phase stack unwinding, does not support resumable exceptions", right? I don't know why this proposal has to support resumable exceptions in the first place. It will be a separate proposal, and it may have two-phase stack unwinding with filters if it needs to. It will be likely to have a separate set of primitives anyway.

@RossTate
Copy link
Contributor Author

I think this paragraph basically means "This proposal, without two-phase stack unwinding, does not support resumable exceptions", right? I don't know why this proposal has to support resumable exceptions in the first place. It will be a separate proposal, and it may have two-phase stack unwinding with filters if it needs to. It will be likely to have a separate set of primitives anyway.

No, this paragraph explains why you can't add resumable exceptions. Per your request, I filed a separate issue going into this topic in more depth (see #104).

I don't want to make a bunch of simultaneous interrelated issues because then we get cross-referencing conversations that are hard to review or jump into. So for possible future reference I'll just make notes on other unresolved issues that came up in this thread (but not expected to be resolved in this thread, except possibly the first):

  1. Forces (inefficient) eager stack tracing (if exnref conventionally has the stack trace of the original throw)
  2. No debugging support for pausing execution at a point where an uncaught exception is thrown while keeping the stack intact (related to Forwards-incompatible with resumable exceptions #104)
  3. Support for Java stack-tracing semantics but not for other stack-tracing semantics such as Python's (which I mentioned is not the same—try throwing the exception in a nested function call and reraising it via raise; in another nested function call), which contradicts WebAssembly's language-agnostic philosophy
  4. __cbx_rethrow works incorrectly across linked C++ modules unless those modules are both linked to a common module responsible for maintaining the "catch" stack
  5. Given that the common case is to rethrow an exception from within its catch clause, binaries could be made smaller by optimizing for this common case. Same goes for adding finally to consolidate clean up code (both destructors and unwinding of the global stack pointer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants