-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More rationale on the choice of pre-order and control structures in the binary format #295
Comments
Side note: I believe @LouisLaf had concerns in #239 (comment), the surrounding discussion provides good context. Agreed that we should clarify. |
On Thu, Aug 13, 2015 at 3:10 AM, Wouter van Oortmerssen <
if (x) TS else FS If we want to write a one-pass JIT, then we want to: 0.) step 0.) is essentially the pre-order visitation point. Steps 2) and 4) are
One important point is that most engines (V8 included) don't properly
|
@titzer: thanks for the detailed reply. I guess it comes down to what use cases are deemed most important. A statement of the form "while pre order and post order are similar from a binary encoding perspective, pre-order works best for A B and C, while post-order would have advantages for X Y and Z. because of this, we chose pre-order." would be good. I see your point about the control structures. Slightly clumsier to generate, but easier to work with for the wasm implementation, I agree that that's a win. I'm curious what effect this will have on future programming language support. Also, does it mean that it is possible to write really convoluted gotos in C++ that cannot be translated? How will this "error" percolate up to the programmer? |
@gwvo "Allows context-dependent index spaces (described above)" It's clever, but I can't comment on whether it saves a substantial amount of space. "Also, does it mean that it is possible to write really convoluted gotos in C++ that cannot be translated?" Emscripten apparently translates arbitrary control flow into a weird loop+switch construct. It's awkward, but it shows that you can translate arbitrary control flow to JavaScript. (see http://davideglintine-new.googlecode.com/hg/docs/paper.pdf) var label = 0;
while(1) switch(label) {
case 0: label = 1; break;
case 1: label = 2; break;
case 2: return;
} I think WebAssembly can expose the same functionality in a way that can be directly translated to JavaScript, but can also just as easily generate more efficient code for VMs that support arbitrary control flow. You just need a node that means "var label = 0; while(1) switch(label) {}", and a node that means "label = literal int; break;" |
FWIW, while I initially proposed (and implemented, in polyfill-prototype-1), context-dependent index spaces as a way to reduce contention for single-byte opcode encodings when there are many ops, more thinking about what was achievable through macros and module-local index spaces caused me to change my mind and remove the section describing it (but not the reference, as you've pointed out, which I can nix now) from BinaryEncoding.md. |
@AndrewScheidecker : thanks for the explanation.. though I personally think remapping opcode assignments is a bit overly flexible, I can see the use of it, and I'm fine with it. It could be clearer though. As for the loop + switch construct, I presume that gets used only in "desperate" cases, because clearly that takes up more space than the equivalent gotos. |
The loop+switch construct would indeed be larger than the equivalent gotos, The benefit that a such specialized construct provides is to enable a On Wed, Aug 19, 2015 at 5:51 PM, Wouter van Oortmerssen <
|
Also note that this is an area where WebAssembly is likely to grow beyond the initial features set in the MVP, so loop+switch is not necessarily the end of the story. FutureFeatures has a brief section mentioning some options. |
@lukewagner Noting your comment based on experience with polyfill-prototype-1 and support for macros, has it been decided that there will no be 'context-dependent index spaces'? I would support such a position as I think it will make it much more difficult to make tools that work with the binary code if the index are context-dependent. I think that the use of what I would call 'vectors' in polyfill-prototype-1, where the number of elements in vectors (or lists) is emitted before each of these elements are emitted, would also complicate tools that manipulate the binary code. For example, adding or removing elements could not be streamed, rather the entire vector would need to be re-generated and the new length determined before any of it could be emitted. I see that the binary design precludes tagging, but if this did not increase the size too much then it might help solve a range of challenges. There appear to be the following objects in polyfill-prototype-1: uint32, int32, null terminated string, f32, f64, nodes. If someone could come up with a relatively efficient way to tag these then it might still deserve consideration. A lot of effort seems to have been put into the encoding efficiency in polyfill-prototype-1. In a number of places I was wondering if it was really worth the extra complexity. It might be cleaner if some of the encoding compression were layered and shared. If it helped then using variable bit length fields might be less complex if layered and might do better. I see that a polyfill is inbound on emscripten, but it does not appear to reflect the current binary design, and are there any plans to re-do this, or is the design being worked out in the spec? Will the polyfill support all existing asm.js code, even style not supported in wasm such as globals, so that it can be used on existing asm.js code, or would this be better done with a separate polyfill? |
I think there are still split opinions on this, but I am currently in favor of removing. In #287 and IRL we've agreed to postpone these kinds of purely binary encoding choices until after we have "finished" the definition of the AST and semantics and are able to generate realistic-sized ASTs (using the S-Expr temporary text format) which we can then use to do apples-to-apples A/B tests for these types of questions.
Agreed that layering can reduce the complexity in the base layer; this is reflected in the current binary encoding design (where only the "raw" layer is implemented by a browser). As noted in the README.md and name, the polyfill-prototype was only intended as a proof of concept to demonstrate the point that: (1) a polyfill was possible/effective, (2) a binary format could show significant size wins over text, even after gzip/lzma (both points had doubt before). I should add that the design started with a pure/simple preorder encoding and each subsequent complication was only added after showing significant (>1%) wins after gzip. But the prototype doesn't consider the use of a user-land "layer 1" which allows similar wins without the native decoder complexity. (Indeed I think a polyfill could actually be folded into the layer 1 decoding, blurring the line between the two and allowing us much greater freedom to decouple asm.js-polyfill concerns from wasm.)
That feature will not be advertised as "WebAssembly", but just a "size optimization" flag that addresses real user requests we are getting today. Likely we'd want to also throw in user-mode lzham compression on top (for another ~25% win) which also avoids the need to configure the server for HTTP Content-Encoding. |
As we've moved to a post-order format and other finegrain items subsume most of this, closing. |
This section:
https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#serialized-ast
could use some more rationale as to why a pre-order encoding is being used.
It also says "Allows context-dependent index spaces (described above)", which is not obvious what that refers to, nor do I see how that would depend on it being pre-order.
In fact, I can see advantages to post-order representation:
an otherwise very linear parser. Pre-order can be equally efficient if you want, but it fits recursive
parsing more naturally.
Related to that, some more rationale on requiring backends to use a relooper-like algorithm to produce a limited set of control structures. I understand it results in the absolutely smallest binary, but the algorithm is not entirely non-trivial, and it feels.. less elegant to require this when most backends will have a branch/graph based representation they're working from, and V8 etc presumably also needing to convert this back to a more general representation.
What alternatives have been considered? If the top level of a block is a list of "statements", could a (conditional) branch node that refers to a statement index work also, at a small cost in binary size, but be more generic in what kind of structures it represents, being easier to generate and decode etc.?
The text was updated successfully, but these errors were encountered: