More rationale on the choice of pre-order and control structures in the binary format #295

ghost · 2015-08-13T01:09:44Z

This section:
https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#serialized-ast
could use some more rationale as to why a pre-order encoding is being used.
It also says "Allows context-dependent index spaces (described above)", which is not obvious what that refers to, nor do I see how that would depend on it being pre-order.

In fact, I can see advantages to post-order representation:

Allows things like validation or building up a tree by using nothing but a temporary node stack, and
an otherwise very linear parser. Pre-order can be equally efficient if you want, but it fits recursive
parsing more naturally.
Closer fit to backends that use stack-machines in some way.

Related to that, some more rationale on requiring backends to use a relooper-like algorithm to produce a limited set of control structures. I understand it results in the absolutely smallest binary, but the algorithm is not entirely non-trivial, and it feels.. less elegant to require this when most backends will have a branch/graph based representation they're working from, and V8 etc presumably also needing to convert this back to a more general representation.

What alternatives have been considered? If the top level of a block is a list of "statements", could a (conditional) branch node that refers to a statement index work also, at a small cost in binary size, but be more generic in what kind of structures it represents, being easier to generate and decode etc.?

jfbastien · 2015-08-13T01:25:57Z

Side note: I believe @LouisLaf had concerns in #239 (comment), the surrounding discussion provides good context.

Agreed that we should clarify.

titzer · 2015-08-13T04:36:45Z

On Thu, Aug 13, 2015 at 3:10 AM, Wouter van Oortmerssen <
notifications@github.com> wrote:

This section:

https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#serialized-ast
could use some more rationale as to why a pre-order encoding is being used.
It also says "Allows context-dependent index spaces (described above)",
which is not obvious what that refers to, nor do I see how that would
depend on it being pre-order.

In fact, I can see advantages to post-order representation:

Allows things like validation or building up a tree by using nothing
but a temporary node stack, and an otherwise very linear parser. Pre-order
can be equally efficient if you want, but it fits recursive parsing more
naturally.

Closer fit to backends that use stack-machines in some way.

This only makes sense if your parser for the bytecode only does
interesting things at post-order time, i.e. when a syntax tree node is
completed by having its sub-trees completed. If your parser wants to do
interesting things at pre-order or in-order points, e.g. splitting an
SSA-renaming environment, or generating machine code, then a postorder
encoding will always imply a second pass. Concretely, suppose we were
writing a baseline JIT and we had:

if (x) TS else FS

If we want to write a one-pass JIT, then we want to:

0.)
1.) gen code for x
2.) gen a branch_if_false to a label for FS (unresolved)
3.) gen code for TS and then a jump to the end of the if (unresolved)
4.) bind the label for FS, patching the branch generated in step 2
5.) gen code for FS and fallthru
6.) patch the label generated in step 3

step 0.) is essentially the pre-order visitation point. Steps 2) and 4) are
in-order, and step 6) is the post-order point. If we have a post-order
encoding of the tree, there is no way to make a one-pass baseline JIT work,
since it would have to first scan pass the bytecode for x, TS, and FS
before reaching the if, keeping a stack of them (or recursing). But only
when you reach the if does it know in which order to generate code and that
there should be branches in between. But worse, it's still too early, since
this if could clearly be nested in some other structure which is on the
stack. You essentially have to wait until you construct a full AST and then
visit it again.

Related to that, some more rationale on requiring backends to use a
relooper-like algorithm to produce a limited set of control structures. I
understand it results in the absolutely smallest binary, but the algorithm
is not entirely non-trivial, and it feels.. less elegant to require this
when most backends will have a branch/graph based representation they're
working from, and V8 etc presumably also needing to convert this back to a
more general representation.

What alternatives have been considered? If the top level of a block is a
list of "statements", could a (conditional) branch node that refers to a
statement index work also, at a small cost in binary size, but be more
generic in what kind of structures it represents, being easier to generate
and decode etc.?

There was actually a lot of discussion about this and the benefits of
structured control flow seem to be highly worth preserving. It's not
primarily about encoding size. The encoding size might be more rightly
considered a happy accident.

One important point is that most engines (V8 included) don't properly
support irreducible control flow graphs, so unrestricted gotos would have
to be either policed or internally processed before making it to the
optimizing compiler. Or, optimizing compilers in the major browser engines
would have to be upgraded to handle irreducible control flow specifically
for wasm. Side note: JVMs generally don't optimize bytecode methods with
irreducible graphs either; they are stuck in the interpreter.

—
Reply to this email directly or view it on GitHub
#295.

ghost · 2015-08-14T18:12:10Z

@titzer: thanks for the detailed reply. I guess it comes down to what use cases are deemed most important. A statement of the form "while pre order and post order are similar from a binary encoding perspective, pre-order works best for A B and C, while post-order would have advantages for X Y and Z. because of this, we chose pre-order." would be good.

I see your point about the control structures. Slightly clumsier to generate, but easier to work with for the wasm implementation, I agree that that's a win. I'm curious what effect this will have on future programming language support. Also, does it mean that it is possible to write really convoluted gotos in C++ that cannot be translated? How will this "error" percolate up to the programmer?

AndrewScheidecker · 2015-08-17T21:35:03Z

@gwvo "Allows context-dependent index spaces (described above)"
I think what that is saying is that with pre-order serialization, the encoding of opcodes etc can be dependent on the context. For example, the polyfill prototype partitions opcodes by result type, and requires knowledge of the result type to decode a subtree. That knowledge is nearly always implicit, so it rarely needs to store it.

It's clever, but I can't comment on whether it saves a substantial amount of space.

"Also, does it mean that it is possible to write really convoluted gotos in C++ that cannot be translated?"

Emscripten apparently translates arbitrary control flow into a weird loop+switch construct. It's awkward, but it shows that you can translate arbitrary control flow to JavaScript. (see http://davideglintine-new.googlecode.com/hg/docs/paper.pdf)

var label = 0;
while(1) switch(label) {
    case 0: label = 1; break;
    case 1: label = 2; break;
    case 2: return;
}

I think WebAssembly can expose the same functionality in a way that can be directly translated to JavaScript, but can also just as easily generate more efficient code for VMs that support arbitrary control flow. You just need a node that means "var label = 0; while(1) switch(label) {}", and a node that means "label = literal int; break;"

lukewagner · 2015-08-17T21:52:22Z

FWIW, while I initially proposed (and implemented, in polyfill-prototype-1), context-dependent index spaces as a way to reduce contention for single-byte opcode encodings when there are many ops, more thinking about what was achievable through macros and module-local index spaces caused me to change my mind and remove the section describing it (but not the reference, as you've pointed out, which I can nix now) from BinaryEncoding.md.

ghost · 2015-08-19T21:51:40Z

@AndrewScheidecker : thanks for the explanation.. though I personally think remapping opcode assignments is a bit overly flexible, I can see the use of it, and I'm fine with it. It could be clearer though.

As for the loop + switch construct, I presume that gets used only in "desperate" cases, because clearly that takes up more space than the equivalent gotos.

AndrewScheidecker · 2015-08-19T22:41:27Z

The loop+switch construct would indeed be larger than the equivalent gotos,
but I believe it's rarely needed. I just looked at a small Emscripten
compiled program I was using for testing and 9/679 "do"s used this do
switch pattern.

The benefit that a such specialized construct provides is to enable a
purely local transform of that construct into your choice of JavaScript, a
CFG, or bytecode with gotos. IMO it's worth frontloading such analysis at
compile-time instead of making the runtime do it, even if it makes the
binaries a little larger.

On Wed, Aug 19, 2015 at 5:51 PM, Wouter van Oortmerssen <
notifications@github.com> wrote:

@AndrewScheidecker https://github.com/AndrewScheidecker : thanks for
the explanation.. though I personally think remapping opcode assignments is
a bit overly flexible, I can see the use of it, and I'm fine with it. It
could be clearer though.

As for the loop + switch construct, I presume that gets used only in
"desperate" cases, because clearly that takes up more space than the
equivalent gotos.

—
Reply to this email directly or view it on GitHub
#295 (comment).

sunfishcode · 2015-08-19T22:53:47Z

Also note that this is an area where WebAssembly is likely to grow beyond the initial features set in the MVP, so loop+switch is not necessarily the end of the story. FutureFeatures has a brief section mentioning some options.

ghost · 2015-09-17T13:53:01Z

@lukewagner Noting your comment based on experience with polyfill-prototype-1 and support for macros, has it been decided that there will no be 'context-dependent index spaces'? I would support such a position as I think it will make it much more difficult to make tools that work with the binary code if the index are context-dependent.

I think that the use of what I would call 'vectors' in polyfill-prototype-1, where the number of elements in vectors (or lists) is emitted before each of these elements are emitted, would also complicate tools that manipulate the binary code. For example, adding or removing elements could not be streamed, rather the entire vector would need to be re-generated and the new length determined before any of it could be emitted.

I see that the binary design precludes tagging, but if this did not increase the size too much then it might help solve a range of challenges. There appear to be the following objects in polyfill-prototype-1: uint32, int32, null terminated string, f32, f64, nodes. If someone could come up with a relatively efficient way to tag these then it might still deserve consideration.

A lot of effort seems to have been put into the encoding efficiency in polyfill-prototype-1. In a number of places I was wondering if it was really worth the extra complexity. It might be cleaner if some of the encoding compression were layered and shared. If it helped then using variable bit length fields might be less complex if layered and might do better.

I see that a polyfill is inbound on emscripten, but it does not appear to reflect the current binary design, and are there any plans to re-do this, or is the design being worked out in the spec?

Will the polyfill support all existing asm.js code, even style not supported in wasm such as globals, so that it can be used on existing asm.js code, or would this be better done with a separate polyfill?

lukewagner · 2015-09-17T15:49:57Z

Noting your comment based on experience with polyfill-prototype-1 and support for macros, has it
been decided that there will no be 'context-dependent index spaces'?

I think there are still split opinions on this, but I am currently in favor of removing. In #287 and IRL we've agreed to postpone these kinds of purely binary encoding choices until after we have "finished" the definition of the AST and semantics and are able to generate realistic-sized ASTs (using the S-Expr temporary text format) which we can then use to do apples-to-apples A/B tests for these types of questions.

A lot of effort seems to have been put into the encoding efficiency in polyfill-prototype-1. In a
number of places I was wondering if it was really worth the extra complexity. It might be cleaner if
some of the encoding compression were layered and shared.

Agreed that layering can reduce the complexity in the base layer; this is reflected in the current binary encoding design (where only the "raw" layer is implemented by a browser). As noted in the README.md and name, the polyfill-prototype was only intended as a proof of concept to demonstrate the point that: (1) a polyfill was possible/effective, (2) a binary format could show significant size wins over text, even after gzip/lzma (both points had doubt before). I should add that the design started with a pure/simple preorder encoding and each subsequent complication was only added after showing significant (>1%) wins after gzip. But the prototype doesn't consider the use of a user-land "layer 1" which allows similar wins without the native decoder complexity. (Indeed I think a polyfill could actually be folded into the layer 1 decoding, blurring the line between the two and allowing us much greater freedom to decouple asm.js-polyfill concerns from wasm.)

I see that a polyfill is inbound on emscripten, but it does not appear to reflect the current binary
design, and are there any plans to re-do this, or is the design being worked out in the spec?

That feature will not be advertised as "WebAssembly", but just a "size optimization" flag that addresses real user requests we are getting today. Likely we'd want to also throw in user-mode lzham compression on top (for another ~25% win) which also avoids the need to configure the server for HTTP Content-Encoding.

flagxor · 2016-07-27T13:21:38Z

As we've moved to a post-order format and other finegrain items subsume most of this, closing.

lukewagner mentioned this issue Aug 17, 2015

Make WebAssembly more like assembly #299

Closed

sunfishcode added the binary format label Sep 1, 2015

sunfishcode added this to the MVP milestone Sep 1, 2015

jfbastien assigned titzer Feb 24, 2016

sunfishcode added the clarification label Jul 25, 2016

flagxor closed this as completed Jul 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More rationale on the choice of pre-order and control structures in the binary format #295

More rationale on the choice of pre-order and control structures in the binary format #295

ghost commented Aug 13, 2015

jfbastien commented Aug 13, 2015

titzer commented Aug 13, 2015

ghost commented Aug 14, 2015

AndrewScheidecker commented Aug 17, 2015

lukewagner commented Aug 17, 2015

ghost commented Aug 19, 2015

AndrewScheidecker commented Aug 19, 2015

sunfishcode commented Aug 19, 2015

ghost commented Sep 17, 2015

lukewagner commented Sep 17, 2015

flagxor commented Jul 27, 2016

More rationale on the choice of pre-order and control structures in the binary format #295

More rationale on the choice of pre-order and control structures in the binary format #295

Comments

ghost commented Aug 13, 2015

jfbastien commented Aug 13, 2015

titzer commented Aug 13, 2015

ghost commented Aug 14, 2015

AndrewScheidecker commented Aug 17, 2015

lukewagner commented Aug 17, 2015

ghost commented Aug 19, 2015

AndrewScheidecker commented Aug 19, 2015

sunfishcode commented Aug 19, 2015

ghost commented Sep 17, 2015

lukewagner commented Sep 17, 2015

flagxor commented Jul 27, 2016