Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nop in branch values when the target is not expecting a value. #518

Closed
ghost opened this issue Jan 19, 2016 · 15 comments
Closed

Nop in branch values when the target is not expecting a value. #518

ghost opened this issue Jan 19, 2016 · 15 comments
Milestone

Comments

@ghost
Copy link

ghost commented Jan 19, 2016

The V8 encoding uses a nop as filling in br operations see WebAssembly/wabt#20 Can the spec be modified to accept this usage?

The V8 encoding used the following pattern.

$ cat test4.wast
(module
  (func
    (block $exit
      (br $exit (nop)))))
$ ./wasm  test4.wast
test4.wast:4.7-4.23: arity mismatch
@AndrewScheidecker
Copy link

It seems like an inefficiency in the V8 encoding if it requires an explicit nop there when the context implies that the branch target doesn't expect a value.

@rossberg
Copy link
Member

This used to be allowed by the spec but is no longer. I would rather view the use of the nop opcode to encode an absent optional parameter a detail of the binary encoding, not part of the AST semantics.

@lukewagner
Copy link
Member

Actually, I just ran into this yesterday and I was appreciating the way it is currently spec'd since, if the expected type of a br or return is None, you know you don't have to recursively decode children. Otherwise, we'd need the binary format to distinguish (br $exit (nop)) from (br $exit) which means either two opcodes or one opcode and always adding a byte to a pretty common operator.

@titzer
Copy link

titzer commented Jan 19, 2016

It's really just a consequence of doing bottom-up verification in the V8
implementation. One nice property of bottom up is that checking an
expression is always context independent by definition. E.g. it doesn't
require outer context to tell that a br or br_if is expecting children
nodes. In fact, all operators have their arity fixed by their first few
bytes.

On Tue, Jan 19, 2016 at 4:28 PM, Luke Wagner notifications@github.com
wrote:

Actually, I just ran into this yesterday and I was appreciating the way it
is currently spec'd since, if the expected type of a br or return is None,
you know you don't have to recursively decode children. Otherwise, we'd
need the binary format to distinguish (br $exit (nop)) from (br $exit)
which means either two opcodes or one opcode and always adding a byte to a
pretty common operator.


Reply to this email directly or view it on GitHub
#518 (comment).

@lukewagner
Copy link
Member

Yes, that makes sense. I haven't really considered the ramifications of top-down (which is what's in ml-proto/spec/check.ml) vs. bottom-up. Thinking about this a bit more right now, if we want to move to a postorder serialization (which I do), then we wouldn't have parent context so we'd need a bottom-up algorithm.

Assuming bottom-up, for an op as common as br, it seems like a good idea to have an explicit nullary br/br_if AST node and opcode.

@titzer
Copy link

titzer commented Jan 19, 2016

On Tue, Jan 19, 2016 at 5:23 PM, Luke Wagner notifications@github.com
wrote:

Yes, that makes sense. I haven't really considered the ramifications of
top-down (which is what's in ml-proto/spec/check.ml) vs. bottom-up.
Thinking about this a bit more right now, if we want to move to a postorder
serialization (which I do), then we wouldn't have parent context so we'd
need a bottom-up algorithm.

Assuming bottom-up, for an op as common as br, it seems like a good idea
to have an explicit nullary br/br_if AST node and opcode.

Agree. That would also save a byte.


Reply to this email directly or view it on GitHub
#518 (comment).

@kg
Copy link
Contributor

kg commented Jan 19, 2016

Saving individual bytes (when most of them are 0 anyway) is way less important than ensuring the structure of the file is highly compressible. It does seem like requiring top-down validation (in order to know whether to read a child expression for the br/br_if) would be an unnecessary bit of complication, though. It's too magical, so we probably shouldn't do it.

I'd prefer that we just defined separate opcodes for br/br_if that accept a value, since the case where they accept a value is going to be incredibly rare compared to the no-value case. Then the shape of the opcode is always predictable and you don't have to do any inference. So +1 on Luke's nullary opcode suggestion.

@rossberg
Copy link
Member

As I pointed out elsewhere, introducing separate operators on the AST level would become extremely odd in the presence of multiple values (if we ever add them), where there would more than just 0 or 1 -- in the same way that it would be odd to have a separate operator for nullary calls.

I'd much prefer to consider optional operands, like lists of operands, an encoding level problem, not something that shows through in the AST semantics. As for the current V8 format, representing nullary by a pair of br opcode and nop opcode should already be perfectly compressible, I reckon?

FWIW, either way I don't believe this affects top-down vs bottom-up typing considerations at all.

@ghost
Copy link
Author

ghost commented Jan 20, 2016

Thank you for all the feedback. It's still not clear to me that this should fail. The expected type is void or empty-values, and the expression type is the same? It seems just a bug to assume that the expression returns values without checking it.

@rossberg-chromium I think we see multi-value support differently as I don't see a problem with a separate convenience opcode for the case in which there is no expression (br label). The multi-value case would be (br_exp label (values 1 2 3)) or (br_exp label (call $fn_returning_multiple_values ..)) etc and would not be (br_n label v1 v2 v3 ... vn).

@lukewagner Using the parent expected type to determine the number of expressions required by a child expression sounds very unfortunate. However it would be fine for a compressor to make coding choices based on context. Please let us know when you have some sketch of an alternative binary encoding as it might take some time to evaluate and it might be unnecessary to have all the details to form some opinions - for example we can run a bottom up compression predictor and compare it to a top down predictor.

@kg It might be too early to assume that 'the case where they accept a value is going to be incredibly rare compared to the no-value case' because the emitted code is just not yet optimize to take advantage of expressions at the block level. With multiple-value support the opportunities also increase.

@lukewagner
Copy link
Member

@rossberg-chromium Ah, good point: this is an expr option type, not simply an expr so we have some flexibility for how to encode option types. So I have this rough idea, already mentioned elsewhere, to extend opcode tables to be able to define not just the string operator name of each opcode but to also optionally fix any of the immediates. This would allow you to, e.g., have a 1-byte set_local 0, set_local 1, etc without having to explicitly provision these like Java does. Assuming one could fix None for optional fields in the opcode table, this would also allow a 1-byte (br 0 None).

@ghost
Copy link
Author

ghost commented Jan 20, 2016

@lukewagner Yes, I see some tints that an opcode table that can specify immediate arguments could be useful. Tried a table of extended opcodes that do this, but giving each a separate name. For example: set_local/ll for a set_local encoding variant that accepts two immediate labels and becomes an effective copy_local operator, and br/l for an immediate label, br/c for an immediate constant exp, br/n for no exp. Plus many combinations for the binary operators, immediate local and constant arguments. Similarly for the memory operators, plus allowing an immediate addition expression which again could have many combinations of immediate locals and constants. It improve the pre-compressed size and did not destroy the compression ratio but need to explore this further - some seem to help more than others. But this might all be abstracted in the planned opcode table, giving the producer flexibility to choose and optimize.

@kg
Copy link
Contributor

kg commented Jan 21, 2016

@kg It might be too early to assume that 'the case where they accept a value is going to be incredibly rare compared to the no-value case' because the emitted code is just not yet optimize to take advantage of expressions at the block level. With multiple-value support the opportunities also increase.

In the short-to-mid-term it seems unlikely that any code generator will actually use this feature. Maybe someone in the CG can correct me, but the only time I've seen break-with-value is in the .NET expression tree representation. It's not in llvm IR, the x86 or arm instruction sets, or in any language I've ever seen as a flow control primitive.

I agree that it could be compelling once compilers are able to utilize the feature, but until then we'd be paying the cost for it with every file shipped. To be fair, if I remember correctly the cost of an unneeded stream of Mostly Zeroes was on the order of 3% in my tests, so maybe we don't care about the difference.

This would allow you to, e.g., have a 1-byte set_local 0, set_local 1, etc without having to explicitly provision these like Java does. Assuming one could fix None for optional fields in the opcode table, this would also allow a 1-byte (br 0 None).

We've previously referred to this (in a more general sense) as non-nullary macros. It's a good idea to consider doing it in a more constrained fashion and just implement support for an opcode+immediate pair directly in the opcode table. It would be much easier to specify and implement.

@ghost
Copy link
Author

ghost commented Jan 21, 2016

@kg Fwiw here are the counts and percentages for br with no value, with a value from a label, and with a value from a general expression, for the zlib benchmark converted view asm.js, and yes 10% is not that significant, but I seem to recall it helped compression a little.

br/n                             825     3.12
br/l                              88     0.33
br/s                              19     0.07

Yes, I am developing an optimization pass that tries to better use the operations in this regard but these are also expected to reduce the use of br too. It's not necessary for 'compilers' to hold this back, post-processing code can further optimize the code in the interim. I don't think we can draw sound conclusions based on poor code generation. For example, it's not sound to focus on a compiler that only emits br_if and not if or if_else and conclude that it is optimal to remove if and if_else.

There seems to be some support now for exploring the 'opcode+immediate pair directly in the opcode table'. Let's see how this goes. Perhaps while exploring this people can stretch this to also handle the load/store-add patterns etc. Who else has an interest in working on thIs?

@sunfishcode sunfishcode added this to the MVP milestone Feb 12, 2016
@sunfishcode
Copy link
Member

An idea discussed recently is to change branches and other instructions have an arity immediate which would indicate the number of result values. It could be initially limited to 0 or 1, but would leave room for extending to multiple results in the future. This would also eliminate the need for nop opcodes before branches, because a branch with no results would just have a 0 arity.

@ghost
Copy link
Author

ghost commented Feb 27, 2016

@sunfishcode Thank you for looking into this issue. Depending on the goals of the post-order encoding some of the mv issues might be affected, but here are some points:

  1. There seems to be agreement that mv support would include a tuple/values operator and that this would support returning multiple values.
  2. There seems to be agreement that br would return all the values of a single expression. For example (br $l (call $fn_returning_three_values)) would return three values from the block.

This suggests that it would only be unnecessary for br to accept one expression and when a mv result was need the following pattern could be used: (br $l (tuple 1 2 3)). Under this interpretation (br $l (nop)) == (br $l (tuple)) and the v8 encoding seem consistent.

  1. There has been a br0/br1 proposal that might give a more compact encoding, and I would support this. Where (br0 $l) == (br1 $l (nop)).
  2. Note in these examples that the arity (one) does not equal the number of values returned (three), just as in the issue here the arity of (br $l (nop)) (one) does not equal the number of return values (zero). While not very practical, the following should arguably work at present too (br $l (call $fn_returning_zero_values)).
  3. If patterns such as (br $l (tuple 1 2 3)) are very common then a new operator could be added along with the mv support. For example (mv_br $l 1 2 3) == (br $l (tuple 1 2 3)) which could encoding the arity of the operation and could be defined to return one value for an arity of one.
  4. A move to a post-order encoding might change how expression values need to be handled if there is an expectation that the code can be interpreted literally and is not just an encoding of an AS. So I would like to see the port-order proposal to take another look at these issues.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants