Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move arities from branches to blocks #741

Closed
wants to merge 3 commits into from
Closed

Move arities from branches to blocks #741

wants to merge 3 commits into from

Conversation

rossberg
Copy link
Member

@rossberg rossberg commented Aug 2, 2016

No description provided.

| `block` | `0x01` | | begin a sequence of expressions, the last of which yields a value |
| `loop` | `0x02` | | begin a block which can also form control flow loops |
| `if` | `0x03` | | begin if expression |
| `block` | `0x01` | arity : `varuint1` | begin a sequence of expressions, the last of which yields a value |
Copy link

@ghost ghost Aug 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting a bit lost here on the 'stack machine' proposal. It's not 'expressions' now is it, and it's not the 'the last' that yields a value, rather the block returns the first arity elements from the stack? Does it discard the rest or is it a requirement that stack be empty after that? Does the stack need to be empty after a break operator too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked the text to not be misleading. However, the docs will need a substantial makeover to reflect the move to a stack machine that is outside the scope of this PR.

| `loop` | `0x02` | | begin a block which can also form control flow loops |
| `if` | `0x03` | | begin if expression |
| `block` | `0x01` | arity : `varuint1` | begin a sequence of expressions, yielding 0 or 1 values |
| `loop` | `0x02` | arity : `varuint1` | begin a block which can form control flow loops |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop having an arity is awkward, because branches targeting a loop label are effectively required to have arity 0. A common validation strategy would record an arity of 0 in the control-flow stack entry for a loop, and then need to separately record the loop arity somewhere else.

Restricting loop to arity 0 would simplify it, and obviate its arity immediate. loops are less frequent than blocks, so the code size win from using loop's return value(s) directly are likely to be small, especially after accounting for the arity immediates it would add to every loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's worth discussing, but should probably be a separate PR. Restricting the semantics of loops that way would be a design change, while this PR is only meant to change the binary encoding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand: branches to a loop must have arity 0. A loop's arity can be entirely determined by the stack depth at fallthrough. @MikeHolman for the regalloc scheme you guys were talking about, is there any issue for loops in a multi-value setting? I'd assume not since there are no forward branches, so no join point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukewagner The loop arity would describe loop's own return value, and not to branches to the loop top. This is non-obvious, so I created #742 to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but my point is that, unlike branches where the end is a join from fallthrough and branches, the loop's end cannot be the target of branches, so it seems fine to simply define the arity as the stack depth at end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I still like the symmetry of giving branch target immediates a direction bit; in that case the arity of loop would be meaningful in the same way as block.

@rossberg
Copy link
Member Author

rossberg commented Aug 2, 2016

One more downside I just realised about moving arities from branches to blocks is that it requires changing the S-expr format: the arity now has to be explicit everywhere, and the same would probably be true for any other text format. For example, it requires changing all our tests (even if we allow omitting arity 0 as a shorthand).

The nice thing about arity on branches was that e.g. in the S-expr form, you could just derive it from the syntax, and it was only explicitly needed in the flat op form.

Together with the size increase of if's this reduces the attractiveness of the change even further, and I'm almost tempted to suggest backing out of this part of the proposal...

@kripken
Copy link
Member

kripken commented Aug 2, 2016

Those do seem like good reasons to back out of this. What is the other side of the tradeoff?

@rossberg
Copy link
Member Author

rossberg commented Aug 2, 2016

@kripken, the main advantage of the change is that it simplifies the validation algorithm slightly (and perhaps 1-pass codegen), because you always know all arities upfront.

@titzer
Copy link

titzer commented Aug 2, 2016

Changing all the tests does seem like a lot of work, unless it can be automated somehow, or sugar can be introduced for the (assumedly) common case of 0-arity blocks. I can try implementing this proposal to see for sure if the validator gets simpler. If it doesn't really, then I'd support backing out this proposal as well.

@rossberg
Copy link
Member Author

rossberg commented Aug 2, 2016

@titzer, I already introduced that sugar, but there are many uses of blocks in our tests that aren't 0. The change isn't trivial to automate, because you need to derive the type of each block to adapt it. Of course, such a tool could be written, but that probably is more work than doing it manually.

I'll give it another go tomorrow and see how bad it is. But I wonder more about the long-term trade-off.

@titzer
Copy link

titzer commented Aug 4, 2016

Given we got a little bit of cold feet for the block/if arity change, can we split this PR into two and land the uncontroversial removal of call/return arities first?

@rossberg
Copy link
Member Author

rossberg commented Aug 5, 2016

@titzer, done: #748

@rossberg rossberg changed the title Remove arities for call/return; move arities from branches to blocks Move arities from branches to blocks Aug 5, 2016
@rossberg
Copy link
Member Author

rossberg commented Aug 5, 2016

I pondered about the br/block arities some more. As the one who proposed this change, I would now like to withdraw it, because there are several disadvantages I did not anticipate:

  • We are keeping if-labels. That vastly increases the cost of the change, probably pushing the balance way into the negative in many cases (e.g. 163K uses of if in AngryBots).
  • Worse, we are likely imposing the same cost on any block construct we gonna add in the future.
  • The move may necessitate additional design changes, like Remove loop's result value(s). #742.
  • It is an intrusive change to the (any) text format.

The only thing that remains on the plus side is a slight simplification in validation and one-pass code generation. But given that it is small, and everybody has already implemented the current format anyway, that does not seem worth it.

@MikeHolman
Copy link
Member

For the first point, that is not a concern if we add if0, which may or may not be premature optimization right now. The second point would also apply to any new branch constructs under the current scheme.

My concern is that this will be more complicated once we have multivalue brs. Knowing the arity (or better, signature) of a block up front allows for a more informed codegen.

I'm sure we can deal with the current semantics, but it is one of the few places where it feels like we are fighting against the language.

@sunfishcode sunfishcode modified the milestone: MVP Aug 6, 2016
@rossberg
Copy link
Member Author

@MikeHolman, we can equally add br0 and friends, so that is not pushing it in any particular direction (though in the past we decided against prematurely adding such shortcuts, which are best seen as opcode table / layer 1 problems).

The broader observation is that moving the arity to blocks is imposing the cost (in terms of either binary size or opcode duplication) on all block-like constructs we're ever going to add. And that is a much more open-ended set than the br-constructs. Moreover, it is only the br-constructs that actually need this information, so it makes more sense to associate it with them.

I don't think you need to worry about implementation complexity: Ben has already implemented full multi-value support for the current design, and it is straightforward.

@ghost
Copy link

ghost commented Aug 11, 2016

One advantage is that the block fall-though would then know the number of values to retain and could discard the rest. This might be a requirement for handling multiple values as values will remain on the stack (unless there are swap/drop sequence to remove them which I would not recommend). This requirement could also be achieve by the producer using a break to end a block.

It is also not clear that this would be 'an intrusive change to the text format.'. Would it really require blocks to include the arity, or could this just be implicit in the context, from the number of values require by the consumer of the block.

Would be interested to see Ben's 'full multi-value support for the current design'?

@lukewagner
Copy link
Member

@rossberg-chromium But the implementation concern is having the a priori knowledge of the arity of a block exit before entering a block and encountering any of its forward branches. I can see this being useful in a number single-pass compilation techniques in addition to the valid problems Chakra pointed out for their impl. Given the hundreds of opcodes yet to add, I'm not especially worried about a few *0 additions. Alternatively, depending on how "layer 1" works out; if it's like a macro layer and gives us 1-byte encodings of get_local 0 or i32.load offset=0 align=4 etc (generalizing the "opcode table with fixed immediates" idea), it could also give us *0 for all the block-y opcodes so we don't even need to actually duplicate them in layer 0. We've already forgone major size wins by not having Java-style get_local 0...get_local 3/set_local 0...set_local 3, so it feels like, one way or another, layer 1 is going to give us something like this.

@rossberg
Copy link
Member Author

@lukewagner, yeah, the size concern may be a wash either way, we won't know without a lot of measurements. But that was just one of the arguments.

I am a bit surprised about the implementation concerns, and would like to understand them better. After all, I am simply arguing for sticking with what has been the status quo for a long time, and I haven't heard such arguments coming up before. What has changed?

@lukewagner
Copy link
Member

With the stack machine, we have a clear path to multi-value (blocks, returns, everything). With 0xb (and even explicit-drop), one could simply assume every single block returned a single (possibly "void") value.

@ghost
Copy link

ghost commented Aug 14, 2016

@rossberg-chromium I presume your reference to Ben's 'full multi-value support for the current design' is https://codereview.chromium.org/2176653002/ If so then it is interesting and great to see support for calls returning multiple values. However:

  1. The design seems to depend on the block fall-through returning all the remaining values on the stack which clashes with multiple value support where it might not be possible to discard some values up the stack.
  2. The design seems to have no provision for accessing the values up the stack. Some code can be expressed, such as returning two values and then subtracting them, but if they need to be subtracted in the reverse order then this does not appear possible.

Thus it seems to need more development.

I see that branches still unwind, so (block-old ... ) == (block-new ... (br 0 ...)). If the blocks have an 'arity' then the fall-though can be equivalent to a trailing branch and the differences between these block semantics are gone.

At the same time the requirement to explicitly drop is also gone as values can just be left on the stack. The drop operator can remain and producers might be wise to use it where it fits to both minimize the values stack depth for the runtime and to reduce the live set, but it might not be a hard requirement.

It looks like it might be best if all values not returned at the end of a block are required to be void which would dictate the use of drop and other operators, but this might be deemed too much of a burden on producers.

@qwertie
Copy link

qwertie commented Aug 14, 2016

Some code can be expressed, such as returning two values and then subtracting them, but if they need to be subtracted in the reverse order then this does not appear possible.

I think the most common case will be passing structures. For example, if a function returns a point (X, Y) which is then consumed by another function, there will be no need to reverse the order of the fields.

Of course, sometimes it is necessary to do more elaborate operations, but no one has presented data showing that operators that you've proposed like pick would provide major space savings to justify their cost. I'm not just talking about the implementation cost of having to support both local variables and stack manipulation operators. It also seems like the stack is harder to reason about, at least as long as stack locations are expressed as "distance from top of stack". It makes me uncomfortable that the same location on the stack, the same value, must be addressed by a different integer every time any value is pushed or popped.

@ghost
Copy link

ghost commented Aug 14, 2016

@qwertie Structures are generally accessed, and efficient access to the slots is needed, even functions using structures get inlined. I have some uses in mind for multiple value results that would be high frequency such as returning the number of dynamic values on a stack in linear memory plus one of more values that could be returned in values for efficiency in common cases.

At least multiple result values can be stored to locals with Ben's changes as the store-local operator does not push a void element.

The pick operator alone appears to be a trivial operator to implement, both for the SSA decoder and an interpreter. It could be emulated by storing to a local variable and reloading as needed, and clearly it is just less overhead for the runtime decoder to short-circuit this. It's just as easy for the SSA decoder to reason about.

The pick_and_void aka take operator might be a different matter as it mutates the stack. Perhaps the SSA decoder would have to check that the void elements were consistent at merge points, but it might be done efficiently by comparing bitmaps. Needs to be explored. It might also have significant benefits for the SSA decoding to know the live range, so should be a nett plus. For example, it could be a validation error for one branch of an if operator to void a stack element but not the other branch, and a stack element bitmap could be used to track this when validating.

Some example conversions of wasm code to exploit the pick operator might give results to help choose the encoding of the index to the values. I expect that an index relative to the top-of-stack would work well for producers inserting code or inlining etc, but it's not so nice for accessing elements relative to the start of a function.

Given that adding a pick operator enables the function arguments to be usefully moved onto the values stack I think it needs to be done before the MVP, as changing the calling conventions is a significant change.

@rossberg
Copy link
Member Author

On 14 August 2016 at 06:42, JSStats notifications@github.com wrote:

@rossberg-chromium https://github.com/rossberg-chromium I presume your
reference to Ben's 'full multi-value support for the current design' is
https://codereview.chromium.org/2176653002/ If so then it is interesting
and great to see support for calls returning multiple values. However:

The design seems to depend on the block fall-through returning all the
remaining values on the stack which clashes with multiple value support
where it might not be possible to discard some values up the stack.

That is intentional and consistent with the explicit-drop semantics; the
idea is that a block as such should not have any operational affect.

The design seems to have no provision for accessing the values up the
stack. Some code can be expressed, such as returning two values and then
subtracting them, but if they need to be subtracted in the reverse order
then this does not appear possible.

In terms of operator set, this is certainly not a full stack machine design
yet -- I don't think I claimed that. But adding additional operators like
pick is fully compatible with that implementation. (FWIW, you can already
"destructure" the top of the stack arbitrarily through a call to an
auxiliary function.)

I see that branches still unwind, so (block-old ... ) == (block-new ...
(br 0 ...)).

All control transfer (branch, return, unreachable, a future throw...)
unwinds the stack up to its target, because that's the only sensible thing
it can do. It's orthogonal to what the target is.

@sunfishcode sunfishcode mentioned this pull request Aug 16, 2016
@titzer
Copy link

titzer commented Sep 6, 2016

After a lot of discussion, closing this in favor of #765

@titzer titzer closed this Sep 6, 2016
@titzer titzer deleted the arities branch September 29, 2016 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants