Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block signatures #765

Merged
merged 3 commits into from
Sep 14, 2016
Merged

Block signatures #765

merged 3 commits into from
Sep 14, 2016

Conversation

sunfishcode
Copy link
Member

#741 proposed moving the arity immediates from branches to block/if, which has the advantage of giving decoders more information up front, however the arities alone aren't enough for some use cases. This PR implements @MikeHolman's suggestion to provide full type information up front.

This proposes a new "block signature" type to the Type section, so block/if only need a single index immediate to specify their full type, even if they are extended to returning multiple values post-MVP.

This gives us a greater benefit than #741 did at approximately the same code size cost -- one immediate per block and if. And if code size of the immediates becomes a concern, we can introduce an if0 (and similar opcodes for future block-like opcodes).

In my experiments, the decode time cost of indirecting into the type table to obtain the signature data was not significant.

@lukewagner
Copy link
Member

What I like about this change is:

  • Removes the only bit of "inference" from type checking, better matching the rest of wasm.
  • Removes the need for an internal "unknown" type state for blocks during validation.
  • For the same reason that a single-pass compiler would want to know the arity a priori, the type can be useful, e.g., to determine the register class of the block result.

@MikeHolman
Copy link
Member

lgtm, this is exactly what I wanted!

@ghost
Copy link

ghost commented Aug 16, 2016

Could some of the common high frequency block signatures be given specified indexes. For example could block signature index 0 be zero result values, and could there be specified indexes for the single value core types, so index 1 be (i32), index 2 be (i64), etc. Further could it be invalid to have a duplicate signature in this table, to force a canonical encoding.

This might help the text format a lot because there would be canonical encodings for the common cases, and the text format might then not even need to encode the signature index in most cases and might still derive the signature from the text syntax.

@ghost
Copy link

ghost commented Aug 17, 2016

If the signatures could also be required to be in a sorted order this might be even better. Then there is only one canonical table for a set of signatures and the validator can check this by simply checking that they are in order.

@titzer
Copy link

titzer commented Aug 17, 2016

Maybe there is an encoding trick that we could use to obviate the need for signature entries until the multi-value era. E.g. what if the immediate is an inline value type (i.e. void, i32, i64, f32, f64), with one encoding being an index into the type table to be used when we have multi-values?

@ghost
Copy link

ghost commented Aug 17, 2016

If indexes are predefined and specified for the block types used in the MVP then the table will not be needed for the MVP, and there will be a canonical index for them. Perhaps consider sorting these in an order that will scale to multiple-values.

@rossberg
Copy link
Member

Ben and I have discussed the design space more extensively. There are 3 main proposals on the table now. Here are the pros and cons as we see them:

  1. Status quo: arity annotations on branches.
    Pros:

    • No change to text format or tests required.
    • No change to implementations required.
    • Avoids forcing annotations on all future blockish constructs.
    • Avoids redundant information (e.g. on blockish constructs not using branches).
    • Probably the most size-efficient solution without additional measures.

    Cons:

    • Knowing the type of a block is deferred until the first exit point.
    • With multi-values, potential deferred allocation in decoder.
  2. Proposed change Move arities from branches to blocks #741: arity annotations on blocks
    Pros:

    • With multi-values, no deferred allocation in decoder.

    Cons:

    • Requires changes to implementations.
    • Requires changing text format and many tests.
    • Forces annotations on all future blockish constructs.
  3. This proposal Block signatures #765: type annotations on blocks
    Pro:

    • All types are known upfront.
    • For multi-values, no deferred allocation in decoder.

    Cons:

    • Requires changing implementations.
    • Requires extending type section?
    • Requires changing text format and many tests.
    • Forces annotations on all future blockish constructs.
    • Forces any code transformation tool to derive types if it wants to insert a blockish.
    • Least size-efficient without additional measures.

Overall, Ben and I still think option (1) remains the most attractive. We have implemented it, understand its pros and cons best, and don't need to worry about unforeseen implications late in the game. The trade-offs of the others are less obvious, and their advantages appear small (and biased towards consumers).

@rossberg
Copy link
Member

I'd also like to point out one further advantage of the status quo that I just ran into: for the linear text format (like will be visible in a debugger) having arities on branches is somewhat more user-friendly, because it is directly apparent how many values a branch operator consumes -- no need to search for the target (or rely on tooling). The information is where it belongs, that is, where it is used.

@lukewagner
Copy link
Member

Requires changing implementations.

If the changes are quite simple, I don't think the bullet "Requires changing implementations" should be considered a con given that we are in the middle of changing this all anyway for 0xc.

Requires extending type section?

@titzer's idea above seems fine as a way to avoid the indirection, but again the cost here is low, especially since we know we're adding more forms to the type section anyway (that's the point of the form field).

Requires changing text format and many tests.

If there was a sixth, lowest consituency, I think this would be it.

Forces annotations on all future blockish constructs.

I can't see any reason why we wouldn't want that for the same reason we want them for the current set of block constructs. Also, other than, it sounds like, a try/finally block for which the type annotations seem fine, are we really anticipating a bunch more blocks?

Forces any code transformation tool to derive types if it wants to insert a blockish.

Are there any concrete examples of such a transformation that is valuable and can't be bothered to understand types? We put types on practically everything else, it seems hard to get away with being type-oblivious for anything non-trivial.

Least size-efficient without additional measures.

How is this any bigger than adding arity?

@lukewagner
Copy link
Member

@rossberg-chromium Branch-with-value / block-with-signature is no different than call / function-signature. For both, devtools could provide tooltips, highlighted, etc to make the information more immediate.

@qwertie
Copy link

qwertie commented Aug 17, 2016

For the text format, there may be as much advantage as disadvantage: to the reader there may be some value in seeing up front what type a block will produce, even when there are no branches out. And type inference can be supported for anyone writing wasm, just as we're looking at supporting infix +.

@ghost
Copy link

ghost commented Aug 17, 2016

@rossberg-chromium One key point, and perhaps even a show stopper, seems to be missing in the list of pros and cons:

We still need arity annotations on the fall-though for multi-value support. The strategy of just returning all the values remaining on the block at the end of the block will significantly frustrate code using multiple values, or with pick just making multiple use of values on the stack. With the status-quo it would require writing values to locals and reloading just those needed to clean up the stack for the fall-through, or using a branch at the end of the block. I think this is a show-stopper, so I strongly support annotating blocks with at least the number of values.

@ghost
Copy link

ghost commented Aug 17, 2016

@qwertie I agree that some text formats might want to annotate the block with their return type, but it would only frustrate the text format if they were required to annotate them all with the index too which would be required to be lossless. Thus my suggesting that they be required to be in sorted order so there is always a canonical order and so the text format does not need to annotate them with the signature index too even if it annotates them with the type.

@ghost
Copy link

ghost commented Aug 17, 2016

@rossberg-chromium I agree that a text format will be more familiar and readable if the number of result values are on the branches and even using values on the fall-through, but I still fully expect that a text format can do this with the change to putting the type on the block if it can be a lossless transform. Text format tool will need to be type-aware in order to deal with the stack machine encoding, and it seems consistent with the wasm approach to move some burden to tools and off the runtime.

@rossberg
Copy link
Member

On 17 August 2016 at 18:09, Luke Wagner notifications@github.com wrote:

Requires changing implementations.

If the changes are quite simple, I don't think the bullet "Requires
changing implementations" should be considered a con given that we are in
the middle of changing this all anyway for 0xc.

Hm, because there is other work to do it doesn't matter to produce more?

Requires extending type section?

@titzer https://github.com/titzer's idea above seems fine as a way to
avoid the indirection, but again the cost here is low, especially since we
know we're adding more forms to the type section anyway (that's the point
of the form field).

It's not a proper type, though, so would represent an anomaly in the type
section.

Requires changing text format and many tests.

If there was a sixth, lowest consituency
http://dev.w3.org/html5/html-design-principles/#priority-of-constituencies,
I think this would be it.

Since you cite that hierarchy it's worth noting that it also puts "users"
over "implementers". In our case, "users" = "producers", while we keep
arguing for design changes motivated the other way round, including with
the current proposal. ;)

Also, the text format actually is user-facing.

Forces annotations on all future blockish constructs.

I can't see any reason why we wouldn't want that for the same reason we
want them for the current set of block constructs. Also, other than, it
sounds like, a try/finally block for which the type annotations seem fine,
are we really anticipating a bunch more blocks?

It's extra redundancy, and it puts the cost on the wrong side: the
annotation is only needed for branches, but now you have to include it
everywhere even if there isn't a single branch around. That seems silly for
constructs like if and try in particular, for which not using their
label is most likely the 99% use case.

Forces any code transformation tool to derive types if it wants to insert
a blockish.

Are there any concrete examples of such a transformation that is valuable
and can't be bothered to understand types? We put types on practically
everything else, it seems hard to get away with being type-oblivious for
anything non-trivial.

No, that's not right. We don't have type annotations on any generic
operator right now. Even monomorphic operators only explicate their type as
a textual naming convention which has no actual representation in the
binary format. The only type annotations we currently have are on function
boundaries.

There are e.g. many kind of peephole optimisations that don't require
understanding types. Or various instances of partial evaluation. I can't
judge how relevant they are, but stuff like that is done for other bytecode
formats.

Least size-efficient without additional measures.

How is this any bigger than adding arity?

Depends on the exact solution, but you either need extra entries in the
type section, or inline type vectors. In both cases, the size is generally
1 without extra encoding hacks, and obviously linear with multi-values.

@lukewagner
Copy link
Member

Hm, because there is other work to do it doesn't matter to produce more?

I'm saying it's in the area that is being changed anyway, so it's a change one way or another. Anyhow, it's very little work, mostly involves simplifying existing code, so this shouldn't be a deterrent regardless.

It's not a proper type, though, so would represent an anomaly in the type section.

Not every type in the types section will be able to show up in every context where an index into the types section is specified so I don't think it makes sense to say it's not a "proper type". It's not like we're representing some cartesian closed category here ;)

Forces annotations on all future blockish constructs.

Agreed and desired, for the same reason as adding block signatures in the first place.

It's extra redundancy, and it puts the cost on the wrong side: the annotation is only needed for branches,
but now you have to include it everywhere even if there isn't a single branch around. That seems silly for
constructs like if and try in particular, for which not using their label is most likely the 99% use case.

If size is an actual concern, we can have the specialized opcodes. But I expect layer 1 compression would obviate the need.

We put types on practically everything else, it seems hard to get away with being type-oblivious for anything non-trivial.

No, that's not right. We don't have type annotations on any generic operator right now.

What I mean is that the vast majority of ops are not generic.

There are e.g. many kind of peephole optimisations that don't require understanding types. Or various
instances of partial evaluation. I can't judge how relevant they are, but stuff like that is done for other
bytecode formats.

Possibly, but I don't think we should constrain ourselves based on this level of speculation.

Also, the text format actually is user-facing.

Since we're aiming at low-level clarity, not brevity, nor optimizing ease of writing by hand, with the current linear text format, I think it's actually useful to have the types returned by a block listed up front (just like having local types listed up front). For the experimental sugared syntaxes, it seems natural to drop these annotations when they are implied by the types of branches/fallthrough.

Depends on the exact solution, but you either need extra entries in the
type section, or inline type vectors. In both cases, the size is generally
without extra encoding hacks, and obviously linear with multi-values.

Let's not consider inlining the entire type vector into each block. A few bytes of type section entries are negligible (and avoidable in the MVP). That leaves the size effectively equivalent to arities-on-blocks. We can measure the overall delta, though.

@titzer
Copy link

titzer commented Aug 18, 2016

On Thu, Aug 18, 2016 at 5:28 PM, Luke Wagner notifications@github.com
wrote:

Hm, because there is other work to do it doesn't matter to produce more?

I'm saying it's in the area that is being changed anyway, so it's a change
one way or another. Anyhow, it's very little work, mostly involves
simplifying existing code, so this shouldn't be a deterrent regardless.

After giving this more thought, I think there is no real advantage to
having arities only on blockish things versus having type annotations on
blockish things.

So one alternative, arities-only, seems to be out.

That leaves type-annotated blockish things (our set is now block, if,
try_finally, try_catch, and try_catch_finally--since we are prototyping
these), or backing off this change and keeping arities on branches.

I'm in the middle of implementing block type annotations throughout V8 on
top of the stack machine branch, and the changes aren't trivial. Overall I
haven't seen any code savings yet, either. Instead it makes decoding of
blocks more complex and there are several gotcha special cases. The
simplification of inference wasn't dramatic. I'm implementing the full
multi-value semantics (on top of the multi-value semantics I already
implemented). I'll report more when I am finished.

It's not a proper type, though, so would represent an anomaly in the type
section.

Not every type in the types section will be able to show up in every
context where an index into the types section is specified so I don't think
it makes sense to say it's not a "proper type". It's not like we're
representing some cartesian closed category here ;)

Forces annotations on all future blockish constructs.

Agreed and desired, for the same reason as adding block signatures in the
first place.

It's extra redundancy, and it puts the cost on the wrong side: the
annotation is only needed for branches,
but now you have to include it everywhere even if there isn't a single
branch around. That seems silly for
constructs like if and try in particular, for which not using their label
is most likely the 99% use case.

If size is an actual concern, we can have the specialized opcodes. But I
expect layer 1 compression would obviate the need.

We put types on practically everything else, it seems hard to get away
with being type-oblivious for anything non-trivial.

No, that's not right. We don't have type annotations on any generic
operator right now.

What I mean is that the vast majority of ops are not generic.

There are e.g. many kind of peephole optimisations that don't require
understanding types. Or various
instances of partial evaluation. I can't judge how relevant they are, but
stuff like that is done for other
bytecode formats.

Possibly, but I don't think we should constrain ourselves based on this
level of speculation.

Also, the text format actually is user-facing.

Since we're aiming at low-level clarity, not brevity, nor optimizing ease
of writing by hand, with the current linear text format, I think it's
actually useful to have the types returned by a block listed up front (just
like having local types listed up front). For the experimental sugared
syntaxes, it seems natural to drop these annotations when they are implied
by the types of branches/fallthrough.

Actually I am not sure what we are aiming at here. I admit that part of me
was enticed by the idea of all blocks being typed, but I am not
experiencing the simplification in the decoder yet. I'm in the middle of
changing over all of our internal tests for these experiments, and that's
an annoying amount of work. It will probably be the same for the spec
tests. I'd like to finish my implementation to say for sure, since it seems
we've bitten ourselves a couple times by making changes before they're
fully thought out.

Depends on the exact solution, but you either need extra entries in the
type section, or inline type vectors. In both cases, the size is generally
without extra encoding hacks, and obviously linear with multi-values.

Let's not consider inlining the entire type vector into each block. A few
bytes of type section entries are negligible (and avoidable in the MVP).
That leaves the size effectively equivalent to arities-on-blocks. We can
measure the overall delta, though.

I think requiring a types section just to use typed blocks (even if it's
only for multi-values) is a no-go.

There are a couple of negatives that I already ran into, e.g. that one
doesn't know how many values a br* will pop without looking at the block;
so the control stack in the interpreter has to maintain that arity, too.

Other weird things are that with multi-values, br_table is actually
polymorphic; it would pop a different number of values depending on which
case was targeted. We'd have to disallow some (weird) legal cases and
require that the arity of all blocks targeted in a br_table match.

Before there was a nice bottleneck in the decoder where all merge-y
operations went through, and that was the natural place for both the
inference and checking step. Now that bottleneck must serve two masters;
fallthroughs have to be an exact match, while branches can have extra stuff
in the middle of the stack.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#765 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALnq1EsosqDsmvLUh4LvS3npt8O7E9_Bks5qhHoygaJpZM4Jli8P
.

@lukewagner
Copy link
Member

I'll report more when I am finished.

Great, and thanks for the experimentation. We're also experimenting with how this works so we'll be able to compare notes in a bit.

I think requiring a types section just to use typed blocks (even if it's only for multi-values) is a no-go.

It's not a types section, it'd just be a new form of entry in the existing types section. Especially with GC types, there will be quite a few new forms so if the objection is the annoyance of going from 1->2 (which does turn the associated internal array from homogeneous to heterogeneous), then that will happen regardless with the current design and trajectory.

@ghost
Copy link

ghost commented Aug 18, 2016

Re: 'br_table is actually polymorphic', it seems appropriate to 'require that the arity of all blocks targeted in a br_table match.'

Perhaps it is a the little extra work for the interpreter, to look at the target block's number of required values rather than for this to be an immediate argument, but this would have been necessary anyway for the fall-through, or the end opcode would have needed a count too.

I see problems if the index space of the block types is shared with all other types, it would make having a canonical order even more problematic, and it might have a negative impact on code compression, so a separate table seems needed.

@Cellule
Copy link

Cellule commented Aug 19, 2016

For the type section, we could share the same type as Function types, but with no params. That way we could share an entry for blocks and functions if they happen to have the same signature.
The only I see to have a different form is to avoid the param_count field, but then again, if a function already use a signature with no params then we could simply reuse it.

I'm in the middle of implementing block type annotations throughout V8 on
top of the stack machine branch, and the changes aren't trivial. Overall I
haven't seen any code savings yet, either. Instead it makes decoding of
blocks more complex and there are several gotcha special cases.

I don't really understand how having the signature ahead of time would make decoding more complicated.
Right now, in Chakra, we already have to keep track of the type of the block for br yielding values. Plus, when we enter the block, because we don't know the type nor how many (0 or 1 right now) values might be yielded, we have to pre-reserve a space for that possible value.
I don't even know, with our current model, how we'll handle multi-value yields.
Having the signature would definitely make that easier.

For the binary space, one could argue that if we keep arity on br then 1 block with multiple branches would have to spend 1 byte on every branches where as having that byte on the block would cost 1 byte for all the branches. Plus, if we really want to save more space, we could have the *0 opcodes to means there are no values.

@titzer
Copy link

titzer commented Aug 19, 2016

On Thu, Aug 18, 2016 at 7:12 PM, Luke Wagner notifications@github.com
wrote:

I'll report more when I am finished.

Great, and thanks for the experimentation. We're also experimenting with
how this works so we'll be able to compare notes in a bit.

I think requiring a types section just to use typed blocks (even if it's
only for multi-values) is a no-go.

It's not a types section, it'd just be a new form of entry in the
existing types section. Especially with GC types, there will be quite a few
new forms so if the objection is the annoyance of going from 1->2 (which
does turn the associated internal array from homogeneous to heterogeneous),
then that will happen regardless with the current design and trajectory.

No, I understand that part. I just don't think it makes sense to require a
types section entry for doing local control flow, even if it would just be
for multi-value blocks. It makes more sense to have the block type inline
in all cases, thus not requiring a non-local section for what is something
completely local.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#765 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALnq1NgEFgWSme6p5VT5z2cG7wVDjMZ6ks5qhJJ0gaJpZM4Jli8P
.

@lukewagner
Copy link
Member

@titzer Ok; it'd just be a size optimization, similar to not including the signature of a call_indirect inline (esp. now that they are structurally-matched and thus the particular index has no meaning). But if we did what you're suggesting for the MVP, we could make that decision post-MVP.

@rossberg
Copy link
Member

rossberg commented Aug 23, 2016

@Cellule, the size argument you are making is exactly the one I was making when I originally proposed this. However, it is invalidated by two things: (1) br_table which is a single branch to many blocks (IIRC, we have some with 100K targets), and more importantly, (2) if, which typically has no branches. When I proposed the change I still assumed we would remove labels from if, so that it wouldn't affect the equation (IIRC, AngryBots has like 140K uses of if).

However, if we want to go this way (I still prefer we don't) I like the suggestion to reuse function signatures. That may in fact be more future-proof, because we might want to allow block operators to also consume values from the stack eventually.

@rossberg
Copy link
Member

@lukewagner, getting a bit OT, but ever since we moved to structural types, and after looking at the design space for managed types, I have wondered about the distinction we are making between inline types and defined types. Once you add more type structure, it becomes increasingly ad-hoc.

Is there any particular reason left why we cannot make it uniform? That is, you can specify all types inline, but you can also name all types in the type section? Then the type section would just be a means of abbreviating common type expressions to save space. We could use the positive/negative index space for ids vs constructors that we discussed at some point to keep the encoding maximally compact (and fast).

@lukewagner
Copy link
Member

@rossberg-chromium Actually, function signatures may be the right thing here: in the MVP, input arity would be constrained to be 0, but post-MVP, we were realizing (as were you, in a recent email :) that there is no reason that blocks can't be given arguments in a consistent way (representing values on the stack that are popped by the block). Arguments even make sense for loop (the values of a branch would be the loop's arguments), which has a nice sort of symmetry.

@titzer
Copy link

titzer commented Aug 23, 2016

Just some numbers for context from the AngryBots.wasm and BananaBread. Removing the arity from branches and introducing type signatures (typically one byte inline as I described in a previous comment), we end up with:
angrybots bananabread
before 12072057 bytes 2444163 bytes
br -40097 -10313
br_if -16267 -6066
br_table -2093 -852
block +148224 +36292
if +163429 +41173
loop +17490 +5662
net +270686 +65896
2.2% 2.7%

(apologies for the formatting)

binji edit: tried to MD this:

angrybots bananabread
before 12072057 bytes 2444163 bytes
br -40097 -10313
br_if -16267 -6066
br_table -2093 -852
block +148224 +36292
if +163429 +41173
loop +17490 +5662
net +270686 +65896
2.2% 2.7%

@AndrewScheidecker
Copy link

I'm in favor of declaring a signature for blocks up front for the simplification in validator state.

Just some numbers for context from the AngryBots.wasm and BananaBread.

There may be a bunch of superfluous blocks in these test cases, since if this is just moving arity from br to block you would expect the size(excluding if) to only increase if there are blocks without corresponding branches to them. I can't quantify it, but I do see these superfluous blocks in binaryen output.

Actually, function signatures may be the right thing here: in the MVP, input arity would be constrained to be 0, but post-MVP, we were realizing (as were you, in a recent email :) that there is no reason that blocks can't be given arguments in a consistent way (representing values on the stack that are popped by the block). Arguments even make sense for loop (the values of a branch would be the loop's arguments), which has a nice sort of symmetry.

I like this idea (I would use it for loops), but it makes me wonder: is there a good reason for function arguments to remain local variables, or should they just be implicitly pushed onto the operand stack on function entry?

@titzer
Copy link

titzer commented Aug 30, 2016

After having implemented this, I've come around to this idea. If we can move the block types inline for the common cases of void, i32, i64, f32, and f64, then this will LGTM.

@AndrewScheidecker
Copy link

Thinking about #778, I realized that with block signatures, end can have the same effect on the stack as br 0, which means you can save the drops to clean up values that aren't yielded by a block.

@ghost
Copy link

ghost commented Aug 30, 2016

@AndrewScheidecker Right, so we are back to the old (current) block semantics, the fall through unwinds the stack, and some values can remain on the block value stack which might avoid some uses of drop and also enables pick style block stack constants.

@ghost
Copy link

ghost commented Aug 30, 2016

@rossberg-chromium Can't both allow block starts to be anywhere and have breaks unwind. Having blocks unwind matches the restriction inherent in the text format lexical blocks so at this point I would strongly object to removing the 'unwind' semantics of blocks and it seems a key feature of the wasm design, something we are doing 'better' differently to CIL and may other VMs. But it does mean that the following are very different:

block<1>
  i32.const 1
  i32.neg
end

versus

i32.const 1
block<1>
  i32.neg  ; Can't pop its argument.
end

Even with the suggested drop restrictions, the block start position is important.

@sunfishcode Yeh, the drop plan was a dead end, drop it.

@rossberg
Copy link
Member

@JSStats, I didn't say that you can insert blocks anywhere. That was never the case, regardless of drop.

That being said, there is the idea that block annotations should be function types, in which case they could later be allowed to consume stack, like e.g. in your example:

i32.const 1
block (param i32) (result i32)
  i32.neg
end

This may be useful for macrofication and other things.

Moving arities to blocks has the nice property of giving implementations
useful information up front, however some anticipated uses of this
information would really want to know the types up front too.

This patch proposes replacing block arities with function signature indices,
which would provide full type information about a block up front.
@sunfishcode
Copy link
Member Author

Per feedback, this PR is now updated to use inline signature immediates for the common cases of void, i32, i64, f32, and f64.

| `block` | `0x01` | | begin a sequence of expressions, the last of which yields a value |
| `loop` | `0x02` | | begin a block which can also form control flow loops |
| `if` | `0x03` | | begin if expression |
| `block` | `0x01` | sig : `inline_signature_type` | begin a sequence of expressions, yielding 0 or 1 values |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There does not appear to be any compelling reason to restrict blocks to returning only '0 or 1 values` in the MVP. It is probably a small matter for the runtimes to be able to return multiple values here? But don't let that hold this up, can always revisit how it is going.

Also no mention if a function signature with arguments is usable in the MVP? Also seems a small matter, just not sure if it is really necessary yet.

@rossberg
Copy link
Member

rossberg commented Sep 9, 2016

Yeah, it's still not clear to me how we gonna future-proof this encoding for multiple values etc. But I guess we can still rectify that in 0xd. So to move forward for now: lgtm modulo br_table oversight.

@sunfishcode
Copy link
Member Author

Fixed the br_table oversight.

kisg pushed a commit to paul99/webkit-mips that referenced this pull request Sep 9, 2016
https://bugs.webkit.org/show_bug.cgi?id=161778

Reviewed by Michael Saboff.

This patch makes some major changes to the way that the WASM
function parser works. First, the control stack has been moved
from the parser's context to the parser itself. This simplifies
the way that the parser works and allows us to make the decoder
iterative rather than recursive. Since the control stack has been
moved to the parser, any context operation that refers to some
block now receives that block by reference.

For any if block, regardless of whether or not it is an
if-then-else or not, we will allocate both the entire control flow
diamond. This is not a major issue in the if-then case since B3
will immediately cleanup these blocks. In order to support if-then
and if-then-else we needed to be able to distinguish what the type
of the top block on the control stack is. This will be necessary
when validating the else opcode in the future. In the B3 IR
generator we decide to the type of the block strictly by the
shape.

Currently, if blocks don't handle passed and returned stack values
correctly. I plan to fix this when I add support for the block
signatures. See: WebAssembly/design#765

* testWASM.cpp:
(runWASMTests):
* wasm/WASMB3IRGenerator.cpp:
(dumpProcedure):
(JSC::WASM::parseAndCompile):
* wasm/WASMB3IRGenerator.h:
* wasm/WASMFunctionParser.h:
(JSC::WASM::FunctionParser<Context>::parseBlock):
(JSC::WASM::FunctionParser<Context>::parseExpression):
(JSC::WASM::FunctionParser<Context>::parseUnreachableExpression):
* wasm/WASMOps.h:

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@205769 268f45cc-cd09-0410-ab3c-d52691b4dbfc
@titzer
Copy link

titzer commented Sep 12, 2016

The actual change here looks good to me, but can we simplify the text to leave out the discussion of arguments? We can always add that later when it becomes relevant.

@@ -56,6 +56,14 @@ A single-byte unsigned integer indicating a [value type](AstSemantics.md#types).
* `3` indicating type `f32`
* `4` indicating type `f64`

### `inline_signature_type`
A single-byte unsigned integer indicating a signature. These types are encoded as:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be simpler to explain in terms of a single result time, since the signatures won't be relevant until some future post-MVP version.

@titzer
Copy link

titzer commented Sep 12, 2016

lgtm

@sunfishcode
Copy link
Member Author

Merging with lgtms above.

@sunfishcode sunfishcode merged commit 477a2a2 into binary_0xc Sep 14, 2016
@sunfishcode sunfishcode deleted the block-sigs branch September 14, 2016 19:09
rossberg added a commit to WebAssembly/spec that referenced this pull request Sep 15, 2016
Implements WebAssembly/design#765; specifically:

- Adds block signatures (syntax: (block i32 ...) etc)
- Removes arities from branches
- Also simplifies if syntax: the label is on if now instead of the children, in order to be consistent -with the signature
- Adjusts typing
- Adapts all tests (phew...)
titzer pushed a commit that referenced this pull request Sep 29, 2016
* Clarify that wasm may be viewed as either an AST or a stack machine. (#686)

* Clarify that wasm may be viewed as either an AST or a stack machine.

* Reword the introductory paragraph.

* Add parens, remove "typed".

* Make opcode 0x00 `unreachable`. (#684)

Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode.

All-zeros is one of the more common patterns of corrupted data. This
change makes it more likely that code that is accidentally zeroed, in
whole or in part, will be noticed when executed rather than silently
running through a nop slide.

Obviously, this doesn't matter when an opcode table is present, but
if there is a default opcode table, it would presumably use the
opcodes defined here.

* BinaryEncoding.md changes implied by #682

* Fix thinko in import section

* Rename definition_kind to external_kind for precision

* Rename resizable_definition to resizable_limits

* Add  opcode delimiter to init_expr

* Add Elem section to ToC and move it before Data section to reflect Table going before Memory

* Add missing init_expr to global variables and undo the grouped representation of globals

* Note that only immutable globals can be exported

* Change the other 'mutability' flag to 'varuint1'

* Give 'anyfunc' its own opcode

* Add note about immutable global import requirement

* Remove explicit 'default' flag; make memory/table default by default

* Change (get|set)_global opcodes

* Add end opcode to functions

* Use section codes instead of section names

(rebasing onto 0xC instead of master)

This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder.
It allows for user-defined sections that have string names to be encoded in the same manner as before.
The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being.

* Use LEB for br_table (#738)

* Describe operand order of call_indirect (#758)

* Remove arities from call/return (#748)

* Limit varint sizes in Binary Encoding. (#764)

* Global section (#771)

global-variable was a broken anchor and the type of count was an undefined reference and inconsistent with all the rest of the sections.

* Make name section a user-string section.

* Update BinaryEncoding.md

* Update BinaryEncoding.md

* Use positive section code byte

* Remove specification of name strings for unknown sections

* Update BinaryEncoding.md

* Remove repetition in definition of var(u)int types (#768)

* Fix typo (#781)

* Move the element section before the code section (#779)

* Binary format identifier is out of date (#785)

* Update BinaryEncoding.md to reflect the ml-proto encoding of the memory and table sections. (#800)

* Add string back

* Block signatures (#765)

* Replace branch arities with block and if signatures.

Moving arities to blocks has the nice property of giving implementations
useful information up front, however some anticipated uses of this
information would really want to know the types up front too.

This patch proposes replacing block arities with function signature indices,
which would provide full type information about a block up front.

* Remove the arity operand from br_table too.

* Remove mentions of "arguments".

* Make string part of the payload

* Remove references to post-order AST in BinaryEncoding.md (#801)

* Simplify loop by removing its exit label.

This removes loop's bottom label.

* Move description of `return` to correct column (#804)

* type correction and missing close quote (#805)

* Remove more references to AST (#806)

* Remove reference to AST in JS.md

Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now.

* Use "instruction" instead of "AST operator"

* Update rationale for stack machine

* Update Rationale.md

* Update discussion of expression trees

* Update MVP.md

* Update Rationale.md

* Update Rationale.md

* Remove references to expressions

* Update Rationale.md

* Update Rationale.md

* Address review comments

* Address review comments

* Address review comments

* Delete h
ryanhaddad pushed a commit to WebKit/WebKit that referenced this pull request Dec 22, 2020
https://bugs.webkit.org/show_bug.cgi?id=161778

Reviewed by Michael Saboff.

This patch makes some major changes to the way that the WASM
function parser works. First, the control stack has been moved
from the parser's context to the parser itself. This simplifies
the way that the parser works and allows us to make the decoder
iterative rather than recursive. Since the control stack has been
moved to the parser, any context operation that refers to some
block now receives that block by reference.

For any if block, regardless of whether or not it is an
if-then-else or not, we will allocate both the entire control flow
diamond. This is not a major issue in the if-then case since B3
will immediately cleanup these blocks. In order to support if-then
and if-then-else we needed to be able to distinguish what the type
of the top block on the control stack is. This will be necessary
when validating the else opcode in the future. In the B3 IR
generator we decide to the type of the block strictly by the
shape.

Currently, if blocks don't handle passed and returned stack values
correctly. I plan to fix this when I add support for the block
signatures. See: WebAssembly/design#765

* testWASM.cpp:
(runWASMTests):
* wasm/WASMB3IRGenerator.cpp:
(dumpProcedure):
(JSC::WASM::parseAndCompile):
* wasm/WASMB3IRGenerator.h:
* wasm/WASMFunctionParser.h:
(JSC::WASM::FunctionParser<Context>::parseBlock):
(JSC::WASM::FunctionParser<Context>::parseExpression):
(JSC::WASM::FunctionParser<Context>::parseUnreachableExpression):
* wasm/WASMOps.h:

Canonical link: https://commits.webkit.org/179983@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@205769 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants