Block signatures #765

sunfishcode · 2016-08-16T15:36:02Z

#741 proposed moving the arity immediates from branches to block/if, which has the advantage of giving decoders more information up front, however the arities alone aren't enough for some use cases. This PR implements @MikeHolman's suggestion to provide full type information up front.

This proposes a new "block signature" type to the Type section, so block/if only need a single index immediate to specify their full type, even if they are extended to returning multiple values post-MVP.

This gives us a greater benefit than #741 did at approximately the same code size cost -- one immediate per block and if. And if code size of the immediates becomes a concern, we can introduce an if0 (and similar opcodes for future block-like opcodes).

In my experiments, the decode time cost of indirecting into the type table to obtain the signature data was not significant.

lukewagner · 2016-08-16T16:08:50Z

What I like about this change is:

Removes the only bit of "inference" from type checking, better matching the rest of wasm.
Removes the need for an internal "unknown" type state for blocks during validation.
For the same reason that a single-pass compiler would want to know the arity a priori, the type can be useful, e.g., to determine the register class of the block result.

MikeHolman · 2016-08-16T21:04:40Z

lgtm, this is exactly what I wanted!

ghost · 2016-08-16T23:21:31Z

Could some of the common high frequency block signatures be given specified indexes. For example could block signature index 0 be zero result values, and could there be specified indexes for the single value core types, so index 1 be (i32), index 2 be (i64), etc. Further could it be invalid to have a duplicate signature in this table, to force a canonical encoding.

This might help the text format a lot because there would be canonical encodings for the common cases, and the text format might then not even need to encode the signature index in most cases and might still derive the signature from the text syntax.

ghost · 2016-08-17T01:46:00Z

If the signatures could also be required to be in a sorted order this might be even better. Then there is only one canonical table for a set of signatures and the validator can check this by simply checking that they are in order.

titzer · 2016-08-17T08:13:40Z

Maybe there is an encoding trick that we could use to obviate the need for signature entries until the multi-value era. E.g. what if the immediate is an inline value type (i.e. void, i32, i64, f32, f64), with one encoding being an index into the type table to be used when we have multi-values?

ghost · 2016-08-17T09:22:14Z

If indexes are predefined and specified for the block types used in the MVP then the table will not be needed for the MVP, and there will be a canonical index for them. Perhaps consider sorting these in an order that will scale to multiple-values.

rossberg · 2016-08-17T15:25:00Z

Ben and I have discussed the design space more extensively. There are 3 main proposals on the table now. Here are the pros and cons as we see them:

Status quo: arity annotations on branches.
Pros:
- No change to text format or tests required.
- No change to implementations required.
- Avoids forcing annotations on all future blockish constructs.
- Avoids redundant information (e.g. on blockish constructs not using branches).
- Probably the most size-efficient solution without additional measures.
Cons:
- Knowing the type of a block is deferred until the first exit point.
- With multi-values, potential deferred allocation in decoder.
Proposed change Move arities from branches to blocks #741: arity annotations on blocks
Pros:
- With multi-values, no deferred allocation in decoder.
Cons:
- Requires changes to implementations.
- Requires changing text format and many tests.
- Forces annotations on all future blockish constructs.
This proposal Block signatures #765: type annotations on blocks
Pro:
- All types are known upfront.
- For multi-values, no deferred allocation in decoder.
Cons:
- Requires changing implementations.
- Requires extending type section?
- Requires changing text format and many tests.
- Forces annotations on all future blockish constructs.
- Forces any code transformation tool to derive types if it wants to insert a blockish.
- Least size-efficient without additional measures.

Overall, Ben and I still think option (1) remains the most attractive. We have implemented it, understand its pros and cons best, and don't need to worry about unforeseen implications late in the game. The trade-offs of the others are less obvious, and their advantages appear small (and biased towards consumers).

rossberg · 2016-08-17T15:27:44Z

I'd also like to point out one further advantage of the status quo that I just ran into: for the linear text format (like will be visible in a debugger) having arities on branches is somewhat more user-friendly, because it is directly apparent how many values a branch operator consumes -- no need to search for the target (or rely on tooling). The information is where it belongs, that is, where it is used.

lukewagner · 2016-08-17T16:09:41Z

Requires changing implementations.

If the changes are quite simple, I don't think the bullet "Requires changing implementations" should be considered a con given that we are in the middle of changing this all anyway for 0xc.

Requires extending type section?

@titzer's idea above seems fine as a way to avoid the indirection, but again the cost here is low, especially since we know we're adding more forms to the type section anyway (that's the point of the form field).

Requires changing text format and many tests.

If there was a sixth, lowest consituency, I think this would be it.

Forces annotations on all future blockish constructs.

I can't see any reason why we wouldn't want that for the same reason we want them for the current set of block constructs. Also, other than, it sounds like, a try/finally block for which the type annotations seem fine, are we really anticipating a bunch more blocks?

Forces any code transformation tool to derive types if it wants to insert a blockish.

Are there any concrete examples of such a transformation that is valuable and can't be bothered to understand types? We put types on practically everything else, it seems hard to get away with being type-oblivious for anything non-trivial.

Least size-efficient without additional measures.

How is this any bigger than adding arity?

lukewagner · 2016-08-17T16:12:12Z

@rossberg-chromium Branch-with-value / block-with-signature is no different than call / function-signature. For both, devtools could provide tooltips, highlighted, etc to make the information more immediate.

qwertie · 2016-08-17T19:47:01Z

For the text format, there may be as much advantage as disadvantage: to the reader there may be some value in seeing up front what type a block will produce, even when there are no branches out. And type inference can be supported for anyone writing wasm, just as we're looking at supporting infix +.

ghost · 2016-08-17T21:51:37Z

@rossberg-chromium One key point, and perhaps even a show stopper, seems to be missing in the list of pros and cons:

We still need arity annotations on the fall-though for multi-value support. The strategy of just returning all the values remaining on the block at the end of the block will significantly frustrate code using multiple values, or with pick just making multiple use of values on the stack. With the status-quo it would require writing values to locals and reloading just those needed to clean up the stack for the fall-through, or using a branch at the end of the block. I think this is a show-stopper, so I strongly support annotating blocks with at least the number of values.

ghost · 2016-08-17T21:55:46Z

@qwertie I agree that some text formats might want to annotate the block with their return type, but it would only frustrate the text format if they were required to annotate them all with the index too which would be required to be lossless. Thus my suggesting that they be required to be in sorted order so there is always a canonical order and so the text format does not need to annotate them with the signature index too even if it annotates them with the type.

ghost · 2016-08-17T22:11:40Z

@rossberg-chromium I agree that a text format will be more familiar and readable if the number of result values are on the branches and even using values on the fall-through, but I still fully expect that a text format can do this with the change to putting the type on the block if it can be a lossless transform. Text format tool will need to be type-aware in order to deal with the stack machine encoding, and it seems consistent with the wasm approach to move some burden to tools and off the runtime.

rossberg · 2016-08-18T13:28:10Z

On 17 August 2016 at 18:09, Luke Wagner notifications@github.com wrote:

Requires changing implementations.

If the changes are quite simple, I don't think the bullet "Requires
changing implementations" should be considered a con given that we are in
the middle of changing this all anyway for 0xc.

Hm, because there is other work to do it doesn't matter to produce more?

Requires extending type section?

@titzer https://github.com/titzer's idea above seems fine as a way to
avoid the indirection, but again the cost here is low, especially since we
know we're adding more forms to the type section anyway (that's the point
of the form field).

It's not a proper type, though, so would represent an anomaly in the type
section.

Requires changing text format and many tests.

If there was a sixth, lowest consituency
http://dev.w3.org/html5/html-design-principles/#priority-of-constituencies,
I think this would be it.

Since you cite that hierarchy it's worth noting that it also puts "users"
over "implementers". In our case, "users" = "producers", while we keep
arguing for design changes motivated the other way round, including with
the current proposal. ;)

Also, the text format actually is user-facing.

Forces annotations on all future blockish constructs.

I can't see any reason why we wouldn't want that for the same reason we
want them for the current set of block constructs. Also, other than, it
sounds like, a try/finally block for which the type annotations seem fine,
are we really anticipating a bunch more blocks?

It's extra redundancy, and it puts the cost on the wrong side: the
annotation is only needed for branches, but now you have to include it
everywhere even if there isn't a single branch around. That seems silly for
constructs like if and try in particular, for which not using their
label is most likely the 99% use case.

Forces any code transformation tool to derive types if it wants to insert
a blockish.

Are there any concrete examples of such a transformation that is valuable
and can't be bothered to understand types? We put types on practically
everything else, it seems hard to get away with being type-oblivious for
anything non-trivial.

No, that's not right. We don't have type annotations on any generic
operator right now. Even monomorphic operators only explicate their type as
a textual naming convention which has no actual representation in the
binary format. The only type annotations we currently have are on function
boundaries.

There are e.g. many kind of peephole optimisations that don't require
understanding types. Or various instances of partial evaluation. I can't
judge how relevant they are, but stuff like that is done for other bytecode
formats.

Least size-efficient without additional measures.

How is this any bigger than adding arity?

Depends on the exact solution, but you either need extra entries in the
type section, or inline type vectors. In both cases, the size is generally
1 without extra encoding hacks, and obviously linear with multi-values.

lukewagner · 2016-08-18T15:28:25Z

Hm, because there is other work to do it doesn't matter to produce more?

I'm saying it's in the area that is being changed anyway, so it's a change one way or another. Anyhow, it's very little work, mostly involves simplifying existing code, so this shouldn't be a deterrent regardless.

It's not a proper type, though, so would represent an anomaly in the type section.

Not every type in the types section will be able to show up in every context where an index into the types section is specified so I don't think it makes sense to say it's not a "proper type". It's not like we're representing some cartesian closed category here ;)

Forces annotations on all future blockish constructs.

Agreed and desired, for the same reason as adding block signatures in the first place.

It's extra redundancy, and it puts the cost on the wrong side: the annotation is only needed for branches,
but now you have to include it everywhere even if there isn't a single branch around. That seems silly for
constructs like if and try in particular, for which not using their label is most likely the 99% use case.

If size is an actual concern, we can have the specialized opcodes. But I expect layer 1 compression would obviate the need.

We put types on practically everything else, it seems hard to get away with being type-oblivious for anything non-trivial.

No, that's not right. We don't have type annotations on any generic operator right now.

What I mean is that the vast majority of ops are not generic.

There are e.g. many kind of peephole optimisations that don't require understanding types. Or various
instances of partial evaluation. I can't judge how relevant they are, but stuff like that is done for other
bytecode formats.

Possibly, but I don't think we should constrain ourselves based on this level of speculation.

Also, the text format actually is user-facing.

Since we're aiming at low-level clarity, not brevity, nor optimizing ease of writing by hand, with the current linear text format, I think it's actually useful to have the types returned by a block listed up front (just like having local types listed up front). For the experimental sugared syntaxes, it seems natural to drop these annotations when they are implied by the types of branches/fallthrough.

Depends on the exact solution, but you either need extra entries in the
type section, or inline type vectors. In both cases, the size is generally
without extra encoding hacks, and obviously linear with multi-values.

Let's not consider inlining the entire type vector into each block. A few bytes of type section entries are negligible (and avoidable in the MVP). That leaves the size effectively equivalent to arities-on-blocks. We can measure the overall delta, though.

titzer · 2016-08-18T15:47:39Z

On Thu, Aug 18, 2016 at 5:28 PM, Luke Wagner notifications@github.com
wrote:

Hm, because there is other work to do it doesn't matter to produce more?

I'm saying it's in the area that is being changed anyway, so it's a change
one way or another. Anyhow, it's very little work, mostly involves
simplifying existing code, so this shouldn't be a deterrent regardless.

After giving this more thought, I think there is no real advantage to
having arities only on blockish things versus having type annotations on
blockish things.

So one alternative, arities-only, seems to be out.

That leaves type-annotated blockish things (our set is now block, if,
try_finally, try_catch, and try_catch_finally--since we are prototyping
these), or backing off this change and keeping arities on branches.

I'm in the middle of implementing block type annotations throughout V8 on
top of the stack machine branch, and the changes aren't trivial. Overall I
haven't seen any code savings yet, either. Instead it makes decoding of
blocks more complex and there are several gotcha special cases. The
simplification of inference wasn't dramatic. I'm implementing the full
multi-value semantics (on top of the multi-value semantics I already
implemented). I'll report more when I am finished.

It's not a proper type, though, so would represent an anomaly in the type
section.

Not every type in the types section will be able to show up in every
context where an index into the types section is specified so I don't think
it makes sense to say it's not a "proper type". It's not like we're
representing some cartesian closed category here ;)

Forces annotations on all future blockish constructs.

Agreed and desired, for the same reason as adding block signatures in the
first place.

It's extra redundancy, and it puts the cost on the wrong side: the
annotation is only needed for branches,
but now you have to include it everywhere even if there isn't a single
branch around. That seems silly for
constructs like if and try in particular, for which not using their label
is most likely the 99% use case.

If size is an actual concern, we can have the specialized opcodes. But I
expect layer 1 compression would obviate the need.

We put types on practically everything else, it seems hard to get away
with being type-oblivious for anything non-trivial.

No, that's not right. We don't have type annotations on any generic
operator right now.

What I mean is that the vast majority of ops are not generic.

There are e.g. many kind of peephole optimisations that don't require
understanding types. Or various
instances of partial evaluation. I can't judge how relevant they are, but
stuff like that is done for other
bytecode formats.

Possibly, but I don't think we should constrain ourselves based on this
level of speculation.

Also, the text format actually is user-facing.

Since we're aiming at low-level clarity, not brevity, nor optimizing ease
of writing by hand, with the current linear text format, I think it's
actually useful to have the types returned by a block listed up front (just
like having local types listed up front). For the experimental sugared
syntaxes, it seems natural to drop these annotations when they are implied
by the types of branches/fallthrough.

Actually I am not sure what we are aiming at here. I admit that part of me
was enticed by the idea of all blocks being typed, but I am not
experiencing the simplification in the decoder yet. I'm in the middle of
changing over all of our internal tests for these experiments, and that's
an annoying amount of work. It will probably be the same for the spec
tests. I'd like to finish my implementation to say for sure, since it seems
we've bitten ourselves a couple times by making changes before they're
fully thought out.

Depends on the exact solution, but you either need extra entries in the
type section, or inline type vectors. In both cases, the size is generally
without extra encoding hacks, and obviously linear with multi-values.

Let's not consider inlining the entire type vector into each block. A few
bytes of type section entries are negligible (and avoidable in the MVP).
That leaves the size effectively equivalent to arities-on-blocks. We can
measure the overall delta, though.

I think requiring a types section just to use typed blocks (even if it's
only for multi-values) is a no-go.

There are a couple of negatives that I already ran into, e.g. that one
doesn't know how many values a br* will pop without looking at the block;
so the control stack in the interpreter has to maintain that arity, too.

Other weird things are that with multi-values, br_table is actually
polymorphic; it would pop a different number of values depending on which
case was targeted. We'd have to disallow some (weird) legal cases and
require that the arity of all blocks targeted in a br_table match.

Before there was a nice bottleneck in the decoder where all merge-y
operations went through, and that was the natural place for both the
inference and checking step. Now that bottleneck must serve two masters;
fallthroughs have to be an exact match, while branches can have extra stuff
in the middle of the stack.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#765 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALnq1EsosqDsmvLUh4LvS3npt8O7E9_Bks5qhHoygaJpZM4Jli8P
.

lukewagner · 2016-08-18T17:11:56Z

I'll report more when I am finished.

Great, and thanks for the experimentation. We're also experimenting with how this works so we'll be able to compare notes in a bit.

I think requiring a types section just to use typed blocks (even if it's only for multi-values) is a no-go.

It's not a types section, it'd just be a new form of entry in the existing types section. Especially with GC types, there will be quite a few new forms so if the objection is the annoyance of going from 1->2 (which does turn the associated internal array from homogeneous to heterogeneous), then that will happen regardless with the current design and trajectory.

ghost · 2016-08-18T23:27:10Z

Re: 'br_table is actually polymorphic', it seems appropriate to 'require that the arity of all blocks targeted in a br_table match.'

Perhaps it is a the little extra work for the interpreter, to look at the target block's number of required values rather than for this to be an immediate argument, but this would have been necessary anyway for the fall-through, or the end opcode would have needed a count too.

I see problems if the index space of the block types is shared with all other types, it would make having a canonical order even more problematic, and it might have a negative impact on code compression, so a separate table seems needed.

Cellule · 2016-08-19T00:48:10Z

For the type section, we could share the same type as Function types, but with no params. That way we could share an entry for blocks and functions if they happen to have the same signature.
The only I see to have a different form is to avoid the param_count field, but then again, if a function already use a signature with no params then we could simply reuse it.

I'm in the middle of implementing block type annotations throughout V8 on
top of the stack machine branch, and the changes aren't trivial. Overall I
haven't seen any code savings yet, either. Instead it makes decoding of
blocks more complex and there are several gotcha special cases.

I don't really understand how having the signature ahead of time would make decoding more complicated.
Right now, in Chakra, we already have to keep track of the type of the block for br yielding values. Plus, when we enter the block, because we don't know the type nor how many (0 or 1 right now) values might be yielded, we have to pre-reserve a space for that possible value.
I don't even know, with our current model, how we'll handle multi-value yields.
Having the signature would definitely make that easier.

For the binary space, one could argue that if we keep arity on br then 1 block with multiple branches would have to spend 1 byte on every branches where as having that byte on the block would cost 1 byte for all the branches. Plus, if we really want to save more space, we could have the *0 opcodes to means there are no values.

titzer · 2016-08-19T10:52:17Z

On Thu, Aug 18, 2016 at 7:12 PM, Luke Wagner notifications@github.com
wrote:

I'll report more when I am finished.

Great, and thanks for the experimentation. We're also experimenting with
how this works so we'll be able to compare notes in a bit.

I think requiring a types section just to use typed blocks (even if it's
only for multi-values) is a no-go.

It's not a types section, it'd just be a new form of entry in the
existing types section. Especially with GC types, there will be quite a few
new forms so if the objection is the annoyance of going from 1->2 (which
does turn the associated internal array from homogeneous to heterogeneous),
then that will happen regardless with the current design and trajectory.

No, I understand that part. I just don't think it makes sense to require a
types section entry for doing local control flow, even if it would just be
for multi-value blocks. It makes more sense to have the block type inline
in all cases, thus not requiring a non-local section for what is something
completely local.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#765 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALnq1NgEFgWSme6p5VT5z2cG7wVDjMZ6ks5qhJJ0gaJpZM4Jli8P
.

lukewagner · 2016-08-19T17:46:22Z

@titzer Ok; it'd just be a size optimization, similar to not including the signature of a call_indirect inline (esp. now that they are structurally-matched and thus the particular index has no meaning). But if we did what you're suggesting for the MVP, we could make that decision post-MVP.

rossberg · 2016-08-23T11:16:30Z

@Cellule, the size argument you are making is exactly the one I was making when I originally proposed this. However, it is invalidated by two things: (1) br_table which is a single branch to many blocks (IIRC, we have some with 100K targets), and more importantly, (2) if, which typically has no branches. When I proposed the change I still assumed we would remove labels from if, so that it wouldn't affect the equation (IIRC, AngryBots has like 140K uses of if).

However, if we want to go this way (I still prefer we don't) I like the suggestion to reuse function signatures. That may in fact be more future-proof, because we might want to allow block operators to also consume values from the stack eventually.

rossberg · 2016-08-23T12:00:54Z

@lukewagner, getting a bit OT, but ever since we moved to structural types, and after looking at the design space for managed types, I have wondered about the distinction we are making between inline types and defined types. Once you add more type structure, it becomes increasingly ad-hoc.

Is there any particular reason left why we cannot make it uniform? That is, you can specify all types inline, but you can also name all types in the type section? Then the type section would just be a means of abbreviating common type expressions to save space. We could use the positive/negative index space for ids vs constructors that we discussed at some point to keep the encoding maximally compact (and fast).

lukewagner · 2016-08-23T14:49:25Z

@rossberg-chromium Actually, function signatures may be the right thing here: in the MVP, input arity would be constrained to be 0, but post-MVP, we were realizing (as were you, in a recent email :) that there is no reason that blocks can't be given arguments in a consistent way (representing values on the stack that are popped by the block). Arguments even make sense for loop (the values of a branch would be the loop's arguments), which has a nice sort of symmetry.

titzer · 2016-08-23T15:49:12Z

Just some numbers for context from the AngryBots.wasm and BananaBread. Removing the arity from branches and introducing type signatures (typically one byte inline as I described in a previous comment), we end up with:
angrybots bananabread
before 12072057 bytes 2444163 bytes
br -40097 -10313
br_if -16267 -6066
br_table -2093 -852
block +148224 +36292
if +163429 +41173
loop +17490 +5662
net +270686 +65896
2.2% 2.7%

(apologies for the formatting)

binji edit: tried to MD this:

	angrybots	bananabread
before	12072057 bytes	2444163 bytes
br	-40097	-10313
br_if	-16267	-6066
br_table	-2093	-852
block	+148224	+36292
if	+163429	+41173
loop	+17490	+5662
net	+270686	+65896
	2.2%	2.7%

AndrewScheidecker · 2016-08-29T13:39:42Z

I'm in favor of declaring a signature for blocks up front for the simplification in validator state.

Just some numbers for context from the AngryBots.wasm and BananaBread.

There may be a bunch of superfluous blocks in these test cases, since if this is just moving arity from br to block you would expect the size(excluding if) to only increase if there are blocks without corresponding branches to them. I can't quantify it, but I do see these superfluous blocks in binaryen output.

Actually, function signatures may be the right thing here: in the MVP, input arity would be constrained to be 0, but post-MVP, we were realizing (as were you, in a recent email :) that there is no reason that blocks can't be given arguments in a consistent way (representing values on the stack that are popped by the block). Arguments even make sense for loop (the values of a branch would be the loop's arguments), which has a nice sort of symmetry.

I like this idea (I would use it for loops), but it makes me wonder: is there a good reason for function arguments to remain local variables, or should they just be implicitly pushed onto the operand stack on function entry?

titzer · 2016-08-30T09:55:59Z

After having implemented this, I've come around to this idea. If we can move the block types inline for the common cases of void, i32, i64, f32, and f64, then this will LGTM.

AndrewScheidecker · 2016-08-30T11:12:41Z

Thinking about #778, I realized that with block signatures, end can have the same effect on the stack as br 0, which means you can save the drops to clean up values that aren't yielded by a block.

ghost · 2016-08-30T12:03:48Z

@AndrewScheidecker Right, so we are back to the old (current) block semantics, the fall through unwinds the stack, and some values can remain on the block value stack which might avoid some uses of drop and also enables pick style block stack constants.

ghost · 2016-08-30T21:24:51Z

@rossberg-chromium Can't both allow block starts to be anywhere and have breaks unwind. Having blocks unwind matches the restriction inherent in the text format lexical blocks so at this point I would strongly object to removing the 'unwind' semantics of blocks and it seems a key feature of the wasm design, something we are doing 'better' differently to CIL and may other VMs. But it does mean that the following are very different:

block<1>
  i32.const 1
  i32.neg
end

versus

i32.const 1
block<1>
  i32.neg  ; Can't pop its argument.
end

Even with the suggested drop restrictions, the block start position is important.

@sunfishcode Yeh, the drop plan was a dead end, drop it.

rossberg · 2016-08-31T05:39:06Z

@JSStats, I didn't say that you can insert blocks anywhere. That was never the case, regardless of drop.

That being said, there is the idea that block annotations should be function types, in which case they could later be allowed to consume stack, like e.g. in your example:

i32.const 1
block (param i32) (result i32)
  i32.neg
end

This may be useful for macrofication and other things.

Moving arities to blocks has the nice property of giving implementations useful information up front, however some anticipated uses of this information would really want to know the types up front too. This patch proposes replacing block arities with function signature indices, which would provide full type information about a block up front.

sunfishcode · 2016-09-09T03:21:08Z

Per feedback, this PR is now updated to use inline signature immediates for the common cases of void, i32, i64, f32, and f64.

ghost · 2016-09-09T09:12:48Z

BinaryEncoding.md

-| `block` | `0x01` |  | begin a sequence of expressions, the last of which yields a value |
-| `loop` | `0x02` |  | begin a block which can also form control flow loops |
-| `if` | `0x03` | | begin if expression |
+| `block` | `0x01` | sig : `inline_signature_type` | begin a sequence of expressions, yielding 0 or 1 values |


There does not appear to be any compelling reason to restrict blocks to returning only '0 or 1 values` in the MVP. It is probably a small matter for the runtimes to be able to return multiple values here? But don't let that hold this up, can always revisit how it is going.

Also no mention if a function signature with arguments is usable in the MVP? Also seems a small matter, just not sure if it is really necessary yet.

rossberg · 2016-09-09T09:35:46Z

Yeah, it's still not clear to me how we gonna future-proof this encoding for multiple values etc. But I guess we can still rectify that in 0xd. So to move forward for now: lgtm modulo br_table oversight.

sunfishcode · 2016-09-09T15:56:42Z

Fixed the br_table oversight.

https://bugs.webkit.org/show_bug.cgi?id=161778 Reviewed by Michael Saboff. This patch makes some major changes to the way that the WASM function parser works. First, the control stack has been moved from the parser's context to the parser itself. This simplifies the way that the parser works and allows us to make the decoder iterative rather than recursive. Since the control stack has been moved to the parser, any context operation that refers to some block now receives that block by reference. For any if block, regardless of whether or not it is an if-then-else or not, we will allocate both the entire control flow diamond. This is not a major issue in the if-then case since B3 will immediately cleanup these blocks. In order to support if-then and if-then-else we needed to be able to distinguish what the type of the top block on the control stack is. This will be necessary when validating the else opcode in the future. In the B3 IR generator we decide to the type of the block strictly by the shape. Currently, if blocks don't handle passed and returned stack values correctly. I plan to fix this when I add support for the block signatures. See: WebAssembly/design#765 * testWASM.cpp: (runWASMTests): * wasm/WASMB3IRGenerator.cpp: (dumpProcedure): (JSC::WASM::parseAndCompile): * wasm/WASMB3IRGenerator.h: * wasm/WASMFunctionParser.h: (JSC::WASM::FunctionParser<Context>::parseBlock): (JSC::WASM::FunctionParser<Context>::parseExpression): (JSC::WASM::FunctionParser<Context>::parseUnreachableExpression): * wasm/WASMOps.h: git-svn-id: http://svn.webkit.org/repository/webkit/trunk@205769 268f45cc-cd09-0410-ab3c-d52691b4dbfc

titzer · 2016-09-12T11:17:38Z

The actual change here looks good to me, but can we simplify the text to leave out the discussion of arguments? We can always add that later when it becomes relevant.

titzer · 2016-09-12T11:18:26Z

BinaryEncoding.md

@@ -56,6 +56,14 @@ A single-byte unsigned integer indicating a [value type](AstSemantics.md#types).
 * `3` indicating type `f32` 
 * `4` indicating type `f64`

+### `inline_signature_type`
+A single-byte unsigned integer indicating a signature. These types are encoded as:


This will be simpler to explain in terms of a single result time, since the signatures won't be relevant until some future post-MVP version.

titzer · 2016-09-12T15:32:29Z

lgtm

sunfishcode · 2016-09-14T19:07:21Z

Merging with lgtms above.

Implements WebAssembly/design#765; specifically: - Adds block signatures (syntax: (block i32 ...) etc) - Removes arities from branches - Also simplifies if syntax: the label is on if now instead of the children, in order to be consistent -with the signature - Adjusts typing - Adapts all tests (phew...)

* Clarify that wasm may be viewed as either an AST or a stack machine. (#686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed". * Make opcode 0x00 `unreachable`. (#684) Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode. All-zeros is one of the more common patterns of corrupted data. This change makes it more likely that code that is accidentally zeroed, in whole or in part, will be noticed when executed rather than silently running through a nop slide. Obviously, this doesn't matter when an opcode table is present, but if there is a default opcode table, it would presumably use the opcodes defined here. * BinaryEncoding.md changes implied by #682 * Fix thinko in import section * Rename definition_kind to external_kind for precision * Rename resizable_definition to resizable_limits * Add opcode delimiter to init_expr * Add Elem section to ToC and move it before Data section to reflect Table going before Memory * Add missing init_expr to global variables and undo the grouped representation of globals * Note that only immutable globals can be exported * Change the other 'mutability' flag to 'varuint1' * Give 'anyfunc' its own opcode * Add note about immutable global import requirement * Remove explicit 'default' flag; make memory/table default by default * Change (get|set)_global opcodes * Add end opcode to functions * Use section codes instead of section names (rebasing onto 0xC instead of master) This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder. It allows for user-defined sections that have string names to be encoded in the same manner as before. The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being. * Use LEB for br_table (#738) * Describe operand order of call_indirect (#758) * Remove arities from call/return (#748) * Limit varint sizes in Binary Encoding. (#764) * Global section (#771) global-variable was a broken anchor and the type of count was an undefined reference and inconsistent with all the rest of the sections. * Make name section a user-string section. * Update BinaryEncoding.md * Update BinaryEncoding.md * Use positive section code byte * Remove specification of name strings for unknown sections * Update BinaryEncoding.md * Remove repetition in definition of var(u)int types (#768) * Fix typo (#781) * Move the element section before the code section (#779) * Binary format identifier is out of date (#785) * Update BinaryEncoding.md to reflect the ml-proto encoding of the memory and table sections. (#800) * Add string back * Block signatures (#765) * Replace branch arities with block and if signatures. Moving arities to blocks has the nice property of giving implementations useful information up front, however some anticipated uses of this information would really want to know the types up front too. This patch proposes replacing block arities with function signature indices, which would provide full type information about a block up front. * Remove the arity operand from br_table too. * Remove mentions of "arguments". * Make string part of the payload * Remove references to post-order AST in BinaryEncoding.md (#801) * Simplify loop by removing its exit label. This removes loop's bottom label. * Move description of `return` to correct column (#804) * type correction and missing close quote (#805) * Remove more references to AST (#806) * Remove reference to AST in JS.md Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now. * Use "instruction" instead of "AST operator" * Update rationale for stack machine * Update Rationale.md * Update discussion of expression trees * Update MVP.md * Update Rationale.md * Update Rationale.md * Remove references to expressions * Update Rationale.md * Update Rationale.md * Address review comments * Address review comments * Address review comments * Delete h

https://bugs.webkit.org/show_bug.cgi?id=161778 Reviewed by Michael Saboff. This patch makes some major changes to the way that the WASM function parser works. First, the control stack has been moved from the parser's context to the parser itself. This simplifies the way that the parser works and allows us to make the decoder iterative rather than recursive. Since the control stack has been moved to the parser, any context operation that refers to some block now receives that block by reference. For any if block, regardless of whether or not it is an if-then-else or not, we will allocate both the entire control flow diamond. This is not a major issue in the if-then case since B3 will immediately cleanup these blocks. In order to support if-then and if-then-else we needed to be able to distinguish what the type of the top block on the control stack is. This will be necessary when validating the else opcode in the future. In the B3 IR generator we decide to the type of the block strictly by the shape. Currently, if blocks don't handle passed and returned stack values correctly. I plan to fix this when I add support for the block signatures. See: WebAssembly/design#765 * testWASM.cpp: (runWASMTests): * wasm/WASMB3IRGenerator.cpp: (dumpProcedure): (JSC::WASM::parseAndCompile): * wasm/WASMB3IRGenerator.h: * wasm/WASMFunctionParser.h: (JSC::WASM::FunctionParser<Context>::parseBlock): (JSC::WASM::FunctionParser<Context>::parseExpression): (JSC::WASM::FunctionParser<Context>::parseUnreachableExpression): * wasm/WASMOps.h: Canonical link: https://commits.webkit.org/179983@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@205769 268f45cc-cd09-0410-ab3c-d52691b4dbfc

sunfishcode mentioned this pull request Sep 1, 2016

Terminator operators that imply "end" #778

Closed

titzer mentioned this pull request Sep 6, 2016

Move arities from branches to blocks #741

Closed

sunfishcode force-pushed the block-sigs branch from 7b474d7 to 66311e0 Compare September 9, 2016 03:18

sunfishcode force-pushed the block-sigs branch from 66311e0 to fb1f010 Compare September 9, 2016 03:19

sunfishcode added this to the MVP milestone Sep 9, 2016

ghost reviewed Sep 9, 2016
View reviewed changes

Remove the arity operand from br_table too.

dcd2a3a

rossberg mentioned this pull request Sep 9, 2016

Implement block signatures WebAssembly/spec#336

Merged

titzer reviewed Sep 12, 2016
View reviewed changes

Remove mentions of "arguments".

2bee76f

sunfishcode mentioned this pull request Sep 13, 2016

Remove loop's result value(s). #742

Closed

sunfishcode added the binary format label Sep 13, 2016

sunfishcode merged commit 477a2a2 into binary_0xc Sep 14, 2016

sunfishcode deleted the block-sigs branch September 14, 2016 19:09

taralx mentioned this pull request May 30, 2020

Type annotations on new instructions WebAssembly/function-references#27

Closed

RossTate mentioned this pull request Jun 11, 2020

Block-type annotations for branch-unrelated block-like instructions are redundant #1352

Open

Block signatures #765

Block signatures #765

Conversation

sunfishcode commented Aug 16, 2016

lukewagner commented Aug 16, 2016

MikeHolman commented Aug 16, 2016

ghost commented Aug 16, 2016

ghost commented Aug 17, 2016

titzer commented Aug 17, 2016 • edited Loading

ghost commented Aug 17, 2016

rossberg commented Aug 17, 2016

rossberg commented Aug 17, 2016

lukewagner commented Aug 17, 2016

lukewagner commented Aug 17, 2016

qwertie commented Aug 17, 2016 • edited Loading

ghost commented Aug 17, 2016 • edited by ghost Loading

ghost commented Aug 17, 2016 • edited by ghost Loading

ghost commented Aug 17, 2016

rossberg commented Aug 18, 2016

lukewagner commented Aug 18, 2016

titzer commented Aug 18, 2016

lukewagner commented Aug 18, 2016

ghost commented Aug 18, 2016

Cellule commented Aug 19, 2016

titzer commented Aug 19, 2016

lukewagner commented Aug 19, 2016

rossberg commented Aug 23, 2016 • edited Loading

rossberg commented Aug 23, 2016

lukewagner commented Aug 23, 2016

titzer commented Aug 23, 2016 • edited by binji Loading

AndrewScheidecker commented Aug 29, 2016

titzer commented Aug 30, 2016

AndrewScheidecker commented Aug 30, 2016

ghost commented Aug 30, 2016

ghost commented Aug 30, 2016 • edited by ghost Loading

rossberg commented Aug 31, 2016

sunfishcode commented Sep 9, 2016

ghost Sep 9, 2016

Choose a reason for hiding this comment

rossberg commented Sep 9, 2016

sunfishcode commented Sep 9, 2016

titzer commented Sep 12, 2016

titzer Sep 12, 2016

Choose a reason for hiding this comment

titzer commented Sep 12, 2016

sunfishcode commented Sep 14, 2016

titzer commented Aug 17, 2016 •

edited

Loading

qwertie commented Aug 17, 2016 •

edited

Loading

ghost commented Aug 17, 2016 •

edited by ghost

Loading

ghost commented Aug 17, 2016 •

edited by ghost

Loading

rossberg commented Aug 23, 2016 •

edited

Loading

titzer commented Aug 23, 2016 •

edited by binji

Loading

ghost commented Aug 30, 2016 •

edited by ghost

Loading