-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block signatures #765
Block signatures #765
Conversation
What I like about this change is:
|
lgtm, this is exactly what I wanted! |
Could some of the common high frequency block signatures be given specified indexes. For example could block signature index 0 be zero result values, and could there be specified indexes for the single value core types, so index 1 be (i32), index 2 be (i64), etc. Further could it be invalid to have a duplicate signature in this table, to force a canonical encoding. This might help the text format a lot because there would be canonical encodings for the common cases, and the text format might then not even need to encode the signature index in most cases and might still derive the signature from the text syntax. |
If the signatures could also be required to be in a sorted order this might be even better. Then there is only one canonical table for a set of signatures and the validator can check this by simply checking that they are in order. |
Maybe there is an encoding trick that we could use to obviate the need for signature entries until the multi-value era. E.g. what if the immediate is an inline value type (i.e. void, i32, i64, f32, f64), with one encoding being an index into the type table to be used when we have multi-values? |
If indexes are predefined and specified for the block types used in the MVP then the table will not be needed for the MVP, and there will be a canonical index for them. Perhaps consider sorting these in an order that will scale to multiple-values. |
Ben and I have discussed the design space more extensively. There are 3 main proposals on the table now. Here are the pros and cons as we see them:
Overall, Ben and I still think option (1) remains the most attractive. We have implemented it, understand its pros and cons best, and don't need to worry about unforeseen implications late in the game. The trade-offs of the others are less obvious, and their advantages appear small (and biased towards consumers). |
I'd also like to point out one further advantage of the status quo that I just ran into: for the linear text format (like will be visible in a debugger) having arities on branches is somewhat more user-friendly, because it is directly apparent how many values a branch operator consumes -- no need to search for the target (or rely on tooling). The information is where it belongs, that is, where it is used. |
If the changes are quite simple, I don't think the bullet "Requires changing implementations" should be considered a con given that we are in the middle of changing this all anyway for 0xc.
@titzer's idea above seems fine as a way to avoid the indirection, but again the cost here is low, especially since we know we're adding more forms to the type section anyway (that's the point of the
If there was a sixth, lowest consituency, I think this would be it.
I can't see any reason why we wouldn't want that for the same reason we want them for the current set of block constructs. Also, other than, it sounds like, a try/finally block for which the type annotations seem fine, are we really anticipating a bunch more blocks?
Are there any concrete examples of such a transformation that is valuable and can't be bothered to understand types? We put types on practically everything else, it seems hard to get away with being type-oblivious for anything non-trivial.
How is this any bigger than adding arity? |
@rossberg-chromium Branch-with-value / block-with-signature is no different than call / function-signature. For both, devtools could provide tooltips, highlighted, etc to make the information more immediate. |
For the text format, there may be as much advantage as disadvantage: to the reader there may be some value in seeing up front what type a block will produce, even when there are no branches out. And type inference can be supported for anyone writing wasm, just as we're looking at supporting infix |
@rossberg-chromium One key point, and perhaps even a show stopper, seems to be missing in the list of pros and cons: We still need arity annotations on the fall-though for multi-value support. The strategy of just returning all the values remaining on the block at the end of the block will significantly frustrate code using multiple values, or with |
@qwertie I agree that some text formats might want to annotate the block with their return type, but it would only frustrate the text format if they were required to annotate them all with the index too which would be required to be lossless. Thus my suggesting that they be required to be in sorted order so there is always a canonical order and so the text format does not need to annotate them with the signature index too even if it annotates them with the type. |
@rossberg-chromium I agree that a text format will be more familiar and readable if the number of result values are on the branches and even using |
On 17 August 2016 at 18:09, Luke Wagner notifications@github.com wrote:
Requires extending type section?
It's not a proper type, though, so would represent an anomaly in the type Requires changing text format and many tests.
Since you cite that hierarchy it's worth noting that it also puts "users" Also, the text format actually is user-facing. Forces annotations on all future blockish constructs.
It's extra redundancy, and it puts the cost on the wrong side: the
No, that's not right. We don't have type annotations on any generic There are e.g. many kind of peephole optimisations that don't require Least size-efficient without additional measures.
|
I'm saying it's in the area that is being changed anyway, so it's a change one way or another. Anyhow, it's very little work, mostly involves simplifying existing code, so this shouldn't be a deterrent regardless.
Not every type in the types section will be able to show up in every context where an index into the types section is specified so I don't think it makes sense to say it's not a "proper type". It's not like we're representing some cartesian closed category here ;)
Agreed and desired, for the same reason as adding block signatures in the first place.
If size is an actual concern, we can have the specialized opcodes. But I expect layer 1 compression would obviate the need.
What I mean is that the vast majority of ops are not generic.
Possibly, but I don't think we should constrain ourselves based on this level of speculation.
Since we're aiming at low-level clarity, not brevity, nor optimizing ease of writing by hand, with the current linear text format, I think it's actually useful to have the types returned by a block listed up front (just like having local types listed up front). For the experimental sugared syntaxes, it seems natural to drop these annotations when they are implied by the types of branches/fallthrough.
Let's not consider inlining the entire type vector into each block. A few bytes of type section entries are negligible (and avoidable in the MVP). That leaves the size effectively equivalent to arities-on-blocks. We can measure the overall delta, though. |
On Thu, Aug 18, 2016 at 5:28 PM, Luke Wagner notifications@github.com
So one alternative, arities-only, seems to be out. That leaves type-annotated blockish things (our set is now block, if, I'm in the middle of implementing block type annotations throughout V8 on
There are a couple of negatives that I already ran into, e.g. that one Other weird things are that with multi-values, br_table is actually Before there was a nice bottleneck in the decoder where all merge-y
|
Great, and thanks for the experimentation. We're also experimenting with how this works so we'll be able to compare notes in a bit.
It's not a types section, it'd just be a new |
Re: 'br_table is actually polymorphic', it seems appropriate to 'require that the arity of all blocks targeted in a br_table match.' Perhaps it is a the little extra work for the interpreter, to look at the target block's number of required values rather than for this to be an immediate argument, but this would have been necessary anyway for the fall-through, or the I see problems if the index space of the block types is shared with all other types, it would make having a canonical order even more problematic, and it might have a negative impact on code compression, so a separate table seems needed. |
For the type section, we could share the same type as Function types, but with no params. That way we could share an entry for blocks and functions if they happen to have the same signature.
I don't really understand how having the signature ahead of time would make decoding more complicated. For the binary space, one could argue that if we keep arity on br then 1 block with multiple branches would have to spend 1 byte on every branches where as having that byte on the block would cost 1 byte for all the branches. Plus, if we really want to save more space, we could have the *0 opcodes to means there are no values. |
On Thu, Aug 18, 2016 at 7:12 PM, Luke Wagner notifications@github.com
|
@titzer Ok; it'd just be a size optimization, similar to not including the signature of a |
@Cellule, the size argument you are making is exactly the one I was making when I originally proposed this. However, it is invalidated by two things: (1) However, if we want to go this way (I still prefer we don't) I like the suggestion to reuse function signatures. That may in fact be more future-proof, because we might want to allow block operators to also consume values from the stack eventually. |
@lukewagner, getting a bit OT, but ever since we moved to structural types, and after looking at the design space for managed types, I have wondered about the distinction we are making between inline types and defined types. Once you add more type structure, it becomes increasingly ad-hoc. Is there any particular reason left why we cannot make it uniform? That is, you can specify all types inline, but you can also name all types in the type section? Then the type section would just be a means of abbreviating common type expressions to save space. We could use the positive/negative index space for ids vs constructors that we discussed at some point to keep the encoding maximally compact (and fast). |
@rossberg-chromium Actually, function signatures may be the right thing here: in the MVP, input arity would be constrained to be 0, but post-MVP, we were realizing (as were you, in a recent email :) that there is no reason that blocks can't be given arguments in a consistent way (representing values on the stack that are popped by the block). Arguments even make sense for |
Just some numbers for context from the AngryBots.wasm and BananaBread. Removing the arity from branches and introducing type signatures (typically one byte inline as I described in a previous comment), we end up with: (apologies for the formatting) binji edit: tried to MD this:
|
I'm in favor of declaring a signature for blocks up front for the simplification in validator state.
There may be a bunch of superfluous blocks in these test cases, since if this is just moving arity from
I like this idea (I would use it for loops), but it makes me wonder: is there a good reason for function arguments to remain local variables, or should they just be implicitly pushed onto the operand stack on function entry? |
After having implemented this, I've come around to this idea. If we can move the block types inline for the common cases of void, i32, i64, f32, and f64, then this will LGTM. |
Thinking about #778, I realized that with block signatures, |
@AndrewScheidecker Right, so we are back to the old (current) block semantics, the fall through unwinds the stack, and some values can remain on the block value stack which might avoid some uses of |
@rossberg-chromium Can't both allow block starts to be anywhere and have breaks unwind. Having blocks unwind matches the restriction inherent in the text format lexical blocks so at this point I would strongly object to removing the 'unwind' semantics of blocks and it seems a key feature of the wasm design, something we are doing 'better' differently to CIL and may other VMs. But it does mean that the following are very different:
versus
Even with the suggested @sunfishcode Yeh, the |
@JSStats, I didn't say that you can insert blocks anywhere. That was never the case, regardless of That being said, there is the idea that block annotations should be function types, in which case they could later be allowed to consume stack, like e.g. in your example:
This may be useful for macrofication and other things. |
7b474d7
to
66311e0
Compare
Moving arities to blocks has the nice property of giving implementations useful information up front, however some anticipated uses of this information would really want to know the types up front too. This patch proposes replacing block arities with function signature indices, which would provide full type information about a block up front.
66311e0
to
fb1f010
Compare
Per feedback, this PR is now updated to use inline signature immediates for the common cases of void, i32, i64, f32, and f64. |
| `block` | `0x01` | | begin a sequence of expressions, the last of which yields a value | | ||
| `loop` | `0x02` | | begin a block which can also form control flow loops | | ||
| `if` | `0x03` | | begin if expression | | ||
| `block` | `0x01` | sig : `inline_signature_type` | begin a sequence of expressions, yielding 0 or 1 values | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There does not appear to be any compelling reason to restrict blocks to returning only '0 or 1 values` in the MVP. It is probably a small matter for the runtimes to be able to return multiple values here? But don't let that hold this up, can always revisit how it is going.
Also no mention if a function signature with arguments is usable in the MVP? Also seems a small matter, just not sure if it is really necessary yet.
Yeah, it's still not clear to me how we gonna future-proof this encoding for multiple values etc. But I guess we can still rectify that in 0xd. So to move forward for now: lgtm modulo br_table oversight. |
Fixed the |
https://bugs.webkit.org/show_bug.cgi?id=161778 Reviewed by Michael Saboff. This patch makes some major changes to the way that the WASM function parser works. First, the control stack has been moved from the parser's context to the parser itself. This simplifies the way that the parser works and allows us to make the decoder iterative rather than recursive. Since the control stack has been moved to the parser, any context operation that refers to some block now receives that block by reference. For any if block, regardless of whether or not it is an if-then-else or not, we will allocate both the entire control flow diamond. This is not a major issue in the if-then case since B3 will immediately cleanup these blocks. In order to support if-then and if-then-else we needed to be able to distinguish what the type of the top block on the control stack is. This will be necessary when validating the else opcode in the future. In the B3 IR generator we decide to the type of the block strictly by the shape. Currently, if blocks don't handle passed and returned stack values correctly. I plan to fix this when I add support for the block signatures. See: WebAssembly/design#765 * testWASM.cpp: (runWASMTests): * wasm/WASMB3IRGenerator.cpp: (dumpProcedure): (JSC::WASM::parseAndCompile): * wasm/WASMB3IRGenerator.h: * wasm/WASMFunctionParser.h: (JSC::WASM::FunctionParser<Context>::parseBlock): (JSC::WASM::FunctionParser<Context>::parseExpression): (JSC::WASM::FunctionParser<Context>::parseUnreachableExpression): * wasm/WASMOps.h: git-svn-id: http://svn.webkit.org/repository/webkit/trunk@205769 268f45cc-cd09-0410-ab3c-d52691b4dbfc
The actual change here looks good to me, but can we simplify the text to leave out the discussion of arguments? We can always add that later when it becomes relevant. |
@@ -56,6 +56,14 @@ A single-byte unsigned integer indicating a [value type](AstSemantics.md#types). | |||
* `3` indicating type `f32` | |||
* `4` indicating type `f64` | |||
|
|||
### `inline_signature_type` | |||
A single-byte unsigned integer indicating a signature. These types are encoded as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be simpler to explain in terms of a single result time, since the signatures won't be relevant until some future post-MVP version.
lgtm |
Merging with lgtms above. |
Implements WebAssembly/design#765; specifically: - Adds block signatures (syntax: (block i32 ...) etc) - Removes arities from branches - Also simplifies if syntax: the label is on if now instead of the children, in order to be consistent -with the signature - Adjusts typing - Adapts all tests (phew...)
* Clarify that wasm may be viewed as either an AST or a stack machine. (#686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed". * Make opcode 0x00 `unreachable`. (#684) Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode. All-zeros is one of the more common patterns of corrupted data. This change makes it more likely that code that is accidentally zeroed, in whole or in part, will be noticed when executed rather than silently running through a nop slide. Obviously, this doesn't matter when an opcode table is present, but if there is a default opcode table, it would presumably use the opcodes defined here. * BinaryEncoding.md changes implied by #682 * Fix thinko in import section * Rename definition_kind to external_kind for precision * Rename resizable_definition to resizable_limits * Add opcode delimiter to init_expr * Add Elem section to ToC and move it before Data section to reflect Table going before Memory * Add missing init_expr to global variables and undo the grouped representation of globals * Note that only immutable globals can be exported * Change the other 'mutability' flag to 'varuint1' * Give 'anyfunc' its own opcode * Add note about immutable global import requirement * Remove explicit 'default' flag; make memory/table default by default * Change (get|set)_global opcodes * Add end opcode to functions * Use section codes instead of section names (rebasing onto 0xC instead of master) This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder. It allows for user-defined sections that have string names to be encoded in the same manner as before. The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being. * Use LEB for br_table (#738) * Describe operand order of call_indirect (#758) * Remove arities from call/return (#748) * Limit varint sizes in Binary Encoding. (#764) * Global section (#771) global-variable was a broken anchor and the type of count was an undefined reference and inconsistent with all the rest of the sections. * Make name section a user-string section. * Update BinaryEncoding.md * Update BinaryEncoding.md * Use positive section code byte * Remove specification of name strings for unknown sections * Update BinaryEncoding.md * Remove repetition in definition of var(u)int types (#768) * Fix typo (#781) * Move the element section before the code section (#779) * Binary format identifier is out of date (#785) * Update BinaryEncoding.md to reflect the ml-proto encoding of the memory and table sections. (#800) * Add string back * Block signatures (#765) * Replace branch arities with block and if signatures. Moving arities to blocks has the nice property of giving implementations useful information up front, however some anticipated uses of this information would really want to know the types up front too. This patch proposes replacing block arities with function signature indices, which would provide full type information about a block up front. * Remove the arity operand from br_table too. * Remove mentions of "arguments". * Make string part of the payload * Remove references to post-order AST in BinaryEncoding.md (#801) * Simplify loop by removing its exit label. This removes loop's bottom label. * Move description of `return` to correct column (#804) * type correction and missing close quote (#805) * Remove more references to AST (#806) * Remove reference to AST in JS.md Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now. * Use "instruction" instead of "AST operator" * Update rationale for stack machine * Update Rationale.md * Update discussion of expression trees * Update MVP.md * Update Rationale.md * Update Rationale.md * Remove references to expressions * Update Rationale.md * Update Rationale.md * Address review comments * Address review comments * Address review comments * Delete h
https://bugs.webkit.org/show_bug.cgi?id=161778 Reviewed by Michael Saboff. This patch makes some major changes to the way that the WASM function parser works. First, the control stack has been moved from the parser's context to the parser itself. This simplifies the way that the parser works and allows us to make the decoder iterative rather than recursive. Since the control stack has been moved to the parser, any context operation that refers to some block now receives that block by reference. For any if block, regardless of whether or not it is an if-then-else or not, we will allocate both the entire control flow diamond. This is not a major issue in the if-then case since B3 will immediately cleanup these blocks. In order to support if-then and if-then-else we needed to be able to distinguish what the type of the top block on the control stack is. This will be necessary when validating the else opcode in the future. In the B3 IR generator we decide to the type of the block strictly by the shape. Currently, if blocks don't handle passed and returned stack values correctly. I plan to fix this when I add support for the block signatures. See: WebAssembly/design#765 * testWASM.cpp: (runWASMTests): * wasm/WASMB3IRGenerator.cpp: (dumpProcedure): (JSC::WASM::parseAndCompile): * wasm/WASMB3IRGenerator.h: * wasm/WASMFunctionParser.h: (JSC::WASM::FunctionParser<Context>::parseBlock): (JSC::WASM::FunctionParser<Context>::parseExpression): (JSC::WASM::FunctionParser<Context>::parseUnreachableExpression): * wasm/WASMOps.h: Canonical link: https://commits.webkit.org/179983@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@205769 268f45cc-cd09-0410-ab3c-d52691b4dbfc
#741 proposed moving the arity immediates from branches to block/if, which has the advantage of giving decoders more information up front, however the arities alone aren't enough for some use cases. This PR implements @MikeHolman's suggestion to provide full type information up front.
This proposes a new "block signature" type to the Type section, so
block
/if
only need a single index immediate to specify their full type, even if they are extended to returning multiple values post-MVP.This gives us a greater benefit than #741 did at approximately the same code size cost -- one immediate per
block
andif
. And if code size of the immediates becomes a concern, we can introduce anif0
(and similar opcodes for future block-like opcodes).In my experiments, the decode time cost of indirecting into the type table to obtain the signature data was not significant.