Text format: type definition abbreviations #333

tlively · 2022-11-01T22:06:13Z

Since we write a lot of text format tests in Binaryen, we've implemented a number of abbreviations to make type definitions shorter. Here's the grammar I've implemented the new Binaryen parser that combines the text format used in this repo with the abbreviations used in Binaryen.

deftype ::= '(' 'rec' subtype* ')'
          | subtype

subtype ::= '(' 'type' id? '(' 'sub' typeidx? strtype ')' ')'
          | '(' 'type' id? strtype ')'

strtype ::= functype
          | structtype
          | arraytype

functype ::= '... ;; same as MVP

structtype ::= '(' 'struct' field* ')'

arraytype ::= '(' 'array' field ')'

field ::= '(' 'field' id? fieldtype ')'
        | '(' 'field' fieldtype* ')'

fieldtype ::= storagetype
            | '(' 'mut' storagetype ')'

storagetype ::= valtype | packedtype

packedtype ::= 'i8' | 'i16'

Comments and suggested edits welcome. Once it looks good, we should document it in the MVP doc. For the actual spec, we'll have to extract the abbreviations from the grammar and turn them into rewrite rules.

The text was updated successfully, but these errors were encountered:

sbc100 · 2022-11-01T22:16:12Z

What is strtype? Presumably nothing to do with strings, but it certainly reads like it is. Maybe a better name?

tlively · 2022-11-01T22:44:38Z

strtype is short for "structural type" and comes directly from the proposal docs: https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md#type-definitions. I agree it would be nice to find a clearer name or abbreviation for those, but I do want to keep the explanation of the text format in sync with the abstract syntax.

rossberg · 2022-11-09T09:42:13Z

Actually, strtype stands for "structured type", i.e., a compound type like structs, funcs, arrays, as opposed to scalar types like ints and floats.

tlively · 2022-11-10T18:23:22Z

There is also the issue of using symbolic references to struct fields, as discussed here: #193 (comment). I don't believe Binaryen's parser currently allows symbolic field references, but I agree they would be nice to have. Since we ended up keeping the type annotations on struct.get and friends, we could keep the dependent name lookup behavior where each struct type gets its own index space of types. @rossberg, wdyt? What if separate definitions of the same type contain conflicting field names?

rossberg · 2022-11-22T11:23:06Z

We could, but personally, I don't find it worth the added complexity. Currently, conflicting field symbols are simply an error, so you prefix them to disambiguate, like in assembly and the good old C days. : )

tlively · 2022-11-22T14:23:43Z

We should consider allowing un-prefixed names to be a requirement to make debugging in browser devtools more feasible. Binaryen supports this today (although it doesn’t allow using field names symbolically in the text), so losing this capability would regress the debugging experience. As we get closer to productionizing WasmGC, debugging has become an important area of user concern.

rossberg · 2022-11-22T14:28:52Z

Binaryen supports this today (although it doesn’t allow using field names symbolically in the text)

I'm confused, where does it support them then?

tlively · 2022-11-22T14:59:38Z

Field names in type definitions are parsed and end up in the names section, but the parser doesn’t support using those names as field indices in instructions.

titzer · 2022-11-22T15:46:37Z

Now that we've settled on keeping an immediate for both the struct index and the field index for struct.get and struct.set, it seems logical to me for the text format to reflect that. In particular, a standard two-level namespace of struct names and field names e.g. $S.$f in the text format seems convenient and would have less surprise for most users than requiring field name disambiguation via manual prefixing.

tlively · 2023-03-22T20:12:54Z

I filed #361 to continue the discussion about whether we should have dependent field name lookup.

As promised, here are some examples of the abbreviations I suggested allowing in the opening post. Most are probably uncontroversial.

;; maximal type definition
(rec (type (sub (struct (field i32) (field i64)))))

;; abbreviate out singleton rec group
(type (sub (struct (field i32) (field i64))))

;; abbreviate out declaration of zero supertypes
(type (struct (field i32) (field i64)))

;; combine field declarations
(type (struct (field i32 i64)))

;; abbreviate out 'field' -- note: no longer allowed
(type (struct i32 i64))

rossberg · 2023-03-22T20:33:45Z

These all sound good to me and are already supported by the interpreter. The only exception is the last one, which I'm a bit skeptical about. It has no real parallel anywhere else so far, and it might become a debt when we later want to add more elements to struct declarations (e.g. descriptors or static fields like we discussed).

tlively · 2023-03-22T21:51:15Z

Fair enough. I would be fine dropping the last one.

eqrion · 2023-04-06T16:06:56Z

@tlively Could you post an updated grammar for abbreviations?

Your example (sub (struct (field i32) (field i64))) is not possible in the grammar you posted at the beginning.

Also, your (3 and 4th) abbreviations for (struct ...) without being wrapped in (type) seems odd to me. We couldn't allow that for function types, or else they'd be confused with function definitions. So allowing that in structs/arrays would be asymmetrical.

tlively · 2023-04-06T16:27:58Z

I've updated the grammar in the opening post to remove the abbreviation that allows having a list of field types without (field ...). I've also updated the examples a few posts up to add the missing (type ...) declarations, which are not intended to be possible to remove.

eqrion · 2023-04-06T16:35:58Z

Okay, the updated grammar LGTM.

Your previous comment with examples though still seems to not match the grammar, and I just want to make sure I understand correctly. My understanding of the proposed grammar is that the keyword nesting always goes: 'rec', 'type', and then 'sub'. With 'rec' and 'sub' being possible to omit, but 'type' always being present.

tlively · 2023-04-06T16:49:31Z

Yes, your understanding is correct and my examples were wrong (again). I fixed the examples now.

eqrion · 2023-05-08T16:05:04Z

Another issue here is the 'final' keyword. My understanding from MVP.md is that for pre-existing definitions of the form (type ...) we default to final=true. But otherwise there is an optional final flag in sub: (sub final? ...).

This is sort of weird, because to define a final type definition that has no super type you need to declare it like:

(sub final (type (struct)))

And that makes the sub keyword just noise, this type is not declaring itself as a subtype. It also means that wrapping (sub) around an existing type definition without also adding final into it is actually a semantic change to the type definition.

This makes me wonder if the 'final' keyword should be inverted so the absence of sub matches the absence of specifying the flag in sub. I'm not sure what a good keyword would be though, nofinal is not great.

rossberg · 2023-05-09T08:34:58Z

@eqrion, unfortunately, we cannot change the meaning of the existing text format, plus we want to treat preexisting type declarations as final (in order to keep call_indirect unaffected). That pretty much necessitates the current syntax and interpretation, which also more or less matches what happens in the binary format.

Edit: After rereading your comment I realised that I did not address your actual suggestion. Yes, we could invert the keyword, but that would add a lot of noise to the vastly more common case. And it would be the opposite of how it is presented in all other languages. So I'm not sure we really want to do that.

If it helps, my suggested reading of sub is "participates in the subtype relation". A special case of that is "final-sub", which just happens to be written sub final.

tlively · 2023-05-09T14:56:55Z

This is sort of weird, because to define a final type definition that has no super type you need to declare it like:
(sub final (type (struct)))

Can't this be shortened to (type (struct)), since the default is final=true? IIUC, the weird case is the opposite, where you define a non-final type with no supertype, which would be (sub (type (struct))). That's less verbose and is amenable to the interpretation of "sub" @rossberg gave, so it seems fine to me.

eqrion · 2023-05-16T20:23:00Z

The odd thing to me is that wrapping a plain type definition (type (struct)) in an empty sub (sub (type (struct))) results in inverting the final attribute. Even though it looks like a no-op or at least a change completely unrelated to 'final'. From a text syntax perspective, that's confusing.

I agree though that final=false will probably be the common case for 'sub', so inverting it would be verbose. I'm not sure that's an issue though, the text format is already very verbose. And the advantage is that the absence of 'nofinal' or 'extensible' would syntactically match the plain type definition syntax we have.

bvisness · 2023-07-19T16:05:34Z

The current text syntax has been causing some confusion over here at Mozilla. In particular, we had confusion over why a particular type was not extensible and why an empty (sub would do anything at all, in situations like this for example:

(type $s1 (sub (struct))) ;; me an hour ago: "is this a subtype of (struct) somehow...?"
(type $s2 (sub $s1 (struct (field i32))))

In my opinion the current syntax conflates two independent concepts: finality and base types. It's not clear that sub changes whether a type is final or not, which makes the existence of sub final surprising and the use of a bare sub even more surprising. (I have always read sub $x as "subtype of $x", making a bare sub nonsensical.) These two concepts are quite reasonably combined in the binary encoding, but I think in the text format they ought to be treated separately.

I would propose the following syntax for type definitions instead:

(type $s1 (struct))                             ;; final with no parents
(type $s2 open (struct))                        ;; non-final with no parents
(type $s3 (sub $s2) (struct (field i32)))       ;; final with one parent
(type $s4 open (sub $s2) (struct (field i32)))  ;; non-final with one parent

You can see I suggest the term "open" for "non-final" - this naturally follows from familiar phrases like "open to extension", and is the terminology used by Kotlin (which is also final-by-default).

I also suggest that (sub) not wrap the type's structure definition. This is to make it clearly independent and obviously optional. It also more naturally handles the potential future where types can have multiple base types, which the binary encoding already supports: (sub $t1 $t2 ...)

With this scheme, types would remain final by default. It's true that this would add noise to GC type hierarchy definitions, but I think it's better to be slightly more explicit than to have finality flip-flopping back and forth as you add sub and then sub final.

Curious for your thoughts.

tlively · 2023-07-19T18:36:29Z

I had never heard of "open" used for this before and I don't think it's immediately clear what that means without further explanation. I'd be open to it, though, and at least there is precedent.

Having the (sub ...) wrap the type in the fully expanded text format has the benefit of matching the currently specified abstract syntax and concrete binary syntax (including in the case of 0-length supertype vectors), so I would prefer not to change that. I guess the smallest possible change would be to keep (sub final ...) as-is and change the current (sub ...) to (sub open ...) so that sub never appears on its own. What would folks think of that?

rossberg · 2023-07-20T11:17:28Z

Agreed with @tlively, changing the syntactic structure would create a mismatch with abstract and binary syntax, which the text format is supposed to reflect.

Not too excited about requiring open, but I could be convinced if there is overwhelming enthusiasm for that. But somebody would need to volunteer to make all the changes to spec, interpreter, tests, etc. ;)

bvisness · 2023-07-21T16:09:31Z

In that case, I think it would still be an improvement to use (sub open instead of (sub final. To my knowledge, that would align equally well with the binary encoding, but would avoid the surprising problem of requiring an empty (sub to change the finality of a type. It also would probably be an easier transition from the syntax used today, if that matters.

So to be clear, that would look like:

(type $s1 (struct))                             ;; final with no parents
(type $s2 (sub open (struct)))                  ;; non-final with no parents
(type $s3 (sub $s2 (struct (field i32))))       ;; final with one parent
(type $s4 (sub open $s2 (struct (field i32))))  ;; non-final with one parent

In this world, an empty (sub would not be useful, which is fine, because I think it is inherently confusing. The presence of the word open makes it clear that something about the type is changing even if no base types are specified.

If others agreed with this, I would be happy to try and update the spec, interpreter, tests, etc. myself, with the disclaimer that it would be my first time doing such things 🙂

tlively · 2023-07-21T16:17:49Z

Oh interesting, I was thinking of having (sub open ...) and (sub final ...) be the two forms, but you have (sub open ...) and (sub ...). I think your way makes sense 👍

A PR updating everything for this change would be very welcome.

Also note that it should still be valid to have this:

(type $s5 (sub (struct))) ;; final with no parents, non-abbreviated form of $s1

tlively · 2023-07-26T16:17:16Z

@bvisness, are you still planning to take a stab at implementing this? If you need any help or cannot prioritize this, let me know and I can help out.

bvisness · 2023-07-27T01:59:22Z

Yes, I was planning to do this roughly next week or so. If it's holding anything up, though, I'd be happy to jump on it sooner.

Another thing I just realized - while I would prefer to have (sub for final and (sub open for non-final, this does invert the current meaning of (sub. Is that a problem at this stage? If so, I think it would have to be (sub final and (sub open so that there is no ambiguity...but I hope that the text format for GC is still not really in wide enough use for that to be a concern.

tlively · 2023-07-27T15:08:16Z

No, it's not that urgent, next week would be fine. We just need to make sure we finish up well before September.

This change should be fine regarding back compat. Anyone depending on the text format would be using it as an input to Binaryen, and I can handle coordinating with users and updating Binaryen separately from this spec update.

tlively added the spec notes label Nov 15, 2022

tlively mentioned this issue Mar 22, 2023

Text format: field names #361

Closed

tlively added requires discussion and removed spec notes labels Mar 22, 2023

tlively added spec notes and removed requires discussion labels May 2, 2023

tlively mentioned this issue Jul 24, 2023

Call for agenda for July 25 GC subgroup meeting #410

Closed

bvisness mentioned this issue Jul 28, 2023

Use open instead of final in the text format #413

Closed

bvisness mentioned this issue Aug 10, 2023

Update wast to align with GC spec tests bytecodealliance/wasm-tools#1140

Merged

rossberg closed this as completed Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text format: type definition abbreviations #333

Text format: type definition abbreviations #333

tlively commented Nov 1, 2022 •

edited

Loading

sbc100 commented Nov 1, 2022

tlively commented Nov 1, 2022

rossberg commented Nov 9, 2022

tlively commented Nov 10, 2022

rossberg commented Nov 22, 2022

tlively commented Nov 22, 2022

rossberg commented Nov 22, 2022

tlively commented Nov 22, 2022

titzer commented Nov 22, 2022

tlively commented Mar 22, 2023 •

edited

Loading

rossberg commented Mar 22, 2023

tlively commented Mar 22, 2023

eqrion commented Apr 6, 2023

tlively commented Apr 6, 2023

eqrion commented Apr 6, 2023

tlively commented Apr 6, 2023

eqrion commented May 8, 2023

rossberg commented May 9, 2023 •

edited

Loading

tlively commented May 9, 2023

eqrion commented May 16, 2023

bvisness commented Jul 19, 2023

tlively commented Jul 19, 2023

rossberg commented Jul 20, 2023

bvisness commented Jul 21, 2023 •

edited

Loading

tlively commented Jul 21, 2023

tlively commented Jul 26, 2023

bvisness commented Jul 27, 2023

tlively commented Jul 27, 2023

Text format: type definition abbreviations #333

Text format: type definition abbreviations #333

Comments

tlively commented Nov 1, 2022 • edited Loading

sbc100 commented Nov 1, 2022

tlively commented Nov 1, 2022

rossberg commented Nov 9, 2022

tlively commented Nov 10, 2022

rossberg commented Nov 22, 2022

tlively commented Nov 22, 2022

rossberg commented Nov 22, 2022

tlively commented Nov 22, 2022

titzer commented Nov 22, 2022

tlively commented Mar 22, 2023 • edited Loading

rossberg commented Mar 22, 2023

tlively commented Mar 22, 2023

eqrion commented Apr 6, 2023

tlively commented Apr 6, 2023

eqrion commented Apr 6, 2023

tlively commented Apr 6, 2023

eqrion commented May 8, 2023

rossberg commented May 9, 2023 • edited Loading

tlively commented May 9, 2023

eqrion commented May 16, 2023

bvisness commented Jul 19, 2023

tlively commented Jul 19, 2023

rossberg commented Jul 20, 2023

bvisness commented Jul 21, 2023 • edited Loading

tlively commented Jul 21, 2023

tlively commented Jul 26, 2023

bvisness commented Jul 27, 2023

tlively commented Jul 27, 2023

tlively commented Nov 1, 2022 •

edited

Loading

tlively commented Mar 22, 2023 •

edited

Loading

rossberg commented May 9, 2023 •

edited

Loading

bvisness commented Jul 21, 2023 •

edited

Loading