Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text format: type definition abbreviations #333

Closed
tlively opened this issue Nov 1, 2022 · 28 comments
Closed

Text format: type definition abbreviations #333

tlively opened this issue Nov 1, 2022 · 28 comments

Comments

@tlively
Copy link
Member

tlively commented Nov 1, 2022

Since we write a lot of text format tests in Binaryen, we've implemented a number of abbreviations to make type definitions shorter. Here's the grammar I've implemented the new Binaryen parser that combines the text format used in this repo with the abbreviations used in Binaryen.

deftype ::= '(' 'rec' subtype* ')'
          | subtype

subtype ::= '(' 'type' id? '(' 'sub' typeidx? strtype ')' ')'
          | '(' 'type' id? strtype ')'

strtype ::= functype
          | structtype
          | arraytype

functype ::= '... ;; same as MVP

structtype ::= '(' 'struct' field* ')'

arraytype ::= '(' 'array' field ')'

field ::= '(' 'field' id? fieldtype ')'
        | '(' 'field' fieldtype* ')'

fieldtype ::= storagetype
            | '(' 'mut' storagetype ')'

storagetype ::= valtype | packedtype

packedtype ::= 'i8' | 'i16'

Comments and suggested edits welcome. Once it looks good, we should document it in the MVP doc. For the actual spec, we'll have to extract the abbreviations from the grammar and turn them into rewrite rules.

@sbc100
Copy link
Member

sbc100 commented Nov 1, 2022

What is strtype? Presumably nothing to do with strings, but it certainly reads like it is. Maybe a better name?

@tlively
Copy link
Member Author

tlively commented Nov 1, 2022

strtype is short for "structural type" and comes directly from the proposal docs: https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md#type-definitions. I agree it would be nice to find a clearer name or abbreviation for those, but I do want to keep the explanation of the text format in sync with the abstract syntax.

@rossberg
Copy link
Member

rossberg commented Nov 9, 2022

Actually, strtype stands for "structured type", i.e., a compound type like structs, funcs, arrays, as opposed to scalar types like ints and floats.

@tlively
Copy link
Member Author

tlively commented Nov 10, 2022

There is also the issue of using symbolic references to struct fields, as discussed here: #193 (comment). I don't believe Binaryen's parser currently allows symbolic field references, but I agree they would be nice to have. Since we ended up keeping the type annotations on struct.get and friends, we could keep the dependent name lookup behavior where each struct type gets its own index space of types. @rossberg, wdyt? What if separate definitions of the same type contain conflicting field names?

@rossberg
Copy link
Member

We could, but personally, I don't find it worth the added complexity. Currently, conflicting field symbols are simply an error, so you prefix them to disambiguate, like in assembly and the good old C days. : )

@tlively
Copy link
Member Author

tlively commented Nov 22, 2022

We should consider allowing un-prefixed names to be a requirement to make debugging in browser devtools more feasible. Binaryen supports this today (although it doesn’t allow using field names symbolically in the text), so losing this capability would regress the debugging experience. As we get closer to productionizing WasmGC, debugging has become an important area of user concern.

@rossberg
Copy link
Member

Binaryen supports this today (although it doesn’t allow using field names symbolically in the text)

I'm confused, where does it support them then?

@tlively
Copy link
Member Author

tlively commented Nov 22, 2022

Field names in type definitions are parsed and end up in the names section, but the parser doesn’t support using those names as field indices in instructions.

@titzer
Copy link
Contributor

titzer commented Nov 22, 2022

Now that we've settled on keeping an immediate for both the struct index and the field index for struct.get and struct.set, it seems logical to me for the text format to reflect that. In particular, a standard two-level namespace of struct names and field names e.g. $S.$f in the text format seems convenient and would have less surprise for most users than requiring field name disambiguation via manual prefixing.

@tlively
Copy link
Member Author

tlively commented Mar 22, 2023

I filed #361 to continue the discussion about whether we should have dependent field name lookup.

As promised, here are some examples of the abbreviations I suggested allowing in the opening post. Most are probably uncontroversial.

;; maximal type definition
(rec (type (sub (struct (field i32) (field i64)))))

;; abbreviate out singleton rec group
(type (sub (struct (field i32) (field i64))))

;; abbreviate out declaration of zero supertypes
(type (struct (field i32) (field i64)))

;; combine field declarations
(type (struct (field i32 i64)))

;; abbreviate out 'field' -- note: no longer allowed
(type (struct i32 i64))

@rossberg
Copy link
Member

These all sound good to me and are already supported by the interpreter. The only exception is the last one, which I'm a bit skeptical about. It has no real parallel anywhere else so far, and it might become a debt when we later want to add more elements to struct declarations (e.g. descriptors or static fields like we discussed).

@tlively
Copy link
Member Author

tlively commented Mar 22, 2023

Fair enough. I would be fine dropping the last one.

@eqrion
Copy link
Contributor

eqrion commented Apr 6, 2023

@tlively Could you post an updated grammar for abbreviations?

Your example (sub (struct (field i32) (field i64))) is not possible in the grammar you posted at the beginning.

Also, your (3 and 4th) abbreviations for (struct ...) without being wrapped in (type) seems odd to me. We couldn't allow that for function types, or else they'd be confused with function definitions. So allowing that in structs/arrays would be asymmetrical.

@tlively
Copy link
Member Author

tlively commented Apr 6, 2023

I've updated the grammar in the opening post to remove the abbreviation that allows having a list of field types without (field ...). I've also updated the examples a few posts up to add the missing (type ...) declarations, which are not intended to be possible to remove.

@eqrion
Copy link
Contributor

eqrion commented Apr 6, 2023

Okay, the updated grammar LGTM.

Your previous comment with examples though still seems to not match the grammar, and I just want to make sure I understand correctly. My understanding of the proposed grammar is that the keyword nesting always goes: 'rec', 'type', and then 'sub'. With 'rec' and 'sub' being possible to omit, but 'type' always being present.

@tlively
Copy link
Member Author

tlively commented Apr 6, 2023

Yes, your understanding is correct and my examples were wrong (again). I fixed the examples now.

@eqrion
Copy link
Contributor

eqrion commented May 8, 2023

Another issue here is the 'final' keyword. My understanding from MVP.md is that for pre-existing definitions of the form (type ...) we default to final=true. But otherwise there is an optional final flag in sub: (sub final? ...).

This is sort of weird, because to define a final type definition that has no super type you need to declare it like:

(sub final (type (struct)))

And that makes the sub keyword just noise, this type is not declaring itself as a subtype. It also means that wrapping (sub) around an existing type definition without also adding final into it is actually a semantic change to the type definition.

This makes me wonder if the 'final' keyword should be inverted so the absence of sub matches the absence of specifying the flag in sub. I'm not sure what a good keyword would be though, nofinal is not great.

@rossberg
Copy link
Member

rossberg commented May 9, 2023

@eqrion, unfortunately, we cannot change the meaning of the existing text format, plus we want to treat preexisting type declarations as final (in order to keep call_indirect unaffected). That pretty much necessitates the current syntax and interpretation, which also more or less matches what happens in the binary format.

Edit: After rereading your comment I realised that I did not address your actual suggestion. Yes, we could invert the keyword, but that would add a lot of noise to the vastly more common case. And it would be the opposite of how it is presented in all other languages. So I'm not sure we really want to do that.

If it helps, my suggested reading of sub is "participates in the subtype relation". A special case of that is "final-sub", which just happens to be written sub final.

@tlively
Copy link
Member Author

tlively commented May 9, 2023

This is sort of weird, because to define a final type definition that has no super type you need to declare it like:

(sub final (type (struct)))

Can't this be shortened to (type (struct)), since the default is final=true? IIUC, the weird case is the opposite, where you define a non-final type with no supertype, which would be (sub (type (struct))). That's less verbose and is amenable to the interpretation of "sub" @rossberg gave, so it seems fine to me.

@eqrion
Copy link
Contributor

eqrion commented May 16, 2023

The odd thing to me is that wrapping a plain type definition (type (struct)) in an empty sub (sub (type (struct))) results in inverting the final attribute. Even though it looks like a no-op or at least a change completely unrelated to 'final'. From a text syntax perspective, that's confusing.

I agree though that final=false will probably be the common case for 'sub', so inverting it would be verbose. I'm not sure that's an issue though, the text format is already very verbose. And the advantage is that the absence of 'nofinal' or 'extensible' would syntactically match the plain type definition syntax we have.

@bvisness
Copy link
Contributor

The current text syntax has been causing some confusion over here at Mozilla. In particular, we had confusion over why a particular type was not extensible and why an empty (sub would do anything at all, in situations like this for example:

(type $s1 (sub (struct))) ;; me an hour ago: "is this a subtype of (struct) somehow...?"
(type $s2 (sub $s1 (struct (field i32))))

In my opinion the current syntax conflates two independent concepts: finality and base types. It's not clear that sub changes whether a type is final or not, which makes the existence of sub final surprising and the use of a bare sub even more surprising. (I have always read sub $x as "subtype of $x", making a bare sub nonsensical.) These two concepts are quite reasonably combined in the binary encoding, but I think in the text format they ought to be treated separately.

I would propose the following syntax for type definitions instead:

(type $s1 (struct))                             ;; final with no parents
(type $s2 open (struct))                        ;; non-final with no parents
(type $s3 (sub $s2) (struct (field i32)))       ;; final with one parent
(type $s4 open (sub $s2) (struct (field i32)))  ;; non-final with one parent

You can see I suggest the term "open" for "non-final" - this naturally follows from familiar phrases like "open to extension", and is the terminology used by Kotlin (which is also final-by-default).

I also suggest that (sub) not wrap the type's structure definition. This is to make it clearly independent and obviously optional. It also more naturally handles the potential future where types can have multiple base types, which the binary encoding already supports: (sub $t1 $t2 ...)

With this scheme, types would remain final by default. It's true that this would add noise to GC type hierarchy definitions, but I think it's better to be slightly more explicit than to have finality flip-flopping back and forth as you add sub and then sub final.

Curious for your thoughts.

@tlively
Copy link
Member Author

tlively commented Jul 19, 2023

I had never heard of "open" used for this before and I don't think it's immediately clear what that means without further explanation. I'd be open to it, though, and at least there is precedent.

Having the (sub ...) wrap the type in the fully expanded text format has the benefit of matching the currently specified abstract syntax and concrete binary syntax (including in the case of 0-length supertype vectors), so I would prefer not to change that. I guess the smallest possible change would be to keep (sub final ...) as-is and change the current (sub ...) to (sub open ...) so that sub never appears on its own. What would folks think of that?

@rossberg
Copy link
Member

Agreed with @tlively, changing the syntactic structure would create a mismatch with abstract and binary syntax, which the text format is supposed to reflect.

Not too excited about requiring open, but I could be convinced if there is overwhelming enthusiasm for that. But somebody would need to volunteer to make all the changes to spec, interpreter, tests, etc. ;)

@bvisness
Copy link
Contributor

bvisness commented Jul 21, 2023

In that case, I think it would still be an improvement to use (sub open instead of (sub final. To my knowledge, that would align equally well with the binary encoding, but would avoid the surprising problem of requiring an empty (sub to change the finality of a type. It also would probably be an easier transition from the syntax used today, if that matters.

So to be clear, that would look like:

(type $s1 (struct))                             ;; final with no parents
(type $s2 (sub open (struct)))                  ;; non-final with no parents
(type $s3 (sub $s2 (struct (field i32))))       ;; final with one parent
(type $s4 (sub open $s2 (struct (field i32))))  ;; non-final with one parent

In this world, an empty (sub would not be useful, which is fine, because I think it is inherently confusing. The presence of the word open makes it clear that something about the type is changing even if no base types are specified.

If others agreed with this, I would be happy to try and update the spec, interpreter, tests, etc. myself, with the disclaimer that it would be my first time doing such things 🙂

@tlively
Copy link
Member Author

tlively commented Jul 21, 2023

Oh interesting, I was thinking of having (sub open ...) and (sub final ...) be the two forms, but you have (sub open ...) and (sub ...). I think your way makes sense 👍

A PR updating everything for this change would be very welcome.

Also note that it should still be valid to have this:

(type $s5 (sub (struct))) ;; final with no parents, non-abbreviated form of $s1

@tlively
Copy link
Member Author

tlively commented Jul 26, 2023

@bvisness, are you still planning to take a stab at implementing this? If you need any help or cannot prioritize this, let me know and I can help out.

@bvisness
Copy link
Contributor

Yes, I was planning to do this roughly next week or so. If it's holding anything up, though, I'd be happy to jump on it sooner.

Another thing I just realized - while I would prefer to have (sub for final and (sub open for non-final, this does invert the current meaning of (sub. Is that a problem at this stage? If so, I think it would have to be (sub final and (sub open so that there is no ambiguity...but I hope that the text format for GC is still not really in wide enough use for that to be a concern.

@tlively
Copy link
Member Author

tlively commented Jul 27, 2023

No, it's not that urgent, next week would be fine. We just need to make sure we finish up well before September.

This change should be fine regarding back compat. Anyone depending on the text format would be using it as an input to Binaryen, and I can handle coordinating with users and updating Binaryen separately from this spec update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants