Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move RTTs to Post-MVP #306

Merged
merged 3 commits into from Jul 14, 2022
Merged

Move RTTs to Post-MVP #306

merged 3 commits into from Jul 14, 2022

Conversation

rossberg
Copy link
Member

@rossberg rossberg commented Jun 8, 2022

This PR removes explicit RTTs and moves them to Post-MVP. Updates design docs, interpreter, and tests.

The rtt $t type and the rtt.canon instruction are removed and all RTT-consuming instructions (allocations and casts) are replaced with "canonical" versions that embed the semantics of rtt.canon. RTT-dependent cast instructions now need a static type as an additional immediate.

Obviously, this is a breaking change across the board, so I did not attempt to maintain backwards compat in details such as opcode assignments this time.

@rossberg rossberg requested a review from tlively June 8, 2022 15:24
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

My one big comment is that I think it would be better for the MVP instructions not to contain _canon in their names. Understanding what _canon means would require knowing some of the context of the development of the proposal, but ideally the proposal would not require such context to fully understand. Also, having _canon in the name seems to imply that the instructions will always work for any implicit rtt.canon, but I think it would make sense to leave ourselves space to consider whether the sugar should be restricted to only a subset of possible rtt.canon cases. I would prefer to differentiate the post-MVP instructions by adding _rtt to their names instead.

br_on_non_null <var>
br_on_non_<castop> <var>
br_on_cast_fail <var>
br_on_cast_canon_fail <var> (type <var>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is (type <var>) here? Why isn't it added to other instructions like struct.new below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the type to cast to, as it can no longer be derived from the RTT operand. Most other instructions already had a (previously redundant) explicit type (cf the discussion regarding type annotations).

The syntax is chosen to mirror other cases like call_indirect, and to avoid confusion with the branch label.

I'd be fine with switching to (type _) syntax everywhere, it would be more consistent – but also more verbose. I justified the difference to myself by the prevalent choice that indices associated with the primary object kind of an instruction do not have a sort prefix (e.g., the global index in global.get, the table index in call_indirect, the label index in branches) whereas secondary ones do (e.g., the type index in call_indirect and br_on_cast, the memory index in data). And one can argue that the type in GC instructions represents the primary object kind, if that makes sense. :)

@@ -127,7 +127,7 @@ type vec_storeop = (vec_type, unit) memop
type vec_laneop = (vec_type, pack_size) memop * int

type initop = Explicit | Implicit
type castop = NullOp | I31Op | DataOp | ArrayOp | FuncOp | RttOp
type castop = NullOp | I31Op | DataOp | ArrayOp | FuncOp | RttOp of int32 Source.phrase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this new int32 for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, this is the now necessary type immediate.

* `rtt <typeidx>` is a new heap type that is a runtime representation of the static type `<typeidx>`
- `heaptype ::= ... | rtt <typeidx>`
- `rtt t ok` iff `t ok`
- the reference type `(rtt $t)` is a shorthand for `(ref (rtt $t))`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a text-only shorthand? It seems slightly inconsistent that this shorthand corresponds to a non-nullable reference while our other shorthands correspond to nullable references, but I don't think that's a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what we previously had, including in the binary format. I agree it's a minor inconsistency, but you rarely ever want nullable RTTs.

```
With extensions like [type parameters](#type-parameters), this instruction will become more nuanced and will involve additional RTT operands if `$t` has type paramters.

Correspondingly, all allocation and cast instructions will get counter parts taking an explicit RTT operand. The MVP's canonical versions can be reinterpreted as the combination of the explicit ones whose RTT operand is created with `rtt.canon`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the presence of type parameters, would the MVP instructions also gain additional parameters for types where rtt.canon would have additional parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope not, as that would create a lot of extra complexity in the instruction set and also be somewhat inconsistent (the primary RTT operand is implicit but the others are not?).

But it is this kind of potential extra complexity and non-trivial warts that makes me so uneasy. ;)

Copy link
Member Author

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the generic mnemonics ought to be reserved for the instructions that can actually handle the general case. I’m not wed to the particular choice of suffix (better suggestions welcome!), but given that we cannot change them later, I do think that the mnemonics need to reflect that these instructions are limited to a special case in the future.

br_on_non_null <var>
br_on_non_<castop> <var>
br_on_cast_fail <var>
br_on_cast_canon_fail <var> (type <var>)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the type to cast to, as it can no longer be derived from the RTT operand. Most other instructions already had a (previously redundant) explicit type (cf the discussion regarding type annotations).

The syntax is chosen to mirror other cases like call_indirect, and to avoid confusion with the branch label.

I'd be fine with switching to (type _) syntax everywhere, it would be more consistent – but also more verbose. I justified the difference to myself by the prevalent choice that indices associated with the primary object kind of an instruction do not have a sort prefix (e.g., the global index in global.get, the table index in call_indirect, the label index in branches) whereas secondary ones do (e.g., the type index in call_indirect and br_on_cast, the memory index in data). And one can argue that the type in GC instructions represents the primary object kind, if that makes sense. :)

@@ -127,7 +127,7 @@ type vec_storeop = (vec_type, unit) memop
type vec_laneop = (vec_type, pack_size) memop * int

type initop = Explicit | Implicit
type castop = NullOp | I31Op | DataOp | ArrayOp | FuncOp | RttOp
type castop = NullOp | I31Op | DataOp | ArrayOp | FuncOp | RttOp of int32 Source.phrase
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, this is the now necessary type immediate.

* `rtt <typeidx>` is a new heap type that is a runtime representation of the static type `<typeidx>`
- `heaptype ::= ... | rtt <typeidx>`
- `rtt t ok` iff `t ok`
- the reference type `(rtt $t)` is a shorthand for `(ref (rtt $t))`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what we previously had, including in the binary format. I agree it's a minor inconsistency, but you rarely ever want nullable RTTs.

```
With extensions like [type parameters](#type-parameters), this instruction will become more nuanced and will involve additional RTT operands if `$t` has type paramters.

Correspondingly, all allocation and cast instructions will get counter parts taking an explicit RTT operand. The MVP's canonical versions can be reinterpreted as the combination of the explicit ones whose RTT operand is created with `rtt.canon`:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope not, as that would create a lot of extra complexity in the instruction set and also be somewhat inconsistent (the primary RTT operand is implicit but the others are not?).

But it is this kind of potential extra complexity and non-trivial warts that makes me so uneasy. ;)

@jakobkummerow
Copy link
Contributor

Regarding opcodes, since we've had the RTT-free instructions implemented in both V8 and Binaryen for a long time now, we also have an existing set of opcode assignments (that don't overlap with the other opcodes), see here. It would be nice to maintain them for now? Not a big issue though, we'll need to make some backwards-incompatible changes soon anyway and could batch up the opcode changes with those.

The state in that doc is in part due to this proposal's history: you may recall that in days long gone we actually had both struct.new and struct.new_with_rtt, then at some point dropped the former, then (in V8/Binaryen at least) reintroduced it, now we're dropping the latter. This history also explains the textual names used there: for the allocating opcodes, .new is with implicit RTT and .new_with_rtt is with explicit RTT. We've never renamed them to preserve backwards compatibility of the text format. For the cast/test instructions, there is no such history, so we picked the _static (as in: "statically typed") suffix when we added the implicit-RTT versions. (I don't care too much about renaming some or all of the instructions, and it would certainly be nice to make them consistent; I'm just explaining how we arrived at what that doc is saying today.)


A possible (follow-up?) simplification could be to merge ref.cast_canon, ref.test_canon, br_on_cast_canon and br_on_cast_canon_fail into ref.as <ht>, ref.is <ht>, br_on <ht> and br_on_not <ht>, respectively, as issue #274 sketches out. I don't see a reason to keep them separate, and merging them might help assuage concerns people have about the overall size of the instruction set once we re-introduce explicit-RTT versions later. That said, I don't feel strongly about it, and if there are reasons to keep them separate, we can certainly do that too.
(The fact that the *_canon group currently takes a dataref rather than an anyref as input is not providing any benefits, as discussed at length on that issue, so that's not a reason to keep the distinction. Effectively, to cast to type $T, you need to use ref.as iff $T is generic and ref.cast_canon iff $T is non-generic, and I don't see this distinction providing any practical benefits. There is no $T for which ref.as $T and ref.cast_canon $T are both valid, so no expressiveness is lost if we merge them.)

@rossberg
Copy link
Member Author

Yes, refactoring the cast operators is a follow-up. But it's gonna be affected by the anyref/funcref/externref subtyping questions, so it makes more sense to wait with it until we have resolves that.

eqrion added a commit to eqrion/wasm-tools that referenced this pull request Jul 8, 2022
This commit:
  * Adds new instructions that have been added to the proposal
  * Removes RTT producing instructions
  * Modifies instructions that consumed RTT values to no longer
    take the RTT

The last point is challenging because there are no agreed upon names
or encodings for the modified instructions yet [1]. I tried to choose
something reasonable, and will come back to fix them once this is
figured out.

[1] WebAssembly/gc#306
alexcrichton pushed a commit to bytecodealliance/wasm-tools that referenced this pull request Jul 11, 2022
* wast: Add explicit recursion groups for type definitions

The GC proposal adds the ability to define multiple types inside
of a recursion group [1]. There is no text syntax for this yet, so
I tried to pick a reasonable starting point.

e.g.

(rec (type (struct)) (type (array)))

[1] https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md#type-definitions

* wast: Add declared subtyping to type definitions

The GC proposal type system has declared subtyping where
a definition indicates the parent type(s) it is a subtype
of [1]. For now a type is restricted to just one parent type.

There is no text format for this yet, so I tried to pick a
reasonable starting point.

e.g.

(sub $parent (type (struct)))

[1] https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md#type-definitions

* wast: Remove rtt value type

* wast: Update GC instructions

This commit:
  * Adds new instructions that have been added to the proposal
  * Removes RTT producing instructions
  * Modifies instructions that consumed RTT values to no longer
    take the RTT

The last point is challenging because there are no agreed upon names
or encodings for the modified instructions yet [1]. I tried to choose
something reasonable, and will come back to fix them once this is
figured out.

[1] WebAssembly/gc#306

* wast: Add arrayref heap type

This heap type was added upstream in the GC proposal.

* wast: Update GC tests

* wast: Update array instructions.

 * array.init is now array.new_fixed
 * array.init_from_data is now array.new_from_data
 * array.new_from_elem was added for creating an array
     from an elem segment
@rossberg
Copy link
Member Author

To avoid conflicts with followup PRs, taking the liberty to land this. We can bikeshed the mnemonics later.

@rossberg rossberg merged commit 79ae4ce into main Jul 14, 2022
@tlively tlively mentioned this pull request Jul 21, 2022
webkit-early-warning-system pushed a commit to takikawa/WebKit that referenced this pull request Jul 25, 2022
https://bugs.webkit.org/show_bug.cgi?id=243079

Reviewed by Justin Michaud.

This patch removes RTT support from the Wasm GC implementation. This
part of the spec was postponed to later post-MVP phases in recent
decisions.

  PR for RTT removal in spec: WebAssembly/gc#306
  Discussion issue: WebAssembly/gc#275

Instead of using RTTs, MVP instructions will use the type index for the
information needed to allocate values or do casts. For example,

  array.new_canon $t : [t' i32] -> [(ref $t)]

instead of

  array.new $t : [t' i32 (rtt $t)] -> [(ref $t)]

* JSTests/wasm/gc/rtt.js: Removed.
* JSTests/wasm/wasm.json:
* Source/JavaScriptCore/bytecode/BytecodeList.rb:
* Source/JavaScriptCore/llint/WebAssembly.asm:
* Source/JavaScriptCore/wasm/WasmAirIRGenerator.cpp:
(JSC::Wasm::AirIRGenerator::gRef):
(JSC::Wasm::AirIRGenerator::tmpForType):
(JSC::Wasm::AirIRGenerator::emitCCall):
(JSC::Wasm::AirIRGenerator::AirIRGenerator):
(JSC::Wasm::AirIRGenerator::gRtt): Deleted.
(JSC::Wasm::AirIRGenerator::addRttCanon): Deleted.
* Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:
(JSC::Wasm::B3IRGenerator::addRttCanon): Deleted.
* Source/JavaScriptCore/wasm/WasmCallingConvention.h:
(JSC::Wasm::WasmCallingConvention::marshallLocation const):
* Source/JavaScriptCore/wasm/WasmFormat.h:
(JSC::Wasm::isValueType):
(JSC::Wasm::isValidHeapTypeKind):
* Source/JavaScriptCore/wasm/WasmFunctionParser.h:
(JSC::Wasm::FunctionParser<Context>::parseExpression):
(JSC::Wasm::FunctionParser<Context>::parseUnreachableExpression):
* Source/JavaScriptCore/wasm/WasmLLIntGenerator.cpp:
(JSC::Wasm::LLIntGenerator::callInformationForCaller):
(JSC::Wasm::LLIntGenerator::callInformationForCallee):
(JSC::Wasm::LLIntGenerator::addArguments):
(JSC::Wasm::LLIntGenerator::addRttCanon): Deleted.
* Source/JavaScriptCore/wasm/WasmOperations.cpp:
* Source/JavaScriptCore/wasm/WasmOperations.h:
* Source/JavaScriptCore/wasm/WasmParser.h:
(JSC::Wasm::Parser<SuccessType>::parseValueType):
* Source/JavaScriptCore/wasm/WasmSlowPaths.cpp:
* Source/JavaScriptCore/wasm/WasmSlowPaths.h:
* Source/JavaScriptCore/wasm/js/WasmToJS.cpp:
(JSC::Wasm::wasmToJS):
* Source/JavaScriptCore/wasm/wasm.json:

Canonical link: https://commits.webkit.org/252788@main
Jarred-Sumner pushed a commit to oven-sh/WebKit that referenced this pull request Jul 27, 2022
https://bugs.webkit.org/show_bug.cgi?id=243079

Reviewed by Justin Michaud.

This patch removes RTT support from the Wasm GC implementation. This
part of the spec was postponed to later post-MVP phases in recent
decisions.

  PR for RTT removal in spec: WebAssembly/gc#306
  Discussion issue: WebAssembly/gc#275

Instead of using RTTs, MVP instructions will use the type index for the
information needed to allocate values or do casts. For example,

  array.new_canon $t : [t' i32] -> [(ref $t)]

instead of

  array.new $t : [t' i32 (rtt $t)] -> [(ref $t)]

* JSTests/wasm/gc/rtt.js: Removed.
* JSTests/wasm/wasm.json:
* Source/JavaScriptCore/bytecode/BytecodeList.rb:
* Source/JavaScriptCore/llint/WebAssembly.asm:
* Source/JavaScriptCore/wasm/WasmAirIRGenerator.cpp:
(JSC::Wasm::AirIRGenerator::gRef):
(JSC::Wasm::AirIRGenerator::tmpForType):
(JSC::Wasm::AirIRGenerator::emitCCall):
(JSC::Wasm::AirIRGenerator::AirIRGenerator):
(JSC::Wasm::AirIRGenerator::gRtt): Deleted.
(JSC::Wasm::AirIRGenerator::addRttCanon): Deleted.
* Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:
(JSC::Wasm::B3IRGenerator::addRttCanon): Deleted.
* Source/JavaScriptCore/wasm/WasmCallingConvention.h:
(JSC::Wasm::WasmCallingConvention::marshallLocation const):
* Source/JavaScriptCore/wasm/WasmFormat.h:
(JSC::Wasm::isValueType):
(JSC::Wasm::isValidHeapTypeKind):
* Source/JavaScriptCore/wasm/WasmFunctionParser.h:
(JSC::Wasm::FunctionParser<Context>::parseExpression):
(JSC::Wasm::FunctionParser<Context>::parseUnreachableExpression):
* Source/JavaScriptCore/wasm/WasmLLIntGenerator.cpp:
(JSC::Wasm::LLIntGenerator::callInformationForCaller):
(JSC::Wasm::LLIntGenerator::callInformationForCallee):
(JSC::Wasm::LLIntGenerator::addArguments):
(JSC::Wasm::LLIntGenerator::addRttCanon): Deleted.
* Source/JavaScriptCore/wasm/WasmOperations.cpp:
* Source/JavaScriptCore/wasm/WasmOperations.h:
* Source/JavaScriptCore/wasm/WasmParser.h:
(JSC::Wasm::Parser<SuccessType>::parseValueType):
* Source/JavaScriptCore/wasm/WasmSlowPaths.cpp:
* Source/JavaScriptCore/wasm/WasmSlowPaths.h:
* Source/JavaScriptCore/wasm/js/WasmToJS.cpp:
(JSC::Wasm::wasmToJS):
* Source/JavaScriptCore/wasm/wasm.json:

Canonical link: https://commits.webkit.org/252788@main
code-terror pushed a commit to code-terror/wasm-tools that referenced this pull request Aug 24, 2022
* wast: Add explicit recursion groups for type definitions

The GC proposal adds the ability to define multiple types inside
of a recursion group [1]. There is no text syntax for this yet, so
I tried to pick a reasonable starting point.

e.g.

(rec (type (struct)) (type (array)))

[1] https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md#type-definitions

* wast: Add declared subtyping to type definitions

The GC proposal type system has declared subtyping where
a definition indicates the parent type(s) it is a subtype
of [1]. For now a type is restricted to just one parent type.

There is no text format for this yet, so I tried to pick a
reasonable starting point.

e.g.

(sub $parent (type (struct)))

[1] https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md#type-definitions

* wast: Remove rtt value type

* wast: Update GC instructions

This commit:
  * Adds new instructions that have been added to the proposal
  * Removes RTT producing instructions
  * Modifies instructions that consumed RTT values to no longer
    take the RTT

The last point is challenging because there are no agreed upon names
or encodings for the modified instructions yet [1]. I tried to choose
something reasonable, and will come back to fix them once this is
figured out.

[1] WebAssembly/gc#306

* wast: Add arrayref heap type

This heap type was added upstream in the GC proposal.

* wast: Update GC tests

* wast: Update array instructions.

 * array.init is now array.new_fixed
 * array.init_from_data is now array.new_from_data
 * array.new_from_elem was added for creating an array
     from an elem segment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants