funcref <: anyref, or not? #293

jakobkummerow · 2022-04-26T20:21:28Z

In the "reference types" proposal, after loooong discussions we decided to drop the previously-planned funcref <: anyref relation, mostly due to a lack of use cases and concerns about limiting implementation flexibility (and, as a consequence, reachable performance).

The GC proposal currently suggests to re-introduce this subtyping relation. I haven't seen any discussion of newly discovered use cases for it. Have I missed it? Has any other reasoning changed, why having this relation has in the meantime become desirable?

Meanwhile, I have recently come upon a concrete performance concern.
The background is that JavaScript and Wasm have different needs from their function/funcref objects; so to make pure-Wasm calls as fast as possible, we use different internal representations for the "JS view" and the "Wasm view" onto the same function reference. (They are both representation-compatible with anyref.) That means that on the boundary between both languages, a conversion/"unwrapping" step is required when a function reference is passed as a parameter or return value of a function call. Additionally, preparing a function to actually be callable from both worlds is a nontrivial amount of work (that's only performed when necessary, for obvious reasons).
The resulting situation is that when an exported Wasm function that takes an anyref parameter is called from JavaScript, and another function F is passed as this parameter, we have two options:

(1) We can perform a relatively complex check whether F is a function that could be called from Wasm (because it originated there, or was prepared for it via an import/export cycle or the "Type Reflection" proposal's new WebAssembly.Function constructor), and if so, "unwrap" it to its Wasm representation, and otherwise pass it along unchanged.
The drawback is that JS-to-Wasm calls get more expensive whenever they have anyref-typed parameters. When the value being passed is a function of any kind, the overhead increases further.

(2) We can unconditionally pass the pointer along, without attempting to unwrap it.
The drawback is that a function passed to Wasm this way always becomes an opaque reference on the Wasm side (which is not out of place for an anyref), even if it originally was a Wasm function. In particular, a ref.is_func check on it would return 0, and it could not be cast to either funcref or a more specific signature.

My inclination is that (2) is preferable, because anyref is in particular useful for round-tripping opaque host references, and performance matters. However, I concede that this behavior is somewhat displeasing, in particular when considering that the same "can't downcast it back" limitation would not apply to structs/arrays coming back from a similar roundtrip through JavaScript. (That's because we're investing a lot of effort to make sure structs/arrays can be passed around with maximum efficiency, i.e. just passing the pointers along -- they don't have to be callable so that doesn't cause performance overhead elsewhere.)

If we decided not to reintroduce the funcref <: anyref relation, then the question wouldn't pose itself, and (2) would be the obvious way to go.

The text was updated successfully, but these errors were encountered:

tlively · 2022-04-26T22:18:54Z

I do think it would be very strange (as long as funcref <: anyref) for structs/arrays passed back to Wasm as anyref to be able to be downcast back to their original types but for functions passed back that way to not be able to be downcast.

@lars-t-hansen, would SpiderMonkey hit this same performance problem? If so, are the potential trade offs similar to those in V8?

lars-t-hansen · 2022-04-27T07:06:17Z

@tlively, we're heading down path (1) and would be facing the exact same problem. Currently our "wasm view" is just a JSFunction*, and we have been able to live with this because we don't have call_ref yet, but it's not workable in the long run - we'll need to have a specialized representation inside wasm code. We were resolved to just eat the complexity cost of the unwrapping in the hope that "anyref" is not the preferred type for anything performance-sensitive.

jakobkummerow · 2022-04-27T10:56:41Z

@lars-t-hansen , I agree we could hope that, but for the use case of round-tripping opaque host references through Wasm, there isn't really an alternative to anyref, is there? I don't know how performance-sensitive that use case is in general, though.

(FWIW, in V8 we also used to have a single JSFunction* representation, until the desire to squeeze more and more performance out of call_ref eventually made us introduce the distinct "wasm view".)

rossberg · 2022-04-27T14:52:00Z

Just to check my understanding, isn't there an option (2b), where you pass a function pointer along unmodified as well, but implement respective unwrapping logic in down casts to funcref (ref.is/as_func and friends), which perform it by need? Symmetrically, a call in JS could handle raw Wasm functions and implicitly wrap them. That way, the penalty would only be on the rare operations. (In fact, both directions seem independent, so an engine could independently choose between eager or lazy un/wrapping anyref functions on the way in vs out.)

That said, I can see that none of these options is a pleasure to implement.

AFAICS, plain option (2) is only possible if func </: any. Otherwise you'd need at something like (2b), or the semantics would be very incoherent.

titzer · 2022-05-04T18:25:23Z

I think we should keep funcref <: anyref. A key use case is implementing type erasure for polymorphic languages with first-class functions. I know @rossberg decided to roll his own closures for Wob, but other languages will be fine using function references with or without func.bind. For those languages, they would erase to anyref in polymorphic code and would want polymorphic code to store and traffic in function references. Otherwise they'd have to box.

lukewagner · 2022-05-04T20:58:18Z

I think we should have funcref </: anyref. In addition to the JS engine performance concerns listed above, there are also modest Wasmtime perf concerns, so this is a consistent theme. Moreover, I expect the problems @rossberg hit that prevented funcref from being the native representation type for closures will be the common case, so there would be zero benefit from funcref <: anyref in practice in exchange for this distributed performance (and complexity, in trying to optimize it away) cost.

If, in the future, we want to have a native wasm GC type for optimally implementing source-language closures, I think the path forward is to have some new heap type specifically designed for compilation from closures (by, e.g., allowing access to the closure fields before calling the closure) and references to this new closure-heap-type would be subtypes of anyref and thus naturally participate in the type erasure scheme.

rossberg · 2022-05-05T09:09:40Z

I see some of the arguments for removing the subtyping. In particular, if we postpone func.bind, the use case is much weaker. As Luke suggests, we could later introduce a new type closureref that would be a subtype of anyref.

Of course, we can also go the opposite way, leave in the subtyping and later introduce a rawfuncref type that's not an anyref.

So it comes down to the relative benefit and disadvantages of introducing the type hierarchy separation now.

The pro clearly is that it gives more leeway to implementations in representing funcrefs. If at least one engine could demonstrate a benefit in practice, I'd be convinced.

The main downside is more irregularity and complexity in the type system (even before we consider introducing a separate closureref type). In particular, we'd have:

two disjoint subtype hierarchies for heap types,
which implies two distinct null values (another reason why introducing a variant of ref.null without immediate could be a bad move ;) ),
and moreover, two distinct bottom heap types which subtyping has to deal with,
duplicated typing rules for constructs that are currently typed in terms of anyref.

That is not necessarily hard but somewhat unpleasant.

tlively · 2022-05-12T20:08:30Z

I added this to the agenda for our meeting on Tuesday.

manoskouk · 2022-05-18T06:02:26Z

Another approach to maintain the subtyping relation and also give JS API users the possibility to pass opaque references without cost, would be to introduce another type in the hierarchy below anyref and above all wasm types. Let's call it wasmref.
wasmref would, by definition, be the type of all values that can be cast to any of the wasm reference types. I.e., ref.is_wasm v == ref.is_i31 v || ref.is_data v || ref.is_func v. Then a value can be passed from JS as anyref without cost; of course the wasm module would have to pay this cost later if it decides to downcast the value.
Passing a value as a more concrete type would also incur a smaller cost, just as it does now.

rossberg · 2022-05-18T07:22:16Z

@manoskouk, such a type (basically the complement of what we previously had as externref) might be useful. But it would not solve this specific problem, because like all subtyping, the relation wasmref <: anyref would have to be coercion-free, i.e., downcasting from anyref to wasmref cannot require a representation change. If it did, it would break any efficient implementation of subtyping.

Consider this example:

(type $t (array anyref))
(type $u (sub $t (array wasmref)))

If going from anyref to wasmref required a representation change, then casting from $t to $u would require a full copy of the array. Besides the unbounded cost and the fact that it breaks identity, this is even impossible when state is involved. Consider:

(type $st (struct (ref $t) (mut i32)))
(type $su (sub $st (struct (ref $u) (mut i32))))

Casting from $st to $su would likewise require a copy, since we have to copy the contained array field. But we cannot do that, since that would silently duplicate the stateful i32 field.

So, no, unfortunately, this does not work. If we allow non-normalised references anywhere in the subtype hierarchy, we have to allow them everywhere and handle them in the corresponding elimination forms.

If you wanted to distinguish normalised and non-normalised types then they cannot be in a subtype relation.

manoskouk · 2022-05-18T08:20:44Z

@rossberg I see, thanks.
It seems to me that the question is more general than funref: Since the JS-Wasm boundary will most likely require representation change or extra checks for some values (including i31ref as was mentioned in some discussions), then anyref in particular becomes problematic for the reasons mentioned above, regardless of its relation with funcref specifically. Presumably we can tolerate the representation change for more concrete types.
In that case, a solution would be to disallow (or discourage) anyref at the boundary, restore externref outside the type hierarchy, and introduce explicit coersions between externref and wasm reference types. Then anyref is guaranteed to be a wasm value (like wasmref above), which will simplify implementation of downcasts within Wasm.

An additional question: you mentioned before that we would need separate null values if we have disjoint type hierarchies. Could you elaborate on that? Why cannot we have a single nullref type at the bottom of the hierarchy?

rossberg · 2022-05-18T09:20:27Z

It seems to me that the question is more general than funref

Yes, absolutely!

Then anyref is guaranteed to be a wasm value (like wasmref above)

That would defeat the purpose of anyref, though. Its intended role is to be able to import something without constraining it to be either host or Wasm, so that e.g. host things can be freely virtualised in Wasm and vice versa.

The only thing we could do is removing the subtyping between anyref and anything below it, and always require explicit conversions (which then could perform a representation change). I'm not sure if that's desirable, though.

you mentioned before that we would need separate null values if we have disjoint type hierarchies. Could you elaborate on that?

The point of disjoint hierarchies is that they can use different internal representations (including different size). Consequently, their null values have to be assumed to have different representations as well. And because of that, the type system must not allow to mix them up. For the same reason, we'd need separate bottom types.

manoskouk · 2022-05-18T10:12:59Z

The only thing we could do is removing the subtyping between anyref and anything below it, and always require explicit conversions

Yes, this is the same thing as I am suggesting: Add a type which can be a wasm reference (with the host's representation) or a host reference, and requires conversions to be cast to and from wasm types (you call it anyref, I am calling it externref in my post).
In my post there is also add a type for all wasm types (i.e. the union of funcref and eqref), which I called anyref.
This way we would not need an additional null type, since the host's null value can be included as a value in the host reference type (as it is now).

rossberg · 2022-05-18T10:33:52Z

you call it anyref, I am calling it externref in my post

Ah, I see.

In my post there is also add a type for all wasm types (i.e. the union of funcref and eqref), which I called anyref. This way we would not need an additional null type, since the host's null value can be included as a value in the host reference type (as it is now).

If funcref and dataref are put in separate hierarchies for the sake of allowing different representations, then you cannot allow a union between them either – like subtyping, a union requires a compatible representation. Imagine datarefs being implemented with one word, funcrefs with two (closures as fat pointers). Clearly, their null values would be incompatible as well.

manoskouk · 2022-05-18T10:54:18Z

If funcref and dataref are put in separate hierarchies for the sake of allowing different representations

Right, but it was my impression from the discussion that the representation problems arise mostly at the wasm/JS boundary.
Regardless if we decide to have a common supertype for funcref and dataref, we can still have a separate host reference type with explicit coercions.

rossberg · 2022-05-18T11:13:50Z

Right, but it was my impression from the discussion that the representation problems arise mostly at the wasm/JS boundary.

Well, that's only one class of problem that could motivate splitting the type hierarchies. In the original discussions, the ability to represent function refs completely differently was another. If we go for the separation, it makes most sense to carry it all the way through, so that all such use cases are covered.

Regardless if we decide to have a common supertype for funcref and dataref, we can still have a separate host reference type with explicit coercions.

Right.

tlively · 2022-07-21T17:31:10Z

This conversation was resolved in #307

tlively mentioned this issue May 12, 2022

Ensure symmetric results in PossibleConstantValues WebAssembly/binaryen#4662

Merged

dcodeIO mentioned this issue May 28, 2022

Update binaryen + improve codegen of instantiation AssemblyScript/assemblyscript#2302

Merged

2 tasks

manoskouk mentioned this issue Jun 10, 2022

Suggestion: Combine opaque external references with explicit wasm-extern casts #307

Closed

tlively closed this as completed Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

funcref <: anyref, or not? #293

funcref <: anyref, or not? #293

jakobkummerow commented Apr 26, 2022

tlively commented Apr 26, 2022

lars-t-hansen commented Apr 27, 2022

jakobkummerow commented Apr 27, 2022

rossberg commented Apr 27, 2022

titzer commented May 4, 2022 •

edited

Loading

lukewagner commented May 4, 2022

rossberg commented May 5, 2022 •

edited

Loading

tlively commented May 12, 2022

manoskouk commented May 18, 2022

rossberg commented May 18, 2022

manoskouk commented May 18, 2022

rossberg commented May 18, 2022

manoskouk commented May 18, 2022

rossberg commented May 18, 2022

manoskouk commented May 18, 2022

rossberg commented May 18, 2022

tlively commented Jul 21, 2022

funcref <: anyref, or not? #293

funcref <: anyref, or not? #293

Comments

jakobkummerow commented Apr 26, 2022

tlively commented Apr 26, 2022

lars-t-hansen commented Apr 27, 2022

jakobkummerow commented Apr 27, 2022

rossberg commented Apr 27, 2022

titzer commented May 4, 2022 • edited Loading

lukewagner commented May 4, 2022

rossberg commented May 5, 2022 • edited Loading

tlively commented May 12, 2022

manoskouk commented May 18, 2022

rossberg commented May 18, 2022

manoskouk commented May 18, 2022

rossberg commented May 18, 2022

manoskouk commented May 18, 2022

rossberg commented May 18, 2022

manoskouk commented May 18, 2022

rossberg commented May 18, 2022

tlively commented Jul 21, 2022

titzer commented May 4, 2022 •

edited

Loading

rossberg commented May 5, 2022 •

edited

Loading