Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

funcref <: anyref, or not? #293

Closed
jakobkummerow opened this issue Apr 26, 2022 · 17 comments
Closed

funcref <: anyref, or not? #293

jakobkummerow opened this issue Apr 26, 2022 · 17 comments

Comments

@jakobkummerow
Copy link
Contributor

In the "reference types" proposal, after loooong discussions we decided to drop the previously-planned funcref <: anyref relation, mostly due to a lack of use cases and concerns about limiting implementation flexibility (and, as a consequence, reachable performance).

The GC proposal currently suggests to re-introduce this subtyping relation. I haven't seen any discussion of newly discovered use cases for it. Have I missed it? Has any other reasoning changed, why having this relation has in the meantime become desirable?

Meanwhile, I have recently come upon a concrete performance concern.
The background is that JavaScript and Wasm have different needs from their function/funcref objects; so to make pure-Wasm calls as fast as possible, we use different internal representations for the "JS view" and the "Wasm view" onto the same function reference. (They are both representation-compatible with anyref.) That means that on the boundary between both languages, a conversion/"unwrapping" step is required when a function reference is passed as a parameter or return value of a function call. Additionally, preparing a function to actually be callable from both worlds is a nontrivial amount of work (that's only performed when necessary, for obvious reasons).
The resulting situation is that when an exported Wasm function that takes an anyref parameter is called from JavaScript, and another function F is passed as this parameter, we have two options:

(1) We can perform a relatively complex check whether F is a function that could be called from Wasm (because it originated there, or was prepared for it via an import/export cycle or the "Type Reflection" proposal's new WebAssembly.Function constructor), and if so, "unwrap" it to its Wasm representation, and otherwise pass it along unchanged.
The drawback is that JS-to-Wasm calls get more expensive whenever they have anyref-typed parameters. When the value being passed is a function of any kind, the overhead increases further.

(2) We can unconditionally pass the pointer along, without attempting to unwrap it.
The drawback is that a function passed to Wasm this way always becomes an opaque reference on the Wasm side (which is not out of place for an anyref), even if it originally was a Wasm function. In particular, a ref.is_func check on it would return 0, and it could not be cast to either funcref or a more specific signature.

My inclination is that (2) is preferable, because anyref is in particular useful for round-tripping opaque host references, and performance matters. However, I concede that this behavior is somewhat displeasing, in particular when considering that the same "can't downcast it back" limitation would not apply to structs/arrays coming back from a similar roundtrip through JavaScript. (That's because we're investing a lot of effort to make sure structs/arrays can be passed around with maximum efficiency, i.e. just passing the pointers along -- they don't have to be callable so that doesn't cause performance overhead elsewhere.)

If we decided not to reintroduce the funcref <: anyref relation, then the question wouldn't pose itself, and (2) would be the obvious way to go.

@tlively
Copy link
Member

tlively commented Apr 26, 2022

I do think it would be very strange (as long as funcref <: anyref) for structs/arrays passed back to Wasm as anyref to be able to be downcast back to their original types but for functions passed back that way to not be able to be downcast.

@lars-t-hansen, would SpiderMonkey hit this same performance problem? If so, are the potential trade offs similar to those in V8?

@lars-t-hansen
Copy link
Contributor

@tlively, we're heading down path (1) and would be facing the exact same problem. Currently our "wasm view" is just a JSFunction*, and we have been able to live with this because we don't have call_ref yet, but it's not workable in the long run - we'll need to have a specialized representation inside wasm code. We were resolved to just eat the complexity cost of the unwrapping in the hope that "anyref" is not the preferred type for anything performance-sensitive.

@jakobkummerow
Copy link
Contributor Author

@lars-t-hansen , I agree we could hope that, but for the use case of round-tripping opaque host references through Wasm, there isn't really an alternative to anyref, is there? I don't know how performance-sensitive that use case is in general, though.

(FWIW, in V8 we also used to have a single JSFunction* representation, until the desire to squeeze more and more performance out of call_ref eventually made us introduce the distinct "wasm view".)

@rossberg
Copy link
Member

Just to check my understanding, isn't there an option (2b), where you pass a function pointer along unmodified as well, but implement respective unwrapping logic in down casts to funcref (ref.is/as_func and friends), which perform it by need? Symmetrically, a call in JS could handle raw Wasm functions and implicitly wrap them. That way, the penalty would only be on the rare operations. (In fact, both directions seem independent, so an engine could independently choose between eager or lazy un/wrapping anyref functions on the way in vs out.)

That said, I can see that none of these options is a pleasure to implement.

AFAICS, plain option (2) is only possible if func </: any. Otherwise you'd need at something like (2b), or the semantics would be very incoherent.

@titzer
Copy link
Contributor

titzer commented May 4, 2022

I think we should keep funcref <: anyref. A key use case is implementing type erasure for polymorphic languages with first-class functions. I know @rossberg decided to roll his own closures for Wob, but other languages will be fine using function references with or without func.bind. For those languages, they would erase to anyref in polymorphic code and would want polymorphic code to store and traffic in function references. Otherwise they'd have to box.

@lukewagner
Copy link
Member

I think we should have funcref </: anyref. In addition to the JS engine performance concerns listed above, there are also modest Wasmtime perf concerns, so this is a consistent theme. Moreover, I expect the problems @rossberg hit that prevented funcref from being the native representation type for closures will be the common case, so there would be zero benefit from funcref <: anyref in practice in exchange for this distributed performance (and complexity, in trying to optimize it away) cost.

If, in the future, we want to have a native wasm GC type for optimally implementing source-language closures, I think the path forward is to have some new heap type specifically designed for compilation from closures (by, e.g., allowing access to the closure fields before calling the closure) and references to this new closure-heap-type would be subtypes of anyref and thus naturally participate in the type erasure scheme.

@rossberg
Copy link
Member

rossberg commented May 5, 2022

I see some of the arguments for removing the subtyping. In particular, if we postpone func.bind, the use case is much weaker. As Luke suggests, we could later introduce a new type closureref that would be a subtype of anyref.

Of course, we can also go the opposite way, leave in the subtyping and later introduce a rawfuncref type that's not an anyref.

So it comes down to the relative benefit and disadvantages of introducing the type hierarchy separation now.

The pro clearly is that it gives more leeway to implementations in representing funcrefs. If at least one engine could demonstrate a benefit in practice, I'd be convinced.

The main downside is more irregularity and complexity in the type system (even before we consider introducing a separate closureref type). In particular, we'd have:

  • two disjoint subtype hierarchies for heap types,
  • which implies two distinct null values (another reason why introducing a variant of ref.null without immediate could be a bad move ;) ),
  • and moreover, two distinct bottom heap types which subtyping has to deal with,
  • duplicated typing rules for constructs that are currently typed in terms of anyref.

That is not necessarily hard but somewhat unpleasant.

@tlively
Copy link
Member

tlively commented May 12, 2022

I added this to the agenda for our meeting on Tuesday.

@manoskouk
Copy link
Contributor

Another approach to maintain the subtyping relation and also give JS API users the possibility to pass opaque references without cost, would be to introduce another type in the hierarchy below anyref and above all wasm types. Let's call it wasmref.
wasmref would, by definition, be the type of all values that can be cast to any of the wasm reference types. I.e., ref.is_wasm v == ref.is_i31 v || ref.is_data v || ref.is_func v. Then a value can be passed from JS as anyref without cost; of course the wasm module would have to pay this cost later if it decides to downcast the value.
Passing a value as a more concrete type would also incur a smaller cost, just as it does now.

@rossberg
Copy link
Member

@manoskouk, such a type (basically the complement of what we previously had as externref) might be useful. But it would not solve this specific problem, because like all subtyping, the relation wasmref <: anyref would have to be coercion-free, i.e., downcasting from anyref to wasmref cannot require a representation change. If it did, it would break any efficient implementation of subtyping.

Consider this example:

(type $t (array anyref))
(type $u (sub $t (array wasmref)))

If going from anyref to wasmref required a representation change, then casting from $t to $u would require a full copy of the array. Besides the unbounded cost and the fact that it breaks identity, this is even impossible when state is involved. Consider:

(type $st (struct (ref $t) (mut i32)))
(type $su (sub $st (struct (ref $u) (mut i32))))

Casting from $st to $su would likewise require a copy, since we have to copy the contained array field. But we cannot do that, since that would silently duplicate the stateful i32 field.

So, no, unfortunately, this does not work. If we allow non-normalised references anywhere in the subtype hierarchy, we have to allow them everywhere and handle them in the corresponding elimination forms.

If you wanted to distinguish normalised and non-normalised types then they cannot be in a subtype relation.

@manoskouk
Copy link
Contributor

@rossberg I see, thanks.
It seems to me that the question is more general than funref: Since the JS-Wasm boundary will most likely require representation change or extra checks for some values (including i31ref as was mentioned in some discussions), then anyref in particular becomes problematic for the reasons mentioned above, regardless of its relation with funcref specifically. Presumably we can tolerate the representation change for more concrete types.
In that case, a solution would be to disallow (or discourage) anyref at the boundary, restore externref outside the type hierarchy, and introduce explicit coersions between externref and wasm reference types. Then anyref is guaranteed to be a wasm value (like wasmref above), which will simplify implementation of downcasts within Wasm.

An additional question: you mentioned before that we would need separate null values if we have disjoint type hierarchies. Could you elaborate on that? Why cannot we have a single nullref type at the bottom of the hierarchy?

@rossberg
Copy link
Member

It seems to me that the question is more general than funref

Yes, absolutely!

Then anyref is guaranteed to be a wasm value (like wasmref above)

That would defeat the purpose of anyref, though. Its intended role is to be able to import something without constraining it to be either host or Wasm, so that e.g. host things can be freely virtualised in Wasm and vice versa.

The only thing we could do is removing the subtyping between anyref and anything below it, and always require explicit conversions (which then could perform a representation change). I'm not sure if that's desirable, though.

you mentioned before that we would need separate null values if we have disjoint type hierarchies. Could you elaborate on that?

The point of disjoint hierarchies is that they can use different internal representations (including different size). Consequently, their null values have to be assumed to have different representations as well. And because of that, the type system must not allow to mix them up. For the same reason, we'd need separate bottom types.

@manoskouk
Copy link
Contributor

The only thing we could do is removing the subtyping between anyref and anything below it, and always require explicit conversions

Yes, this is the same thing as I am suggesting: Add a type which can be a wasm reference (with the host's representation) or a host reference, and requires conversions to be cast to and from wasm types (you call it anyref, I am calling it externref in my post).
In my post there is also add a type for all wasm types (i.e. the union of funcref and eqref), which I called anyref.
This way we would not need an additional null type, since the host's null value can be included as a value in the host reference type (as it is now).

@rossberg
Copy link
Member

you call it anyref, I am calling it externref in my post

Ah, I see.

In my post there is also add a type for all wasm types (i.e. the union of funcref and eqref), which I called anyref. This way we would not need an additional null type, since the host's null value can be included as a value in the host reference type (as it is now).

If funcref and dataref are put in separate hierarchies for the sake of allowing different representations, then you cannot allow a union between them either – like subtyping, a union requires a compatible representation. Imagine datarefs being implemented with one word, funcrefs with two (closures as fat pointers). Clearly, their null values would be incompatible as well.

@manoskouk
Copy link
Contributor

If funcref and dataref are put in separate hierarchies for the sake of allowing different representations

Right, but it was my impression from the discussion that the representation problems arise mostly at the wasm/JS boundary.
Regardless if we decide to have a common supertype for funcref and dataref, we can still have a separate host reference type with explicit coercions.

@rossberg
Copy link
Member

Right, but it was my impression from the discussion that the representation problems arise mostly at the wasm/JS boundary.

Well, that's only one class of problem that could motivate splitting the type hierarchies. In the original discussions, the ability to represent function refs completely differently was another. If we go for the separation, it makes most sense to carry it all the way through, so that all such use cases are covered.

Regardless if we decide to have a common supertype for funcref and dataref, we can still have a separate host reference type with explicit coercions.

Right.

@tlively
Copy link
Member

tlively commented Jul 21, 2022

This conversation was resolved in #307

@tlively tlively closed this as completed Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants