-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i31 as a ref type dimension like null #130
Comments
I see that WASM already has 64-bit integers, which suggests, that, eventually, WASM might be usable as a 64-bit compilation target. We are very interested in a spec that allows 63-bit unboxed scalars in addition to 31-bit unboxed scalars. With Do you think it will be possible to import/export |
I'm generally supportive of this suggestion. In particular because it allows modules that don't need/use i31ref to avoid some i31 checks (they may be cheap, but not doing them at all is even cheaper). I'm not sure how many checks would be avoidable in practice. I assume that the type lattice will have to be:
So in particular, downcasts from
@sabine : I would have assumed that all |
Ah, sorry about derailing. If all |
I just realized that #76 creates an interesting wrinkle here. If we have to disallow So on the one hand, that would imply that (contrary to my previous post) downcasts from One solution would be to introduce more general union types. One solution would be to introduce another built-in type that sits between Am I overlooking anything? |
Nice catch, @jakobkummerow! To rephrase, no GC proposal with |
It seems like this simplest and least intrusive change would be to split |
You would also have to either disallow imported types from representing |
That boxing would have to happen only if |
@rossberg any thoughts? If
Basically, the low order two bits indicate the two orthogonal dimensions of the type: whether Also, currently |
What would be the non-canonical type of
Given the
disallowing mixing Makes me wonder, how does |
I'm not sure what is meant by the "non-canonical type" of I am also skeptical about the claim that I think it's closer to the truth that a Wasm engine generally knows something about the encoding of |
Hmm, interesting, so the closest type is Makes me wonder about solving the
I am skeptical about |
This is not a bad idea. It is one step further in the direction of a union type -- really, In principle, any such refinement is useful. However, union types have rather problematic properties in terms of type-checking complexity and worst cases, which can explode quickly, in particular when also combined with intersection type, forms of which have also been suggested. And it's always possible to fall back to OTOH, maybe i31 is enough of a common case to justify tracking it specially (like null). |
In the meantime, I've heard specific feedback that making i31 a ref type dimension would be more efficient/useful than the current proposal for surface languages that have primitive integers: they would typically represent such source-level integers as |
Yes, that's one obvious use case. Although you can easily find similar use cases for other kinds of small unions, so that alone isn't necessarily enough to justify special casing i31. One problem I realised with making i31 a dimension unlike other reftypes is that that creates a problem for type imports/exports (and possibly generics): naturally, you want to be able to instantiate a type import with i31, but that wouldn't be possible if it was segregated into a separate dimension. Proper union types would provide a coherent solution, but suffer from complexity (in particular, quadratic subtyping). Maybe we need to look for some middle ground. |
Is it still quadratic if we don't support width subtyping of unions? The use cases I've gathered don't need it. |
@taralx, do you mean that |
A|B wouldn't be a subtype of A|B|C, but A and B would be subtypes of both. |
That wouldn't compose, though. Imagine you have |
Let me think a bit more about it, but that might be ok. It probably requires coercive subtyping, though, which we've been avoiding so far. |
Subtyping for arbitrary union types is quadratic without recursive types. Subtyping for equi-recursive types is quadratic without arbitrary union types. The combination of these features is not quadratic and in fact may not even be polynomial. At least for the subtyping rules that would enable the standard automata-based subtyping algorithms, subtyping is PSPACE-complete. And that's without taking into account the contravariance that would be introduced by function types. |
I'm left feeling like the correct thing is just to bound the "size" of types to make the problem tractable rather than restricting functionality to linear or near-linear complexity. |
Unfortunately low-level structural heap types tend to be quite large. If you think of a language like Java, a class's structural type includes not only all of its fields but also all of its overridable methods (as they are part of the v-table) as well as its meta-data (e.g. the data-structures used for interface-method dispatch, casting, and reflection) and then recursively the same components of any classes referenced by those fields or methods. |
* Fix outdated stuff * Update Overview.md * Update Overview.md * Update Overview.md * Clarify zero table index * Address review comments
@askeksa-google, a type export applies to a type definition from the type section. That contains func, struct, or array definitions – and the aforementioned TODO would add i31 as a definition. This is orthogonal to dimensions in a reftype. So turning i31 into a dimension would move it to the wrong category, making it incompatible with exports. Minimal toy example of a client module:
Now, by basic principles of abstraction, we need both the following implementations for B to be possible (among others):
or
(For the sake of simple example, I'm assuming that the module's contract implies that the parameter has only, say, 16 significant bits.) |
@rossberg The above reason is why I have always thought that type imports and exports should be types, and not heap type definitions. Then the constraint on a type import is just a subtype bound, which can specify any type, including our simple union type mechanism. |
It would be nice if we could put off figuring out how to make this work with type imports and exports. Without considering that proposal, it seems that we would all agree that adding a bottom type and making i31 a type dimension would be a good solution for the GC proposal. |
@titzer, I know you do :), but that is a way more complex and much less obvious feature, and one that requires fundamental changes to Wasm's compilation model, as you know (also, to its type indexing model). To be honest, I still would not know how to design all that concretely. It also is no substitute for exporting heap types. When a module exports some data representation, then clients may need to be able to, for example, form both nullable and non-nullable references to it (e.g., to define locals/globals/tables). Hence, requiring that choice to be made on the exporting side typically is the wrong factorisation (even if there may be use cases for that a well). Also note that null is very different from i31 in that regard. @tlively, we can't put it off, since the suggestion would paint us into the wrong corner. What the above shows is that making i31 a property of the pointer instead of a representation type is a category mistake, even though the unboxing optimisation for i31 refs can easily mislead us into thinking otherwise. Or to put it differently: there is a fine line between designing semantics that enables certain optimisations and making optimisations the semantics itself. :) The motivating argument of wanting to express a certain kind of type union to save some checks applies equally to other type unions. That suggests that we should approach this from a principled angle. |
While thinking more about how to pull off union types, I reread the OP of this issue and realised that some of the assumptions are actually off. Sorry that I had not noticed this earlier!
Ah, the type
I don't think there are nearly as many redundant checks as you seem to assume. Let's walk through a few representative examples.
So, the only case where an i31 dimension has a real benefit is a binary union like Example C, where there is only a single other case besides the i31. For all other examples, the MVP type hierarchy can already express essentially the same information. (Moreover, it would be straightforward to extend the hierarchy such that this remains so even if we extended The biggest cost in the examples is actually the unnecessary
Obviously, this is not essential for the MVP. But if we wanted to optimise union representations better eventually, then that's the more useful route. And that requires no change to the MVP. Does that make sense? |
Ok, I did misunderstand this. But this means that if If As for the examples above, I think example C will be far more common than examples A or B, and I do think it's an important win. Otherwise, we are introducing a requirement that engines represent As a path forward, we could design the general mechanism of the |
I'm not sure I follow. Currently, there is no type with i31 representation that inhabits any
Hm, I'm still lost. The union of
I have to respectfully disagree. AFAIAA, example C has come up in none of the compilers that have been prototyped so far. They all needed either A or B. I would expect B to be the most common case by far, basically every managed language will use it in some form or the other. I can imagine very few scenarios where C would come up and not be a subset of some larger B. Do you have a specific scenario in mind? |
Yes, it comes up when, e.g. packing algebraic data types. For example, if some cases can be packed into 31 bits and others need to be boxed. I will use this in my Virgil compiler, first for native targets (including Wasm sans GC), and then when I retool it to target Wasm iso-recursive types. It also comes up when, ironically enough, implementing dynamic languages like JavaScript. There, you might want I know we need to be focused on use cases not unnecessarily drag the discussion towards generalities, but the compilers prototyped so far have taken only one approach to implementing polymorphism, which is erasure and dynamic typing. The inclusion of Forgive my slight misunderstanding above. I did actually implement all of this in Wizard, but that was a year ago. In the last part I wrote:
AFAICT that still holds. |
Yes, that's the one example I could think of. But it only applies to the edge case where you have (a) exactly one non-nullary constructor, and (b) more than one nullary constructor (otherwise you could presumably use null). Not exactly the most common case. Why is optimising that case more pressing than any other?
Hm, any JS implementation I have seen requires more than two cases – there are boxed numbers, strings, null, etc., none of which are JS objects. So JS rather is an example of a language that would not benefit.
Well, as I said, the B case comes up in practically all managed languages, and is far more frequent than the C case. C hardly ever comes up without B occurring as well. Even your ADT example is a case in point. I remain unconvinced that special semantics is justified for optimising such edge cases. If we did special-case that, we'd essentially be designing an inherent performance cliff into the type system. That seems undesirable.
I didn't respond to that earlier, because I don't understand what that would achieve. As mentioned above, the type Also, any path forward should be generalisable to proper unions somehow. It's not clear to me how this would. AFAICS, it would be a design dead end. And let's not forget, it likewise remains completely unclear how to reconcile it with type exports. |
I think this is very well put. A union between For instance, in Dart, the redundant cast (that would be avoided by the union) occurs in mainly the following situations:
Since an |
The canonical use case is the union between i31 and a set of structs. The single struct case is no more special than the 2 structs case is. ;) Does Dart not use a larger union? Of course, there will be places where you have narrowed it down to a binary union, but arent't there equally places where you have narrowed it down to a ternary union? Or a binary union not involving i31?
Now I'm confused. Wouldn't it be more significant if it was not cheap? |
I suppose
What I mean is that having an To be clear: having |
@askeksa-google, the primary motivation for i31 is saving space, allocations, and garbage, which is quite significant. Cheaper checks are more like a nice by-product. Example C is the only one for which an i31 dimension would allow a cheaper check. |
Yes, that's one side of the trade-off. The other side is that using |
Not sure I understand what you are referring to. Compared to what? If we did not have i31, then these cases would be even more expensive, right? |
I'll illustrate with some concrete examples of what various situations would look like in Boxing an
|
@rossberg The term "edge case" is inherently subjective so we might get stuck in another holding pattern debating it, and that gets long with technical details, so I'll have to be brief.
Of course I mean implementation-level object; boxed numbers,
Just because a constructor isn't nullary doesn't mean its arguments can't be packed into 31 bits, e.g. if they are small primitives. A good example is the ADT I use to represent operands in the backend of the Virgil compiler. I specifically chose a limit of, e.g. 1 million vregs so that the VREG number, plus a tag, plus a hint, fits into a small number of bits, so it can be packed. But other operand cases might have a reference to something, so they are big and boxed. ADTs are the example of the source-level construct that benefits most from having |
Thanks for the concrete examples, @askeksa-google! What would it take to do the benchmarking to figure out the relative costs you mention in your conclusion? |
@askeksa-google, thanks for the concrete examples. They match what I've seen as well. So the main effect of an i31 dimension would be one or two extra casts saved in places where you are only expecting an int64, as an instance of my Example C. As @tlively says, it would be great to know how these affect the overall performance in order to justify special features enabling these casts' elimination in the MVP. Because, of course, the MVP requires many undesirable casts. @titzer, okay, let's call it special case, that's perhaps more neutral. Objectively, it is just one of many similar cases. I think we can agree that it does not differ from the others qualitatively, the main question is how much it differs quantitatively. Do you agree?
Well, in the proposal, the analogue to a map is an RTT. The analogue to the type of heap objects with a map is (I don't think a VM targetting Wasm would want to introduce custom maps and a custom supertype on top of RTTs and dataref. The only reason to do so would be if you wanted to do some form of table dispatch on those maps. AFAIK, not even V8 does that. And the space overhead would be so substantial that I doubt it is a viable implementation strategy. If such dispatch was needed, then Wasm ought to introduce suitable primitives for it.)
True, but still this is only a (rather incidental) fragment of a general set of cases of algebraic data types. Why do you think it deserves special attention in the MVP, where we have already accepted redundant casts in many other places? Our criteria for putting some special handling into the MVP should be that (a) it can be generalised to the general case in the future, (b) it is forward-compatible with other anticipated features, and (c) the win is significant enough that missing out on it would put the MVP's success at risk. But AFAICS, the dimension idea is quite contrary to (a) and (b) as I argued above, and so far we lack evidence for (c). |
Echoing @rossberg, my feeling is also that special-casing unions specifically with |
As a first step towards some quantitative data on this, I did an experiment on One hurdle I ran into is that What is the reasoning behind the Anyway, it turned out that even though extra casts were inserted in many places, the overall code size increase from these was very small (around 0.1%), maybe because the change of the types also causes casts to be removed in other places. In terms of performance, I tried various benchmarks, but did not observe any significant change. Next step would be adding the actual |
Based on the discussion today, my position is now that I think we can accomplish most of the goals of this proposal by adding a |
Also, just to explicitly record here, I'm supportive of a type constructor for special-cased unions with |
After a lot of discussion of this topic, we've settled on keeping i31 as it is (although there are still questions about whether it is well-motivated enough to include in the MVP at all). In post-MVP, we expect to have some sort of general variant type that can contain i31 as well. Since we will not have i31 as a type dimension, I'll close this issue. |
In the course of discussion #100, @gasche proposed making
i31ref
not a type, but a dimension in theref
type constructor. I'd like to pull this idea out for a dedicated discussion because it is not entirely clear to me if we can add this later or must add this up front. It also seems like it can make a difference in code generation, so we should probably think it through.Concretely, instead of (or perhaps, in addition to) a single
i31ref
type which is in the middle of the hierarchy betweenanyref
and(ref $T)
, we have:For each type
$T
, then:(ref $T)
which is a subtype of(ref i31 $T)
It is thus another independent dimension much the same way as
null
is a dimension for this type constructor.(ref i31 $T)
subtype of(ref null i31 $T)
When thinking about the code that would be generated by an engine for various reference operations, some engines may (already) need to generate these null checks:
struct.get
,struct.set
and corresponding array operationsOf course, most engines will hopefully be able to do these implicit null checks via virtual memory tricks, but not all can.
We've been discussing the additional checks that would be needed for
i31
tag checks in the context of downcasts fromi31ref
to a more specific reference type. In particular, they would be implicitly needed becausei31ref
is proposed as a subtype ofanyref
above all declared heap types (AFAICT).I'd like to point out that tag checks can be avoided if we think of i31 values as just a big group of nulls: engines can elide the checks based on static information, which is exactly the same information provided in the
null
ref type dimension.It also gives more expressive power to programs, because it is a very limited union type. In particular, using
(ref i31 $T)
, a module could perform a(struct.get $T i)
that has an implicit tag check that would trap on an i31 value in much the same way as it would trap onnull
. Without this, a program would have to usei31ref
types everywhere and downcast everywhere explicitly; a potentially more expensive check than just a tag check.The text was updated successfully, but these errors were encountered: