Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First go at MVP proposal #34

Merged
merged 14 commits into from Aug 23, 2018

Conversation

Projects
None yet
9 participants
@rossberg
Copy link
Member

rossberg commented May 14, 2018

A more concrete suggestion for an MVP feature set. Still a number of todos, and instruction names diverged from Overview, should fix.

@lukewagner, WDYT?

@rossberg rossberg requested a review from lukewagner May 14, 2018

@stedolan

This comment has been minimized.

Copy link

stedolan commented May 17, 2018

(Not @lukewagner, but hope you don't mind an unsolicited review!)

The MVP looks nice, and seems very reasonable both to target and to implement. It seems complete enough to express most things with reasonable efficiency, although it's missing a nice means of packaging up data with code pointers that know the data's type (for both ML-style closures and Java-style single dispatch OO). That might be a post-MVP feature, though!

The only bit that looks worrying in the current proposal is ref.cast. I think it is always a mistake to reflect the subtyping relation into something that can be dynamically tested at runtime, for several reasons:

  • you must either have full RTTI on all values, or complicate the type system with castable/uncastable types.

  • the algorithm for checking subtypes can be slow. With hashconsing, you can optimise the case where the type being cast to is exactly the runtime type, but if you do that then innocuous changes like adding a struct field can silently change many casts from O(1) to O(number and size of struct fields). It seems like a lot of work to hide behind a single instruction!

  • if wasm ever grows generics/parametric polymorphism, then keeping full RTTI rapidly becomes unmanagable, and the differences between the subtyping relation and the runtime test soon cause really weird behaviour (e.g. Java, more Java) or subtle soundness bugs (e.g. Scala, more Scala, Hack).

I would much prefer the runtime casts to be based on explicit tags. Here's an attempt at a proposal:

Add to modules a section containing a list of "tag definitions", which are given a type:

deftag ::= $A : tag <t>
    iff t <: anyref

Change struct.new to have a tag parameter as well as a type:

struct.new $t $A : [t*] -> [(ref $t)]
    iff $t = struct (mut t)*
        $A : tag <t'>
        $t <: t'

The tag's type must be a supertype of the structure type, but need not be exactly the same. Programs that don't care about downcasts can just pass some tag of type anyref (in fact, predefining one such tag might be useful).

Then, ref.cast also uses a tag parameter:

ref.cast $A : [anyref] -> [t]
    iff $A : tag<t>

and traps if the operand does not have tag $A at runtime.

The implementation can represent tags as distinct integers, and then the implementation of ref.cast is always just an integer comparison. If there's a non-trapping version of ref.cast, then this also becomes useful for compiling pattern-matching on sums/variants/algebraic data types.

Also, private types can be implemented by not exporting a tag definition. (Roughly, this is like having structural subtyping with nominal downcasts).

Finally, a couple of minor notes on the subtyping relation:

  • "Greatest fixpoint of the reflexive transitive closure" doesn't work. If you have a coinductive definition with a transitivity rule, then everything is a subtype of everything else, as witnessed by an infinite proof that applies transitivity forever. You need to either define the relation by simulteneous induction/coinduction (roughly, there may only be finitely many appeals to transitivity before you get to an actual type constructor rule), or, more simply, leave out reflexivity and transitivity and prove them as theorems about the coinductive definition afterwards.

  • I like the definition of eqref, but please consider restricting it to only include those struct types that have at least one mutable field. Immutable objects allow lots of nice GC techniques that do non-atomic copies of objects, knowing that it doesn't matter whether the program sees a mixture of old and new references. These techniques are broken by pointer comparison on immutable structures.

  • There's a difference between "immutable" fields (like in ML, known to never change, implementations may cache their values in registers across arbitrary code) and "readonly" fields (like C++ const pointers, may not be mutated using this reference but might change by aliases). It looks like this proposal goes for "immutable" fields rather than "readonly" (since there's no subtyping relation between const and var fields), but it's a subtle enough issue that it's worth pointing out explicitly.

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 17, 2018

@stedolan:

I would much prefer the runtime casts to be based on explicit tags.

I think you are right, and in fact I have been thinking along a similar direction -- thus the cryptic question about distinguishing "castable types" and RTTI at the very end of the doc.

However, I think that static tags like you suggest would be too inflexible, at least if there also was interesting polymorphism. What I thought was dynamic "type representation" values like in some literature on type erasure -- essentially your tags but first-class. Also, I was thinking that it might make sense to make the runtime type information on objects themselves completely optional. So my idea was:

  • Introduce a family of opaque types of the form rep $t that are inhabited by the runtime representation of the respective deftype $t (i.e., they are singleton types).

  • An instruction rep $t : [] -> [(rep $t)] to create such a value.

  • Distinguish castable from non-castable references, e.g., via some attribute on reference types. Say ref rep? $t. Only references with the rep attribute are known to carry runtime type information. (But I suppose ref rep $t <: ref $t should hold for all $t.)

  • The cast instruction would require both a castable reference and a rep for the target type. So

    cast $t : [(rep $t) (ref rep $t')] -> [(ref rep $t)]
      iff $t <: $t'
    
  • Instead of allowing subtyping on the rep like you propose, one could have two versions of each new instruction, one that takes an additional rep operand and one that doesn't. For example:

    struct.new $t : [t*] -> [(ref $t)]
      iff $t = struct (mut t)*
    
    struct.new_rep $t : [t* (rep $t)] -> [(ref rep $t)]
      iff $t = struct (mut t)*
    

    But maybe your idea of allowing subtyping is a simpler alternative, I had not thought of that. In that case, however, we would need to introduce either a new form of deftype for any and friends, or respective instructions for creating such reps.

  • "Greatest fixpoint of the reflexive transitive closure" doesn't work.

Right. To my excuse, I added this sentence in a hurry. :)

  • I like the definition of eqref, but please consider restricting it to only include those struct types that have at least one mutable field.

Ah, good point, the doc does not currently say anything about that. The idea was that the users can somehow explicitly choose to forbid eq for types they define, orthogonal to mutability. But I haven't thought this through yet, i.e., I don't have an idea yet what the right way to declare this would be.

It looks like this proposal goes for "immutable" fields rather than "readonly".

Yes, I should make that more explicit.

Thanks for the comments!

@stedolan

This comment has been minimized.

Copy link

stedolan commented May 17, 2018

However, I think that static tags like you suggest would be too inflexible, at least if there also was interesting polymorphism. What I thought was dynamic "type representation" values like in some literature on type erasure -- essentially your tags but first-class.

I agree completely. I think static tags are enough to replace the downcasts supported by ref.cast in the proposal, but for more interesting uses I'd love to see tag<t> move from being a static definition to an ordinary value on the stack, although I wouldn't be surprised if that occurred post-MVP. I think the most important thing to ensure from the beginning is that cast is defined in terms of tags / reps rather than the subtyping relation.

Also, I was thinking that it might make sense to make the runtime type information on objects themselves completely optional.

Right, OK. Is this for efficiency? I was assuming that implementations would have a type tag on every object anyway (in order to scan it during GC), but maybe there's another approach where values without RTTI can be more efficiently implemented?

The idea was that the users can somehow explicitly choose to forbid eq for types they define, orthogonal to mutability.

I don't think that mutability and eq are actually orthogonal. Given two mutable values, I can decide their equality by mutating one and seeing if the other changes. Given two immutable values, if I can decide their equality then eta-contraction and GCs that copy nonatomically become invalid.

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 17, 2018

I wouldn't be surprised if that occurred post-MVP.

Hm, I think it will be difficult (or at least ugly) to change later, so it'd better to start right.

OK. Is this for efficiency? I was assuming that implementations would have a type tag on every object anyway (in order to scan it during GC), but maybe there's another approach where values without RTTI can be more efficiently implemented?

The information needed for GC might be more low-level, so simpler. I could imagine some implementation schemes where you need almost none of it, though that may not be the most efficient. But it's a fair question, I'm not sure whether it is worth supporting.

I don't think that mutability and eq are actually orthogonal.

I fully agree that they aren't orthogonal semantically, but in terms of features it makes sense to allow combining them in all ways. For example, you sometimes want fast equality checks for immutable data structures. Likewise, you might want to hide equality for mutable data, e.g., when you already hide the type's representation (which may be a future extension).

@brion

This comment has been minimized.

Copy link

brion commented May 17, 2018

Great start! Couple questions:

Is there any way to apply resource limits to GC'd objects created by a WebAssembly module in this design? I'm researching using WebAssembly as a plugin sandbox for web apps, and the various ways user-supplied code could be adversarial are of interest. :)

Linear memory can be fixed to a maximum size at compile time, but I don't see any way here to control the amount of memory used by reference types: as with JavaScript, you could allocate a large number of arrays and use multiple gigabytes until the system goes into swap hell or crashes.

Additionally, I can see some use for finalizers which I don't see here; being able to reference heap data in linear memory from a GC'd struct, and then being able to free that data when the GC handle dies, would be very useful. Certainly for things like emscripten's 'embind' C++ bindings, I would love to not have to manually call a delete method on the exposed JS bindings. Is that something that might come later?

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 17, 2018

@brion, good questions. Unfortunately, I don't have good answers.

We haven't really talked about ways of limiting resource usage of a Wasm engine. I can't claim to have a good answer, but I assume that would most likely happen at the embedding level?

As for finalisers: yes, fairly far down on the future feature list. It's probably the toughest problem of everything related to GC. There simply are too many competing semantics for such mechanisms out there, and we don't know yet how to reduce that zoo to some reasonably generic (if slow) set of primitives that can be implemented in all engines and is useful for all languages.

@brion

This comment has been minimized.

Copy link

brion commented May 17, 2018

@rossberg yeah the tricky part of the resource limit use case is that I'm targeting plugins running in a web app, so the app doesn't control the browser's Wasm engine... the GC would likely treat the plugin module's objects as belonging to the web app if it applies any limits at all.

It may be that I just have to work within linear memory and roll my own internal GC for this use case, which should be fine for the narrow API I'm envisioning between the module and the host web app. Or assume a little more trust with a curated plugin list and some way to manually remove a troublesome plugin.

I'll keep an eye out for future discussions on finalizers. Thanks!

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 17, 2018

@brion, what I meant is that this could be some functionality in the embedding API, such as the JS one. As long as you can call JavaScript you could access it. But we haven't discussed anything like that yet, and I don't have a clear idea what the right design would be (and what could be implemented in current engines).

It is worth noting that on browsers memory usage is capped anyway. This is a limitation that most browsers already put in place for JavaScript, and it will simply be inherited by Wasm. It is not a threshold you can control, but it certainly will stop any app long before it can decline into swapping hell.

@brion

This comment has been minimized.

Copy link

brion commented May 17, 2018

@rossberg hmm, there may be some differences between engines. Allocating typed arrays and writing some memory in them to make sure they're really allocated, Firefox lets me allocate at least 64 gigabytes of data on a 16GB MacBook Pro, using tons of swap and compressed memory:

screen shot 2018-05-17 at 11 14 41 am

On my Linux PC with a spinning disk, the same runs for a few gigabytes and then the system becomes unresponsive, requiring a reboot.

Safari cuts it off around 16 GB of allocations, with a warning that the page is using a lot of memory, so that's not too bad.

Haven't tested Chrome or Edge.

I'll add an issue to propose an addition for per-module memory limits and continue over there. :)

@brion

This comment has been minimized.

Copy link

brion commented May 18, 2018

I'm a bit unclear on the semantics of the intref type:

  • what is the limit on representable values? ("traps when value is not representable?")
  • are intrefs with the same value guaranteed to compare equal, or do they have to be unboxed before comparing?
@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 18, 2018

@brion, interesting, I wasn't aware that FF would allow that. Pretty sure Chrome caps at a gig or so, though there might be special rules for array buffers.

The exact semantics of intref is still TBD, thus the question mark. The choices boil down to:

  1. intrefs are 31 (or less?) bits, guaranteed to be unboxed and fast
  2. intrefs are 32 bits, may be boxed, may require hidden allocation
  3. provide both, e.g. int31ref and int32ref (plus perhaps int64ref and even int63ref?)

1 is generally more efficient (no implicit branches). And a producer can build 2 out of 1, but not the other way round. However, 2 enables engines to unbox even high 32 bit values on 64 bit systems, which a producer cannot do themselves given only 1.

@lukewagner and I have been discussing it recently, and I think we ended up tending towards starting with int31ref for the MVP. But this will take implementation and producer feedback to finalise.

An int31ref could potentially be made an eqref, the other sizes couldn't. However, it may be more portable (and more uniform) to not allow eq for any of them. It should be easy for engines to optimise the untag-compare combination anyway.

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 18, 2018

Pushed some refinements on intrefs, settling on int31 for now.

@brion

This comment has been minimized.

Copy link

brion commented May 18, 2018

@rossberg thanks, that's clearer!

Should be possible to build on int31ref to avoid boxing the 31-bit subset of integers in a universal type representation per the note in Overview.md; JavaScript style types might look like:

  • undefined, null, true, false -> refs to global sigil instances
  • object ref types -> references to an object struct
  • ints in the 31-bit range -> int31ref
  • doubles and ints outside 31-bit range -> ref to box struct

Makes reasonable sense to me. :)

I'd prefer full int32ref, but understand that could be more painful on 32-bit arch. (I'm assuming int31ref on 32-bit arch is envisioned as a 32-bit word with a tag in the lowest bit and a bit-shift to get the stored value?)

@brion

This comment has been minimized.

Copy link

brion commented May 19, 2018

Looking in more detail at how to implement, say, equality checking with a JS-style universal type representation, I'm a bit lost. The ref.cast operator is spec'd as trapping if a cast isn't possible, so I can't speculatively test a cast. But there's no is_subclass or is_castable operator, so I'm unsure how to tell if a given anyref is an int31ref (in which case I have to get the integer and compare) or an eqref (in which case I can check object identity... but then I have to check subtypes to see if they're strings or boxed floats, which can compare equal when they're different instances).

If there was a ref.is_castable operator, might look something like this?

(func $equal_check (param $a anyref) (param $b anyref)
  (if
    (i32.and
      (ref.is_castable eqref (get_local $a))
      (ref.is_castable eqref (get_local $b))
    )
    (block
      ;; Check object identity first.
      (if
        (ref.equal
          (ref.cast eqref (get_local $a))
          (ref.cast eqref (get_local $b))
        )
        (block
          (return (i32.const 1))
        )
      )
    )
    ;; If not the same object, may still be a boxed float or a string
    ;; so cast it down and make a method call for the comparison.
    ;; ... todo  ...
  )
  ;; Not an eqref? Probably an int31ref.
  (if
    (i32.and
      (ref.is_castable int31ref (get_local $a))
      (ref.is_castable int31ref (get_local $b))
    )
    (block
      (return
        (i32.eq
          (int31ref.get_s (ref.cast int31ref $a))
          (int31ref.get_s (ref.cast int31ref $b))
        )
      )
    )
  )
  ;; Not equal.
  (return (int32.const 0))
)
@lukewagner
Copy link
Member

lukewagner left a comment

Great start! Sorry for taking so long to review, but I had been thinking and reading about the dynamic cast/nominal/structural interaction which, quite coincidentally, you and @stedolan have already been discussing. This ultimately led me to the same conclusion of "structural static"/"nominal dynamic" so I'm a big fan of explicit tags and want to pursue that direction more.


* `deftype` is the new category of types that can occur as type definitions
- `deftype ::= <functype> | <structtype> | <arraytype>`
- `module ::= {..., types vec(<deftype>)}`

This comment has been minimized.

@lukewagner

lukewagner May 21, 2018

Member

There's already a types section containing functypes which I think we're extending with other constructor "forms", right? (The wording sounds like this is a new section, so just checking.)

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

Yes, deftype is the syntax of type definitions, which previously was just function types. (I'm avoiding saying "type section" here since in my mind, sections are rather an encoding detail of the binary format.)

This comment has been minimized.

@lukewagner

lukewagner May 22, 2018

Member

Ok, makes sense. Perhaps tweak wording to say "generalizes the existing types component of modules"?

This comment has been minimized.

@rossberg

rossberg May 23, 2018

Author Member

Done.


* `type <typetype>` is an import description with an upper bound
- `importdesc ::= ... | type <reftype>`
- Note: `type` may get additional parameters in the future

This comment has been minimized.

@lukewagner

lukewagner May 21, 2018

Member

Could you explain the reasoning behind having the type have a reftype as opposed to a deftype? E.g., into which index space does a type import belong? It seems like it should be in the type definition index space (so it can be used in function types, exports, etc). I was imagining that maybe, as a special case (to play well with section ordering), type imports would go into the type section (using a special import form followed by the type constructor's form). Then a type import would be prepended to both the type definition index space and the import index space. Kinda weird, but seems forced by the type/import circularity.

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

The bound is a reftype is that you need to be able to use anyref (or eqref) as a bound.

Yes, type imports are in the type index space. Lumping them in with the type section rather than with imports would be inconsistent with how we handle other imports. It would create a mess with both the type index space (which should have imports first) and the "argument list" of a module instantiation (which currently simply reflects the import list). If we want to separate type imports from other types than I think we instead should have a separate type import section. But I don't think it's strictly necessary.

This comment has been minimized.

@lukewagner

lukewagner May 22, 2018

Member

Hm, still not sure reftype makes sense as the bound, but I guess I'll wait to see what you propose for nominal types + type imports. In particular, it seems like the import needs to introduce a new type definition; it can't just reference an existing one.

Ah ok, so then type imports are appended to the type section so that (1) signatures can refer to type imports by just using the appropriate index (2) the type section can only be validated together with the import section. Yes? It'd be good to mention these things.

This comment has been minimized.

@rossberg

rossberg May 23, 2018

Author Member

Well, yes, except that type imports are still meant to be prepended to the type index space. (The ordering of binary sections doesn't have to imply the ordering of index spaces. As you said, they need to be validated together anyway.)

Added comments.

This comment has been minimized.

@rossberg

rossberg May 23, 2018

Author Member

Regarding the bound: In most cases, it will just be anyref, i.e., the type is fully abstract. How would you express that if it wasn't a reftype?

FWIW, I'm not sure I understood what you meant by:

In particular, it seems like the import needs to introduce a new type definition; it can't just reference an existing one.

Why not?

This comment has been minimized.

@lukewagner

lukewagner May 25, 2018

Member

Ok, I think I understand the proposal a bit better now. I also see the benefit in that, by using a reference-type bound (instead of using a struct/array bound more-directly), one has the more-expressive capability of declaring whether the import is equality-comparable, nullable, etc.


* `eqref` is a subtype of `anyref`
- `eqref <: anyref`
- Note: `int31ref` and `anyfunc` are *not* a subtypes of `eqref`, i.e., those types do not expose reference equality

This comment has been minimized.

@lukewagner

lukewagner May 21, 2018

Member

The anyfunc in tables is already nullable...

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

Hm, I don't disagree, but how does that relate to this item?

This comment has been minimized.

@lukewagner

lukewagner May 22, 2018

Member

Oops, I put this comment on the wrong line: I meant to comment on the below nullref is not a subtype of anyfunc line.

This comment has been minimized.

@rossberg

rossberg May 23, 2018

Author Member

Ah, I see. Yes, typo.


* `ref.func` creates a function reference from a function index
- `ref.func $f : [] -> (ref $t)`
- iff `$f : $t`

This comment has been minimized.

@lukewagner

lukewagner May 21, 2018

Member

I had been on the edge of whether function references were needed for the GC MVP, but the previous discussion about passing closures to and from JS without wrapping each time seemed to boost the priority. Given that, could we also have the bind operator which can be used to manufacture closures?

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

Oh, but closures are different from mere function references, since they have a different runtime representation. Hence they imply a new type, so I considered them post-MVP.

This comment has been minimized.

@lukewagner

lukewagner May 22, 2018

Member

Ah, I had assumed that they would have the same type and that they would also have the same representation. That is, even without a bound dynamic environment, ref.func is still a closure because it entrains the WebAssembly.Instance. We represent these as JSFunction objects internally in SM. I had entertained the ideal of representing function references as raw function pointers, but I think this gets pretty hairy when GC and multi-instance linking enters the picture.

This comment has been minimized.

@rossberg

rossberg May 23, 2018

Author Member

I added it to the question section.

- `structtype ::= struct <fieldtype>*`

* `arraytype` describes an array with dynamically indexed fields
- `arraytype ::= array <fieldtype>`

This comment has been minimized.

@lukewagner

lukewagner May 21, 2018

Member

What about the older idea of giving structtypes a trailing, dynamic-length array tail? I'd be surprised if multiple source-languages didn't need to store one or two extra bookkeeping fields to implement their source-language array, so it feels like a GC MVP thing and it seems like, with that ability, a separate arraytype isn't even needed.

This comment has been minimized.

@aardappel

aardappel May 21, 2018

I agree that this would really extend language's abilities to find efficient representations for their data types. Engines can still treat structs without a tail specially.

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

This is enabled by allowing nesting of aggregates (structs/arrays), which is currently left for post-MVP. For MVP it seems okay to use an indirection.

This comment has been minimized.

@lukewagner

lukewagner May 25, 2018

Member

Fair enough


* A reference value type is defaultable if it is not of the form `ref $t`

* Locals must have a type that is defaultable.

This comment has been minimized.

@lukewagner

lukewagner May 21, 2018

Member

Without adding a nullable type constructor, it seems like any source language with ubiquitous nullable types will be forced to use tons of otherwise-unnecessary anyrefs and casts. Nullability seems like an important thing not to bolt on later too, so can we add it?

This comment has been minimized.

@aardappel

aardappel May 21, 2018

I think it could be added, but there is a risk though that people will be lazy and default to using optref instead of anyref since it gives "easier" interop between all sorts of languages, which will make languages that are null-safe pay the price for having nulls. A world where non-null is default and languages that are liberal with null pay the price makes more sense to me, and will promote more robust software ecosystems built on wasm.

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

I am on the fence here. The main benefit of a nullable ref type over anyref is that a downcast to the non-nullable concrete ref type is gonna be slightly cheaper. But you'd still need it. (Or are you suggesting that all ref instructions should also work with nullable refs directly?)

This comment has been minimized.

@lukewagner

lukewagner May 22, 2018

Member

A downcast from a nullref $t to ref $t is going to be significantly cheaper than a downcast from anyref; the former is a single null check; the latter several instructions and memory access (several, if TLS is used to avoid baking in pointers to the code).

This comment has been minimized.

@rossberg

rossberg May 23, 2018

Author Member

Okay, added optref (for a short name). (I abandoned the idea for a separate nullable type constructor because it actually introduces more anomalies than benefits.)

This comment has been minimized.

@lukewagner

lukewagner May 25, 2018

Member

Thanks, looks good!

This comment has been minimized.

@lukewagner

lukewagner May 25, 2018

Member

Oh: perhaps also mention table element kinds (or maybe this is solved by validation constraint on table.grow and adding a new table.grow_init that takes a value to initialize with?

This comment has been minimized.

@rossberg

rossberg Jun 5, 2018

Author Member

I think table.grow should take an initialisation operand right away, no point in having two separate instructions there. (I just realised that this is instruction is not currently part of any proposal, so I opened an issue for the reference types proposal.)

But there is another constraint we need: a table definition of non-zero initial size must have a defaultable element type. Added that.

@lukewagner

This comment has been minimized.

Copy link
Member

lukewagner commented May 21, 2018

@rossberg @stedolan What about we start with having the type tags be a new definition kind (like @stedolan first suggested) and then support the use cases wanting first-class tags with the definition reference types extension? This is good because it allows the type tags to be imported (like any definition kind) which has multiple benefits.

Also: should the type tags we're talking about here be unified with those being discussed for exceptions?

@sjrd

This comment has been minimized.

Copy link

sjrd commented May 21, 2018

I'm slowly catching up with this discussion. Here are some comments, from the point of view of compiling Scala.js to Wasm.

int31ref

int31ref is ... weird, IMO. Wasm is generally based on i32. int31ref is only relevant if there are producers whose languages have 31-bit integers in their specification, which AFAIK is not common (OCaml comes to mind, but that's all). If not for values fitting in 31 bits a priori, there is no use case for int31ref. Even the use case of unboxing an integer when possible (small enough at run-time) would require tests in the generated code, and unpredictable boxing, which an int32ref would more elegantly solve, and in ways that actually take advantage of the run-time bit-width (i.e., a 64-bit interpreter will be able to perform better).

I would advocate for int32ref based on this observation, or even no int-ref at all in the MVP, and let experience with different producers drive the right choice in the future.

nullable ref types

As already mentioned by @lukewagner, completely removing the ability to have nullable references other than the completely untyped anyref would be a killer for languages that have nullable reference types by default, which includes the entire Java family. On the other hand, having non-nullable reference types by default is clearly the future, and not exposing those in Wasm would mean that more modern languages might have suboptimal performance as a result.

I suggest the addition of another type constructor nullableref $t (or nref for short) which is, well, a nullable reference, i.e., it is equivalent to ref $t | nullref. The typing rules are obviously

  • nullref <: nullableref $t
  • ref $t <: nullableref $t

any type

I would really like to see an any type, i.e., one that can hold any JavaScript value (including primitives) and any anyref. Or alternatively, allow anyref to contain all those values. In Scala.js, values of type Any can really contain anything, and due to generic type erasure, they are very common. If there is no equivalent type in Wasm, I am not sure how I would compile them. I would essentially need a union (à la C) of different types, which is not really provided by Wasm either. The best I can come up with right now would be a ref struct with 0 field, and then "subclasses" ref struct i32, ref struct anyref, etc. that I could downcast to. Not fun.

Yet another way to look at this would be to add boolref, undefinedref and float64ref to the mix. Then every JS value becomes representable in anyref.

casts

In the current text, the constraint t <: t' <: anyref is too strong to implement even the downcast of Java. Consider the following snippet:

interface A {}
interface B {}
class X extends Object implements A, B {}
class Y extends Object implements A, B {}

A a = ...;
B b = (B) a;

the last instruction may succeed, but there is no way to encode it as the proposed cast instruction, because neither A <: B nor B <: A. In fact, we can encode it by introducing a temporary variable Object a1:

A a = ...;
Object a1 = a;
B b = (B) b;

but that's just silly.

I suggest dropping the constraint altogether.

I haven't examined the tag-based approach enough to determine whether it allows for such things.

- `reftype ::= ... | ref <typeidx>`
- `ref $t ok` iff `$t` is defined in the context

* `int31ref` is a new reference type

This comment has been minimized.

@aardappel

aardappel May 21, 2018

While I am sure this has its uses, and is nice to have from a "why not" perspective, I feel this is such an implementation specific optimization (it will not apply to languages that guarantee 32-bits integers, and it may not be a great fit for certain engines/hosts either, if they e.g. need more than 1 bit) that I'd think we'd be better off not having it, or guarantee the full 32-bits.

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

What @stedolan said. Don't think of this as a source-level type, it's just a way to represent tagged pointers/integers, which is a mechanism that many languages need. 32 bit integer references can be implemented in user space using this primitive. See also my recent comment on the respective issue.

This comment has been minimized.

@aardappel

aardappel Jun 1, 2018

Not sure what comment of @stedolan you are referring to, I don't see him refer to int31ref at all?

I see "many languages" needing 32-bit, and 31-bit being rather specific to some language implementations. I don't see how wasm as a whole benefits from this bias, especially since there will be some cost to supporting it (there will be places where this bit needs to be tested if integer values are a possibility). The alternative, if int32ref is not practical, is to not have integers part of reftype at all, since at least that would give engines the benefit of being able to assume a reftype is always a pointer.

This comment has been minimized.

@rossberg

rossberg Jun 4, 2018

Author Member

I was referring to this comment.

To clarify, int31ref is primarily a primitive for making pointer tagging available, in the most general form that we can still guarantee to be directly implementable across all relevant platforms and engines. Such tagging is a mechanism that probably a majority of GC'ed language implementations rely on in some form (almost all dynamic languages, functional languages, logic languages). The GC proposal will effectively be useless to them without it. While certainly limited, the proposed type hopefully is general enough that a rather large subset of these use cases can be mapped to it directly.

Int32ref is in a different category. It is not applicable to the same use cases, since it can involve unpredictable and substantial hidden cost on some engines (allocation and branching in particular). For example, in V8 on 32 bit platforms ;). Also, an int32ref can already be implemented with the current proposal, in at least two ways: either (1) as a ref to an immutable struct with a single int32 field (which an engine should be able to optimise just like a proper int32ref), or (2) as the union of an int31ref and the aforementioned struct, doing the necessary boxing/unboxing of large values in user space (in case you don't trust the engine to do a good enough job).

This comment has been minimized.

@aardappel

aardappel Jun 4, 2018

What about cost to languages that don't use it? The mere possibility of int31ref values being present may require engines to emit code that test the bit, right?

This comment has been minimized.

@rossberg

rossberg Jun 4, 2018

Author Member

Only at the point where they perform a downcast from anyref, which is an explicit operation and which in practice will require that check anyway, because existing engines tend to use tagging already (like V8).

- `structtype ::= struct <fieldtype>*`

* `arraytype` describes an array with dynamically indexed fields
- `arraytype ::= array <fieldtype>`

This comment has been minimized.

@aardappel

aardappel May 21, 2018

I agree that this would really extend language's abilities to find efficient representations for their data types. Engines can still treat structs without a tail specially.

- `arraytype ::= array <fieldtype>`

* `fieldtype` describes a struct or array field and whether it is mutable
- `fieldtype ::= <mutability> <storagetype>`

This comment has been minimized.

@aardappel

aardappel May 21, 2018

Missing definition of mutability ?
A field can also be a reftype ?
A field can be reftype that is stored in-line? Or is that GC v2? :)

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

Mutability is in the spec already. A field can be a valtype, which includes reftypes per the reference types proposal. Nested aggregates are post-MVP, since they imply introducing inner pointers.

- `eqref <: anyref`
- Note: `int31ref` and `anyfunc` are *not* a subtypes of `eqref`, i.e., those types do not expose reference equality

* `nullref` is a subtype of `eqref`

This comment has been minimized.

@aardappel

aardappel May 21, 2018

Would be good to have a hint somewhere (if not here, then in the GC proposal) why we need nullref at all, since none of the current types are nullable. Is this just to be able to type locals between function start and first assignment?

This comment has been minimized.

@rossberg

rossberg May 22, 2018

Author Member

It's from the reference types proposal. Anyref and anyfunc are nullable, so before you have cast down to a concrete ref type successfully it could still be null.


* A reference value type is defaultable if it is not of the form `ref $t`

* Locals must have a type that is defaultable.

This comment has been minimized.

@aardappel

aardappel May 21, 2018

I think it could be added, but there is a risk though that people will be lazy and default to using optref instead of anyref since it gives "easier" interop between all sorts of languages, which will make languages that are null-safe pay the price for having nulls. A world where non-null is default and languages that are liberal with null pay the price makes more sense to me, and will promote more robust software ecosystems built on wasm.

@stedolan

This comment has been minimized.

Copy link

stedolan commented May 22, 2018

@sjrd @aardappel
Specifying 31-bit integers as a source-language type is indeed pretty unusual. The only examples that come to mind are OCaml and Haskell (which specifies Int as a fixed-width type of at least 30 bits, although GHC uses 32-bit Int).

However, it's fairly common for dynamic language runtimes to use 31-bit ints to represent small integers, to avoid having to allocate on every arithmetic operation. Ruby does this, as does PyPy (optionally, although standard CPython does not do this and uses a separate (cached) allocation for every integer value) and Racket. Several Prolog implementations do this as well (e.g. SICStus uses 29-bit/61-bit ints, and SWI-Prolog uses 25/57-bit ints). All of these languages do not expose the word size at source level: they support arbitrary-precision arithmetic, but use tagged integers as an optimisation for small ints.

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 22, 2018

@brion, I added a br_on_cast instruction to enable expressing type-switching.

@sjrd

This comment has been minimized.

Copy link

sjrd commented May 22, 2018

However, it's fairly common for dynamic language runtimes to use 31-bit ints to represent small integers, to avoid having to allocate on every arithmetic operation.

I understand that. My point, which probably was not clear at all, is that: for the implementation of a language with dynamic small integers versus large integers, we can advantageously replace int31ref by int32ref in 100% of the cases. On a 32-bit implementation, the run-time can still avoid allocation for integers fitting in 31 bits. And on a 64-bit implementation, the run-time can avoid allocation of all 32-bit integers. Compared to int31ref, where a 64-bit implementation would have to allocate as much as the 32-bit implementation.

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 22, 2018

@sjrd:

int31ref

int31ref is ... weird, IMO. Wasm is generally based on i32. int31ref is only relevant if there are producers whose languages have 31-bit integers in their specification, which AFAIK is not common (OCaml comes to mind, but that's all). If not for values fitting in 31 bits a priori, there is no use case for int31ref.

That is not quite correct. See my other comments for why int31ref is not just an integer type but a more low-level primitive for pointer tagging. There are plenty of language implementations that use tagged pointers/integers of some form, not just Ocaml. Stephen enumerated a few -- practically all impls using a uniform representation. In all these cases, 32 bit ints with their possibility of hidden allocation and indirection and branching could be significantly more costly. Ultimately, I think we want both, but for the MVP, the lower-level primitive seems more relevant than int32ref (whose only benefit is better performance on 64 bit).

nullable ref types

As already mentioned by @lukewagner, completely removing the ability to have nullable references other than the completely untyped anyref would be a killer for languages that have nullable reference types by default, which includes the entire Java family. On the other hand, having non-nullable reference types by default is clearly the future, and not exposing those in Wasm would mean that more modern languages might have suboptimal performance as a result.

I suggest the addition of another type constructor nullableref $t

All reasonable, and I agree that we want such a type. I was just leaving it out to avoid MVP feature creep, see my reply to @lukewagner. But happy to discuss this further if folks think it is essential to have it now.

I would really like to see an any type, i.e., one that can hold any JavaScript value (including primitives) and any anyref. Or alternatively, allow anyref to contain all those values.

This topic and these very options are being discussed in the context of the reference types proposal (and quite extensively offline between Luke and me). I think we almost have concluded to go with the latter.

casts

In the current text, the constraint t <: t' <: anyref is too strong to implement even the downcast of Java. Consider the following snippet:

interface A {}
interface B {}
class X extends Object implements A, B {}
class Y extends Object implements A, B {}

A a = ...;
B b = (B) a;
the last instruction may succeed, but there is no way to encode it as the proposed cast instruction, because neither A <: B nor B <: A. In fact, we can encode it by introducing a temporary variable Object a1:

A a = ...;
Object a1 = a;
B b = (B) b;
but that's just silly.

I suggest dropping the constraint altogether.

Hm, be careful not to confuse levels. The Wasm casts are very much a low-level mechanism concerned with concrete low-level (structural a.t.m.) representations. On that level, a sideways cast never makes sense, AFAICS, as it cannot possibly succeed. I don't think a source-level cast between interfaces like in your example will map directly to a Wasm-level cast like you seem to suggest. What Wasm representation for Java interfaces do you have in mind?

@lars-t-hansen

This comment has been minimized.

Copy link

lars-t-hansen commented May 22, 2018

With a little hesitation:

I think if we want to do tagged pointers we should have a broader discussion about concrete needs, and not just assume that 31-bit ints with the high bit discarded on boxing is the sweet spot. Though Racket evidently uses 31-bit fixnums, neither Chez Scheme nor Larceny (both native Scheme implementations) do, preferring instead to use three-bit tags with low tagging where tags 000 and 100 are fixnum values (ie, they have 30-bit signed fixnums) and arithmetic can be performed directly on tagged values. For some dynamic languages, not needing an indirection to access an object's major type class is important. Also see comments about Prologs above. The int31ref design opens up a path for tagged pointers but is IMO strongly biased in favor of statically typed ones.

(There was a longish discussion about tagging schemes here: WebAssembly/design#919, where I proposed a boxing scheme that allowed for more tag bits and more efficient boxing and unboxing. I'm not saying that that is fully baked, but it represents a different view.)

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 22, 2018

@lukewagner:

What about we start with having the type tags be a new definition kind (like @stedolan first suggested) and then support the use cases wanting first-class tags with the definition reference types extension? This is good because it allows the type tags to be imported (like any definition kind) which has multiple benefits.

I agree that im/export is essential. I was assuming that the type rep instructions can be regarded as constant instructions, so could be used in global definitions, which would also allow them to be imp/exported. I strongly believe that declaration-based type tags are gonna be too restrictive long-term.

Also: should the type tags we're talking about here be unified with those being discussed for exceptions?

I believe they are different mechanisms, which is why I avoided the term "tag". For example, exceptions don't have subtyping (although they could). OTOH, they could have return types and other attributes having to do with control flow. Also, wasn't it a stated goal to keep the exception proposal independent of the GC extensions?

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented May 22, 2018

@lars-t-hansen, yes, I think a more general tagging mechanism would be very useful. I still think this would need to have the form of proper variant types to be sufficiently hardware-indepencent. However, I don't think any such mechanism will be able to provide the same performance and portability guarantee, i.e., anything but a singe tag bit might require an indirection and will force specific implementation schemes on engines.

In that sense, while somewhat odd-looking, int31ref is much simpler and has reliable performance. It hence isn't subsumed by a general tagging mechanism, AFAICS. It is making a different trade-off between generality and reliable performance.

Also note that producers are free to use more integer bits for their own tagging purposes. They just cannot have more pointer bits, because that's not portably achievable. (For one, because existing engines already differ in the polarity of their tagging scheme.)

@brion

This comment has been minimized.

Copy link

brion commented May 22, 2018

I did a little more prototyping using an int31ref-like scheme for a JS-like language. While it works nicely for in-range integers the overhead of boxing onto heap objects is really high, so if > 31 bits are needed things slow down fast. (For instance a mandelbrot fractal calculation is 10x slower with f64s boxed compared to using 64-bit NaN-boxing.)

If a float64ref isn't available, I'd have to consider a fat-pointer scheme pairing a NaN-boxed value in an i64 (for i32 and f64 values) and a separate reference (for objects and specials). That's 128 bits per value, and I have to manage the pairing -- as two arguments, as struct tuples in arrays, and ... either require the multival proposal for return values or do some kind of out-value struct.

I do though understand that anything over 31 bits is tough to guarantee across different architectures and embeddings. (NaN boxing as we know it might not survive future increases of native address space beyond 48 bits anyway!)

@lukewagner

This comment has been minimized.

Copy link
Member

lukewagner commented May 22, 2018

(Probably it's a good idea to fork off a separate issue for int31ref and simply put a big TENTATIVE marker in MVP.md for the purposes of this MVP.)


* Any function reference type is a subtype of `anyfunc`
- `ref $t <: anyfunc`
- iff `$t = <functype>`

This comment has been minimized.

@lukewagner

lukewagner May 25, 2018

Member

Also optref $t, yes?

This comment has been minimized.

@rossberg

rossberg Jun 5, 2018

Author Member

Yes, that is implied by the previous bullet.

- and all `t*` are defaultable

* `struct.get <typeidx> <fieldidx>` reads field `$x` from a structure
- `struct.get $t i : [(ref $t)] -> [t]`

This comment has been minimized.

@lukewagner

lukewagner May 25, 2018

Member

For languages with ubiquitous optrefs, get/set taking a ref will have a major codesize impact with all the casts. It seems like get/set could take an optref instead (with trap semantics) and it would just be a trivial local analysis that removes the null check when the operand had static type ref.

This comment has been minimized.

@rossberg

rossberg Jun 5, 2018

Author Member

Changed. But note that this relies on subsumption, i.e., subtyping being implicit. If we were to require an explicit instruction for upcasts like we have considered recently then you would have to inversely upcast every ref to optref -- or introduce all ref instructions in two variants, or introduce overloading.

@sunfishcode

This comment has been minimized.

Copy link
Member

sunfishcode commented Jun 4, 2018

Int32ref is in a different category. It is not applicable to the same use cases, since it can involve unpredictable and substantial hidden cost on some engines (allocation and branching in particular). For example, in V8 on 32 bit platforms ;).

It would seem that int32ref could be implemented on a 32-bit platform as two i32 values, not unlike how i64 is often implemented on 32-bit platforms. To be sure, this would take extra registers and have a cost, but it wouldn't be the cost of allocation and branching. On the other hand it would also avoid the need for 31-bit overflow checking.

As an unrelated question, should int32ref etc. be called i32ref etc., for consistency with wasm's existing integer types?

- `struct.get $t i : [(optref $t)] -> [t]`
- iff `$t = struct (mut1 t1)^i (mut ti) (mut2 t2)*`
- and `t = unpacked(ti)`
- traps on `null`

This comment has been minimized.

@lars-t-hansen

lars-t-hansen Jun 8, 2018

This confuses me: the static type of the referred-to object appears to be restricted to $t exactly, but there is no widening / upcast operator that I can see. Are you expecting widening casts to be handled by LET in some fashion or by changes to type compatibility for eg SET_LOCAL? Or is it just a matter of the missing prose in the section on Value Conversions?

I'd also be curious to hear about why you feel is a desirable operand here, since the verifier will know the static type of the operand.

This comment has been minimized.

@rossberg

rossberg Jun 8, 2018

Author Member

This assumes the usual declarative style of having a separate subsumption rule. For the Wasm type system it would be built into the sequencing rule (see the spec draft for the reference types proposal for details; concretely, the last rule in this subsection). With this approach, individual rules can always "assume" the correct type.

Your second question seems to be missing a word? Did you mean the type immediate? A stated design criterion for Wasm was to always have all operationally relevant type information explicit in the instructions (except where the semantics is completely parametric in a type), e.g. no overloading. So this is just following that.

This comment has been minimized.

@lars-t-hansen

lars-t-hansen Jun 8, 2018

OK, that's fine. So the subtyping rules a little further up here really extend the `match rule there.

(For the second para I did indeed mean the type immediate; github helpfully removed a word in angle brackets from my comment.)

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented Jun 8, 2018

@sunfishcode, the point of having int refs is that they are freely interchangeable with regular references, e.g. in the anyref type. That requires that they are no larger than pointers. (Unless we want to blow up all references to 2 words on 32 bit.)

But you are absolutely right about the naming, changed!

@sunfishcode

This comment has been minimized.

Copy link
Member

sunfishcode commented Jun 8, 2018

@sunfishcode, the point of having int refs is that they are freely interchangeable with regular references, e.g. in the anyref type. That requires that they are no larger than pointers. (Unless we want to blow up all references to 2 words on 32 bit.)

Are you referring to specific implementations, or are there present or anticipated GC features that would require/oblige all implementations to work this way?

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented Jun 8, 2018

@sunfishcode, there isn't too much leeway wrt implementation techniques when e.g. a structure field of type anyref must be able to hold both a regular reference and an int ref without extra boxing.

@rossberg rossberg referenced this pull request Aug 16, 2018

Open

Binary Format #37

@Horcrux7

This comment has been minimized.

Copy link

Horcrux7 commented Aug 21, 2018

Want to represent instances as structures, whose first field is the method table.

Then every instance will 4 bytes larger for this field of the method table. This will consume many extra memory. I think an instance should only contains the type/struct id. Then for every type can be declared an optional method table. This can be in the type section or an extra section.

@lukewagner
Copy link
Member

lukewagner left a comment

Oops, I didn't mean to leave this permanently in "requesting changes" limbo! This MVP.md is sufficiently sprinkled with TODOs, and the PR's discussion sufficiently long that I think we should merge and file separate issues/PRs to sort out the individual TODOs. Two small requests before merging:


#### Integer references

Tentatively, support a type of guaranteed unboxed scalars.

This comment has been minimized.

@lukewagner

lukewagner Aug 23, 2018

Member

Could you add a "TODO: this particular i31 design choice is tentative" here?

This comment has been minimized.

@rossberg

rossberg Aug 23, 2018

Author Member

Done.

@@ -0,0 +1,314 @@
# GC v1 Extensions

See [overview](Overview.md) for background.

This comment has been minimized.

@lukewagner

lukewagner Aug 23, 2018

Member

Could you add a big bold "Note: this design is still in flux, even outside of TODOs below" or some such to the top here?

This comment has been minimized.

@rossberg

rossberg Aug 23, 2018

Author Member

Done.

rossberg added some commits Aug 23, 2018

@rossberg rossberg merged commit d9f966a into master Aug 23, 2018

@rossberg rossberg deleted the mvp branch Aug 23, 2018

@rossberg

This comment has been minimized.

Copy link
Member Author

rossberg commented Aug 23, 2018

@Horcrux7, hardwiring method tables as such into the language would essentially built-in an OO-specific object model, a bias that Wasm is meant to avoid. There are ideas for a more general notion of per-type fields that would hang off the internal type tag, but any mechanism like that would be post-MVP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.