Why is nullref not allowed in binary and text format? #60

aheejin · 2019-11-14T00:36:39Z

This makes writing transformation passes in toolchain somewhat tricky. I'm not necessarily arguing for allowing it at this point, but I don't precisely remember what the reason for disallowing that was.

rossberg · 2019-11-14T07:05:37Z

In the first version of the proposal nullref was available. Removing it was a form of compromise, because there was some dislike of having this type. I am not sure what the arguments were. If you have a good use case, we can discuss bringing it back. It would be an easy change.

aheejin · 2019-11-14T09:33:28Z

It is mostly that block/loop/if/try doesn't have self-sufficient type information within itself. For example,

(block (type ???)
  (ref.null)
)

When we create this block, we don't know which type to assign to this block, because we cannot assign nullref. We cannot arbitrarily promote this to exnref/funcref/anyref before we see the the parent node (in wast format) / the next instruction. In real transformations we wouldn't create a block like that out of nowhere, but transformations can split/merge blocks and create/delete instructions within them, and we don't have the ability to calculate the type of block/loop/... only by examining it and its child nodes anymore. It's not that this restriction makes transformations not possible, but they can be trickier.

RossTate · 2019-11-15T14:14:40Z

For a number of tooling purposes, you should ensure that every instruction has a principal output type for any given input type. For ref.null, that seems to necessitate the nullref type. So the options seem to be either to add nullref or to parameterize ref.null with an argument that indicates which reference type the output type should be.

sjrd · 2019-11-15T14:20:35Z

Some languages, among which Scala, even have a user-space Null type, which can be the declared result type of a function. For example in Scala the following code is valid:

class Foo
class Bar

def giveMeNull(): Null = null

val x: Foo = giveMeNull()
val y: Bar = giveMeNull()

When compiling giveMeNull() to wasm, the only valid result type for giveMeNull() would be nullref. If we cannot express it in the binary or text format, we have to resort to weird encodings.

RossTate · 2019-11-15T14:26:46Z

Yep, though that's more relevant to the GC proposal (where it's planned to be addressed). Nonetheless, it's a good point that something like nullref will be needed eventually anyways, and the option to parameterize ref.null will likely be insuffucient down the line.

rossberg · 2019-11-15T15:46:27Z

Just to clarify, the proposal already includes a nullref type, for the reasons @RossTate mentions. The question @aheejin is asking is whether it should be expressible in concrete syntax. As mentioned above, that was the case originally, but there was some resistance, so it was removed. I'm all for bringing it back if we can get consensus on that.

aheejin · 2019-11-16T09:51:16Z

I found some relevant previous discussion threads: thread1 thread2

@lukewagner Are there advantages specific to allowing nullref but making it not representable in text or binary format? The threads discuss mostly about whether we need nullref or not and whether we should have a separate null instruction for each reference type, but I'm not sure if I fully understand the advantages of making nullref not writable while having it as a validation type.

lukewagner · 2019-11-18T05:35:52Z

Keeping nullref out of the binary format isn't really the goal; the goal was to avoid having to deal with the annoyance of having to add nullref support for locals, globals, params, returns, tables and, in the future, elems and fields. It's all doable, of course, but kindof annoying, especially if you start wondering whether we'll eventually need to "optimize" nullref (instead of simply representing it as a null pointer).

aheejin · 2019-11-18T09:37:29Z

@lukewagner

the goal was to avoid having to deal with the annoyance of having to add nullref support for locals, globals, params, returns, tables and, in the future, elems and fields.

I don't understand this part very well, maybe because I'm not familiar with VM implementations. Locals/globals/params/... can have all other types already, and why would allowing them to have nullref type be more work? Wouldn't categorizing types into two (one that can be local/global/... type and one that can't) rather be more work?

It's all doable, of course, but kindof annoying, especially if you start wondering whether we'll eventually need to "optimize" nullref (instead of simply representing it as a null pointer).

I don't understand what you mean by optimize nullref. Could you elaborate?

lukewagner · 2019-11-18T19:20:19Z

There tends to be case analysis (one case per type) for the operations on all these kinds of types (in codegen, in stub generation, in JS API, in the runtime functions). Again, it's not a ton, it's just annoying to have to add all these for a type that should never be meaningfully used in all these contexts.

The "optimization" for nulltype would be for it to take no space (and take no instructions when passing around) since it has only one value. Noone would do this initially, but maybe someone writes a silly benchmark one day and then someone does a silly optimization for it (e.g., as happened with the computation-on-undefined "optimizations" that arose b/c of Kraken back in the day).

I mean, it's not a big deal, but the origin for all this was that it seemed simpler to not have a ref.null that produced a new nullref type, but rather have a ref.null $T that produced a ref T, which would avoid having a nullref type altogether. The "nullref won't show up except during operand-stack validation" rule was a compromise to avoid (most of) the downsides of adding nullref.

aheejin · 2019-11-18T21:17:01Z

I see. I understand there are many parts (codegen, in stub generation, in JS API, ...) in the VM that has to have handling routines for nullref. But for the optimization, I think it's OK to not have it at this point and we can consider adding it at some point later if it is really necessary.

I think at this point the problem is where we put the complexities. Currently in tools (specifically Binaryen) we can't figure out a node's type independently in some cases without looking at its parent node. Examples include

(block (type ???)
  ...
  (ref.null)
)

(select (type ???)
  (ref.null)
  (ref.null)
)

So we either need to take special care in Binaryen not to produce those kind of code (which has been proven to be very difficult) or do a special separate transformation to remove nullref at the end of the pipeline. So I was wondering, if the complexity to handle nullref in the VM side is not too bad, would it be feasible to add back nullref as a normal type.

We can possibly hear more opinions in the CG meeting too.

kripken · 2019-11-19T00:17:42Z

I agree with @aheejin and @rossberg - it seems simplest to just consider nullref as a normal type. @lukewagner I get your point that special-casing it allows us to avoid some possible optimization work in the future, but the special-casing seems kind of awkward.

In particular, it just seems odd that nullref is special-cased to not be directly storeable to a local that has its own type:

(local $foo nullref)
(local.set $foo
  (ref.null)
)

I think we should allow that. It makes the spec simpler, more regular, and easier to understand.

lukewagner · 2019-11-19T00:50:22Z

@aheejin Ah, that's a good point; I can see how that's problematic. If we consider changing the proposal to allow nullref in more places, I'd also like to consider not having nullref at all since I don't see how it's really buying us anything here.

aheejin · 2019-11-19T01:03:41Z

@lukewagner If we don't have nullref, what do we initialize locals with reference types? And what should the result type of ref.null be?

dschuff · 2019-11-19T01:25:53Z

I think the alternative would be just to have a different null-generating instruction for each nullable type, right? Just like i32.const 0 is different from f64.const 0

lukewagner · 2019-11-19T01:26:19Z

The type of ref.null <consttype> (where <consttype> is an immediate which could only be any or func for now) would produce a (ref <consttype>) (where anyref is just an abbreviation for (ref any) and likewise funcref for (ref func), as already proposed in the function references proposal). Thus you could create a null anyref with (ref.null any) and a null funcref with (ref.null func).

FWIW, this change would remove ~100 lines of code in FF at the cost of adding maybe 5 lines in the decoding/validation of ref.null. Not a big deal, because it's trivial code, but enough to ask why have this rather-useless supertype.

aheejin · 2019-11-19T02:07:16Z

Unless we want to merge this proposal with the function reference proposal, we either need nullref to initialize locals or ref.null for each reference type. I think the former is less of a change to the current proposal than the latter..?

rossberg · 2019-11-19T06:44:49Z

@lukewagner:

There tends to be case analysis (one case per type) for the operations on all these kinds of types (in codegen, in stub generation, in JS API, in the runtime functions).

I am puzzled by this. Soon (with the typed reference proposal) there will be an infinite number of reference types, so doing individual case distinctions seems like the wrong approach already. After validation, there ought to be very little reason for distinguishing the different ref types in generic constructs -- they all have the exact same behaviour. Otherwise, things like type imports cannot possibly work in the future, since they can be any reference type.

Sure, I can imagine a few representation optimisations for single-value types like nullref -- but those would apply to other singleton types as well (e.g. if we had variant types), and don't have anything to do with nullref per se. And I honestly see no practical gain or reason why a VM should ever bother with such optimisations. Either way, their optimisability seems like an odd argument against the possibility of single-value types.

This adds support for the reference type proposal. This adds support for all reference types (anyref, funcref(=anyfunc), and nullref) and four new instructions: `ref.null`, `ref.is_null`, `ref.func`, and new typed `select`. This does not include table instructions yet. This also does not include wasm2js support. This does not pass `fuzz_opt.py` yet because there are remaining bugs related to `nullref` handling, because `nullref` is not allowed to be written to text or binary format. We are discussing whether we should allow it to be written in WebAssembly/reference-types#60 now. This is not ready for review yet because it does not pass fuzzer now; I'm uploading this just to show how the current a bit ad-hoc `nullref` handling looks like. So don't spend too much time on reviewing each detail for now. I apprciate some high-level comments on how we should handle `nullref` or other stuff.

This adds support for the reference type proposal. This adds support for all reference types (anyref, funcref(=anyfunc), and nullref) and four new instructions: `ref.null`, `ref.is_null`, `ref.func`, and new typed `select`. This also adds subtype relationship support between reference types. This does not include table instructions yet. This also does not include wasm2js support. This does not pass `fuzz_opt.py` yet because there are remaining bugs related to `nullref` handling, because `nullref` is not allowed to be written to text or binary format. We are discussing whether we should allow it to be written in WebAssembly/reference-types#60 now. This is not ready for review yet because it does not pass fuzzer now; I'm uploading this just to show how the current a bit ad-hoc `nullref` handling looks like. So don't spend too much time on reviewing each detail for now. I apprciate some high-level comments on how we should handle `nullref` or other stuff.

fgmccabe · 2019-11-19T17:05:09Z

Forgive the intrusion ... but, as the function reference type proposal makes clear, not all reference types are nullable. IMO, it would have been better not to split reftype from function reftype proposal.

lukewagner · 2019-11-19T18:18:57Z

@aheejin We don't need to merge it with the function references, just borrow the a tiny subset of the encoding of <consttype>. Referring to the function references proposal was just a way of showing that this type-immediate approach is not a weird/unique detail of ref.null; it'll show up elsewhere in the future.

@rossberg It's not a case per reference type, but per type constructor. So there is one case for (ref T) (for all T). But nullref is distinct.

As I see it, both nullref and ref.null <consttype> are ways to solve the problem that there are an unbounded number of reference types we might want to generate null for, but the latter keeps this a local detail of the instruction while the former shows up everywhere where we have a wasm valtype, which is a lot of places. It shows up trivially, but we're still talking 1 vs. N.

kripken · 2019-11-19T23:06:51Z

@lukewagner Makes sense, I like ref.null <consttype> as a solution here. This seems to simplify the proposals, and after talking with @aheejin I understand it would also simplify the implementation in binaryen and likely elsewhere.

aheejin · 2019-11-19T23:29:18Z

@lukewagner Do you mean you want to make ref.null take a type parameter so that we don't even need to have a nullref as an internal type? Either that or promoting nullref to a writable type will solve our problem I think, so I'm fine with either.

But is (ref T) the instruction that generates null value for each type? The function reference proposal says

anyref, funcref, nullref are reinterpreted as abbreviations for (ref any), (ref func), (ref null), respectively

This sounds like (ref T) is not an instruction but a rather a type. Could you clarify?

So the problem for toolchain is not we have to do more implementation work, but it's tricky to transform code in the presence of the subtle differences between types, in which all other types are allowed to be written in binary/text but not nullref.

lukewagner · 2019-11-19T23:32:44Z

@lukewagner Do you mean you want to make ref.null take a type parameter so that we don't even need to have a nullref as an internal type?

Correct. I've written this as ref.null <consttype> above.

rossberg · 2019-11-20T08:27:13Z

@lukewagner:

It's not a case per reference type, but per type constructor. So there is one case for (ref T) (for all T). But nullref is distinct.

Sorry, I'm still not getting it. You have to distinguish between numeric and reference types. But all reference types are treated the same. There are already multiple type constructors for reference types, and there will almost certainly be more in the future. Whether this subgroup has one more entry shouldn't make any significant difference.

FWIW, with the function reference proposal that @aheejin pointed to above, all reference types will explicitly turn into the single case ref <constype>. That factorisation is necessary due to bounds on type imports etc (you even were the one who proposed it, IIRC :) ). Should we make it now for clarity?

lukewagner · 2019-11-20T15:16:47Z

@rossberg It's just an impl detail: there's an enum with a case for nulltype, just like in v8, and we switch on it, to get compiler warnings if we miss a case. Maybe it does go away if we replace it with (ref null), though.

But as a non-impl-detail example, should we support new WebAssembly.Global({type:'nullref'})? It'd be nice if we didn't even have to ask this question.

skuzmich · 2019-11-20T17:07:05Z

For languages with a proper "null" type, nullref would be useful for non-coercive subtyping of fields, parameters and return types.
Is there a way to do it without having nullref as a value type?

rossberg · 2019-11-21T13:11:30Z

@lukewagner, the nuisance of not having nullref would be much bigger:

We would need to invent a future-proof syntax and encoding for those "constypes" (the things that can be referenced), which are not currently used anywhere else. I consciously tried to avoid that before advancing to typed references. Changing that would clearly delay this proposal.
It would adversely affect (and require changes to) the bulk ops proposal, because ref.null is used for element expressions in table segments. Every single null element in a segment would have a redundant type annotation then!
Having more type annotations increases the cost of type checking. It is cheaper to check whether a type is a supertype of nullref than to compare two nontrivial types downstream of the instruction. For that reason, we should generally avoid type annotations where they aren't necessary, especially on generic operations.
Nullref actually can be a useful type in practice. For example, it allows a producer to create a single generic empty array object for use as a default for all (immutable) vector types over any nullable reference type. FWIW, C++11 added nullptr_t, and that's not the only such example.

lukewagner · 2019-11-21T18:27:00Z

Bullets 1 and 3 seem like filler arguments (the encoding of <consttype> for the two cases of any and func should be easy; it's arguable whether there would actually be any additional branching necessary, and, even if there was, it would be insignificant (Amdahl's law)).

For bullet 4, compilers should be able to deal with this easily by just "const-propagating" the null to where it's used, making this another of the classic "but what about toy compilers?" arguments.

But bullet 2 is a new and potentially compelling point :) Maybe that's enough?

rossberg · 2019-11-22T07:01:31Z

@lukewagner, to clarify, bullet 4 was alluding to possible examples like this:

type $dummy_vec = (array nullref)
global $dummy_vec : (ref $dummy_vec) = (array.new $dummy_array (i32.const 0))
...
type $s1_vec = (array (optref $type1))
type $s2_vec = (array (optref $type2))
global $vec1 : (ref $s1_vec) = (global.get $dummy_vec)
global $vec2 : (ref $s2_vec) = (global.get $dummy_vec)

Const-propagating null is unrelated. There is no actual null value, only an empty array that is shared across multiple types. (Perhaps not the most compelling example, but hopefully conveys the idea. Things like this pop up occasionally.)

It's also useful to avoid problems with compiling languages that indeed have the equivalent of a nullref type themselves, as @skuzmich pointed out above. That actually is a very strong point that hadn't even occurred to me.

(Of course, all this is primarily relevant in the presence of the GC proposal. But if the likelihood is that we're gonna want it anyway, then we should utilise it where we can.)

RossTate · 2019-11-22T14:46:39Z

@skuzmich's example pertains more to the gc proposal, and actually I've found that, while the use case does need something like nullref, it actually needs a variant with an additional parameter. I've worked out detailed strategies for a number of languages, and I'm trying to write them up as fast as I can so that y'all can discuss.

lukewagner · 2019-11-23T00:02:03Z

@rossberg Ah, I see. Sorry, I skimmed that bullet too quickly. Yes, I suppose as part of a compound type nullref can be useful as well.

So I suppose I'm fine with giving nullref a binary/text encoding and letting it show up everywhere.

aheejin · 2019-11-23T00:57:44Z

cc @gahaas for possible V8 implementation changes
For FF should I cc someone other than @lukewagner?

rossberg · 2019-11-23T08:42:11Z

Thanks @lukewagner! In order to unblock @aheejin fast, I created #66.

lars-t-hansen · 2019-11-25T07:31:07Z

cc @gahaas for possible V8 implementation changes
For FF should I cc someone other than @lukewagner?

@aheejin, yes, you should cc me :-)

This adds support for the reference type proposal. This adds support for all reference types (anyref, funcref(=anyfunc), and nullref) and four new instructions: `ref.null`, `ref.is_null`, `ref.func`, and new typed `select`. This also adds subtype relationship support between reference types. This does not include table instructions yet. This also does not include wasm2js support. This does not pass `fuzz_opt.py` yet because there are remaining bugs related to `nullref` handling, because `nullref` is not allowed to be written to text or binary format. We are discussing whether we should allow it to be written in WebAssembly/reference-types#60 now. This is not ready for review yet because it does not pass fuzzer now; I'm uploading this just to show how the current a bit ad-hoc `nullref` handling looks like. So don't spend too much time on reviewing each detail for now. I apprciate some high-level comments on how we should handle `nullref` or other stuff.

rossberg · 2020-01-16T16:15:47Z

This has been fixed, though #69 may require revisiting it.

lars-t-hansen mentioned this issue Nov 15, 2019

Reference types for phase 4? #61

Closed

aheejin mentioned this issue Nov 19, 2019

Add support for reference types proposal WebAssembly/binaryen#2451

Merged

rossberg pushed a commit that referenced this issue Nov 20, 2019

[Interpreter] Implement memory.copy, memory.fill (#60)

d91b74e

rossberg mentioned this issue Nov 23, 2019

[spec/interpreter/test] Allow nullref type externally #66

Merged

RossTate mentioned this issue Jan 9, 2020

Make funcref not a subtype of anyref #69

Closed

rossberg closed this as completed Jan 16, 2020

Why is nullref not allowed in binary and text format? #60

Why is nullref not allowed in binary and text format? #60

Comments

aheejin commented Nov 14, 2019

rossberg commented Nov 14, 2019

aheejin commented Nov 14, 2019 • edited Loading

RossTate commented Nov 15, 2019

sjrd commented Nov 15, 2019

RossTate commented Nov 15, 2019

rossberg commented Nov 15, 2019

aheejin commented Nov 16, 2019

lukewagner commented Nov 18, 2019

aheejin commented Nov 18, 2019 • edited Loading

lukewagner commented Nov 18, 2019

aheejin commented Nov 18, 2019 • edited Loading

kripken commented Nov 19, 2019

lukewagner commented Nov 19, 2019

aheejin commented Nov 19, 2019

dschuff commented Nov 19, 2019 • edited Loading

lukewagner commented Nov 19, 2019 • edited Loading

aheejin commented Nov 19, 2019 • edited Loading

rossberg commented Nov 19, 2019

fgmccabe commented Nov 19, 2019

lukewagner commented Nov 19, 2019 • edited Loading

kripken commented Nov 19, 2019

aheejin commented Nov 19, 2019 • edited Loading

lukewagner commented Nov 19, 2019

rossberg commented Nov 20, 2019

lukewagner commented Nov 20, 2019

skuzmich commented Nov 20, 2019

rossberg commented Nov 21, 2019

lukewagner commented Nov 21, 2019

rossberg commented Nov 22, 2019 • edited Loading

RossTate commented Nov 22, 2019

lukewagner commented Nov 23, 2019

aheejin commented Nov 23, 2019

rossberg commented Nov 23, 2019

lars-t-hansen commented Nov 25, 2019

rossberg commented Jan 16, 2020

aheejin commented Nov 14, 2019 •

edited

Loading

aheejin commented Nov 18, 2019 •

edited

Loading

aheejin commented Nov 18, 2019 •

edited

Loading

dschuff commented Nov 19, 2019 •

edited

Loading

lukewagner commented Nov 19, 2019 •

edited

Loading

aheejin commented Nov 19, 2019 •

edited

Loading

lukewagner commented Nov 19, 2019 •

edited

Loading

aheejin commented Nov 19, 2019 •

edited

Loading

rossberg commented Nov 22, 2019 •

edited

Loading