Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output type of local.tee? #201

Closed
askeksa-google opened this issue Mar 3, 2021 · 30 comments
Closed

Output type of local.tee? #201

askeksa-google opened this issue Mar 3, 2021 · 30 comments

Comments

@askeksa-google
Copy link

With the introduction of static subtyping, the input to a local.tee can be a subtype of the type of the local. This gives two obvious, sound choices for the output type:

  1. The type of the local. This is what follows from the typing rules for local.tee (same input, output and local type) combined with implicit upcasting of inputs.
  2. The actual type of the input. This is more useful, since it is more precise. On the other hand, it means that local.tee is not equivalent to local.set followed by local.get, and doing the substitution can break validation.

Currently, as far as I can tell, Binaryen implements (1) and V8 implements (2).

What is the intended behavior?

@kripken
Copy link
Member

kripken commented Mar 3, 2021

Previous discussion: WebAssembly/reference-types#55

Looks like the status there is that we agreed to keep things as what you call option 1.

@jakobkummerow
Copy link
Contributor

OK, we'll fix that in V8 then.

(Though IMHO it's sad to discard perfectly valid type information, e.g. about non-null-ness in a sequence like (struct.new) (local.tee) (struct.set)...)

@askeksa-google
Copy link
Author

Thanks for the pointer.

This ties in with the discussion of non-nullable locals in WebAssembly/function-references#44, in that when locals can't be non-nullable, any value stored through local.tee is forced to discard its nullability, leading to more inserted ref.as_non_null checks in practice.

@kripken
Copy link
Member

kripken commented Mar 3, 2021

I'm not opposed to revisiting this topic, btw, I was just pointing out the previous info. Data on code size downsides could be important, I agree.

@taralx
Copy link

taralx commented Mar 3, 2021

At the time we were concerned about losing the equivalence. Do we have any data on that equivalence being used?

@kripken
Copy link
Member

kripken commented Mar 19, 2021

I explored this in Binaryen, WebAssembly/binaryen#3704 (comment) Overall there is some work to be done in the optimizer there to handle losing that equivalence, but it seems practical.

Now that we have an almost-fully working pipeline I think it's easier to make a decision on this. I'd support using the more specific type of the value, so my previous concerns are no longer valid. Perhaps we can make the change then, maybe in the next GC prototype version?

@kripken
Copy link
Member

kripken commented Mar 31, 2021

Noticed in WebAssembly/binaryen#3767 that br_if has a similar situation, where the spec says the type is that of the target block/loop/etc. and not the value flowing through it, which means that like with local.tee we are missing out on more specific type info.

It probably doesn't matter (as mentioned there, I've never seen br_if's value used except in fuzz code), but maybe we should change it for consistency if we change local.tee.

@titzer
Copy link
Contributor

titzer commented Aug 5, 2021

Hmm, I find it odd that br_if loses type information by virtue of a possible control flow transfer to label with a different type. Since it leaves the value on the operand stack when not transferring (i.e. falling through), it makes more sense to me that it shouldn't lose type information.

FWIW I recently made a similar change implementing a source language so that the expression (a = e), i.e. a nested assignment, has the type of e rather than a. I made that change for similar reasons to WebAssembly/reference-types#55.

@tlively
Copy link
Member

tlively commented Aug 5, 2021

I agree with @kripken that this would be a good candidate change for the next prototype spec (both for local.tee and for br_if). Are there any other places where we are unnecessarily losing type information?

@titzer
Copy link
Contributor

titzer commented Aug 5, 2021

I also noticed that my implementation in https://github.com/titzer/wizard-engine is actually wrong for br_if and preserves the type. I guess there are no spec tests for this corner case of br_if.

@tlively
Copy link
Member

tlively commented Jun 16, 2022

This seems like a simple change we should be able to agree on quickly. @kripken, does preserving the more specific type on local.tee and br_if still sound reasonable to you? Would anyone else object to this change?

@kripken
Copy link
Member

kripken commented Jun 16, 2022

I think this makes sense to do. I'll update the binaryen code and run the fuzzer on it to check for any new issues since last we looked at it.

@kripken
Copy link
Member

kripken commented Jun 16, 2022

Testing now, the fuzzer found an issue that is making me rethink things.

Consider this optimization by Vacuum that removes a block:

(local.tee $x                    ;; type = (ref null any)
  (block (result (ref null any)) ;; type = (ref null any)
    (call $foo)))                ;; type = (ref any)  -  call never returns null

=> (vacuum pass) =>

(local.tee $x  ;; type = (ref any)  -  changed!
  (call $foo)) ;; type = (ref any)

After removing the block in the Vacuum pass we need to update the type of the local.tee. Concretely, that happens in TypeUpdater::propagateTypesUp() which currently assumes that when a child changes the only thing we need to propagate to parent types is unreachability, not subtyping - which is a new requirement with this change.

The key issue is that in this change, local.tee (and br_if) would copy the type of their child. That's already possible before, as ref.as_non_null does it for example, but this would make it more common, I guess, since the fuzzer just caught this now.

The concrete impact is that Binaryen will need to propagate types upwards in more places (e.g. using ReFinalize()). That will add at least some complexity and overhead to several passes. Doable, but maybe we should consider if other tools might be affected too.

@tlively
Copy link
Member

tlively commented Jun 16, 2022

Does this change affect the result of optimizing a j2cl module at all?

@askeksa-google
Copy link
Author

askeksa-google commented Jun 16, 2022

The most prominent consequence of this imprecision is that a non-nullable value stored by local.tee inevitably becomes nullable. So if we choose to support non-nullable locals, this issue becomes a lot less important. It will then only lead to imprecision when storing a value to a local that for some other reason has to have a less precise type.

@kripken
Copy link
Member

kripken commented Jun 16, 2022

@tlively Good question, that can help prioritize. Checking now, it's a 5.5 K difference on 13.42 MB, so it's far smaller than 1% (around 0.04%).

@askeksa-google Ah, interesting point... yes, that makes sense, and indeed I see that most of the change in the binary is removing ref.as operations. And when I test with non-nullable locals enabled, the 5.5 K difference shrinks to just 1 K.

In more detail: We have 90 K ref.as operations normally. That shrinks to 84 K with this change to local.tee. But if we enable non-nullable locals then we have just 211 and 67 of them, respectively.

@tlively
Copy link
Member

tlively commented Jun 16, 2022

So it's not a huge difference, but it's nonzero. I think we should go ahead and do this and we can try to optimize our type propagation (among other things) in Binaryen at some later point.

@kripken
Copy link
Member

kripken commented Jun 16, 2022

Doing this would mean we have to update all the relevant passes etc. now - we can't defer that for later. That's more than one day of work, but I'm not sure how much more.

@kripken
Copy link
Member

kripken commented Jun 16, 2022

Another factor to consider is that the current benefits might overlap with GUFA. That is, current benefits are because there is information that we know but then lose in the wasm type system. With this change, we'd be able to put more in the wasm type system. But GUFA will let us keep such information around even without that.

However, GUFA wouldn't help the VM - if this change would help on the VM side, there is no alternative to that.

Overall I'm not sure either way. I'd be ok to spend the time working on updating the passes etc. But I feel there might be more important things to prioritize.

@tlively
Copy link
Member

tlively commented Jun 16, 2022

Yeah, there are definitely more important things to work on. It just seems silly to me to unnecessarily lose type information. Since we can't make this change in the spec without updating Binaryen, this won't be as quick to resolve as I'd hoped. Let's keep this issue open and come back to it later once more important questions have been answered.

@rossberg
Copy link
Member

rossberg commented Jun 20, 2022

FWIW, it's not entirely silly. This would be the only instance where the result type of an instruction depends on the concrete input type, while also being constrained by the label local type. That would amount to a new form of instruction polymorphism that we do not currently have in Wasm. I believe the difficulties Alon encountered have to do with that.

Currently, we only have two instructions where the result type of an instruction depends on the input type at all, namely drop and select. But at least those are fully polymorphic. Yet, we already regretted that choice for the latter.

So, I'd be slightly hesitant. Constrained polymorphism is more tricky and could introduce complications in other tools as well. For example, if some analysis needs to perform a backwards propagation of types.

@jakobkummerow
Copy link
Contributor

Viewed differently, local.tee-with-this-update would be just as "polymorphic" as nop: it would leave the value stack unchanged.

(I'm fine with postponing this; non-nullable locals might well eat most of this proposal's lunch anyway.)

@rossberg
Copy link
Member

@jakobkummerow, no, it's not, unfortunately. Because the type is constrained by the local's type. That makes it a very different category.

@tlively
Copy link
Member

tlively commented Jun 21, 2022

Just be be clear, the status quo (at least as far as the V8 and tools teams know) is that local.tee takes the type of the local and the proposed change is to make it preserve the type from the stack. @rossberg, I believe you had been interpreting it the other way around.

@rossberg
Copy link
Member

@tlively, I haven't. What I'm saying is that it's not just taking the type from the stack. Because that type also has to match the local's. That's fairly easy when you're doing forward validation on complete programs. But it can be complicated when you're doing other forms of type analysis, transformations, or for other reasons, need to handle incomplete program fragments.

As a simple example, consider this code fragment:

(local.tee $x)
(local.tee $y)
(drop)

Imagine some analysis that needs to compute what type of operand must be on the stack before this fragment. Currently, it's simply the type of $x, the last encountered consumer of the operand when walking backwards. In contrast, under the proposed change, the analysis would have to collect and record all types the operand is consumed at. Or compute their greatest lower bound – but that may not be unique either given that we now have explicit subtyping declarations.

@conrad-watt
Copy link
Contributor

It sounds like everyone is willing to accept the status quo, so maybe we can pause the discussion for now and agree to revive it if we later want the refined local.tee type. IIUC this could even be a backwards-compatible post-MVP change?

@titzer
Copy link
Contributor

titzer commented Jun 21, 2022

@rossberg Would call_ref not qualify as being polymorphic in its input type? It doesn't have an explicit type annotation, so its result type(s) depend entirely on the signature of the function argument.

@tlively
Copy link
Member

tlively commented Jun 21, 2022

I'm happy to defer refining local.tee's type to post-MVP (or never) if that's what will be easiest for now. Do we want to defer similarly for br_if, @rossberg?

@rossberg
Copy link
Member

@tlively, yes, br_if is in the same boat.

@titzer, right, closely related, but somewhat different, because it does not inject a type intersection. More importantly, whether call_ref remains that way is still an open question, depending on how we resolve the question of type annotations (WebAssembly/function-references#27). If we keep the type annotations in the GC proposal, then call_ref (and a few other instructions) should be changed to also have one. My example of backwards type propagation indeed only applies in the presence of such annotations.

@tlively
Copy link
Member

tlively commented Jul 21, 2022

Looks like we have consensus not to refine the output type of local.tee and br_if, at least right now, so I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants