Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

folded syntax of let #20

Closed
fgmccabe opened this issue Apr 7, 2020 · 24 comments
Closed

folded syntax of let #20

fgmccabe opened this issue Apr 7, 2020 · 24 comments

Comments

@fgmccabe
Copy link

fgmccabe commented Apr 7, 2020

I have been using let fairly extensively recently (at least hand-writing programs that use it).

  1. Question: what is the appropriate folded version of a let instruction?
  2. If the order of were reversed, then it would better match the structure of a function (IMO).
@binji
Copy link
Member

binji commented Apr 7, 2020

  1. I'd guess similar to other folded block instructions:
(let (param)* (result)* (local)*
  (instr)*
)

EDIT: oh, but this doesn't quite make sense w.r.t. consuming the locals. @lukewagner had a good idea that this could be defined inside the let instead, e.g.:

(let
  (local i32 (i32.const 5))
  ...
) 
  1. I thought so too at first, but there are some good reasons not to do that, see my comment here. Since we allow params (in addition to locals) to a let instruction, they should probably work the same way that they do for other block instructions.

@lukewagner
Copy link
Member

Ah, interesting, I didn't know params were even possible. Given that the type of a let instruction

let bt (local t)* instr* end

has type:

[t* t1*] -> [t2*] 

where

bt = [t1*] -> [t2*]

and given the left-to-right order of locals and parameters, putting the locals before the blocktype would line up the order of local/param declarations in the text format with the order of [t* t1*] in the instruction type. Thus, the instruction grammer would be:

let (local <valtype>)* <blocktype> <instr>* end

Allowing us to write locals and results in the order we were originally expecting.

WDYT?

@binji
Copy link
Member

binji commented Apr 8, 2020

Oh, good idea! :) That makes sense to me.

So then a function:

(func (param t1)* (result t2)* (local t3)* <instr>*)

inlined at a callsite:

(call (t1.value)*)

would become:

(t1.value)*
(t3.default)*
let (local t1)* (local t3)* (result t2)*
  <instr>*
end

Since the stack is always empty at the beginning of the function (so there are no params).

(This is another place where it would have looked more natural if WebAssembly passed params on the stack instead of in locals, but eh!)

@RossTate
Copy link

RossTate commented Apr 8, 2020

If we changed let to not be the target of a branch instruction, could we forgo its block type? It seems like let is serving two orthogonal purposes: variable declaration+initialization and control flow.

@binji
Copy link
Member

binji commented Apr 8, 2020

Interesting idea. I guess in that case let t* would have the signature [t*] -> [] and would not have a matching end. Instead it would scope these locals to the enclosing block.

(If we kept let with a matching end, I think we'd need to keep the blocktype to indicate the expected signature for the entire block.)

Given this new let' design, if you did want to use the same block scope for control flow, you'd need to pass the locals through along with the block parameters. For example, given the following snippet using let:

(t1.value)*  ;; locals
(t2.value)*  ;; params
let (local t1)* (param t2)* (result t3)*
  ;; top of stack is (t2.value)*
  <instr>*
end

would become the following snippet using let'. But this would put the block parameters in a different place on the stack, since the let' instruction would consume the values on top:

(t2.value)*  ;; params
(t1.value)*  ;; locals
block (param t2)* (param t1)* (result t3)*
  let' (local t1)*
  ;; top of stack is (t2.value)*
  <instr>*
end

(The function inlining example in my previous comment would work like this too, I believe.)

If we did want to mimic the current let behavior, I think we'd have to consume the entire thing as locals, and use local.get to put them back on the stack. But maybe we don't care much about that.

As another interesting side effect -- this would mean we could create locals more than once per block. Not sure if that's desirable:

(func
  (local $a i32) ;; $a => index 0
  (i32.const 111)
  (i32.const 222)
  let' (local $b i32) (local $c i32) ;; $b => index 0, $c => 1, $a => 2
  (i32.const 333)
  (i32.const 444)
  let' (local $d i32) (local $e i32) ;; $d => index 0, $e => 1, $b => 2, $c => 3, $a => 4
  ...
)

@RossTate
Copy link

RossTate commented Apr 8, 2020

(If we kept let with a matching end, I think we'd need to keep the blocktype to indicate the expected signature for the entire block.)

In our work on reducing program size and improving type-checking performance, we found that type annotations were mostly only useful at merge points of control flow, and otherwise we found they would typically both increase program size (because of the annotation) and make type-checking slower (because you had to check that the incoming type matched the annotation, and then that the annotation was sufficient for the subsequent instruction, rather than just directly checking that the unique incoming type was sufficient for the subsequent instruction).

So if you can't branch to a let, how would it help to have a block type?

As a separate note, along this line, if we had a notion of immutable local variables, then there actually would be no benefit for let-immutable to specify the type of the local variable. Instead, that type would simply be derived from the type of the stack slot corresponding to the variable. Not suggesting we add that, just giving an example to further illustrate where type annotations are and are not useful. (For mutable local variables, you need the type annotation because later local.sets might set the variable to a value of a supertype of what the variable was initialized to—a supertype that is still sufficient enough to type-check the program.)

@binji
Copy link
Member

binji commented Apr 8, 2020

Oh, I see what you mean. It would be the first case where we have an end instruction that isn't a control merge point, which is a bit strange. It almost makes me think we'd want a separate instruction (or at least text form) to represent the end in that case to make it clearer (tel a la bash? 😄). In any case, you're right, there's no value in having a blocktype at that point.

@RossTate
Copy link

RossTate commented Apr 9, 2020

(Heads up, I was about to make this comment lightly, then decided to do some digging and just discovered that it's relevant to multi-value. However, it is relevant in a backwards-compatible manner. So there is no urgency here, just something to ponder.)

Well, actually, if ever y'all want to do dig into reducing type annotations and speeding up type-checking, the input types on block and if are similarly useless hindrances, as are the output types on loop—you only branch to the end of block and if and to the start of loop.

@rossberg
Copy link
Member

rossberg commented Apr 20, 2020

@lukewagner, in the presence of multi values,

let bt (local t)* instr* end

actually has type:

[t1* t*] -> [t2*] 

We could do it the other way round, but that's much less useful, because you could no longer bind locals "locally" without first emptying the operand stack.

@RossTate, @binji, that ship has sailed. We had an extensive discussion in the early days whether only block and loop should bind labels or all block-like instructions. Personally, I would have preferred the former for semantic simplicity/orthogonality, but the CG went with the latter. The main motivation was that it allows decoders to look up labels simply by indexing into the control stack.

OTOH, I don't think let without end works well either, because then the scope of the variables would be terminated by an unrelated construct, which is rather messy, and more tricky to handle in program transformations.

@RossTate
Copy link

RossTate commented Apr 20, 2020

The main motivation was that it allows decoders to look up labels simply by indexing into the control stack.

Can you clarify this, please? I appreciate the historical insight. It sounds like it's an argument for having the control structure correspond with the grammatical structure. That's something I've also come to appreciate because I found it really useful for stack primitives. But I don't get what that has to do with removing the type annotations that are unrelated to control (or maybe you thought I was suggesting removing end instead)?

@rossberg
Copy link
Member

I was referring to the idea of let not binding a label. A typical decoder/validator pushes all block-like instructions onto the same control stack to handle both branches and matching end. See e.g. the pseudo code for the validator in the spec appendix, and how it handles branch instructions by indexing that stack. Folks deemed the simplicity and efficiency of this important.

@RossTate
Copy link

Thanks for the link! There's no need to sacrifice simplicity, though. Do the following:

First, change control frames to reflect that they may or may not associate a control label and that the end may or may not need to match that label:

type ctrl_frame = {
  opcode : opcode
  start_types : list(val_type)
  opt_label_types : option(list(val_type))
  restrict_out : bool // check that outgoing operands match label type?
  height : nat
  unreachable : bool
}

and change push_ctrl accordingly, change label_types to err if opt_label_types is none, and change pop_ctrl to check the operands on the stack against the label type only if restict_out is true and otherwise leave the operands there. Plus add the following helper:

pop_opds(num : int) : list(val_type) = ... // pop off and return num types from stack

Then do the following changes in validate (the other cases remain as is):

case (block in_size t2*) // just need to know size of t1*
  push_ctrl(block, pop_opds(in_size), Some [t2*], true)
case (loop t1*)
  pop_opds([t1*])
  push_ctrl(loop, [t1*], None, false)
case (if in_size t2*) // just need to know size of t1*
  pop_opd(I32)
  push_ctrl(if, pop_opds(in_size), Some [t2*], true)
case (end)
  let frame = pop_ctrl()
  if (frame.restrict_out)
    push_opds(label_types(frame))
case (else)
  let frame = pop_ctrl()
  error_if(frame.opcode =/= if)
  push_ctrl(else, frame.start_types, frame.opt_label_types, true)

This pretty much changes block and if to simply use the incoming types on the stack, and changes loop to simply use the outgoing types on the stack. So the type annotations aren't making anything substantially simpler; they're just slowing type-checking down by imposing subtyping checks.

@rossberg
Copy link
Member

The actual point I was getting at was the case for br and friends. If only some control frames bound a label, then lookup of label indices would no longer be constant time or require another stack. (Unless we change the indexing scheme for labels to allow holes, which is another option we discussed and rejected early on.)

I agree that we could probably have used more reduced type annotations when we designed blocks. But again, this ship has sailed long ago.

@RossTate
Copy link

The code for br and friends is exactly the same as before. It is still constant time without requiring another stack. label_types will just report an error if the indexed control frame has no associated label.

@rossberg
Copy link
Member

As I mentioned in the parens, that would imply that the label index space has holes, which was rejected as a solution.

@RossTate
Copy link

Oops, missed that! Sorry. Why was that rejected? It seems y'all have painted yourself into a corner for reasons I don't understand. Note that this problem is only going to get worse and worse. For example, now the various stack primitives in WebAssembly/exception-handling#105 like mark-within will need type annotations, unnecessarily making the binary size substantially larger.

@RossTate
Copy link

Also, it really doesn't complicate that code much to have a separate stack for just labels. The code for pop_ctrl gets slightly more complicated, but the rest of the code stays at pretty much the same complexity.

@rossberg
Copy link
Member

I don't remember in detail, but I for one would find that too hacky.

But I don't think it matters either way. Most block instructions (except block itself) are highly unlikely to occur frequent enough that the difference ever matters in practice.

@RossTate
Copy link

RossTate commented Apr 21, 2020

The natural construct for C++ destructors does not need any type annotations, and C++ destructors seem common enough that it seems premature to ignore them. Similarly, if we use stack marks to enable self-managed memory or self-managed stack tracing, those will be extremely frequent in some modules. None of these need type annotations.

A nice design would be to have a separate stack in the validator code for each stack-based indexing mode in WebAssembly. There would be a stack for labels (that let would not touch). There would be a stack for local variables (which let would operate on, but block and loop would ignore). There would also be a master stack tracking the nesting of constructs. end then just pops things off the appropriate specialized stack(s) according to the construct that ended.

(Sorry, I realized you were probably commenting on the hackiness of holes in the label index. I prefer this multi-stack solution anyways.)

@lukewagner
Copy link
Member

@rossberg Ah, you might want to update the let section in the Overview.md then; it says [t* t1*] -> [t2*]. I hadn't actually considered which order was more useful (none of our examples mess with (param)s), but I think you're right that [t1* t*] -> [t2*] is.

So given that, I agree it doesn't make sense to put (local)s before the blocktype. But it still seems syntactically backwards to write let (result i32) (local i32) ... end. What about keeping the abstract syntax let bt vec(valtype) instr* end but tweaking text format parse rules for let so that you could stick the (local)s between the (param)s and (result)s?

Considering that the instruction:
let (param i32) (local f64) (result i32) ... end
has type
[i32 f64] -> [i32]
this tweak would actually let the text format more-directly reflect the operand stack type.

@rossberg
Copy link
Member

@lukewagner, ah, thanks, fixed Overview.

As for putting locals between params and results: I agree that the relation between type annotation and actual instruction type is less direct for let, but paving over that by bending syntax may be worse. Syntactically, the combined param/result list is a "typeuse", e.g., sugar for a single (type $t). It would be inconsistent with other occurrences to special-case that here and splice in a different element into the inlined form.

Also, the notational symmetry between

(func (param t) (result t)
  (local t)
)

and

(let (param t) (result t)
  (local $x t)
)

seems desirable and the latter more what you want to write in practice, especially if we add folded initialiser sugar like @binji mentioned above:

(let (param t) (result t)
  (local $x t (call $f (i32.const 4)))
)

@lukewagner
Copy link
Member

Ah right, I forgot about the folded form. That makes sense.

@tlively
Copy link
Member

tlively commented Jul 22, 2022

We've decided to replace let over in #44, so we should close this issue. (I don't seem to have permissions to do so. @rossberg, could you give me write permissions to this repo?)

@rossberg
Copy link
Member

@tlively, done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants