Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary encoding of 'let' (discussion topic) #7

Closed
lars-t-hansen opened this issue May 29, 2019 · 11 comments
Closed

Binary encoding of 'let' (discussion topic) #7

lars-t-hansen opened this issue May 29, 2019 · 11 comments

Comments

@lars-t-hansen
Copy link
Contributor

One thing that I'd like to be sure of is that a streaming compiler can generate code for allocating the locals area in the stack frame at once when it first encounters a function and not have to scan the function code to find out how many locals it might need. So ideally the information about let-bound locals would be present in the function header. For example, the function could in principle have a fixed set of locals; some locals can then become activated and deactivated as control enter various regions of the code.

@rossberg
Copy link
Member

Hm, I'm thinking of let mostly as naming slots on the operand stack. Wasm doesn't declare the operand stack, so what is different here?

@lars-t-hansen
Copy link
Contributor Author

In that case, local.get either means "access a predeclared local slot or some operand stack slot that I have allowed you to reference (by means of labeling it with let) within this code section". At the binary level, where we can't fake the label we use for local.get, how were you thinking that the label in the local.get would be encoded?

(From an engineering perspective in our streaming compiler, conflating locals and stack slots is a mess, since they are addressed differently; presumably this can be fixed, but it would be nice if there was a good reason for doing so.)

@titzer
Copy link
Contributor

titzer commented May 31, 2019

Is let even necessary if we have dup/pick?

@rossberg
Copy link
Member

rossberg commented Jun 4, 2019

@lars-t-hansen, I was assuming relative addressing similar to labels. That is, 0 is the first variable of the inner-most let, N for the first of the second inner-most one, assuming the first defined N variables, and so on. The actual locals are like an outer-most let. This has the nice property that it is composable in a way that it enables trivial Wasm-level function inlining: you can directly replace a call instruction with the function's body + a suitable let around it without having to change the body.

I first thought of having a separate let.get instruction, but couldn't see any particular advantage in that. And the above property seems nice. Would that be hard to handle?

@titzer, pick alone does not subsume let, which can express operand reordering and removing operands below the top of the stack. You'd also need swap for that.

I think both complement each other: pick works well for a one-off access into the stack. But for accessing the same slot multiple times, or multiple related accesses, or operand reordering, stack twiddling with pick and swap is more difficult to write, more difficult to read and debug, and I suspect often less compact.

@skuzmich
Copy link

skuzmich commented Nov 6, 2019

Could we allow plain locals without default value if we could prove that they can't be read while uninitialized? Something like a validation rule that each local.get must have a local.set or local.tee that dominates it.

@rossberg
Copy link
Member

rossberg commented Nov 6, 2019

@skuzmich, that would require a form of control-flow analysis, which is significantly more complexity than we have been willing to accept for validation so far.

@skuzmich
Copy link

skuzmich commented Nov 6, 2019

@rossberg I'm curious, wouldn't a simple control flow analysis that covers the use cases of let be of the same complexity as let validation?

It could be phrased like:

  • local.set and local.tee of a non-default local variable form an implicit let block until the end of current block.
  • Nested sets and tees of the same local are ignored since variable is already initialized.

I can see some encoding density benefits of this approach:

  • only localidx is encoded instead of both blocktype and end
  • variable can be reused in multiple places
  • all local.tee benefits, unless we introduce "tee" variant of let

Additionally, in the future we could make this analysis smarter if needed.

@rossberg
Copy link
Member

rossberg commented Nov 6, 2019

Well, okay, structurally a version so heavily restricted may be similar, though it still requires maintaining additional context information. But of course, such a heavy limitation feels very arbitrary and would immediately put us on a slippery slope to something more general and complicated.

And it would not have the same expressive power: it does not subsume pick and swap like stack twiddling operations, nor does it have the composability properties of a local naming mechanism.

@skuzmich
Copy link

skuzmich commented Nov 7, 2019

Thank you for the explanation.

Would let be a primary way to declare local variables without default value?
I'm worried, that let block declaration with blocktype and end would bring a code size overhead compared to declaring regular locals.

We could still use block-local variable indices while reusing scopes of existing blocks.

While composabilty is a nice property, it wouldn't be hard for compilers to work around the lack of it.
Do you have in mind other use cases of tying a block creation to a local variable declaration?


Kotlin encourages non-nullable references, and thus has a lot of variables that would not have default values in Wasm. Some stats gathered from standard library:

~95% of local variables are immediately initialized
~50% of those hold objects references
~75% of those have non-nullable types

Proportions in the source code are similar to those of generated code. Most of the nullable variables are there due to generic algorithms. I would imagine them to not be as prevalent in a typical application code.

@rossberg
Copy link
Member

Yes, I imagine many languages (and compilers) would want to use safe refs in many or most places.

Note that a single let can bind arbitrarily many locals, so the difference in code size may be fairly small. Also, we happily generate N blocks for every case or switch, so I wonder how relevant the overhead really is in comparison.

I wouldn't say that we can never add flow-based validation, but it would definitely be preferable to avoid that can of worms if we can. Since the only advantage seems to be code size, it would probably be premature optimisation to add it to the design before more concrete measurements across a multitude of use cases.

@rossberg
Copy link
Member

Let was removed, so this is obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants