-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate Expression/Statement distinction in AST? #223
Comments
While I haven't worked this out in detail, I know from designing Enhanced C# (unfinished) and LES that languages in the Algol tradition (C/C++/Java etc) can be viewed as purely expression-based languages like LISP or Haskell. As I was saying elsewhere, I think all executable code (if not more of the system) should be expression-based.
Merging statements and expressions into a single concept called 'expressions' seems to produce a strict superset of the current plan. I think it would be a simpler and more capable model than what is currently proposed, and it won't substantially alter code generation for the Algol branch of languages (or, if it turns out that it does, perhaps the language could be tweaked until it doesn't) The translation to asm.js could be more complicated, admittedly, but remember that this is a short-term problem--we won't use a polyfill forever. ( |
Yes. It doesn't seem to provide much at the moment. |
I'd like to be able to write code like this:
or even:
(The latter, while a bit ugly, is shorter than the typical way you'd write that code, so it agrees with the design goal of "size efficiency") Things like this are possible in LISP and Haskell (except that Haskell doesn't allow mutable variables) and I'm sure I've seen it in at least one other (otherwise traditional) imperative language. |
@qwertie I think this is a pretty cool idea, but I'm concerned about polyfill to js and added complexity to VMs (it seems like you would have to build up a flowgraph for type checking). |
I agree that such syntax is nice if someone is writing wasm files manually. I don't think that's a goal of WebAssembly, though. I nonetheless think that there could be advantages to what you suggest:
Before we discuss this further, can you please confirm that you're a member of the W3C community group? "@qwertie" isn't quite a name I can find there! The WebAssembly effort really needs to operate under the W3C's umbrella, and significant contributions require membership. |
FWIW, @sunfishcode have also been talking about having all statements be expressions for the purpose of avoiding |
@jfbastien Yes, I'm a member of the W3C group (real name: David Piepgrass) IMO expression-based languages are 'the future', because even if one doesn't like them stylistically, they have other advantages. For example, languages may have higher-level constructs that are naturally lowered into "an expression-with-a-loop-inside": e.g. list comprehensions, function inlining. |
@lukewagner that sounds interesting, but I don't quite see it, can you please elaborate? |
It sounds like we agree it would be nice to have, but only if data shows that it'll be useful. Style is great, but we're not trying to optimize for that. |
The only reason I suggested it is that it reduces the tree depth of a typical AST somewhat, which could improve compression. I've yet to test this theory out in depth other than doing the change in one specific case and seeing my test files get considerably smaller. (from |
I have given this some more thought and I realize that I way overestimated the complexity for WebAssembly VMs, and from that perspective I'm fine with this. And I can believe that there are some modest wins in AST size (even considering macro layer), so that does give some motivation. However, I still can't imagine a good way to translate this to js. The only thing I can think of which works in the general case is outlining the flow expressions into function expressions/calls, but that is obviously too costly. I'm sure we could hoist them into separate statements in many cases (and assign the result to a new local for value producing flow expressions), but that would require the polyfill to do dependence analysis... Unless someone has a better idea for polyfill, I think we should leave this as a possible future feature for a time when the polyfill isn't as important. |
My implication here is that the distinction doesn't entirely go away, but it isn't encoded into the AST. For example, if you're in a block and find an expression, it's obviously a free-standing statement. Much like how we've been leaning towards using context to infer things about operations (we know this expression is an integral one because it's enclosed inside an int setlocal operation), we can often make robust inferences about whether something is an expression or a statement in a given context. There are only a few cases where it really matters, though. BlockStatement -> Block and ExpressionStatement -> Expression can both be pretty profound in terms of making the AST shallower and slightly smaller to encode. Best-case, each one has a type indicator (of some kind) for the 'statement' (block, expression) and then a type indicator for the inner expression in the case of the ExpressionStatement. It's possible there are other ways to optimize out these costs, of course. This might be something that's best tackled at the encoding layer much like macros. |
I'm all for making the AST simpler and smaller, this had advantages in several areas as noted. Is the only real blocker finding an efficient transpile target for the polyfill? I think with @kg's idea most expressions used as statements will be obvious and won't need wrapping in functions. I would love to see some concrete examples of the AST in question so I could mull it over. |
@kg I definitely think for non-value producing expressions (e.g. set_local, or void function calls) we can remove the need for My concern is with what @qwertie is talking about, where we do completely remove the distinction between them. |
Right now, the only way a local value can be live across a basic block boundary (ignoring control flow embedded in expressions) is to live in a local variable. Making statements be expressions would mean that some values could flow out of blocks without having to go through local variables, which would theoretically reduce the size due to get_local/set_local that @lukewagner mentioned. An interesting twist is that we could extend this from allowing a single value to flow out of a block to allowing multiple values to flow out of a block, by allowing expressions to return multiple values (something which would have other uses besides). For sanity, we don't want to deal with tuples floating through the language in general, so we'd probably require multiple return values to be immediately unpacked and named, so there'd still be some set_local/get_local overhead, but it'd be reduced. Either way, it would be interesting to see how much this would help compression. One question I have though is: do we really want expressions with embedded control flow at all? The current design mentions comma and short-circuit ?: but do we even want those? Embeddable control flow makes it somewhat more complex to construct a CFG from the AST, which could complicate some baseline JIT strategies. If the compression wins aren't great, this may be worth considering. |
The question of whether we want embedded control flow at all is a very important one. I'll have to think about it in depth, but I personally don't like the comma operator or short-circuit |
On Thu, Jun 25, 2015 at 2:45 AM, Dan Gohman notifications@github.com
For reference, both ?: and comma in the v8-native-prototype decoder were an
|
I guess that's true, since you form the CFG while decoding. I was thinking about forming the CFG based on an in-memory AST, but that's not the common case here for baseline JITs. |
I strongly second eliminating statements. This would be particularly useful for translating higher-level languages into wasm in the future, which will often need to create control structure inside expression contexts. Even though supporting HL languages is not an initial goal, the statement distinction is a fundamental design decision that cannot easily be reverted later. As for the JS polyfill, there is a proposal for adding "do-expressions" to the next version of JavaScript (which I am a co-champion of). The idea is to allow "do { <statement-list> }" as an expression. Once that arrives (hopefully next year), translation to JS would become straightforward. |
For the record, Ben asked me whether eliminating statements would necessitate an explicit operator for discarding values. Replicating my reply here: I think that's not needed. Basically, you can always tell from the syntactic context where a value is discarded (off-hand: left of a comma/semicolon, at the end of a loop body, and recursively in the arm of any branch construct whose value is discarded). Another way to spec this easily is via the type system. Essentially, some expressions produce type void (e.g. loops), some expressions require their subexpressions to have type void (e.g. the left operand of a semicolon), and there is an implicit conversion from any type to void, whose operational interpretation is discarding the value. This nicely generalises the situation of function calls that return void being expressions already. |
Eventually people will compile different languages to wasm. There is hardly much benefit to keep distinction between statements and expressions even for C/C++. Such distinction will very likely become a real disadvantage for handling other languages. Now would be the best time to eliminate the distinction entirely. |
I don't think the polyfill can use any new, not implemented, js features (including modules for that matter). That would remove all utility from it. |
It sounds like there are a few good reasons to want to eliminate the expression/statement distinction. However, we do need a good solution for polyfilling to current JS, as @MikeHolman said, so I think it would be useful to see an experiment showing that this change would not prevent that. Modifying @lukewagner's prototype might be one way. |
I'm not exactly sure how relevant this is given that WebAssembly doesn't really have subtyping. One useful representation for the type of jump expressions such as break and return is a "bottom" type. (A type that is a subtype of all other types and has no instances.) This would also be useful when exceptions are implemented. |
@kripken If we assume that the polyfill can inject temporary locals (something it already does to handle one unfortunate corner case with mixed f32/f64 typed array stores), I don't see a fundamental problem with this. The one challenge is keeping everything single-pass: if you've started emitting an expression tree and encounter a loop, it's too late. For this, the separate issue I just mentioned, and a few other twiddly details that I current cheat on in the polyfill prototype, I'm thinking we'll really end up wanting a polyfill-optimized specific layer (non-standard, so we can do whatever for short-term polyfill considerations, and then change before/if standardizing). Also, while temporary variables could in theory result in much worse code, the specific layer could be designed (in cooperation with the wasm-compiler) to effectively never need them. |
Sounds plausible, but I think the real question is how much overhead that would add, compared to the much more straightforward situation we have investigated so far. That's why I'd be curious to see experiments here. |
I was initially against statements == expressions due to implementation concerns, but in the intervening time have implemented blocks, ifs, loops, break, and continue in terms of expressions, finding the implementation burden to be NBD. There are nice advantages to having statements == expressions, like getting rid of |
A lot of work has been done since this issue was opened. The current |
I also support stmts=exprs. There are a couple of viable polyfill strategies. If we integrate the polyfill with the specific layer as suggested above, then the specific layer binary stream could contain a marker in front of expression trees that had problematic stmt nesting (from an asm.js pov) so that the polyfill could conservatively emit that expression tree in three address code. |
I'm also in favour (surprise!). I can see several ways of polyfilling it, starting from dumb solutions like creating auxiliary functions to ones introducing minimal numbers of auxiliary variables at the statement/expression boundary. |
This would remove the need for a final |
Early returns can be done by making the body of the function a block and On Fri, Oct 23, 2015 at 2:35 PM, Alon Zakai notifications@github.com
|
@kripken, and what titzer describes is how the spec desugars |
A break could achieve the same result, I agree. But then a concern is that the break would need to specify the nesting level. That adds some entropy that I would guess compresses more poorly. I don't see a way to do an early return that doesn't have downsides compared to Overall I don't think we have clarity on whether this improves or degrades code size? I think we've seen arguments both ways: having statements == expressions lets you return a value in some cases "for free", avoiding a temp var, but for the most part statements like
where bar returns a value, used in other places calling it, but that we want to ignore here. Overall, my current understanding is that
Seems like a mix, so far? |
On Fri, Oct 23, 2015 at 5:20 PM, Alon Zakai notifications@github.com
|
I'm mostly fine with this, except that "we don't know" != "neutral". Optimally we could take the time to investigate this carefully. In any case, I do think that this - and possibly other changes, as you said, but this seems the biggest to me - makes the polyfill kind of pointless, in the sense that it's going to be less efficient to (a) make 1 wasm build and convert that to JS on the client when necessary, vs. (b) make 1 wasm build and 1 JS build. I suspect people will end up doing (b). Does that worry anyone? |
It does worry me. I don't really see any benefit other than "it feels good", and the potential impact on the polyfill is large. The polyfill is not a hugely important scenario for me, but I think it is useful for the health of the project. And it seems like we would be obliterating it in order to add a superfluous feature. |
I assume we actually do want to keep I don't agree that this makes the polyfill pointless. I also don't think that it makes it less efficient. It does make it slightly more interesting, but not in infeasible ways. |
Don't assume that all compilers will want to go through a complex pipeline like LLVM. In fact, I'm pretty sure the vast majority won't. Somewhere else I was pointing to the long list of compilers to JS -- if only a fraction of those ever get ported to Wasm (and I dearly hope so!) then we already have plenty of compilers that will benefit from expressions, some significantly. |
We will try to answer the polyfill question definitively in the next two months, as @kg will be working on it. |
@rossberg-chromium: yes, I completely agree, making it easier for all those languages is a very good thing. I worry about compromises towards that goal, but probably too much ;) Regarding polyfill efficiency: If the polyfill needs to generate out of line functions as @rossberg-chromium suggested, or perform a more complex analysis to place auxiliary variables, or use three address code based on markers as @lukewagner suggested, then all of those can work, but they will necessarily reduce efficiency of either polyfill time or throughput of the generated code. Possibly substantially, in large codebases. For example, if there is a significant increase in the number of variables in a large function (e.g. three address code temporaries), then experience has shown that that can make a massive impact on performance. Emscripten works extremely hard to minimize the total number of variables, spending much of its compile time on it, because we know it is worth it. In fact, in large-enough projects, often unoptimized or partially optimized builds (having the # of variables not fully reduced) are completely unusable. The complexity of such variable reduction is similar to that of register allocation; the problems are closely related. In small codebases, this won't matter. But in large ones, I strongly suspect it will. I'm not saying this is in itself a big enough issue that we should consider avoiding statements == expressions. I just want to point out that we may be moving to a model where we can't polyfill well enough on the client, compared to people making two builds. Two builds will just be better: no polyfill time, fully optimized code. I hope this doesn't sound too pessimistic. I'm glad to hear of plans to work on the polyfill, and would like to help out. |
Haven't had time to catch up on this thread, but I wanted to point out that the entropy concerns re: break vs return are pretty minor. If it turns out to be an issue we can address it, but I really don't think it will (especially since we can now omit many returns entirely) |
If polyfilling is slow in certain situations (like when there is a loop inside an "if" condition), perhaps the code could be restructured in advance with a Wasm=>Wasm transform, rather than asking the polyfill do it. Presumably this will increase the code size slightly and decrease native Wasm performance slightly. Developers can stop using the restructuring option once most browsers have native Wasm. |
It occurred to me that there are alternatives to "single build of wasm + polyfill to JS" and "2 builds, wasm and JS". For example, a single build that is not wasm, but something else that is easy to convert on the client to either wasm or asm.js. For example, the single build could be @lukewagner's asm.js polyfill prototype, or it could just be asm.js itself. This is practical because while we are making Clearly this is not optimal, and has a bunch of obvious downsides. But it seems worth exploring, even if just as a fallback plan. And actually such a converter could have other benefits, like offline upgrading of legacy asm.js builds to wasm. I'll investigate. @qwertie: Interesting idea! |
The reasoning that convinces me in favor of stmts=exprs is that comma and conditional (which we know we want and are useful for building fatter exprs) are just |
This has been implemented for some time now. |
No description provided.
The text was updated successfully, but these errors were encountered: