Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement (sort of) GOTO and COMPUTED GOTO in asm.js #80

Open
mnieper opened this Issue Jan 4, 2014 · 17 comments

Comments

Projects
None yet
5 participants
@mnieper
Copy link

mnieper commented Jan 4, 2014

As ECMAScript doesn’t have a goto statement, asm.js doesn’t have either, of course. However, asm.js is intended to be a compilation target and as such a goto or even a computed goto as in gcc would be quite handy (think of an interpreter loop or a trampoline for languages with guaranteed tail call elimination).

The way to (at least partially) emulate this is of course a huge switch statement wrapped inside a forever running while loop:

loop: while (1) { switch (L) { case 0: ... ... } }

where the variable L, holds the current label. For a simple goto, the corresponding asm.js code would be L = [constant signed]; continue loop; for a computed goto, the code would be L = [computed signed] ; continue loop.

A general compilation of this while/switch construct into machine language wouldn’t yield very efficient code (I guess, this is why Emscripten needs all this relooping). Therefore, it would be nice if the asm.js spec (or OdinMonkey) defined a convention of coding such a while/switch construct such that OdinMonkey (or any other ams.js AOT compiler) is able to recognize such a structure and compiles the goto workarounds into direct and indirect jumps in machine code.

This would bring asm.js nearer to real machine code (which is a goal of asm.js as I understand it) while keeping the compatibility with ECMAScript engines without asm.js. As a target for Emscripten, asm.js probably doesn't need optimized gotos because of Emscripten's relooper, but a general compiler with an asm.js backend won't have a relooper or the resulting code may simply be irreducible by the relooper.

I am not sure whether this proposal belongs to the asm.js spec (it would be a statement about complexity in running time and occupied space of the while/switch construct) or just to an implementation like OdinMonkey. In the latter case, however, all writers of asm.js-compilers (like OdinMonkey) will have to agree on a certain programming idiom that make these while/switch constructs recognizable (much in the manner of the 'use asm' string itself).

In any case, I am willing to help implementing such an optimization.

@kripken

This comment has been minimized.

Copy link
Collaborator

kripken commented Jan 5, 2014

The reason I was opposed to this in the past was that it is something that can be specially optimized very well, but runs slowly in current JS engines. asm.js tried to maintain or improve speed on current JS engines, while also in additional being specially optimizable. The issue is that it would be wrong to suggest people use a tool that is better on one browser but slower on another - that fragments the web and feels unfair to the slower browsers. And specifically here, we already had relooped-type code in the wild, generated by emscripten and mandreel, so moving to a switch-loop would in fact be a regression for all browsers not optimizing it specifically.

The relooper is open source and very modular, btw, so any compiler can use it (and from multiple languages as well, currently C++ and JS). So I think there is not much of a problem in practice, and relooped code is also smaller to boot.

There are some interpreter loops and things like indirectbr that are hard to reloop, that's true. Have you hit a specific testcase with a problem?

@mnieper

This comment has been minimized.

Copy link
Author

mnieper commented Jan 5, 2014

I understand your point that asm.js should not suggest a programming style that won't work well on browsers without special support. On the other hand, it is intended that browsers knowing about asm.js are able to run asm.js better than browsers without any special support (otherwise, the whole point of asm.js would be moot, wouldn't it?).

The while/switch idiom I am proposing respects these two points: Consider a long switch statement with N cases coded in asm.js. An ECMAScript engine (with or without asm.js support) could do three things when running this switch: 1) Check cases one after the other. This yields a running time of O(N). 2) Do a binary search. This gives O(log N). 3) Compile the switch statement into a jump table. This yields a running time of O(1).

How should an asm.js-aware engine AOT-compile such a switch statement? It could very well choose 3) (if sensible by the length of the switch statement and the case numbers). I don't think it should generate suboptimal code by choosing case 1) (or even 2)) just because there might be unoptimized non-asm.js-aware ECMAScript engines out there that choose case 1) and not 3) (which has nothing to do with AOT or JIT-compilation).

Given that an asm.js-engine chooses 3), any extra special optimization for the while/switch idiom I am proposing would be simply a speed gain from O(1) to O(1), that is a constant. But this is what asm.js is about: allowing to AOT compile code to gain a speed increase by some factor (without changing the runtime complexity). And other engines (even without special asm.js support like V8) have regularly followed to improve their backends to be able to compete with OdinMonkey, when it receives another optimization.

I agree that relooping (while probably unaesthetic from a pure mathematical point of view) is very clever technique that has proved to be enough for code for ALGOL-like languages (e.g. C++). In order to be able to effectively compile a language like Scheme with call/cc and guaranteed tail call elimination to asm.js, one has to do jumping and branching instead of calling functions as long as asm.js does not guarantee TCE, so in first order the output of such a compiler will be a huge while/switch construct. I don't see how this can be effectively relooped.

P.S.: I am not suggesting to change Emscripten to not use the relooper but to rely on an optimized while/switch construct; I am suggesting to make asm.js a suitable target for languages like Scheme.

@kripken

This comment has been minimized.

Copy link
Collaborator

kripken commented Jan 5, 2014

I see your point that while-switch code would be the same speed, but could be much faster with specific optimization. And that makes sense. However, why couldn't scheme use the relooper? If there is no reason it cannot, then the choice is between scheme doing relooping, or not, and then relooping would be much faster on current JS engines.

Is there a fundamental reason scheme couldn't use relooping?

@mnieper

This comment has been minimized.

Copy link
Author

mnieper commented Jan 5, 2014

Consider the following example program in JS:

function g(x, y, z, w) {
  w(x + y, z);
}
function f1(x, y) {
  if (x >= 100000) {
    y("finished");
  }
  g(x, 1, y, f2);
}
function f2(x, y) {
  g(x, 1, y, f1);
}
f1(0, function(v) { print(v); })

If we want guaranteed tail call elimination (like we have to in Scheme), we have to recode the program as follows:

var pc = 1; 
var x = 0;
var y = 3;
loop:
while (1) {
  switch (pc) (
  case 1: // f1
    if (x >= 100000) {
      x = "finished";
      pc = 4;
      continue loop;
    } 
    x = x;
    z = y;
    y = 1;
    w = 2;
    pc = 3;
    continue loop;
  case 2: // f2
    x = x;
    z = y;
    y = 1;
    w = 1;
    pc = 3;
    continue loop;
  case 3: // g
    x = x + y;
    y = z;
    pc = w;
    continue loop;    
  case 4: // exit
    print(x); 
    break loop;
  }
}

How can such a construct be relooped? Three gotos (in case 1, case 2 and case 4) are direct; the goto in case 3 is a computed one.

@kripken

This comment has been minimized.

Copy link
Collaborator

kripken commented Jan 5, 2014

I see, yes, this is hard to reloop. Given that input, the relooper would in fact generate a while-switch for blocks 1-3, and move block 4 to the outside, which is not much different.

That assumes btw that the entire program is compiled to a single huge function, not separate individual functions? That's tricky for other reasons in JS.

@mnieper

This comment has been minimized.

Copy link
Author

mnieper commented Jan 5, 2014

Yes, I am assuming that the entire program is compiled to a single huge function. An alternative would be to code each case block in a separate function and to let each function return a function pointer, e.g.:

f = start();
while (1) {
  f = f_table[f()];
}

(Exiting the whole construct could be simulated with a throw.) However, experiments showed that the performance of that is inferior to that of the switch construct. And like the switch construct, it also generates quite a lot of overhead compared to direct-style programming, but looks harder to optimize by an asm.js-engine.

Of course, the proper way out of the dilemma is that asm.js gets guaranteed tail call elimination. In the long run, it will have to as ECMAScript 6 will most likely demand it, but I have no idea how hard it would be to implement it right now in OdinMonkey. In any case, it would become an even better target for compiler writers.

@kripken

This comment has been minimized.

Copy link
Collaborator

kripken commented Jan 5, 2014

Oh right, proper tail calls are coming in ES6. Overall I think the way forward here should be

  1. Optimize the while-switch construct in JS backends (IonMonkey, v8, DFG, etc.), that is, not in asm.js. They already emit jump tables in some cases, so this would be an incremental improvement on that.
  2. Figure out how to eventually do proper tail calls in asm.js once JS gets them through ES6. (Also in particular, figure out how to compile those from LLVM in compilers like emscripten.)
@mnieper

This comment has been minimized.

Copy link
Author

mnieper commented Jan 6, 2014

Could you briefly explain how to implement 1. or 2. in principle? (Or point to a hacker's guide for IonMonkey/OdinMonkey?) So far I have been thinking that OdinMonkey compiles every asm.js module to machine code. How would an optimized while-switch construct in the ordinary JS backend (IonMonkey) help here?

@kripken

This comment has been minimized.

Copy link
Collaborator

kripken commented Jan 6, 2014

@andhow can answer that much better than me. But overall, OdinMonkey compiles asm.js into the JS engine's internal IR, not machine code directly. Then the JS engine's IR optimizer is run, which does things like licm, register allocation, and all the other standard compiler optimizations, as well as various JS-specific stuff. So an extra IR optimization there could handle such switches, is what I am thinking.

@ghost

This comment has been minimized.

Copy link

ghost commented Jan 6, 2014

For (1), I think you underestimate the difficulty of this optimization; it's not simple like jump tables. The while/switch optimization significantly affects phi node placement; effectively you lose the structured control flow which makes phi placement so simple in current OdinMonkey/IonBuilder. Doing the while/switch optimization at a late IR stage seems possible, but expensive as you must effectively re-do phi placement for the entire while/switch body using the more expensive and general phi-placement algorithm. Thus, for efficiency and simplicity (re-wiring SSA/CFGs is also very error prone, ask Niko) compile time, it seems like the right place to do all this is in OdinMonkey/IonBuilder, when generating the MIR and placing phi nodes in the first place. Also, given that the optimization would only be applied under very particular circumstances (asm.js or not), it seems appropriate to encode the target pattern in asm.js.

In addition to mnieper's point as to why relooping was necessary (irreducible control flow), the other main reason I've heard from potential asm.js users is JITs written in JS (requiring any such JIT to either generate structured control flow or package the relooper seems unfortunate). This use case seems particular exciting for efficient combinator libraries and EDSLs.

I will say, though, that none of this yet feels like priority 1. However, I've added it to the odinmonkey list of priority 2 projects [1]. On the Emscripten side this seems like no work at all (as mnieper already said, you'd still try to reloop since it's more efficient on all browsers and it's more compact). If OdinMonkey makes the optimization, it only seems fair to include the pattern in asm.js with appropriate (non-normative) "intended implementation" notes.

[1] https://wiki.mozilla.org/Javascript:SpiderMonkey:OdinMonkey#Possible_asm.js_extensions_that_don.27t_extend_JS

@dcecile

This comment has been minimized.

Copy link

dcecile commented Sep 29, 2014

It may be worth noting that once tail-call optimization works with regular JS, this can be leveraged by asm.js users as a legitimate way to compile GOTOs. Basically, each jump destinations get converted to a function, and the jump itself is a tail call. (Guy Steele's "Lambda: the Ultimate GOTO" has a more in-depth explanation of this transformation.)

If this would be the official asm.js solution, asm.js implementations might not even need any special logic for this pattern; code that looks like this would just be treated as a plain old tail calls.

@mnieper

This comment has been minimized.

Copy link
Author

mnieper commented Sep 29, 2014

Before this technique could be employed in the wild, however, one has to
make sure that all relevant Javascript environments support proper TCO. It
won't be enough if Mozilla's OdinMonkey supports TCO as the program would
yield stack overflows in non-ES6-compliant Javascript environments. Given
that implementing proper TCO means most likely to change some inner
workings of the Javascript environments, I guess that this will be one of
the features of ES6 that will be adopted last.

In the (very) long run, of course, computed GOTOs are less important, of
course. See also: #88

2014-09-29 5:16 GMT+02:00 Dan Cecile notifications@github.com:

It may be worth noting that once tail-call optimization works with regular
JS, this can be leveraged by asm.js users as a legitimate way to compile
GOTOs. Basically, each jump destinations get converted to a function, and
the jump itself is a tail call. (Guy Steele's "Lambda: the Ultimate GOTO"
http://dspace.mit.edu/bitstream/handle/1721.1/5753/AIM-443.pdf has a
more in-depth explanation of this transformation.)

If this would be the official asm.js solution, asm.js implementations
might not even need any special logic for this pattern; code that looks
like this would just be treated as a plain old tail calls.


Reply to this email directly or view it on GitHub
#80 (comment).

@cnlohr

This comment has been minimized.

Copy link

cnlohr commented Sep 30, 2017

I know this is an ancient, and closed issue, however, I cannot find anywhere that there are people who won't downvote me into oblivion for this... I'm currently retargeting the TinyCC C compiler for output to Javascript. Specifically, I'm trying to keep it as asm.js as possible. I am just not sure how to proceed.

Compilers have their own views on how to string together comparisons, jumps and conditionals. I can't seem to transform it out of the goto paradigm... At least not without making the output code absolutely atrocious.

Was there any progress with this or regular implementation of tail-call optimization? I'm really hoping to stick to human-readable javascript.

Thanks,
Charles.

@kripken

This comment has been minimized.

Copy link
Collaborator

kripken commented Sep 30, 2017

The relooper as mentioned above can generate very human-readable javascript. It'll create ifs, whiles, etc., from the gotos that you provide it.

@cnlohr

This comment has been minimized.

Copy link

cnlohr commented Sep 30, 2017

I could be wrong, but as @mnieper points out, doesn't relooper behave poorly in some irreducible control situations? Additionally, relooper is in C++ and doesn't follow conventions similar to TCC... so to use it as a critical component in the conversion of C to JS, that it's kinda dead-in-the-water if TCC can't cross-compile to its own target.

Surely sometime in the last several years this boondoggle must have been improved!

@cnlohr

This comment has been minimized.

Copy link

cnlohr commented Sep 30, 2017

EDIT After playing with the relooper tool a good bit, it does seem to perform better than the few references I can find online complain about it. I think, though sad having to spend most of my time/effort reimplementing it in C, it might be the best path.

EDIT2 Turns out you can get around this, though it violently breaks the asm.js performance. Putting all the code in a giant statement of the form:

switch( state ) {
case 0x0000000:
   //do stuff
   if( condition ) state = 1; break;
   //do more stuff that would be skipped
case 0x00000001:
   //Keep doing stuff
   break outerloop;
} } while(true);```
@kripken

This comment has been minimized.

Copy link
Collaborator

kripken commented Sep 30, 2017

If you do reuse/reimplement the relooper, the cleanest and most optimized implementation is now in binaryen, https://github.com/WebAssembly/binaryen/blob/master/src/cfg/Relooper.h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.