-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebAssembly binary format in relation to LLVM-IR #188
Comments
I'm guessing you are unfamiliar with PNaCl. This is more or less the approach taken by PNaCl; i.e. use LLVM as the starting point for a wire format. It turns out that LLVM IR/bitcode by itself is neither portable nor stable enough to be used for this purpose, and because it is designed for compiler optimizations, it has a huge surface area, much more than is needed for this purpose. PNaCl solves these problems by defining a portable target triple (an architecture called "le32" used instead of e.g. i386 or arm), a subset of LLVM IR, and a stable frozen wire format based on LLVM's bitcode. So this approach (while not as simple as "use LLVM-IR directly") does work. However LLVM's IR and bitcode formats were designed (respectively) for use as a compiler IR and for temporary file serialization for link-time optimization. They were not designed for the goals we have, in particular a small compressed distribution format and fast decoding. We think we can do much better for wasm, with the experience we've gained from PNaCl. |
@dschuff I'm familiar with PNaCl, and also passively subscribed to llvmdev (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/086881.html) I agree the IR is not a perfect fit outside of its target environment. Given the parallels, it would be advantageous to leverage much of the necessary work that has already been done in the area of virtual ISAs. Considering the same toolset will be used, it ought to be an appropriate suggestion. In particular, if the IR bitcode could benefit from a small/compressed distribution representation or fast decoding characteristics, even deriving from LLVM-IR would provide a good starting point. |
Hi Jay, Two important goals of the binary format are that it is easily polyfillable On Wed, Jun 17, 2015 at 4:24 PM, Jay Oster notifications@github.com wrote:
|
Even for WebKit, which uses LLVM for its top tier optimizing JIT, a LLVM IR-based assembly as input would be pretty bad. We want to be able to baseline JIT or interpret wasm at a reasonable level of performance for cold start situations while the optimizer is still spoiling up, and our experience shows that LLVM IR is prohibitively expensive to interpret or baseline compile. I like that wasm has baseline JITability as a first class goal, and I don't think that would be achievable if we used LLVM as a starting point. -Fil
|
I actually don't consider "easy" polyfilling to JS to be a goal of the binary format. The polyfill will be important for maybe a year; the binary format much longer. (Clearly LLVM IR can be acceptably polyfilled because emscripten does exactly that already). But as @pizlonator said, acceptably fast compilation/interpretation is another significant problem with LLVM IR (a problem for PNaCl as well that I missed in my first list) because it's very low-level. Most of our exploration for the binary format has been with AST-style IR; I would actually like to see a little more exploration of a CFG-style IR before finalizing something. It's possible that there's something in the CFG-style space that's higher-level (e.g. more machine operations per VM operation) and so can compile fast enough, and can also compress well and express the constructs that a variety of non-JS-style languages need. |
This is a great question to answer on the FAQ. |
Not my field, but as someone interested in bringing a large codebase (www.godotengine.org) to this platform, and based on my experiments with asm.js and PNaCL, I would say that having a bytecode that can be compiled AOT, as fast as possible, and with as little resource usage and generating as few internal structures as possible would be very desired. JIT sounds nice in theory, but you want your high performance code to run with as little stalls and as deterministic as possible. This should be much more of a priority than having your code "start quickly". TLDR; Please go AOT route, as for many applications it's much better to wait a few seconds at first than having random stalls later. |
We expect that the choice of JIT vs AOT will not be specified but be up to the implementation, as it is today for both asm.js and PNaCl. If I recall correctly, Firefox and Chrome compile asm.js code AOT (and PNaCl in the case of Chrome), whereas Safari does not. Not sure about Edge. |
@dschuff That's fine and I understand it's a fair requirement for having something out of the door as soon as possible, but my point is that whatever bytecode format is chosen is as close to native as possible (in number of steps and resources required to generate it), so implementations can eventually do AOT efficiently. |
Correction: TurboFan (the engine V8 uses for asm.js) has no baseline JIT or interpreter, but it doesn't compile the whole module AOT, just function-by-function on-demand. So in that sense it's really a JIT. |
@dschuff ok, I can understand if it's too much of a task to adapt existing JIT engines right now to AOT for it to be included in MVP, but my point stands in the sense that you might have to switch to this approach in the future. I think the Firefox guys already made their point about how well AOT works, and I'm sure that heavy users of wasm (ie, Unity or Unreal) will eventually demand this (compile times as fast as possible, and eventually AOT) in order for their software to work smoothly from the start. |
Wasm is very close to the metal and is deliberately designed to enable rapid conversion to a low level compiler IR. In this regard it is better than other representations including LLVM IR. LLVM IR is only a good low level representation of you have a specific CPU in mind and the implementation used LLVM as the compiler. Wasm is more portable and easily supports multiple CPUs. It's also very easy to convert back into LLVM IR if your compiler is LLVM based. So, on that point, you're preaching to the choir. We've already engineered it to fit your requirements. An entirely separate question is the scheduling and caching policy that the implementation uses for compiling wasm. This is independent of the format; even if we chose LLVM IR or any other extant format, the implementations could still choose from a bunch of different strategies. This is largely out of scope of the wasm discussion and I don't think that the spec should mandate anything in particular. As an aside, the bet that WebKit is making is that a low power JIT for fast start up, paired with a heavy optimizing JIT once a function eats up more than a handful of microseconds of CPU time, is exactly what you need if you care about applications loading quickly and running efficiently. It will be fun to see how this compares to the approaches that the other engines take. -Filip
|
#194 codifies the ideas mentioned here. |
Hi Juan, On Thu, Jun 18, 2015 at 9:25 AM, Juan Linietsky notifications@github.com
|
On Thu, Jun 18, 2015 at 9:34 AM, Derek Schuff notifications@github.com
As for WebAssembly, we're still prototyping and want to leave some room for —
|
With #194 merged, seems like we can close this out. |
After talking to some other developers, and before it's too late on this matter. I realize for some implementations, JIT makes more sense because you get much higher intial response time, but for others AOT makes much more sense (games and audio). I honestly am not so sure it's possible to use an hybrid methodology (aot first, optimize later) that works flawlessly as described above. I have never seen this implemented before and used in real-time scenarios, despite the claims. I know sounds like it will work but I have my doubts. So, would it be possible to add in the webassembly specification a hint about preferred compilation method, AOT or JIT? Implementations are free to ignore it, but at least developers can make sure to tell how they prefer their software to work. EDIT: Will file a separate issue about this, so i don' t flood this unrelated one. |
Yes, actually we have toyed with the idea of adding such hints at a per-function level. For example, you can imagine using PGO you might determine hot functions that should be AOT compiled. I suppose under such a scheme you could hint to AOT compile everything. |
We've also discussed hints for hot / cold code. We could even avoid compiling or downloading cold code! |
According to the MVP, the WebAssembly binary format has a striking similarity to the LLVM-IR.
Quotes for comparison from the sources given above:
WebAssembly Minimum Viable Product:
LLVM Language Reference:
The LLVM bitcode accomplishes the goals of the MVP binary format. The LLVM human readable assembly language accomplishes the goals of the MVP text format.
Another point to mention is that LLVM-IR can already be compiled to native code for dozens of platforms using any (already existing) LLVM backend, including JavaScript, which accomplishes the Polyfill's goals.
That said, what are the pure advantages (if any) of a new binary and text format over an existing spec and implementation? And second, if there are any compelling advantages at all, how are the two related? Are they in direct competition?
I looked around for any prior recommendations for using LLVM-IR directly, and came up empty. So here it is: Use LLVM-IR directly! It's already built, already compiles into native code for dozens of platforms, already has several front ends including C/C++, it's free, covers many or all of the desired goals, and has a huge community of developers supporting it.
The text was updated successfully, but these errors were encountered: