Why do we need Binaryen-IR? #1520

Becavalier · 2018-04-27T15:51:21Z

Why do we really need Binaryen-IR since all those optimizations can be done in LLVM WebAssembly backend?

kripken · 2018-04-27T17:53:45Z

Reasonable question. I'd say that in theory the LLVM WebAssembly backend could do the optimizations Binaryen does, but in practice we do still need Binaryen very much. First, the goals of the projects are different:

Not all compilers use LLVM, many compile directly to wasm (like Go, Cheerp, etc.). Binaryen can improve the output of all compilers to wasm, not just LLVM-using ones.
The LLVM wasm backend compiles LLVM IR to wasm, while Binaryen can also read wasm. That makes it convenient in Binaryen to build tools like ctor-eval, metadce, do things like Souper integration, etc.

Second, the practical optimizations are different as well:

LLVM codegen takes much more memory and is much slower than Binaryen's optimizations, because of how the two are designed. For example, LLVM maintains use lists and so forth - those have big benefits, but they also take memory and time. LLVM is also single-threaded while Binaryen optimizes functions in parallel. This is "just" efficiency of the compiler itself, but it can be a big issue when optimizing very large wasm files, and optimizing such files is crucial for code size (whole-program dead code elimination and optimization really helps). Running all of Binaryen's optimizations on a big 10-20MB wasm file takes just seconds, so it can be run frequently as part of a build process.
The LLVM wasm backend's internal IRs are less close to wasm than Binaryen's IR is. As a result, some wasm optimizations are harder to do in LLVM, for example the LLVM wasm backend doesn't represent wasm if instructions, so it doesn't emit them and also doesn't do if-specific stuff. It also can't do optimizations like Binaryen's ctor-eval (Binaryen uses its internal wasm interpreter there), etc.

binji · 2018-04-27T19:35:45Z

The LLVM wasm backend compiles LLVM IR to wasm, while Binaryen can also read wasm.

Then why don't we make a tool that reads wasm into LLVM IR?

LLVM codegen takes much more memory and is much slower than Binaryen's optimizations, because of how the two are designed.

This is definitely an important consideration, but remember that we are talking about an offline tool run by the developer. WebAssembly necessarily depends on the wasm producer to optimize the output so the consumer can be simpler. So it's important to bias toward spending more time optimizing on the developer's machine than to push that time to the client.

The LLVM wasm backend's internal IRs are less close to wasm than Binaryen's IR is.

That's mostly true for now, but for how long? We can represent multi-value in LLVM IR via SSA renaming, but how do we handle it in binaryen IR?

It also can't do optimizations like Binaryen's ctor-eval

Is there a fundamental limitation in LLVM that prevents this? Or is it just that no one has done the work?

kripken · 2018-04-27T20:12:27Z

why don't we make a tool that reads wasm into LLVM IR?

That might be worth doing! :) See also WAVM. But it would not make sense for some of the examples given, like metadce (LLVM IR is overkill) or ctor-eval (LLVM can't do that).

The LLVM wasm backend's internal IRs are less close to wasm than Binaryen's IR is.

That's mostly true for now, but for how long? We can represent multi-value in LLVM IR via SSA renaming, but how do we handle it in binaryen IR?

Yeah, this might change. And even right now Binaryen IR can't represent some stacky code, as another example.

Overall my intention is to modify Binaryen IR as necessary to enable generating compact wasm. If something like stacky/multi-return don't allow significant new compactness, I think we just need to update the Binaryen wasm reader/writer code, and not change the IR.

About multi-value, I don't know the answer yet. Possibly it justifies IR changes, but possibly not: For function returns it's not an issue, and for blocks and ifs my data from years ago suggested it was not likely to help much (I counted things like phis, and 1 was by far the most common, and that's already handled in wasm in block/if return values, which Binaryen utilizes).

For stacky code, I've been meaning to look into an additional optional IR for Binaryen for it. But I don't think I've seen clear data yet that justifies that effort.

It also can't do optimizations like Binaryen's ctor-eval

Is there a fundamental limitation in LLVM that prevents this? Or is it just that no one has done the work?

It's somewhat awkward to fit into LLVM, I think. The natural place for such things is on the final executable format, to maximize the chance of the code being runnable, which means wasm (or asm.js for that matter) and not LLVM IR. And an interpreter for wasm seems like an odd thing to add to LLVM.

(A full interpreter for LLVM IR could help, though, and that is work that in theory could be done.)

Becavalier · 2018-05-09T23:59:41Z

Thanks all!

Becavalier closed this as completed May 9, 2018

MaxGraey mentioned this issue Aug 6, 2020

Why not transpile to rust AssemblyScript/assemblyscript#1429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we need Binaryen-IR? #1520

Why do we need Binaryen-IR? #1520

Becavalier commented Apr 27, 2018

kripken commented Apr 27, 2018

binji commented Apr 27, 2018

kripken commented Apr 27, 2018

Becavalier commented May 9, 2018

Why do we need Binaryen-IR? #1520

Why do we need Binaryen-IR? #1520

Comments

Becavalier commented Apr 27, 2018

kripken commented Apr 27, 2018

binji commented Apr 27, 2018

kripken commented Apr 27, 2018

Becavalier commented May 9, 2018