Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do we need Binaryen-IR? #1520

Closed
Becavalier opened this issue Apr 27, 2018 · 4 comments
Closed

Why do we need Binaryen-IR? #1520

Becavalier opened this issue Apr 27, 2018 · 4 comments

Comments

@Becavalier
Copy link

Why do we really need Binaryen-IR since all those optimizations can be done in LLVM WebAssembly backend?

@kripken
Copy link
Member

kripken commented Apr 27, 2018

Reasonable question. I'd say that in theory the LLVM WebAssembly backend could do the optimizations Binaryen does, but in practice we do still need Binaryen very much. First, the goals of the projects are different:

  • Not all compilers use LLVM, many compile directly to wasm (like Go, Cheerp, etc.). Binaryen can improve the output of all compilers to wasm, not just LLVM-using ones.
  • The LLVM wasm backend compiles LLVM IR to wasm, while Binaryen can also read wasm. That makes it convenient in Binaryen to build tools like ctor-eval, metadce, do things like Souper integration, etc.

Second, the practical optimizations are different as well:

  • LLVM codegen takes much more memory and is much slower than Binaryen's optimizations, because of how the two are designed. For example, LLVM maintains use lists and so forth - those have big benefits, but they also take memory and time. LLVM is also single-threaded while Binaryen optimizes functions in parallel. This is "just" efficiency of the compiler itself, but it can be a big issue when optimizing very large wasm files, and optimizing such files is crucial for code size (whole-program dead code elimination and optimization really helps). Running all of Binaryen's optimizations on a big 10-20MB wasm file takes just seconds, so it can be run frequently as part of a build process.
  • The LLVM wasm backend's internal IRs are less close to wasm than Binaryen's IR is. As a result, some wasm optimizations are harder to do in LLVM, for example the LLVM wasm backend doesn't represent wasm if instructions, so it doesn't emit them and also doesn't do if-specific stuff. It also can't do optimizations like Binaryen's ctor-eval (Binaryen uses its internal wasm interpreter there), etc.

@binji
Copy link
Member

binji commented Apr 27, 2018

The LLVM wasm backend compiles LLVM IR to wasm, while Binaryen can also read wasm.

Then why don't we make a tool that reads wasm into LLVM IR?

LLVM codegen takes much more memory and is much slower than Binaryen's optimizations, because of how the two are designed.

This is definitely an important consideration, but remember that we are talking about an offline tool run by the developer. WebAssembly necessarily depends on the wasm producer to optimize the output so the consumer can be simpler. So it's important to bias toward spending more time optimizing on the developer's machine than to push that time to the client.

The LLVM wasm backend's internal IRs are less close to wasm than Binaryen's IR is.

That's mostly true for now, but for how long? We can represent multi-value in LLVM IR via SSA renaming, but how do we handle it in binaryen IR?

It also can't do optimizations like Binaryen's ctor-eval

Is there a fundamental limitation in LLVM that prevents this? Or is it just that no one has done the work?

@kripken
Copy link
Member

kripken commented Apr 27, 2018

why don't we make a tool that reads wasm into LLVM IR?

That might be worth doing! :) See also WAVM. But it would not make sense for some of the examples given, like metadce (LLVM IR is overkill) or ctor-eval (LLVM can't do that).

The LLVM wasm backend's internal IRs are less close to wasm than Binaryen's IR is.

That's mostly true for now, but for how long? We can represent multi-value in LLVM IR via SSA renaming, but how do we handle it in binaryen IR?

Yeah, this might change. And even right now Binaryen IR can't represent some stacky code, as another example.

Overall my intention is to modify Binaryen IR as necessary to enable generating compact wasm. If something like stacky/multi-return don't allow significant new compactness, I think we just need to update the Binaryen wasm reader/writer code, and not change the IR.

About multi-value, I don't know the answer yet. Possibly it justifies IR changes, but possibly not: For function returns it's not an issue, and for blocks and ifs my data from years ago suggested it was not likely to help much (I counted things like phis, and 1 was by far the most common, and that's already handled in wasm in block/if return values, which Binaryen utilizes).

For stacky code, I've been meaning to look into an additional optional IR for Binaryen for it. But I don't think I've seen clear data yet that justifies that effort.

It also can't do optimizations like Binaryen's ctor-eval

Is there a fundamental limitation in LLVM that prevents this? Or is it just that no one has done the work?

It's somewhat awkward to fit into LLVM, I think. The natural place for such things is on the final executable format, to maximize the chance of the code being runnable, which means wasm (or asm.js for that matter) and not LLVM IR. And an interpreter for wasm seems like an odd thing to add to LLVM.

(A full interpreter for LLVM IR could help, though, and that is work that in theory could be done.)

@Becavalier
Copy link
Author

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants