Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run (basic) StackIR optimizations in all binary writes? #6509

Open
kripken opened this issue Apr 17, 2024 · 4 comments
Open

Run (basic) StackIR optimizations in all binary writes? #6509

kripken opened this issue Apr 17, 2024 · 4 comments

Comments

@kripken
Copy link
Member

kripken commented Apr 17, 2024

Atm we enable StackIR optimizations in -O3 and -Os and above. This made sense because their benefit is usually fairly small, around 1-2%, and we didn't want to slow down builds just for that. However, maybe it is worth changing that, for two reasons:

  1. Our past worry about slowing down builds is perhaps not relevant today: Most and perhaps all toolchains using Binaryen are not running it in debug builds. Emscripten for example stopped running wasm-opt in debug builds and even in -O1 (it only runs in -O2+), and other toolchains likewise have fast iteration/debug builds that just skip wasm-opt entirely. If wasm-opt is only run when it is meant to optimize, then there is little harm in running StackIR opts.
  2. StackIR opts improve roundtripping in some cases, which is actually the immediate reason that made me think about this. Things like multivalue end up adding more locals and sets/gets in some cases, and StackIR opts can get rid of a bunch of those. Ideally we'd get to a point where roundtripping a file only shrinks it or keeps it the same size (which may require more than StackIR, but StackIR would be an important part of it). That is, better roundtripping is an additional goal here, beyond the 1-2% that StackIR normally helps.
  3. It would be simpler to just always run StackIR (at least the basic, non-costly parts) all the time, rather than the current system where we have passes to generate and optimize it, and there are various corner cases like what happens if you generate it but then modify BinaryenIR, etc. We can avoid that complexity by always running StackIR in binary writing.

StackIR does have some slower optimizations, which could be enabled only when the user requests a higher optimization level, which the binary writer would check.

(context: #6390 and another approach I am trying to fix that same problem as that PR may end up adding more roundtrip artifacts in rare cases, so I was wondering about ways to mitigate that.)

@tlively
Copy link
Member

tlively commented Apr 18, 2024

This makes sense to me. Unconditionally running StackIR could also motivate us to pay more attention to it.

@kripken
Copy link
Member Author

kripken commented Apr 29, 2024

One issue I noticed now is that the StackIR binary writing path does not support DWARF and source maps. Making it support DWARF looks trivial since that information is only used as a single bool, but I don't know how much work source maps would be.

edit: this has been fixed in #6564

kripken added a commit that referenced this issue May 1, 2024
Helping #6509, this fixes debugging support for StackIR, which makes it more
possible to use StackIR in more places.

The fix is basically just to pass around some more state, and then to call the
parent with "please write debug info" at the correct times, mirroring the
similar calls in BinaryenIRWriter.

The relevant Emscripten tests pass, and the source map test modified
here produces identical output in StackIR and non-StackIR modes (the
test is also simplified to remove --new-wat-parser which is no longer
needed, after which the test can clearly show that StackIR has the same
output as BinaryenIR).
@kripken
Copy link
Member Author

kripken commented Jul 12, 2024

If we want to do this, we'd need to first improve the roundtripping of stacky code, which StackIR opts emit. As noted in emscripten-core/emscripten#22218 it turned out that running StackIR opts early in Emscripten added a few bytes to code size, since we didn't undo the extra code due to stackiness - subsequent loads of the wasm added locals etc., but we weren't doing enough optimizations to remove those.

The particular situation there was that we added several extra locals. If we didn't coalesce them then we'd keep the multiple locals around forever. --coalesce-locals is enough to fix that, but perhaps we can find a way to avoid adding so many locals in the first place (e.g. the binary reading code could reuse temp locals in some cases).

@tlively
Copy link
Member

tlively commented Jul 13, 2024

#6614 will help a bit here; The IRBuilder generates better IR from stacky code than the current binary parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants