CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies #4314

kripken · 2021-11-08T16:54:07Z

The old algorithm can be summarized as: In each basic block, start at the beginning.
Each pair of live locals there might interfere with each other, as they might arrive from
different entry blocks with different values. Afterwards, go through the block and find
overlapping live ranges, and mark interferences there as well.

This is non-linear because at the start of the block we do a double-loop over all
pairs of live locals, which in general can be O(N^2) (N - number of locals). It also
has the downside of ignoring copies: if two locals have overlapping live ranges but
they must have identical values on those ranges, they do not actually interfere,
for example

x = 10;
y = x;
.. // live ranges overlap here
foo(x, y); // live ranges end here.

We can ignore this overlap since the copy shows they are identical there, but the
pass did not take this into account. To some extent other passes can remove such
copies (SimplifyLocals, MergeLocals, RedundantSetElimination), but in general
this was a weak spot for the optimizer.

I realized there is a solution to both these problems: In Wasm, given that we have
a default value for all locals, if a local is live at the start of a block then it must be
live at the end of all the blocks reaching it. That is so because the liveness will
extend backwards all the way to some set of the local, possibly all the way to
the zero-initialization at the start of the function, and it extends that way through
all predecessor blocks. A consequence of this is that there are no interferences
between locals that only occur during a merge: The live ranges include the
predecessor blocks, and theirs, and so forth, until we reach a block where one
of the locals is assigned a value different than the other. That is a necessary and
sufficient condition for intererence, and therefore when processing a block we
only need to look at its contents, and can ignore the merging of control flow,
which allows us to be linear.

More details on this and on the new algorithm in comments in the source, but
the basic idea is that it simply goes through each block in a linear way, finding
which values are assigned to each local (using a numbering of unique values),
and noting which are live at each time. If two locals are live and one is assigned
a value that is not the same as the value in the other, mark them as interfering.

This is of substantial benefit to j2wasm output, I believe because it is common
there to find local subexpression elimination opportunities after inlining, and
each time we find one we add a local. If we inline different functions into the
same target, we may end up with copied locals for each of them. (This was
not noticed in the past because it is very rare on LLVM output, which has
already had inlining and GVN etc. done.) I measured the following benefits:

29% reduction in the total number of vars.
32% reduction in local.gets.
43% reduction in local.sets.
16% reduction in binary size.
96% reduction (that is, 22x less!) struct.sets.

The last is because these extra copies were standing in the way of
optimizations like #4244 which require a very particular pattern in
order to safely fold a struct.set into a struct.new.

There is a small benefit to LLVM output as well, though just a few
percent at best. However, it is enough to be noticeable on some of
the code size tests (which we will need to update after this lands,
fyi @sbc100 ).

This is also faster than the previous pass. It's normally not noticeable
as this pass is not one of the slowest anyhow, but I found some real-world
codebases where the pass becomes 50% faster. I have not found any
case where it is slower than the old algorithm.

Fuzzed over several days to be sure this is correct, and also verified
on the emscripten test suite.

(Note for code review: This causes some large updates to one of the
large existing tests. It's unfortunate, but any change to local indexes
can lead to that, so that test is not that useful here. Might be worth
skimming, but I wouldn't ask that you review that one in depth.)

MaxGraey · 2021-11-09T13:08:54Z

test/lit/passes/Oz.wast

  ;; CHECK-NEXT:   (local.tee $0
-  ;; CHECK-NEXT:    (local.tee $1
+  ;; CHECK-NEXT:    (local.tee $0


Looks like this:

(local.tee $0 (local.tee $0 ... ) )

Also could be optimized as duplicate (redundant) operation

@MaxGraey Yes, see the TODO here: https://github.com/WebAssembly/binaryen/pull/4314/files#diff-4b82a7f223b291ea9808e718d45e91593dc75d9bf684d42386e0f0f480132a16R518 I didn't want to add this in this PR to keep it smaller, but that's a followup.

Thanks, now I see

tlively

Nice! The code here is easier to follow than I would have expected.

src/passes/CoalesceLocals.cpp

tlively · 2021-11-09T19:26:55Z

src/passes/CoalesceLocals.cpp

+      // This is a set. Find the value being assigned to the local.
+      auto* set = (*action.origin)->cast<LocalSet>();
+      Index newValue;
+      if (set->value->is<LocalGet>() || set->value->is<LocalSet>()) {


Could we get the fallthrough value here? Separately, could we consider the case where the value is the default value for the relevant type and set the value back to zeroInit?

Ah, good ideas, adding as TODOs for myself.

Co-authored-by: Thomas Lively <7121787+tlively@users.noreply.github.com>

…set (#4318) Tiny followup to #4314 Also updates some function types in test output, fixing breakage on main after racing landings.

These gains come from WebAssembly/binaryen#4314

trzecieu · 2021-11-10T21:56:32Z

Any chance that this will address also: #3646 ?

kripken · 2021-11-10T22:17:17Z

@trzecieu Hmm, I'm afraid not, this helps with nonlinear runtimes but that issue is about nonlinear memory usage in the case with huge amounts of locals, which would require a different fix (I see there are some ideas in that issue, but I'm not aware of anyone working on them atm).

This removes the old hardcoded value numbering in that pass and makes it use the new code that was split into helper code. The immediate benefit of this is to make the code aware of identical constants: if two locals have the same constant then they do not interfere. Future improvements to numbering will also automatically help here. This changes some constants in existing tests so that they keep testing what they were testing before, and adds new tests for the new benefit here. This implements a proposed TODO from #4314

These gains come from WebAssembly/binaryen#4314

kripken added 28 commits November 7, 2021 07:46

start

893f09f

more

07af19d

work

17eaf4d

work

0e96860

builds

d4a4df0

tests

f35c2a5

tst

5bc5ac2

note

070e119

cleanup

5fa2e82

more

0c3eb31

TODO

176e582

fix

c3b5d30

fix

6028c46

test

ef5dbb3

test

4900688

test

37a4df7

test

538ce73

test

2a745b6

test

d7ca0e9

cleanup/optimize

739ce7b

comment

929a501

comments

d2f3967

comments

1287a12

format

75c32b4

cleanup

ad9a90b

test

ecbbc78

cleanup

7d6a2a1

test

6fabff8

kripken requested review from aheejin and tlively November 8, 2021 16:54

update wasm2js tests too

729ac46

MaxGraey reviewed Nov 9, 2021

View reviewed changes

tlively approved these changes Nov 10, 2021

View reviewed changes

Update src/passes/CoalesceLocals.cpp

652b0ae

Co-authored-by: Thomas Lively <7121787+tlively@users.noreply.github.com>

kripken enabled auto-merge (squash) November 10, 2021 00:42

kripken merged commit c301526 into main Nov 10, 2021

kripken deleted the coacopy branch November 10, 2021 01:01

kripken mentioned this pull request Nov 10, 2021

CoalesceLocals: Remove a redundant tee of the same local as a parent set #4318

Merged

kripken added a commit that referenced this pull request Nov 10, 2021

CoalesceLocals: Remove a redundant tee of the same local as a parent …

7075d2e

…set (#4318) Tiny followup to #4314 Also updates some function types in test output, fixing breakage on main after racing landings.

sbc100 added a commit to emscripten-core/emscripten that referenced this pull request Nov 10, 2021

Rebase codesize tests after binaryen change. NFC

32ce380

These gains come from WebAssembly/binaryen#4314

sbc100 mentioned this pull request Nov 10, 2021

Rebase codesize tests after binaryen change. NFC emscripten-core/emscripten#15467

Merged

sbc100 added a commit to emscripten-core/emscripten that referenced this pull request Nov 10, 2021

Rebase codesize tests after binaryen change. NFC (#15467)

83225a5

These gains come from WebAssembly/binaryen#4314

This was referenced Nov 19, 2021

[NFC] Move value numbering code to a header #4345

Merged

CoalesceLocals: Use ValueNumbering #4355

Merged

mmarczell-graphisoft pushed a commit to GRAPHISOFT/emscripten that referenced this pull request Jan 5, 2022

Rebase codesize tests after binaryen change. NFC (emscripten-core#15467)

c7e74b9

These gains come from WebAssembly/binaryen#4314

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies #4314

CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies #4314

Uh oh!

kripken commented Nov 8, 2021

Uh oh!

MaxGraey Nov 9, 2021 •

edited

Loading

Uh oh!

kripken Nov 9, 2021

Uh oh!

MaxGraey Nov 9, 2021

Uh oh!

tlively left a comment

Uh oh!

Uh oh!

tlively Nov 9, 2021

Uh oh!

kripken Nov 10, 2021

Uh oh!

trzecieu commented Nov 10, 2021

Uh oh!

kripken commented Nov 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies #4314

CoalesceLocals: Rewrite the algorithm to be linear and to ignore copies #4314

Uh oh!

Conversation

kripken commented Nov 8, 2021

Uh oh!

MaxGraey Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

MaxGraey Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tlively Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

kripken Nov 10, 2021

Choose a reason for hiding this comment

Uh oh!

trzecieu commented Nov 10, 2021

Uh oh!

kripken commented Nov 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MaxGraey Nov 9, 2021 •

edited

Loading