Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge-similar-functions size optimization pass, which merges similar functions #4414

Merged
merged 17 commits into from Mar 3, 2022

Conversation

kateinoigakukun
Copy link
Contributor

@kateinoigakukun kateinoigakukun commented Dec 26, 2021

Merge similar functions that only differs constant values (like immediate
operand of const and call insts) by parameterization.
Performing this pass at post-link time can merge more functions across
objects. Inspired by Swift compiler's optimization which is derived from
LLVM's one:
https://github.com/apple/swift/blob/main/lib/LLVMPasses/LLVMMergeFunctions.cpp
https://github.com/llvm/llvm-project/blob/main/llvm/docs/MergeFunctions.rst

The basic ideas here are constant value parameterization and direct callee
parameterization by indirection.

Constant value parameterization is like below:

;;  Before
(func $big-const-42 (result i32)
  [[many instr 1]]
  (i32.const 44)
  [[many instr 2]]
)
(func $big-const-43 (result i32)
  [[many instr 1]]
  (i32.const 45)
  [[many instr 2]]
)

;; After
(func $byn$mgfn-shared$big-const-42 (result i32)
  [[many instr 1]]
  (local.get $0) ;; parameterized!!
  [[many instr 2]]
)
(func $big-const-42 (result i32)
  (call $byn$mgfn-shared$big-const-42
   (i32.const 42)
  )
)
(func $big-const-43 (result i32)
  (call $byn$mgfn-shared$big-const-42
   (i32.const 43)
  )
)

Direct callee parameterization is similar to the constant value parameterization,
but it parameterizes callee function i by ref.func instead. Therefore it is enabled
only when reference-types and typed-function-references features are enabled.

I saw more 1 ~ 2 % reduction for SwiftWasm binary and Ruby's wasm port
using wasi-sdk, and 3 ~ 4.5% reduction for Unity WebGL binary when Oz.

Oz Benchmark

  • original: The original .wasm file (dwarf sections and name section are stripped)
  • old: Optimized by the latest version (104) of wasm-opt
  • new: Optimized by wasm-opt built from this branch
  • new-with-features: Optimized by wasm-opt built from this branch and direct callee parameterization enabled
    (with --enable-reference-types --enable-typed-function-references)

Code size

original (B) old new new-with-features
rust_game_of_life_bg.wasm 36765 36150 (-1.67%) 35855 (-2.48%) 35855 (-2.48%)
swift-hello.wasm 4825211 2968719 (-38.47%) 2928767 (-39.3%) 2907414 (-39.75%)
chibi-link.wasm (by Swift) 21873215 9133106 (-58.25%) 8907066 (-59.28%) 8809156 (-59.73%)
ruby.wasm 8592839 8428193 (-1.92%) 8302506 (-3.38%) 8270963 (-3.75%)
unity-3d-webgl-template.wasm 7986610 7725181 (-3.27%) 7482455 (-6.31%) 7374878 (-7.66%)

unity-3d-webgl-template.wasm is built from Unity's 3D project template targeting WebGL.

This histogram shows the frequency of diff of reduction (%) compared between old and new-with-features, targeting over 1000 npm packages.

npm_hist (1)

Optimizer performance

old (ms) new (ms)
rust_game_of_life_bg.wasm 59.8 ± 1.2 57.4 ± 1.5
swift-hello.wasm 5.176 ± 0.160 4.892 ± 0.074
ruby.wasm 13.510 ± 0.278 12.557 ± 0.335
unity-3d-webgl-template.wasm 9.801 ± 0.179 9.291 ± 0.064

Measured on Mac mini (M1 2020)

Runtime performance

old (ms) new (ms)
swift-hello.wasm 55.4 ± 0.7 56.6 ± 0.5
ruby.wasm 87.3 ± 0.6 90.8 ± 0.9

Measured on Mac mini (M1 2020) and wasmtime 0.31.0

@kripken
Copy link
Member

kripken commented Jan 4, 2022

Interesting, thanks for the PR @kateinoigakukun ! I have not looked at the code yet, but some initial thoughts:

  • Have you seen Explore Similar Function Elimination (SFE) optimization #1142 ? There is some discussion and investigation there of related things. The numbers there are significantly different than here, so I'm curious how the approach (or the data) differs.
  • Is there data on how good the Swift one is?
  • This could be done as a change to the existing DuplicateFunctionElimination pass perhaps: extending the hashing and adding more replacement there. I'm not sure if that's better or not, just curious if you've considered it?

@kateinoigakukun
Copy link
Contributor Author

@kripken

Have you seen Explore Similar Function Elimination (SFE) optimization #1142 ?

I haven't seen it yet, thank you for sharing!
I'm not familiar with asm.js, but as far as I read the code of the tool, my approach does almost the same things.
One possible reason for the regression of my approach is that it accepts only the difference of literals and callees, but the SFE tool does for literals and identifiers. So it gets more merge opportunities, I guess. (I'm not sure what kinds of things can appear as identifiers)

Is there data on how good the Swift one is?

According to the PR, which introduces the pass, it gives ~7% reduction in Swift stdlib. apple/swift#2462

This could be done as a change to the existing DuplicateFunctionElimination pass perhaps: extending the hashing and adding more replacement there. I'm not sure if that's better or not, just curious if you've considered it?

Yes, I've considered it before. DuplicateFunctionElimination is always beneficial from performance and code size aspects. but MergeFunctions is not beneficial in the performance aspect due to thunk overhead, so I decided to create a new pass.

@kripken
Copy link
Member

kripken commented Jan 6, 2022

(I'm not sure what kinds of things can appear as identifiers)

Hmm, I think for asm.js probably function names and global names were relevant.

Yes, I've considered it before. DuplicateFunctionElimination is always beneficial from performance and code size aspects. but MergeFunctions is not beneficial in the performance aspect due to thunk overhead, so I decided to create a new pass.

Makes sense. I agree they should be run at different times/contexts. But I think we can still share code between them if that is useful. See for example the inlining pass which has an "optimize" parameter, and the simplify-locals pass has several. But I'm not sure how much code could be shared, so it's just a thought.

@kateinoigakukun
Copy link
Contributor Author

But I think we can still share code between them if that is useful.

I took a look at DuplicateFunctionElimination again, but I can't see a sharable code because DFE is pretty simple, I think.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done reading yet but here are some comments so far.

src/passes/pass.cpp Show resolved Hide resolved
test/passes/merge-functions.wast Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
@kateinoigakukun kateinoigakukun force-pushed the katei/merge-funcs branch 2 times, most recently from aa446d6 to afa90fe Compare January 14, 2022 01:53
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please measure the speed impact of these hashing changes. Probably fine, but just to be sure. You can try --duplicate-function-elimination on a large wasm file (let me know if you need a link to one) as that will hash a lot.

test/lit/passes/merge-functions.wast Outdated Show resolved Hide resolved
test/lit/passes/merge-functions_all-features.wast Outdated Show resolved Hide resolved
test/lit/passes/merge-functions_all-features.wast Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
@kateinoigakukun kateinoigakukun force-pushed the katei/merge-funcs branch 2 times, most recently from b6e51b6 to 6beec1b Compare February 13, 2022 15:12
@kateinoigakukun
Copy link
Contributor Author

@kripken Sorry for late reply.

Please measure the speed impact of these hashing changes. Probably fine, but just to be sure. You can try --duplicate-function-elimination on a large wasm file (let me know if you need a link to one) as that will hash a lot.

OK, I measured DFE performance on chibi-link.wasm (which contains 63135 functions), and doesn't see big regression 👍

Command Mean [s] Min [s] Max [s] Relative
wasm-opt.new --duplicate-function-elimination chibi-link.wasm -o /dev/null 1.595 ± 0.010 1.584 1.618 1.00 ± 0.01
wasm-opt.old --duplicate-function-elimination chibi-link.wasm -o /dev/null 1.595 ± 0.012 1.586 1.626 1.00

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about MergeFunctions => MergeSimilarFunctions? I think that would be less ambiguous as it clarifies this is about similarity (otherwise it's not clear how this differs from the pass for duplicate function elimination, which also merges functions).

src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
// Canonicalize locals indices to make comparison easier.
PassRunner runner(module);
runner.setIsNested(true);
runner.add("reorder-locals");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not needed as the pass is scheduled in the global post passes, so it is after this function pass has been run. I guess it might help if the pass is run manually by itself, but I don't think it's worth the extra wasted work in the normal case for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. And I confirmed no size regression after removing reorder-locals in this pass.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good! Sorry for the late review.

src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
if (primaryFunction->imported()) {
return false;
}
DeepValueIterator primaryIt(&primaryFunction->body);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why gives the first function a special named iterator? Can't we have a vector of iterators for them all, including the primary?

If it is to make the code a little simpler, then I think I can see that. Maybe the comment on 380 could mention that then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out a simpler way 🤔

src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
src/passes/MergeFunctions.cpp Outdated Show resolved Hide resolved
@kateinoigakukun kateinoigakukun changed the title merge-functions size optimization pass, which merges similar functions merge-similar-functions size optimization pass, which merges similar functions Mar 2, 2022
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Code looks good to me. The last thing is to fuzz this carefully. Basically you can run ./scripts/fuzz_opt.py for several hours to see if it finds anything. I'll also run some fuzzing locally.

src/passes/MergeSimilarFunctions.cpp Outdated Show resolved Hide resolved
@kateinoigakukun
Copy link
Contributor Author

@kripken Yeah, I've also started fuzz test 👍

@kripken
Copy link
Member

kripken commented Mar 2, 2022

This is looking pretty good to me. I've fuzzed 40,000 iterations, and also ran the emscripten -Oz and other test suites. Maybe worth leaving the the fuzzer overnight as a last check.

@kateinoigakukun
Copy link
Contributor Author

Update: I've fuzzed 120,000 iterations for now, and no bug related to this was found

@kripken
Copy link
Member

kripken commented Mar 3, 2022

Thanks, looks good in my fuzzing as well. Landing, thanks @kateinoigakukun ! This is a very nice feature for -Oz builds.

@kripken kripken merged commit 6247e7c into WebAssembly:main Mar 3, 2022
@kateinoigakukun
Copy link
Contributor Author

Thank you @kripken !

@kateinoigakukun kateinoigakukun deleted the katei/merge-funcs branch March 4, 2022 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants