-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transition to regalloc2 #3942
Comments
(Can we add spidermonkey.wasm and clang.wasm to Sightglass?) |
We could perhaps, yeah, with some hackery (building a toplevel harness mostly). In the SpiderMonkey case we need to add a WASI directory capability and feed in a JS file, and in the clang case we need a way to tell the infra that it's compile-only (I don't know how to run it). For now it's not too bad to run by hand though :-) |
A little more benchmarking -- taking most of the modules from #911 and compiling with baseline and regalloc2:
In almost all cases things got faster, sometimes significantly so (3.18s -> 0.61s, 1.2s -> 0.26s, 2.7s -> 0.061s (!)). This tracks with my understanding of some of the bottlenecks I saw in profiling before and the efforts to keep away from quadratic explosions and nonlinear behavior in general in regalloc2 as far as possible. Some of the smaller modules see some increases (0.137s -> 0.365s, 0.057s -> 0.284s); I haven't conclusively resolved what's going on in those but it wouldn't surprise me if this comes from splitting heuristics being a little more aggressive. In any case nothing immediately jumps out in the profile. |
This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```
This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.
The major tasks remaining are:
The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)
The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.
Here is a current snapshot of some benchmark results:
with full details here:
Benchmark methodology and raw output
wasmtime run
once to ensure compiledwasmtime compile
5x, take best of fiveComparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).
Raw output of Sightglass below (instantiation excluded, not interesting).
compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm
Δ = 121531866.00 ± 51042761.18 (confidence = 99%)
new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!
[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so
compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm
Δ = 31981472.00 ± 13432120.92 (confidence = 99%)
new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!
[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so
execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm
Δ = 36931.50 ± 3272.72 (confidence = 99%)
new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!
[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so
execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm
Δ = 140341.60 ± 12437.21 (confidence = 99%)
new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!
[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so
compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm
No difference in performance.
[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so
compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm
No difference in performance.
[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so
execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm
No difference in performance.
[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so
execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm
No difference in performance.
[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so
compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm
Δ = 483775336.20 ± 24646158.96 (confidence = 99%)
new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!
[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so
compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm
Δ = 127275628.40 ± 6480546.57 (confidence = 99%)
new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!
[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so
execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm
Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)
new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!
[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so
execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm
Δ = 891020039.40 ± 119694835.02 (confidence = 99%)
new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!
[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so
compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm
Δ = 213252595.20 ± 29303757.92 (confidence = 99%)
new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!
[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so
compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm
Δ = 56118120.00 ± 7711578.76 (confidence = 99%)
new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!
[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so
execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm
No difference in performance.
[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so
execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm
No difference in performance.
[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so
compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm
Δ = 58684068.80 ± 36909440.37 (confidence = 99%)
new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!
[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so
compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm
Δ = 15436153.00 ± 9714229.01 (confidence = 99%)
new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!
[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so
execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm
No difference in performance.
[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so
execution :: cycles :: benchmarks-next/bz2/benchmark.wasm
No difference in performance.
[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
The text was updated successfully, but these errors were encountered: