Add weval support to PBL.#48
Conversation
15fbe7a to
58a892a
Compare
|
I configured this repo to run the github workflows (they were disabled for some reason) and now we have our automated tests running again 🥳 |
This PR modifies the main PBL interpreter to support specialization by
partial evaluation using the weval tool [1], producing compiled
functions for JS function bodies and IC bodies.
Because partial evaluation allows us to reuse the interpreter body as
the definition of the compiler output, the changes are fairly
self-contained: we "register" the "specialization requests" to create
new functions from the combination of the interpreter and some bytecode,
and we use that specialized function when it exists.
As an optimization, we also modify the macros used in the interpreter
body to make use of some weval intrinsics, allowing weval to more
efficiently support the operand stack and some other details. These
optimizations are unnecessary for correctness, but provide much better
performance in some cases.
This PR, when used to perform ahead-of-time compilation, provides quite
significant speedups over interpreted PBL:
```plain
generic interp old PBL new PBL wevaled PBL (this PR)
Richards 166 214 263 729
DeltaBlue 169 254 287 686
Crypto 412 456 410 1255
RayTrace 525 660 761 1315
EarleyBoyer 728 941 1227 2561
RegExp 271 301 358 461
Splay 1262 1664 1889 3258
NavierStokes 656 623 601 2255
PdfJS 2182 2055 2423 5991
Mandreel 166 189 211 503
Gameboy 1357 1548 1552 4659
CodeLoad 19417 18644 17350 17488
Box2D 927 995 978 3745
----
Geomean 821 943 1039 2273
```
[1]: https://github.com/cfallin/weval
58a892a to
a96b3b6
Compare
|
Rebased and ready for review! |
|
the intermittent jit-test failure from #47 is back; I still can't reproduce locally, even in a loop trying to hit any nondeterministic issues. I've pushed a commit to disable the gc/bug-1791975.js jit-test under PBL for now; it has to do with OOM, incremental GC (which we don't make use of in single-threaded Wasm builds), and the original patch that it's testing doesn't seem to touch anything that PBL does. It's always possible I'm missing something here but in the meantime the risk seems low to me... |
This pulls in work from - bytecodealliance/gecko-dev#45 (Add ahead-of-time ICs.) - bytecodealliance/gecko-dev#46 (JS shell on WASI: add basic Wizer integration for standalone testing.) - bytecodealliance/gecko-dev#47 (Update PBL for performance and in preparation for applying weval.) - bytecodealliance/gecko-dev#48 (Add weval support to PBL.) as originally PR'd onto a SpiderMonkey v124.0.2 branch then rebased to v127.0.2 in bytecodealliance/gecko-dev#51.
This pulls in work from - bytecodealliance/gecko-dev#45 (Add ahead-of-time ICs.) - bytecodealliance/gecko-dev#46 (JS shell on WASI: add basic Wizer integration for standalone testing.) - bytecodealliance/gecko-dev#47 (Update PBL for performance and in preparation for applying weval.) - bytecodealliance/gecko-dev#48 (Add weval support to PBL.) as originally PR'd onto a SpiderMonkey v124.0.2 branch then rebased to v127.0.2 in bytecodealliance/gecko-dev#51.
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
|
I just read your blog post at https://cfallin.org/blog/2024/08/27/aot-js/ It sounds like I can just take any JS project and convert it to WASM for a possible speed increase. Is this true and are there any more specific instructions to use this tech for projects? |
|
@kungfooman with a lot of nuances:
|
This PR modifies the main PBL interpreter to support specialization by partial evaluation using the weval tool 1, producing compiled functions for JS function bodies and IC bodies.
Because partial evaluation allows us to reuse the interpreter body as the definition of the compiler output, the changes are fairly self-contained: we "register" the "specialization requests" to create new functions from the combination of the interpreter and some bytecode, and we use that specialized function when it exists.
As an optimization, we also modify the macros used in the interpreter body to make use of some weval intrinsics, allowing weval to more efficiently support the operand stack and some other details. These optimizations are unnecessary for correctness, but provide much better performance in some cases.
This PR, when used to perform ahead-of-time compilation, provides quite significant speedups over interpreted PBL: