Skip to content

Add weval support to PBL.#48

Merged
cfallin merged 3 commits intobytecodealliance:fastly/ff-124-0-2from
cfallin:cfallin/pbl-weval-ff124
Jul 19, 2024
Merged

Add weval support to PBL.#48
cfallin merged 3 commits intobytecodealliance:fastly/ff-124-0-2from
cfallin:cfallin/pbl-weval-ff124

Conversation

@cfallin
Copy link
Member

@cfallin cfallin commented Jul 18, 2024

This PR modifies the main PBL interpreter to support specialization by partial evaluation using the weval tool 1, producing compiled functions for JS function bodies and IC bodies.

Because partial evaluation allows us to reuse the interpreter body as the definition of the compiler output, the changes are fairly self-contained: we "register" the "specialization requests" to create new functions from the combination of the interpreter and some bytecode, and we use that specialized function when it exists.

As an optimization, we also modify the macros used in the interpreter body to make use of some weval intrinsics, allowing weval to more efficiently support the operand stack and some other details. These optimizations are unnecessary for correctness, but provide much better performance in some cases.

This PR, when used to perform ahead-of-time compilation, provides quite significant speedups over interpreted PBL:

              generic interp  old PBL     new PBL   wevaled PBL (this PR)
Richards        166             214         263       729
DeltaBlue       169             254         287       686
Crypto          412             456         410      1255
RayTrace        525             660         761      1315
EarleyBoyer     728             941        1227      2561
RegExp          271             301         358       461
Splay          1262            1664        1889      3258
NavierStokes    656             623         601      2255
PdfJS          2182            2055        2423      5991
Mandreel        166             189         211       503
Gameboy        1357            1548        1552      4659
CodeLoad      19417           18644       17350     17488
Box2D           927             995         978      3745
----
Geomean         821             943        1039      2273

@cfallin cfallin requested a review from JakeChampion July 18, 2024 05:21
@JakeChampion JakeChampion force-pushed the cfallin/pbl-weval-ff124 branch from 15fbe7a to 58a892a Compare July 19, 2024 11:04
@JakeChampion
Copy link
Collaborator

I configured this repo to run the github workflows (they were disabled for some reason) and now we have our automated tests running again 🥳

cfallin added 2 commits July 19, 2024 08:44
This PR modifies the main PBL interpreter to support specialization by
partial evaluation using the weval tool [1], producing compiled
functions for JS function bodies and IC bodies.

Because partial evaluation allows us to reuse the interpreter body as
the definition of the compiler output, the changes are fairly
self-contained: we "register" the "specialization requests" to create
new functions from the combination of the interpreter and some bytecode,
and we use that specialized function when it exists.

As an optimization, we also modify the macros used in the interpreter
body to make use of some weval intrinsics, allowing weval to more
efficiently support the operand stack and some other details. These
optimizations are unnecessary for correctness, but provide much better
performance in some cases.

This PR, when used to perform ahead-of-time compilation, provides quite
significant speedups over interpreted PBL:

```plain
              generic interp  old PBL     new PBL   wevaled PBL (this PR)
Richards        166             214         263       729
DeltaBlue       169             254         287       686
Crypto          412             456         410      1255
RayTrace        525             660         761      1315
EarleyBoyer     728             941        1227      2561
RegExp          271             301         358       461
Splay          1262            1664        1889      3258
NavierStokes    656             623         601      2255
PdfJS          2182            2055        2423      5991
Mandreel        166             189         211       503
Gameboy        1357            1548        1552      4659
CodeLoad      19417           18644       17350     17488
Box2D           927             995         978      3745
----
Geomean         821             943        1039      2273
```

[1]: https://github.com/cfallin/weval
@cfallin cfallin force-pushed the cfallin/pbl-weval-ff124 branch from 58a892a to a96b3b6 Compare July 19, 2024 15:45
@cfallin
Copy link
Member Author

cfallin commented Jul 19, 2024

Rebased and ready for review!

@cfallin
Copy link
Member Author

cfallin commented Jul 19, 2024

the intermittent jit-test failure from #47 is back; I still can't reproduce locally, even in a loop trying to hit any nondeterministic issues. I've pushed a commit to disable the gc/bug-1791975.js jit-test under PBL for now; it has to do with OOM, incremental GC (which we don't make use of in single-threaded Wasm builds), and the original patch that it's testing doesn't seem to touch anything that PBL does. It's always possible I'm missing something here but in the meantime the risk seems low to me...

@cfallin cfallin merged commit 707d85a into bytecodealliance:fastly/ff-124-0-2 Jul 19, 2024
@cfallin cfallin deleted the cfallin/pbl-weval-ff124 branch July 19, 2024 20:27
cfallin added a commit to cfallin/spidermonkey-wasi-embedding that referenced this pull request Jul 25, 2024
This pulls in work from

- bytecodealliance/gecko-dev#45 (Add ahead-of-time ICs.)
- bytecodealliance/gecko-dev#46 (JS shell on WASI: add basic Wizer
  integration for standalone testing.)
- bytecodealliance/gecko-dev#47 (Update PBL for performance and in
  preparation for applying weval.)
- bytecodealliance/gecko-dev#48 (Add weval support to PBL.)

as originally PR'd onto a SpiderMonkey v124.0.2 branch then rebased to
v127.0.2 in bytecodealliance/gecko-dev#51.
cfallin added a commit to cfallin/spidermonkey-wasi-embedding that referenced this pull request Jul 26, 2024
This pulls in work from

- bytecodealliance/gecko-dev#45 (Add ahead-of-time ICs.)
- bytecodealliance/gecko-dev#46 (JS shell on WASI: add basic Wizer
  integration for standalone testing.)
- bytecodealliance/gecko-dev#47 (Update PBL for performance and in
  preparation for applying weval.)
- bytecodealliance/gecko-dev#48 (Add weval support to PBL.)

as originally PR'd onto a SpiderMonkey v124.0.2 branch then rebased to
v127.0.2 in bytecodealliance/gecko-dev#51.
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
@kungfooman
Copy link

I just read your blog post at https://cfallin.org/blog/2024/08/27/aot-js/

It sounds like I can just take any JS project and convert it to WASM for a possible speed increase. Is this true and are there any more specific instructions to use this tech for projects?

@cfallin
Copy link
Member Author

cfallin commented Aug 28, 2024

@kungfooman with a lot of nuances:

  • This is applicable in a context where one is running JS in a Wasm module outside the web already (e.g., server-side Wasm, or plugin Wasm, on a WASI platform), where the state of the art today is running a JS interpreter. If you have "web JS", this isn't applicable.
  • This has been integrated into StarlingMonkey, Bytecode Alliance's JS runtime built around SpiderMonkey for such use-cases. There, the --aot flag to componentize.sh invokes the compilation flow.
  • This is definitely not a supported product on its own today, outside of the contexts in which it's been integrated. It's more in a "play with it if you know what you're doing" state, with slow rollout in more specific contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants