PBL/weval: value-specialize on two-way conditional inputs to avoid issues with if-conversion.#54
Merged
cfallin merged 1 commit intoff-127-0-2from Jul 31, 2024
Conversation
…sues with if-conversion.
weval processing requires that a conditional update of the constant
"specialization context" PC value be written like
```
if (cond) {
weval_update_context(pc1);
pc = pc1;
goto dispatch;
} else {
weval_update_context(pc2);
pc = pc2;
goto dispatch;
}
```
However, LLVM is very smart (some would say too smart!), and in a recent
perturbation of my build/debug environment, started if-converting this
to
```
next_pc = cond ? pc1 : pc2;
weval_update_context(next_pc);
pc = next_pc;
goto dispatch;
```
This is problematic because in the abstract interpretation, we merge two
"constant" lattice values to the bottom element ("runtime") only; we
have no symbolic representation of the "conditional"/"select" value.
Some attempts to prevent the if-conversion by altering one of the paths
(with e.g. a `weval_assert_const32`) were not successful; really we
shouldn't be depending on LLVM doing or not doing certain opts and
generating a certain form of output only.
Rather, this PR updates the logic to use the same solution that
`TableSwitch` (which is fundamentally value-dependent control flow)
uses: invoke the weval intrinsic that "splits" the context on possible
values (here `0` and `1` only). We can then constant-propagate the
`select` (the ternary `?:` above) and recover
statically-constant-at-abstract-interpretation PC values and contexts.
Rebasing note: this should squash into the base "PBL enhancements"
series, not the weval-specific patch.
cfallin
added a commit
to cfallin/spidermonkey-wasi-embedding
that referenced
this pull request
Jul 31, 2024
Two issues discovered while debugging the testsuite in StarlingMonkey: - bytecodealliance/gecko-dev#54: fixes handling of conditionals by weval when LLVM applies if-conversion in a new place; use of weval intrinsic for value-specialization / subcontext splitting should make this more robust. - bytecodealliance/gecko-dev#55: fixes missing interpreter-PC values in stack frames up the stack during unwind because of too-aggressive optimization trick in weval'd bodies. With these two changes, this version of SpiderMonkey allows the StarlingMonkey integration test suite to pass in bytecodealliance/StarlingMonkey#91.
JakeChampion
approved these changes
Jul 31, 2024
Collaborator
JakeChampion
left a comment
There was a problem hiding this comment.
Thank you for the thorough explanation and the note about how to rebase this 🙇
cfallin
added a commit
to cfallin/spidermonkey-wasi-embedding
that referenced
this pull request
Jul 31, 2024
Two issues discovered while debugging the testsuite in StarlingMonkey: - bytecodealliance/gecko-dev#54: fixes handling of conditionals by weval when LLVM applies if-conversion in a new place; use of weval intrinsic for value-specialization / subcontext splitting should make this more robust. - bytecodealliance/gecko-dev#55: fixes missing interpreter-PC values in stack frames up the stack during unwind because of too-aggressive optimization trick in weval'd bodies. With these two changes, this version of SpiderMonkey allows the StarlingMonkey integration test suite to pass in bytecodealliance/StarlingMonkey#91.
cfallin
added a commit
to cfallin/js-compute-runtime
that referenced
this pull request
Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
cfallin
added a commit
to cfallin/js-compute-runtime
that referenced
this pull request
Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
cfallin
added a commit
to cfallin/js-compute-runtime
that referenced
this pull request
Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
cfallin
added a commit
to cfallin/js-compute-runtime
that referenced
this pull request
Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
cfallin
added a commit
to cfallin/js-compute-runtime
that referenced
this pull request
Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
weval processing requires that a conditional update of the constant "specialization context" PC value be written like
However, LLVM is very smart (some would say too smart!), and in a recent perturbation of my build/debug environment, started if-converting this to
This is problematic because in the abstract interpretation, we merge two "constant" lattice values to the bottom element ("runtime") only; we have no symbolic representation of the "conditional"/"select" value.
Some attempts to prevent the if-conversion by altering one of the paths (with e.g. a
weval_assert_const32) were not successful; really we shouldn't be depending on LLVM doing or not doing certain opts and generating a certain form of output only.Rather, this PR updates the logic to use the same solution that
TableSwitch(which is fundamentally value-dependent control flow) uses: invoke the weval intrinsic that "splits" the context on possible values (here0and1only). We can then constant-propagate theselect(the ternary?:above) and recoverstatically-constant-at-abstract-interpretation PC values and contexts.
Rebasing note: this should squash into the base "PBL enhancements" series, not the weval-specific patch.