Skip to content

PBL/weval: value-specialize on two-way conditional inputs to avoid issues with if-conversion.#54

Merged
cfallin merged 1 commit intoff-127-0-2from
cfallin/fix-if-conversion-on-pbl-branches
Jul 31, 2024
Merged

PBL/weval: value-specialize on two-way conditional inputs to avoid issues with if-conversion.#54
cfallin merged 1 commit intoff-127-0-2from
cfallin/fix-if-conversion-on-pbl-branches

Conversation

@cfallin
Copy link
Member

@cfallin cfallin commented Jul 31, 2024

weval processing requires that a conditional update of the constant "specialization context" PC value be written like

if (cond) {
    weval_update_context(pc1);
    pc = pc1;
    goto dispatch;
} else {
    weval_update_context(pc2);
    pc = pc2;
    goto dispatch;
}

However, LLVM is very smart (some would say too smart!), and in a recent perturbation of my build/debug environment, started if-converting this to

next_pc = cond ? pc1 : pc2;
weval_update_context(next_pc);
pc = next_pc;
goto dispatch;

This is problematic because in the abstract interpretation, we merge two "constant" lattice values to the bottom element ("runtime") only; we have no symbolic representation of the "conditional"/"select" value.

Some attempts to prevent the if-conversion by altering one of the paths (with e.g. a weval_assert_const32) were not successful; really we shouldn't be depending on LLVM doing or not doing certain opts and generating a certain form of output only.

Rather, this PR updates the logic to use the same solution that TableSwitch (which is fundamentally value-dependent control flow) uses: invoke the weval intrinsic that "splits" the context on possible values (here 0 and 1 only). We can then constant-propagate the select (the ternary ?: above) and recover
statically-constant-at-abstract-interpretation PC values and contexts.

Rebasing note: this should squash into the base "PBL enhancements" series, not the weval-specific patch.

…sues with if-conversion.

weval processing requires that a conditional update of the constant
"specialization context" PC value be written like

```
if (cond) {
    weval_update_context(pc1);
    pc = pc1;
    goto dispatch;
} else {
    weval_update_context(pc2);
    pc = pc2;
    goto dispatch;
}
```

However, LLVM is very smart (some would say too smart!), and in a recent
perturbation of my build/debug environment, started if-converting this
to

```
next_pc = cond ? pc1 : pc2;
weval_update_context(next_pc);
pc = next_pc;
goto dispatch;
```

This is problematic because in the abstract interpretation, we merge two
"constant" lattice values to the bottom element ("runtime") only; we
have no symbolic representation of the "conditional"/"select" value.

Some attempts to prevent the if-conversion by altering one of the paths
(with e.g. a `weval_assert_const32`) were not successful; really we
shouldn't be depending on LLVM doing or not doing certain opts and
generating a certain form of output only.

Rather, this PR updates the logic to use the same solution that
`TableSwitch` (which is fundamentally value-dependent control flow)
uses: invoke the weval intrinsic that "splits" the context on possible
values (here `0` and `1` only). We can then constant-propagate the
`select` (the ternary `?:` above) and recover
statically-constant-at-abstract-interpretation PC values and contexts.

Rebasing note: this should squash into the base "PBL enhancements"
series, not the weval-specific patch.
@cfallin cfallin requested a review from JakeChampion July 31, 2024 03:19
cfallin added a commit to cfallin/spidermonkey-wasi-embedding that referenced this pull request Jul 31, 2024
Two issues discovered while debugging the testsuite in StarlingMonkey:

- bytecodealliance/gecko-dev#54: fixes handling of conditionals by weval
  when LLVM applies if-conversion in a new place; use of weval intrinsic
  for value-specialization / subcontext splitting should make this more
  robust.

- bytecodealliance/gecko-dev#55: fixes missing interpreter-PC values in
  stack frames up the stack during unwind because of too-aggressive
  optimization trick in weval'd bodies.

With these two changes, this version of SpiderMonkey allows the
StarlingMonkey integration test suite to pass in
bytecodealliance/StarlingMonkey#91.
Copy link
Collaborator

@JakeChampion JakeChampion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the thorough explanation and the note about how to rebase this 🙇

@cfallin cfallin merged commit b44d44f into ff-127-0-2 Jul 31, 2024
@cfallin cfallin deleted the cfallin/fix-if-conversion-on-pbl-branches branch July 31, 2024 16:42
cfallin added a commit to cfallin/spidermonkey-wasi-embedding that referenced this pull request Jul 31, 2024
Two issues discovered while debugging the testsuite in StarlingMonkey:

- bytecodealliance/gecko-dev#54: fixes handling of conditionals by weval
  when LLVM applies if-conversion in a new place; use of weval intrinsic
  for value-specialization / subcontext splitting should make this more
  robust.

- bytecodealliance/gecko-dev#55: fixes missing interpreter-PC values in
  stack frames up the stack during unwind because of too-aggressive
  optimization trick in weval'd bodies.

With these two changes, this version of SpiderMonkey allows the
StarlingMonkey integration test suite to pass in
bytecodealliance/StarlingMonkey#91.
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants