PBL/weval: value-specialize on two-way conditional inputs to avoid issues with if-conversion. by cfallin · Pull Request #54 · bytecodealliance/gecko-dev

cfallin · 2024-07-31T03:18:49Z

weval processing requires that a conditional update of the constant "specialization context" PC value be written like

if (cond) {
    weval_update_context(pc1);
    pc = pc1;
    goto dispatch;
} else {
    weval_update_context(pc2);
    pc = pc2;
    goto dispatch;
}

However, LLVM is very smart (some would say too smart!), and in a recent perturbation of my build/debug environment, started if-converting this to

next_pc = cond ? pc1 : pc2;
weval_update_context(next_pc);
pc = next_pc;
goto dispatch;

This is problematic because in the abstract interpretation, we merge two "constant" lattice values to the bottom element ("runtime") only; we have no symbolic representation of the "conditional"/"select" value.

Some attempts to prevent the if-conversion by altering one of the paths (with e.g. a weval_assert_const32) were not successful; really we shouldn't be depending on LLVM doing or not doing certain opts and generating a certain form of output only.

Rather, this PR updates the logic to use the same solution that TableSwitch (which is fundamentally value-dependent control flow) uses: invoke the weval intrinsic that "splits" the context on possible values (here 0 and 1 only). We can then constant-propagate the select (the ternary ?: above) and recover
statically-constant-at-abstract-interpretation PC values and contexts.

Rebasing note: this should squash into the base "PBL enhancements" series, not the weval-specific patch.

…sues with if-conversion. weval processing requires that a conditional update of the constant "specialization context" PC value be written like ``` if (cond) { weval_update_context(pc1); pc = pc1; goto dispatch; } else { weval_update_context(pc2); pc = pc2; goto dispatch; } ``` However, LLVM is very smart (some would say too smart!), and in a recent perturbation of my build/debug environment, started if-converting this to ``` next_pc = cond ? pc1 : pc2; weval_update_context(next_pc); pc = next_pc; goto dispatch; ``` This is problematic because in the abstract interpretation, we merge two "constant" lattice values to the bottom element ("runtime") only; we have no symbolic representation of the "conditional"/"select" value. Some attempts to prevent the if-conversion by altering one of the paths (with e.g. a `weval_assert_const32`) were not successful; really we shouldn't be depending on LLVM doing or not doing certain opts and generating a certain form of output only. Rather, this PR updates the logic to use the same solution that `TableSwitch` (which is fundamentally value-dependent control flow) uses: invoke the weval intrinsic that "splits" the context on possible values (here `0` and `1` only). We can then constant-propagate the `select` (the ternary `?:` above) and recover statically-constant-at-abstract-interpretation PC values and contexts. Rebasing note: this should squash into the base "PBL enhancements" series, not the weval-specific patch.

Two issues discovered while debugging the testsuite in StarlingMonkey: - bytecodealliance/gecko-dev#54: fixes handling of conditionals by weval when LLVM applies if-conversion in a new place; use of weval intrinsic for value-specialization / subcontext splitting should make this more robust. - bytecodealliance/gecko-dev#55: fixes missing interpreter-PC values in stack frames up the stack during unwind because of too-aggressive optimization trick in weval'd bodies. With these two changes, this version of SpiderMonkey allows the StarlingMonkey integration test suite to pass in bytecodealliance/StarlingMonkey#91.

JakeChampion

Thank you for the thorough explanation and the note about how to rebase this 🙇

Two issues discovered while debugging the testsuite in StarlingMonkey: - bytecodealliance/gecko-dev#54: fixes handling of conditionals by weval when LLVM applies if-conversion in a new place; use of weval intrinsic for value-specialization / subcontext splitting should make this more robust. - bytecodealliance/gecko-dev#55: fixes missing interpreter-PC values in stack frames up the stack during unwind because of too-aggressive optimization trick in weval'd bodies. With these two changes, this version of SpiderMonkey allows the StarlingMonkey integration test suite to pass in bytecodealliance/StarlingMonkey#91.

This PR pulls in my work to use "weval", the WebAssembly partial evaluator, to perform ahead-of-time compilation of JavaScript using the PBL interpreter we previously contributed to SpiderMonkey. This work has been merged into the BA fork of SpiderMonkey in bytecodealliance/gecko-dev#45, bytecodealliance/gecko-dev#46, bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48, bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52, bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey in bytecodealliance/StarlingMonkey#91. The feature is off by default; it requires a `--enable-experimental-aot` flag to be passed to `js-compute-runtime-cli.js`. This requires a separate build of the engine Wasm module to be used when the flag is passed. This should still be considered experimental until it is tested more widely. The PBL+weval combination passes all jit-tests and jstests in SpiderMonkey, and all integration tests in StarlingMonkey; however, it has not yet been widely tested in real-world scenarios. Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks) are in the 3x-5x range. This is roughly equivalent to the speedup that a native JS engine's "baseline JIT" compiler tier gets over its interpreter, and it uses the same basic techniques -- compiling all polymorphic operations (all basic JS operators) to inline-cache sites that dispatch to stubs depending on types. Further speedups can be obtained eventually by inlining stubs from warmed-up IC chains, but that requires warmup. Important to note is that this compilation approach is *fully ahead-of-time*: it requires no profiling or observation or warmup of user code, and compiles the JS directly to Wasm that does not do any further codegen/JIT at runtime. Thus, it is suitable for the per-request isolation model (new Wasm instance for each request, with no shared state).

cfallin requested a review from JakeChampion July 31, 2024 03:19

cfallin mentioned this pull request Jul 31, 2024

PBL: save PC in interpreter frame before all ICs. #55

Merged

This was referenced Jul 31, 2024

Update to include some gecko-dev fixes for StarlingMonkey. bytecodealliance/spidermonkey-wasi-embedding#19

Merged

Support weval-based ahead-of-time compilation of JavaScript. bytecodealliance/StarlingMonkey#91

Merged

JakeChampion approved these changes Jul 31, 2024

View reviewed changes

cfallin merged commit b44d44f into ff-127-0-2 Jul 31, 2024

cfallin deleted the cfallin/fix-if-conversion-on-pbl-branches branch July 31, 2024 16:42

cfallin mentioned this pull request Aug 1, 2024

Experimental: add ahead-of-time compilation via weval and PBL. fastly/js-compute-runtime#881

Closed

cfallin mentioned this pull request Aug 1, 2024

Experimental: add ahead-of-time compilation via weval and PBL. fastly/js-compute-runtime#883

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PBL/weval: value-specialize on two-way conditional inputs to avoid issues with if-conversion.#54

PBL/weval: value-specialize on two-way conditional inputs to avoid issues with if-conversion.#54
cfallin merged 1 commit intoff-127-0-2from
cfallin/fix-if-conversion-on-pbl-branches

cfallin commented Jul 31, 2024

Uh oh!

JakeChampion left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cfallin commented Jul 31, 2024

Uh oh!

JakeChampion left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants