Switch Cranelift over to regalloc2. #3989

cfallin · 2022-04-02T05:18:39Z

This is a draft PR for now, meant to serve as a discussion-starter. I'll work on splitting this into logically separate commits next week, but wanted to get the initial thing up first.

All tests pass on x86-64, aarch64, and s390x (at least locally on Linux, modulo any CI surprises) and performance numbers from #3942 still apply.

There is a summary of the design changes in this document (I'll turn that into more permanent documentation before this merges).

Closes #3942.

~~Requires bytecodealliance/regalloc2#38 and subsequent crate version bump/release.~~ (done.)

cranelift/codegen/meta/src/shared/settings.rs

cfallin · 2022-04-04T21:03:06Z

@fitzgen I've split this into somewhat more reasonably-sized chunks; I think this should be more manageable, but let me know if you want me to try to factor the core changes more finely still.

I'll switch this out of 'draft' mode once bytecodealliance/regalloc2#38 is reviewed and merged and I can switch the dep back to a crate version here.

…loc2).

… lowering machinery for regalloc2.

…egy).

…ted code.

cfallin · 2022-04-05T19:50:11Z

@fitzgen I've split the work into more-or-less separate chunks, and cleaned up the last CI nits, so I think this is ready for review now!

cranelift/codegen/src/machinst/buffer.rs

fitzgen

LGTM! r=me with comments addressed. Thanks Chris!

…and removes needless churn in precise-output tests.

…1.1 with serde support.

cfallin · 2022-04-13T19:49:10Z

Ideally VCode::emit would take a &self. Right now it consumes the VCode (takes self) only because the ABICallee saves some state when generating the prologue (it only computes clobbers and hence frame size at that point) and I didn't want to play tricks with cells, or clone it or whatnot, to make this work.

Can you file a follow up issue for this?

Yep, filed #4024.

…st baking in assumptions in the rest of the code.

We are currently (bytecodealliance/wasmtime#3989) switching over to a new register allocator in Cranelift/wasmtime. This PR switches our fuzzing setup to start fuzzing the new allocator instead of the old one.

…g_clobber()` instead.

cfallin · 2022-04-14T01:06:54Z

I believe I've resolved all pending review comments now, with a few followup issues filed as well. Thanks @fitzgen for the speedy and helpful review!

I'll hold off on merging this until my morning (PDT) so I'm around just out of caution, but given all of the fuzzing and testing I am cautiously optimistic this should be uneventful...

We are currently (bytecodealliance/wasmtime#3989) switching over to a new register allocator in Cranelift/wasmtime. This PR switches our fuzzing setup to start fuzzing the new allocator instead of the old one.

…ew assert.

…Buffer. Following the merge of regalloc2 support, this became slower because we are stricter about the critical-edge invariant, generating a separate edge block for every out-edge even if two or more out-edges go to the same successor (this is significant in cases of `br_table` with many entries having the same target block, for example). Many of those edge blocks are empty and end up collapsed by the MachBuffer, which leads to a large set of aliased labels. The invariant validation will dutifully iterate over all the data structures at every step, validating all of our conditions. But this gets way slower in the new context, to the point that we'll probably have some fuzz timeouts. This was pointed out in [1] but I missed removing this in bytecodealliance#3989. Given that `MachBuffer` has been around for nearly two years now, has been fuzzed continuously with the invariant validation for that time, and also has a correctness proof in the comments, it's probably reasonable to remove this high (recently increased) cost from the fuzzing-specific compilation configuration. [1] bytecodealliance#3989 (comment)

…Buffer. (#4038) Following the merge of regalloc2 support, this became slower because we are stricter about the critical-edge invariant, generating a separate edge block for every out-edge even if two or more out-edges go to the same successor (this is significant in cases of `br_table` with many entries having the same target block, for example). Many of those edge blocks are empty and end up collapsed by the MachBuffer, which leads to a large set of aliased labels. The invariant validation will dutifully iterate over all the data structures at every step, validating all of our conditions. But this gets way slower in the new context, to the point that we'll probably have some fuzz timeouts. This was pointed out in [1] but I missed removing this in #3989. Given that `MachBuffer` has been around for nearly two years now, has been fuzzed continuously with the invariant validation for that time, and also has a correctness proof in the comments, it's probably reasonable to remove this high (recently increased) cost from the fuzzing-specific compilation configuration. [1] #3989 (comment)

We are currently (bytecodealliance/wasmtime#3989) switching over to a new register allocator in Cranelift/wasmtime. This PR switches our fuzzing setup to start fuzzing the new allocator instead of the old one.

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:meta Everything related to the meta-language. labels Apr 2, 2022

bjorn3 reviewed Apr 2, 2022

View reviewed changes

cranelift/codegen/meta/src/shared/settings.rs Show resolved Hide resolved

cfallin force-pushed the regalloc2-clean branch 4 times, most recently from 7f56594 to b3e1e8a Compare April 4, 2022 21:01

cfallin force-pushed the regalloc2-clean branch from b3e1e8a to 63e5c5b Compare April 4, 2022 23:43

cfallin marked this pull request as ready for review April 4, 2022 23:44

cfallin added 9 commits April 4, 2022 16:45

Switch from regalloc.rs to regalloc2 dependency.

df47799

Remove regalloc algorithm setting (we now only have one option, regal…

f6c5bd7

…loc2).

Add Reg/VirtualReg/RealReg abstractions on top of regalloc2's types.

52d7f2c

Update core VCode and MachInst definitions, VCode implementation, and…

a985b41

… lowering machinery for regalloc2.

Remove DeferredDisplay (no longer needed with new disassembly strat…

519c12e

…egy).

ABI-code changes for regalloc2.

ef2dff7

Minor updates to ISLE rules and glue, and regeneration of ISLE genera…

747edaf

…ted code.

aarch64 Inst-abstraction updates

c5c248c

s390x Inst-abstraction updates

62a0cf2

cfallin force-pushed the regalloc2-clean branch 3 times, most recently from 4c22cf2 to be9ee18 Compare April 5, 2022 18:47

cfallin added 4 commits April 5, 2022 12:09

x64 Inst-abstraction updates

b9a1f72

Unwind updates for regalloc2.

44ac283

Test updates.

4efc608

Add doc describing design changes for regalloc2 integration.

d4a5f0e

cfallin force-pushed the regalloc2-clean branch from be9ee18 to d4a5f0e Compare April 5, 2022 19:09

bjorn3 reviewed Apr 6, 2022

View reviewed changes

cranelift/codegen/src/machinst/buffer.rs Show resolved Hide resolved

fitzgen approved these changes Apr 13, 2022

View reviewed changes

cfallin added 3 commits April 13, 2022 12:27

Remove vreg aliases from disassembly: no vregs should appear anyway, …

1d5ea7b

…and removes needless churn in precise-output tests.

Update filetests.

4bc1ad6

Add serde support back to cranelift-codegen types, using regalloc2 0.…

76993b3

…1.1 with serde support.

cfallin mentioned this pull request Apr 13, 2022

Cranelift: VCode: complete the transition to fully immutable code-emission design #4024

Open

Hide the pinned-register indexing scheme a bit better, to guard again…

a84b122

…st baking in assumptions in the rest of the code.

cfallin mentioned this pull request Apr 13, 2022

wasmtime: update regalloc fuzzer to use regalloc2. google/oss-fuzz#7568

Merged

cfallin added 7 commits April 13, 2022 16:41

Implement From conversions for register types and use them.

05f5f4c

Use enum for VCode build direction.

ade240b

Remove MachInst::get_clobbers() and implement `OperandCollector::re…

3dafbd3

…g_clobber()` instead.

Rename MachInst::type_for_rc() to canonical_type_for_rc().

e09b71f

Address remainder of code-review feedback.

9aa5dda

Remove VCodeBuildDirection::Forward as it is dead code

d5fe840

fix typo, and remove integration doc

15aef15

Fix some pretty-printing AllocationConsumer misalignments caught by n…

ac3863b

…ew assert.

cfallin merged commit a0318f3 into bytecodealliance:main Apr 14, 2022

cfallin deleted the regalloc2-clean branch April 14, 2022 17:28

cfallin mentioned this pull request Apr 15, 2022

Cranelift: remove slow invariant validation in cfg(fuzzing) from MachBuffer. #4038

Merged

alexcrichton mentioned this pull request Jun 3, 2022

wasmtime 0.37 has much smaller call stack limit than 0.36 #4214

Closed

alexcrichton mentioned this pull request Jun 21, 2022

memory usage change in 0.37.0? bytecodealliance/wasmtime-go#132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch Cranelift over to regalloc2. #3989

Switch Cranelift over to regalloc2. #3989

cfallin commented Apr 2, 2022 •

edited

Loading

cfallin commented Apr 4, 2022

cfallin commented Apr 5, 2022

fitzgen left a comment

cfallin commented Apr 13, 2022

cfallin commented Apr 14, 2022

Switch Cranelift over to regalloc2. #3989

Switch Cranelift over to regalloc2. #3989

Conversation

cfallin commented Apr 2, 2022 • edited Loading

cfallin commented Apr 4, 2022

cfallin commented Apr 5, 2022

fitzgen left a comment

Choose a reason for hiding this comment

cfallin commented Apr 13, 2022

cfallin commented Apr 14, 2022

cfallin commented Apr 2, 2022 •

edited

Loading