Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Cranelift over to regalloc2. #3989

Merged
merged 26 commits into from
Apr 14, 2022

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Apr 2, 2022

This is a draft PR for now, meant to serve as a discussion-starter. I'll work on splitting this into logically separate commits next week, but wanted to get the initial thing up first.

All tests pass on x86-64, aarch64, and s390x (at least locally on Linux, modulo any CI surprises) and performance numbers from #3942 still apply.

There is a summary of the design changes in this document (I'll turn that into more permanent documentation before this merges).

Closes #3942.

Requires bytecodealliance/regalloc2#38 and subsequent crate version bump/release. (done.)

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:meta Everything related to the meta-language. labels Apr 2, 2022
@cfallin cfallin force-pushed the regalloc2-clean branch 4 times, most recently from 7f56594 to b3e1e8a Compare April 4, 2022 21:01
@cfallin
Copy link
Member Author

cfallin commented Apr 4, 2022

@fitzgen I've split this into somewhat more reasonably-sized chunks; I think this should be more manageable, but let me know if you want me to try to factor the core changes more finely still.

I'll switch this out of 'draft' mode once bytecodealliance/regalloc2#38 is reviewed and merged and I can switch the dep back to a crate version here.

@cfallin cfallin marked this pull request as ready for review April 4, 2022 23:44
@cfallin cfallin force-pushed the regalloc2-clean branch 3 times, most recently from 4c22cf2 to be9ee18 Compare April 5, 2022 18:47
@cfallin
Copy link
Member Author

cfallin commented Apr 5, 2022

@fitzgen I've split the work into more-or-less separate chunks, and cleaned up the last CI nits, so I think this is ready for review now!

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! r=me with comments addressed. Thanks Chris!

@cfallin
Copy link
Member Author

cfallin commented Apr 13, 2022

Ideally VCode::emit would take a &self. Right now it consumes the VCode (takes self) only because the ABICallee saves some state when generating the prologue (it only computes clobbers and hence frame size at that point) and I didn't want to play tricks with cells, or clone it or whatnot, to make this work.

Can you file a follow up issue for this?

Yep, filed #4024.

…st baking in assumptions in the rest of the code.
cfallin added a commit to cfallin/oss-fuzz that referenced this pull request Apr 13, 2022
We are currently (bytecodealliance/wasmtime#3989) switching over to a
new register allocator in Cranelift/wasmtime. This PR switches our
fuzzing setup to start fuzzing the new allocator instead of the old one.
@cfallin
Copy link
Member Author

cfallin commented Apr 14, 2022

I believe I've resolved all pending review comments now, with a few followup issues filed as well. Thanks @fitzgen for the speedy and helpful review!

I'll hold off on merging this until my morning (PDT) so I'm around just out of caution, but given all of the fuzzing and testing I am cautiously optimistic this should be uneventful...

jonathanmetzman pushed a commit to google/oss-fuzz that referenced this pull request Apr 14, 2022
We are currently (bytecodealliance/wasmtime#3989) switching over to a
new register allocator in Cranelift/wasmtime. This PR switches our
fuzzing setup to start fuzzing the new allocator instead of the old one.
@cfallin cfallin merged commit a0318f3 into bytecodealliance:main Apr 14, 2022
@cfallin cfallin deleted the regalloc2-clean branch April 14, 2022 17:28
cfallin added a commit to cfallin/wasmtime that referenced this pull request Apr 15, 2022
…Buffer.

Following the merge of regalloc2 support, this became slower because we
are stricter about the critical-edge invariant, generating a separate
edge block for every out-edge even if two or more out-edges go to the
same successor (this is significant in cases of `br_table` with many
entries having the same target block, for example).

Many of those edge blocks are empty and end up collapsed by the
MachBuffer, which leads to a large set of aliased labels.

The invariant validation will dutifully iterate over all the data
structures at every step, validating all of our conditions. But this
gets way slower in the new context, to the point that we'll probably
have some fuzz timeouts.

This was pointed out in [1] but I missed removing this in bytecodealliance#3989. Given
that `MachBuffer` has been around for nearly two years now, has been
fuzzed continuously with the invariant validation for that time, and
also has a correctness proof in the comments, it's probably reasonable
to remove this high (recently increased) cost from the fuzzing-specific
compilation configuration.

[1]
bytecodealliance#3989 (comment)
alexcrichton pushed a commit that referenced this pull request Apr 15, 2022
…Buffer. (#4038)

Following the merge of regalloc2 support, this became slower because we
are stricter about the critical-edge invariant, generating a separate
edge block for every out-edge even if two or more out-edges go to the
same successor (this is significant in cases of `br_table` with many
entries having the same target block, for example).

Many of those edge blocks are empty and end up collapsed by the
MachBuffer, which leads to a large set of aliased labels.

The invariant validation will dutifully iterate over all the data
structures at every step, validating all of our conditions. But this
gets way slower in the new context, to the point that we'll probably
have some fuzz timeouts.

This was pointed out in [1] but I missed removing this in #3989. Given
that `MachBuffer` has been around for nearly two years now, has been
fuzzed continuously with the invariant validation for that time, and
also has a correctness proof in the comments, it's probably reasonable
to remove this high (recently increased) cost from the fuzzing-specific
compilation configuration.

[1]
#3989 (comment)
MartinPetkov pushed a commit to MartinPetkov/oss-fuzz that referenced this pull request Aug 15, 2022
We are currently (bytecodealliance/wasmtime#3989) switching over to a
new register allocator in Cranelift/wasmtime. This PR switches our
fuzzing setup to start fuzzing the new allocator instead of the old one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transition to regalloc2
3 participants