Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift: add support for the Mac aarch64 calling convention #2742

Merged
merged 1 commit into from
Mar 22, 2021

Conversation

bnjbvr
Copy link
Member

@bnjbvr bnjbvr commented Mar 19, 2021

This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).

Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.

Fixes #2734.

(Note: more work will likely be needed to accommodate non-wasm uses: sign- or zero- extension of < 32 bits arguments + i128 registers proper register passing. Happy to try PRs and confirm they work or not here.)

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:meta Everything related to the meta-language. labels Mar 19, 2021
Cargo.lock Outdated
@@ -127,9 +127,9 @@ checksum = "23b62fc65de8e4e7f52534fb52b0f3ed04746ae267519eef2a83941e8085068b"

[[package]]
name = "async-trait"
version = "0.1.42"
version = "0.1.48"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bumps more than target-lexicon. Could the other updates be split into a new PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just ran cargo update. I'm happy to do so, or revert the updates unrelated to target-lexicon. But in general patch bumps shouldn't bring in breaking changes (that's unfortunately not the case for this target-lexicon patch bump, which brings a breaking API change), what advantages do you see in doing so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 0.x.y, y bumps may still add features. Only breaking changes are not allowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened bytecodealliance/target-lexicon#71 to discuss this, since in fact I think that adding the new enum variant is a breaking API change (this may break users' match statements, since the enum isn't marked non-exhaustive), but it's subtle. In any case, I've reverted the other packages updates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bumped target-lexicon to 0.12.0, which is the new pre-major release for this new enum variant.

let size = std::cmp::max(size, 8);
// Align.

let size = if call_conv != isa::CallConv::AppleAarch64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When depending on new features of a dependency could you also update the dependency requirement in Cargo.toml? That would be useful for users like cg_clif that only update a single crate at a time and keep the rest pinned using Cargo.lock.

// MacOS aarch64 allows stack slots with sizes less than 8
// bytes. They still need to be properly aligned on their
// natural data alignment, though.
size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the fast and cold call conv use this too? They are unstable anyway and this saves stack usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? It's a bit out of scope for this PR, so I'll try not to accidentally introduce new changes there. Plus, I am not sure if these conventions are used; I seem to recall that there fast implies the default calling convention in some cases, and if we're not being careful that might mean subtly breaking other calling conventions. We should probably audit the "fast"/"cold" calling conventions at some point, and re-design them from the ground up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 -- I actually think we should do something about the "fast" and "cold" conventions, but we should be a little more explicit in designing them, and probably give them better names. I'm not a huge fan of generic terms like "fast ABI" because the ambiguity is confusing -- both on the user/embedder side ("what guarantees does this have? how fast is fast? when can I use it?") and on the implementer side (do we just choose an array of features that lead to better speed, and add more as we think of them? IMHO an evolving ABI is a recipe for subtle bugs as we mutate invariants over time).

So I'd rather design a "fast internal Cranelift ABI v1", implement it, and then keep that as a first-class, well-defined ABI alongside the others, and retire names like "fast" and "cold", just for clarity's sake. But, that's a deeper discussion for another day, I think!

@github-actions github-actions bot added the cranelift:area:x64 Issues related to x64 codegen label Mar 19, 2021
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Overall looks good and I'm very happy to see macOS/aarch64 gain support!

One small thing, in addition to the comment below: could you document the parts of the ABI that we don't do yet (the extension behavior specifically, and i128 differences?), perhaps linking to an issue, so we don't lose track of it?

@@ -171,6 +171,9 @@ impl ABIMachineSpec for AArch64MachineDeps {
let has_baldrdash_tls = call_conv == isa::CallConv::Baldrdash2020;

// See AArch64 ABI (https://c9x.me/compile/bib/abi-arm64.pdf), sections 5.4.
// MacOS aarch64 is slightly different, see also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed here: "If the total number of bytes for stack-based arguments is not a multiple of 8 bytes, insert padding on the stack to maintain the 8-byte alignment requirements." However below we align the final stack-arg area size up to a 16-byte alignment.

(Related to this, my understanding is that the trap-on-not-16-aligned-SP behavior of aarch64 is configurable with a mode bit as well; maybe this means Apple runs with only an 8-aligned stack?)

I think this is OK as it should be fine to reserve extra space at a callsite, but we should document that we diverge and why it's OK (and verify that it is!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it maintains our invariant that the stack is always allocated in 16-bytes chunked, thus always aligned, which is nice. (Otherwise would require more changes when generating prologues and epilogues.)

cranelift/codegen/src/isa/call_conv.rs Outdated Show resolved Hide resolved
This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).

Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.
@github-actions github-actions bot added cranelift:module cranelift:wasm fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself labels Mar 19, 2021
@github-actions
Copy link

Subscribe to Label Action

cc @fitzgen, @peterhuene

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "cranelift:meta", "cranelift:module", "cranelift:wasm", "fuzzing", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing
  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@bnjbvr bnjbvr merged commit 6e6713a into bytecodealliance:main Mar 22, 2021
@bnjbvr bnjbvr deleted the mac-aarch64 branch March 22, 2021 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:module cranelift:wasm cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mac/aarch64: incorrect argument values when calling host with 8+ arguments
4 participants