Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor AArch64 ABI support to extract common bits for shared impl with x64. #2128

Merged
merged 1 commit into from
Aug 15, 2020

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Aug 13, 2020

We have observed that the ABI implementations for AArch64 and x64 are
very similar; in fact, x64's implementation started as a modified copy
of AArch64's implementation. This is an artifact of both a similar ABI
(both machines pass args and return values in registers first, then the
stack, and both machines give considerable freedom with stack-frame
layout) and a too-low-level ABI abstraction in the existing design. For
machines that fit the mainstream or most common ABI-design idioms, we
should be able to do much better.

This commit factors AArch64 into machine-specific and
machine-independent parts, but does not yet modify x64; that will come
next.

This should be completely neutral with respect to compile time and
generated code performance.

@cfallin cfallin added cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. labels Aug 13, 2020
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Aug 13, 2020
@github-actions
Copy link

Subscribe to Label Action

cc @bnjbvr

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst"

Thus the following users have been cc'd because of the following labels:

  • bnjbvr: cranelift

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

cranelift/codegen/src/machinst/helpers.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/helpers.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/helpers.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/abi_impl.rs Show resolved Hide resolved
enum ArgsOrRets {
Args,
Rets,
impl Into<MemArg> for StackMemArg {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my top-level comments regarding MemArg.

Copy link
Contributor

@julian-seward1 julian-seward1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to see this being tidied up. I expect this will work out well, and pretty simply, for almost all targets that the new backend could plausibly be ported to.

From a logic point of view it looks OK, to the limited extent to which I can assess it. I do have come concerns about naming and commenting, though.


It concerns me that there is both MemArg and StackMemArg. If an argument is in memory, it must be on the stack; where else could it be?

Later .. I see that MemArg is an AArch64-specific thing, an address mode. How would you feel about renaming it to Amode? "MemArg" isn't really standard terminology AFAICS. Hennessy and Patterson use the term "address mode" consistently to refer to those parts of an instruction which together specify the address of an in-memory operand. Our new x64 port also uses Amode (well, actually SyntheticAmode, but still) and I think for overall code consistency it would be better for the aarch64 code to use it too.

Also, the current naming gives potential confusion .. MemArg just refers to some operand in memory, so the Arg is "argument to this insn", whereas the Arg in StackMemArg means "formal parameter to this function".


Word sizes: I commented in-line that it would be helpful to mark all known places where this is 32-bit unclean. I should also add that maybe the top level block comment should make it clear that, at least for now, this is for 64-bit targets only, and is likely not to work unmodified for 32-bit targets.


Comments: there's an in-line comment about comments being too tied to aarch64/x64.

Copy link
Contributor

@julian-seward1 julian-seward1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrading to r+ modulo renaming of aarch64-specific MemArg.

@cfallin
Copy link
Member Author

cfallin commented Aug 14, 2020

Thanks for the detailed review!

…ith x64.

We have observed that the ABI implementations for AArch64 and x64 are
very similar; in fact, x64's implementation started as a modified copy
of AArch64's implementation. This is an artifact of both a similar ABI
(both machines pass args and return values in registers first, then the
stack, and both machines give considerable freedom with stack-frame
layout) and a too-low-level ABI abstraction in the existing design. For
machines that fit the mainstream or most common ABI-design idioms, we
should be able to do much better.

This commit factors AArch64 into machine-specific and
machine-independent parts, but does not yet modify x64; that will come
next.

This should be completely neutral with respect to compile time and
generated code performance.
@cfallin cfallin merged commit ac6539a into bytecodealliance:main Aug 15, 2020
@uweigand
Copy link
Member

I just noticed that the new interface makes it difficult to implement the IBM Z ABI (which I already had working with the old interface), because it is now (it seems) overly prescriptive in how the stack has to be layed out.

Specifically, the split of gen_prologue into gen_prologue_frame_setup and gen_clobber_save is not a good idea on Z. This seems to enforce the idea that clobbered registers must be saved below stackslots and spillslots. In the Z ABI, however, the caller provides a register save area (via a biased stack pointer) on function entry that the callee can (and should) use to save clobbered registers. So the sequence is usually: 1) save clobbers (via a single STORE MULTIPLE instruction, which also includes the old SP value, so that the LOAD MULTIPLE in the epilog will simultaneously restore clobbers and de-allocate the stack frame) 2) allocate stack frame by decrementing SP.

This doesn't work with the new interface any more. In gen_prologue_frame_setup I cannot save the clobbers, because I don't even get the list here. But in gen_clobber_save, common code has already allocated the stack frame, so I cannot save the original SP any more.

Some of the other refactoring in the patch makes sense also on Z, but I do believe a generic "gen_prologue" interface that allows the target freedom in how to allocate its stack frame is still necessary to handle a variety of targets.

(As an aside, I'm not sure that saving clobbers below stackslots and spillslots is a good idea on any platform; neither GCC nor LLVM do it this way. In general, you want the frequently accessed areas at short distances off the stack pointer so they can be accessed with small displacement instructions, on ISAs where this is an issue. The clobbered register save areas are usually the least frequently accessed slots ...)

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 17, 2020

The ABIBody trait is not changed. This PR only introduces a new ABIBodyImpl struct that can help with getting an ABIBody impl when the ABI follows certain rules. If it doesn't follow these rules, you can still manually implement ABIBody.

@uweigand
Copy link
Member

Yes, I got that. Still, it would be nice to be able to make use of the new ABIBodyImpl, since a lot of that refactoring is in fact useful on Z as well. It just goes a little bit too far with the prolog/epilog details.

@cfallin
Copy link
Member Author

cfallin commented Aug 17, 2020

@uweigand Thanks for the feedback -- indeed, I'm happy to refactor things as needed to support other architectures. (The layout issue you note with clobbers is also preexisting and had to do with constraints on what we know at what time -- though I think we can do better now that we track nominal-SP.)

In more detail, this was mostly motivated by the fact that the x64 ABI had copied most of the AArch64 ABI verbatim, so this was a long-overdue refactor to fix that particular problem. As additional architectures come online, we can shift things around to be more flexible!

@uweigand
Copy link
Member

@cfallin I've now managed to move the Z backend to the new ABI helper while still generating the same code. I've had to change a number of things:

  • Pass call_conv and clobbers to the gen_prologue_frame_setup and gen_epilogue_frame_restore helpers as well.
  • In addition, pass the frame size to gen_epilogue_frame_restore (this could possibly be avoided, but it seems logical to have the size available here, given that this routine is responsible for deallocating the frame).
  • Allow the machine to specify a different stack alignment than 16.
  • Fix a bug in the argument extension code.

As to the that last bug in argument extension: The current code, when handling a ABIArg::Stack argument requiring extension, will perform an in-register extension of the argument value to I64, and then generate a store using the original type. This means the extension is in fact just ignored ... To fix this, the store needs to happen in I64 instead of the original type. This bug is present both in gen_copy_reg_to_retval and emit_copy_reg_to_arg.

@cfallin
Copy link
Member Author

cfallin commented Aug 17, 2020

Thanks! I'm happy to review a PR that does all of that, if you'd like. I'll plan to fix the argument-extension bug right now. (In general, for small design changes like passing additional args, I don't really think we need to have a design discussion first; I can just r+ a PR right away.)

cfallin added a commit to cfallin/wasmtime that referenced this pull request Aug 18, 2020
…h AArch64.

Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Aug 18, 2020
…h AArch64.

Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Aug 20, 2020
…h AArch64.

Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Aug 20, 2020
…h AArch64.

Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Aug 20, 2020
…h AArch64.

Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Aug 20, 2020
…h AArch64.

Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Sep 9, 2020
…h AArch64.

Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Sep 9, 2020
…h AArch64.

Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
@uweigand
Copy link
Member

uweigand commented Nov 2, 2020

Thanks! I'm happy to review a PR that does all of that, if you'd like. I'll plan to fix the argument-extension bug right now. (In general, for small design changes like passing additional args, I don't really think we need to have a design discussion first; I can just r+ a PR right away.)

@cfallin I've now opened two PRs:
#2345
#2346

The other changes discussed above are no longer necessary in the current code base.

cfallin added a commit that referenced this pull request Nov 30, 2020
…h AArch64.

Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants