Refactor AArch64 ABI support to extract common bits for shared impl with x64. #2128

cfallin · 2020-08-13T04:06:58Z

We have observed that the ABI implementations for AArch64 and x64 are
very similar; in fact, x64's implementation started as a modified copy
of AArch64's implementation. This is an artifact of both a similar ABI
(both machines pass args and return values in registers first, then the
stack, and both machines give considerable freedom with stack-frame
layout) and a too-low-level ABI abstraction in the existing design. For
machines that fit the mainstream or most common ABI-design idioms, we
should be able to do much better.

This commit factors AArch64 into machine-specific and
machine-independent parts, but does not yet modify x64; that will come
next.

This should be completely neutral with respect to compile time and
generated code performance.

github-actions · 2020-08-13T04:27:56Z

Subscribe to Label Action

cc @bnjbvr

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst"

Thus the following users have been cc'd because of the following labels:

bnjbvr: cranelift

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

cranelift/codegen/src/machinst/helpers.rs

cranelift/codegen/src/machinst/abi_impl.rs

julian-seward1 · 2020-08-14T06:12:42Z

cranelift/codegen/src/isa/aarch64/abi.rs

-enum ArgsOrRets {
-    Args,
-    Rets,
+impl Into<MemArg> for StackMemArg {


See my top-level comments regarding MemArg.

cranelift/codegen/src/isa/aarch64/abi.rs

cranelift/codegen/src/machinst/abi_impl.rs

cranelift/codegen/src/machinst/helpers.rs

cranelift/codegen/src/machinst/abi_impl.rs

julian-seward1

It's nice to see this being tidied up. I expect this will work out well, and pretty simply, for almost all targets that the new backend could plausibly be ported to.

From a logic point of view it looks OK, to the limited extent to which I can assess it. I do have come concerns about naming and commenting, though.

It concerns me that there is both MemArg and StackMemArg. If an argument is in memory, it must be on the stack; where else could it be?

Later .. I see that MemArg is an AArch64-specific thing, an address mode. How would you feel about renaming it to Amode? "MemArg" isn't really standard terminology AFAICS. Hennessy and Patterson use the term "address mode" consistently to refer to those parts of an instruction which together specify the address of an in-memory operand. Our new x64 port also uses Amode (well, actually SyntheticAmode, but still) and I think for overall code consistency it would be better for the aarch64 code to use it too.

Also, the current naming gives potential confusion .. MemArg just refers to some operand in memory, so the Arg is "argument to this insn", whereas the Arg in StackMemArg means "formal parameter to this function".

Word sizes: I commented in-line that it would be helpful to mark all known places where this is 32-bit unclean. I should also add that maybe the top level block comment should make it clear that, at least for now, this is for 64-bit targets only, and is likely not to work unmodified for 32-bit targets.

Comments: there's an in-line comment about comments being too tied to aarch64/x64.

julian-seward1

Upgrading to r+ modulo renaming of aarch64-specific MemArg.

cfallin · 2020-08-14T21:38:26Z

Thanks for the detailed review!

…ith x64. We have observed that the ABI implementations for AArch64 and x64 are very similar; in fact, x64's implementation started as a modified copy of AArch64's implementation. This is an artifact of both a similar ABI (both machines pass args and return values in registers first, then the stack, and both machines give considerable freedom with stack-frame layout) and a too-low-level ABI abstraction in the existing design. For machines that fit the mainstream or most common ABI-design idioms, we should be able to do much better. This commit factors AArch64 into machine-specific and machine-independent parts, but does not yet modify x64; that will come next. This should be completely neutral with respect to compile time and generated code performance.

uweigand · 2020-08-17T13:11:28Z

I just noticed that the new interface makes it difficult to implement the IBM Z ABI (which I already had working with the old interface), because it is now (it seems) overly prescriptive in how the stack has to be layed out.

Specifically, the split of gen_prologue into gen_prologue_frame_setup and gen_clobber_save is not a good idea on Z. This seems to enforce the idea that clobbered registers must be saved below stackslots and spillslots. In the Z ABI, however, the caller provides a register save area (via a biased stack pointer) on function entry that the callee can (and should) use to save clobbered registers. So the sequence is usually: 1) save clobbers (via a single STORE MULTIPLE instruction, which also includes the old SP value, so that the LOAD MULTIPLE in the epilog will simultaneously restore clobbers and de-allocate the stack frame) 2) allocate stack frame by decrementing SP.

This doesn't work with the new interface any more. In gen_prologue_frame_setup I cannot save the clobbers, because I don't even get the list here. But in gen_clobber_save, common code has already allocated the stack frame, so I cannot save the original SP any more.

Some of the other refactoring in the patch makes sense also on Z, but I do believe a generic "gen_prologue" interface that allows the target freedom in how to allocate its stack frame is still necessary to handle a variety of targets.

(As an aside, I'm not sure that saving clobbers below stackslots and spillslots is a good idea on any platform; neither GCC nor LLVM do it this way. In general, you want the frequently accessed areas at short distances off the stack pointer so they can be accessed with small displacement instructions, on ISAs where this is an issue. The clobbered register save areas are usually the least frequently accessed slots ...)

bjorn3 · 2020-08-17T13:27:47Z

The ABIBody trait is not changed. This PR only introduces a new ABIBodyImpl struct that can help with getting an ABIBody impl when the ABI follows certain rules. If it doesn't follow these rules, you can still manually implement ABIBody.

uweigand · 2020-08-17T14:08:35Z

Yes, I got that. Still, it would be nice to be able to make use of the new ABIBodyImpl, since a lot of that refactoring is in fact useful on Z as well. It just goes a little bit too far with the prolog/epilog details.

cfallin · 2020-08-17T15:47:08Z

@uweigand Thanks for the feedback -- indeed, I'm happy to refactor things as needed to support other architectures. (The layout issue you note with clobbers is also preexisting and had to do with constraints on what we know at what time -- though I think we can do better now that we track nominal-SP.)

In more detail, this was mostly motivated by the fact that the x64 ABI had copied most of the AArch64 ABI verbatim, so this was a long-overdue refactor to fix that particular problem. As additional architectures come online, we can shift things around to be more flexible!

uweigand · 2020-08-17T19:59:24Z

@cfallin I've now managed to move the Z backend to the new ABI helper while still generating the same code. I've had to change a number of things:

Pass call_conv and clobbers to the gen_prologue_frame_setup and gen_epilogue_frame_restore helpers as well.
In addition, pass the frame size to gen_epilogue_frame_restore (this could possibly be avoided, but it seems logical to have the size available here, given that this routine is responsible for deallocating the frame).
Allow the machine to specify a different stack alignment than 16.
Fix a bug in the argument extension code.

As to the that last bug in argument extension: The current code, when handling a ABIArg::Stack argument requiring extension, will perform an in-register extension of the argument value to I64, and then generate a store using the original type. This means the extension is in fact just ignored ... To fix this, the store needs to happen in I64 instead of the original type. This bug is present both in gen_copy_reg_to_retval and emit_copy_reg_to_arg.

cfallin · 2020-08-17T20:24:36Z

Thanks! I'm happy to review a PR that does all of that, if you'd like. I'll plan to fix the argument-extension bug right now. (In general, for small design changes like passing additional args, I don't really think we need to have a design discussion first; I can just r+ a PR right away.)

…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue.

…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.

uweigand · 2020-11-02T12:28:18Z

Thanks! I'm happy to review a PR that does all of that, if you'd like. I'll plan to fix the argument-extension bug right now. (In general, for small design changes like passing additional args, I don't really think we need to have a design discussion first; I can just r+ a PR right away.)

@cfallin I've now opened two PRs:
#2345
#2346

The other changes discussed above are no longer necessary in the current code base.

…h AArch64. Previously, in #2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.

cfallin added cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. labels Aug 13, 2020

cfallin requested a review from julian-seward1 August 13, 2020 04:06

github-actions bot added the cranelift Issues related to the Cranelift code generator label Aug 13, 2020

bjorn3 suggested changes Aug 13, 2020

View reviewed changes