-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor AArch64 ABI support to extract common bits for shared impl with x64. #2128
Refactor AArch64 ABI support to extract common bits for shared impl with x64. #2128
Conversation
Subscribe to Label Actioncc @bnjbvr
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
enum ArgsOrRets { | ||
Args, | ||
Rets, | ||
impl Into<MemArg> for StackMemArg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my top-level comments regarding MemArg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to see this being tidied up. I expect this will work out well, and pretty simply, for almost all targets that the new backend could plausibly be ported to.
From a logic point of view it looks OK, to the limited extent to which I can assess it. I do have come concerns about naming and commenting, though.
It concerns me that there is both MemArg
and StackMemArg
. If an argument is in memory, it must be on the stack; where else could it be?
Later .. I see that MemArg
is an AArch64-specific thing, an address mode. How would you feel about renaming it to Amode
? "MemArg" isn't really standard terminology AFAICS. Hennessy and Patterson use the term "address mode" consistently to refer to those parts of an instruction which together specify the address of an in-memory operand. Our new x64 port also uses Amode
(well, actually SyntheticAmode
, but still) and I think for overall code consistency it would be better for the aarch64 code to use it too.
Also, the current naming gives potential confusion .. MemArg
just refers to some operand in memory, so the Arg
is "argument to this insn", whereas the Arg
in StackMemArg
means "formal parameter to this function".
Word sizes: I commented in-line that it would be helpful to mark all known places where this is 32-bit unclean. I should also add that maybe the top level block comment should make it clear that, at least for now, this is for 64-bit targets only, and is likely not to work unmodified for 32-bit targets.
Comments: there's an in-line comment about comments being too tied to aarch64/x64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrading to r+ modulo renaming of aarch64-specific MemArg
.
36f1af7
to
2f395d6
Compare
Thanks for the detailed review! |
…ith x64. We have observed that the ABI implementations for AArch64 and x64 are very similar; in fact, x64's implementation started as a modified copy of AArch64's implementation. This is an artifact of both a similar ABI (both machines pass args and return values in registers first, then the stack, and both machines give considerable freedom with stack-frame layout) and a too-low-level ABI abstraction in the existing design. For machines that fit the mainstream or most common ABI-design idioms, we should be able to do much better. This commit factors AArch64 into machine-specific and machine-independent parts, but does not yet modify x64; that will come next. This should be completely neutral with respect to compile time and generated code performance.
2f395d6
to
5cf3fba
Compare
I just noticed that the new interface makes it difficult to implement the IBM Z ABI (which I already had working with the old interface), because it is now (it seems) overly prescriptive in how the stack has to be layed out. Specifically, the split of gen_prologue into gen_prologue_frame_setup and gen_clobber_save is not a good idea on Z. This seems to enforce the idea that clobbered registers must be saved below stackslots and spillslots. In the Z ABI, however, the caller provides a register save area (via a biased stack pointer) on function entry that the callee can (and should) use to save clobbered registers. So the sequence is usually: 1) save clobbers (via a single STORE MULTIPLE instruction, which also includes the old SP value, so that the LOAD MULTIPLE in the epilog will simultaneously restore clobbers and de-allocate the stack frame) 2) allocate stack frame by decrementing SP. This doesn't work with the new interface any more. In gen_prologue_frame_setup I cannot save the clobbers, because I don't even get the list here. But in gen_clobber_save, common code has already allocated the stack frame, so I cannot save the original SP any more. Some of the other refactoring in the patch makes sense also on Z, but I do believe a generic "gen_prologue" interface that allows the target freedom in how to allocate its stack frame is still necessary to handle a variety of targets. (As an aside, I'm not sure that saving clobbers below stackslots and spillslots is a good idea on any platform; neither GCC nor LLVM do it this way. In general, you want the frequently accessed areas at short distances off the stack pointer so they can be accessed with small displacement instructions, on ISAs where this is an issue. The clobbered register save areas are usually the least frequently accessed slots ...) |
The |
Yes, I got that. Still, it would be nice to be able to make use of the new ABIBodyImpl, since a lot of that refactoring is in fact useful on Z as well. It just goes a little bit too far with the prolog/epilog details. |
@uweigand Thanks for the feedback -- indeed, I'm happy to refactor things as needed to support other architectures. (The layout issue you note with clobbers is also preexisting and had to do with constraints on what we know at what time -- though I think we can do better now that we track nominal-SP.) In more detail, this was mostly motivated by the fact that the x64 ABI had copied most of the AArch64 ABI verbatim, so this was a long-overdue refactor to fix that particular problem. As additional architectures come online, we can shift things around to be more flexible! |
@cfallin I've now managed to move the Z backend to the new ABI helper while still generating the same code. I've had to change a number of things:
As to the that last bug in argument extension: The current code, when handling a ABIArg::Stack argument requiring extension, will perform an in-register extension of the argument value to I64, and then generate a store using the original type. This means the extension is in fact just ignored ... To fix this, the store needs to happen in I64 instead of the original type. This bug is present both in gen_copy_reg_to_retval and emit_copy_reg_to_arg. |
Thanks! I'm happy to review a PR that does all of that, if you'd like. I'll plan to fix the argument-extension bug right now. (In general, for small design changes like passing additional args, I don't really think we need to have a design discussion first; I can just r+ a PR right away.) |
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue.
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
@cfallin I've now opened two PRs: The other changes discussed above are no longer necessary in the current code base. |
…h AArch64. Previously, in #2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
We have observed that the ABI implementations for AArch64 and x64 are
very similar; in fact, x64's implementation started as a modified copy
of AArch64's implementation. This is an artifact of both a similar ABI
(both machines pass args and return values in registers first, then the
stack, and both machines give considerable freedom with stack-frame
layout) and a too-low-level ABI abstraction in the existing design. For
machines that fit the mainstream or most common ABI-design idioms, we
should be able to do much better.
This commit factors AArch64 into machine-specific and
machine-independent parts, but does not yet modify x64; that will come
next.
This should be completely neutral with respect to compile time and
generated code performance.