Skip to content

[AArch64][SME] Fix accessing the emergency spill slot with hazard padding #142190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 2, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "AArch64Subtarget.h"
#include "MCTargetDesc/AArch64AddressingModes.h"
#include "MCTargetDesc/AArch64InstPrinter.h"
#include "Utils/AArch64SMEAttributes.h"
#include "llvm/ADT/BitVector.h"
#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/CodeGen/LiveRegMatrix.h"
Expand Down Expand Up @@ -632,14 +633,26 @@ bool AArch64RegisterInfo::hasBasePointer(const MachineFunction &MF) const {
return true;

auto &ST = MF.getSubtarget<AArch64Subtarget>();
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
if (ST.hasSVE() || ST.isStreaming()) {
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
// Frames that have variable sized objects and scalable SVE objects,
// should always use a basepointer.
if (!AFI->hasCalculatedStackSizeSVE() || AFI->getStackSizeSVE())
return true;
}

// Frames with hazard padding can have a large offset between the frame
// pointer and GPR locals, which includes the emergency spill slot. If the
// emergency spill slot is not within range of the load/store instructions
// (which have a signed 9-bit range), we will fail to compile if it is used.
// Since hasBasePointer() is called before we know if we have hazard padding
// or an emergency spill slot we need to enable the basepointer
// conservatively.
if (AFI->hasStackHazardSlotIndex() ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of the hasStackHazardSlotIndex() check? As the comment notes, we have to be conservative, so if we have a stack hazard, we must have already emitted a base pointer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a much cheaper check than parsing all the SME attributes and hasBasePointer() is called again repeatedly after we know we have hazard padding (e.g. when resolving frame indexes).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If parsing the SME attributes is expensive, maybe we should store the parsed SME attributes in AFI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd probably be sensible 👍 (the parsed attributes are just a single int), I can do that as a follow-up NFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR: #142362

!SMEAttrs(MF.getFunction()).hasNonStreamingInterfaceAndBody()) {
return true;
}

// Conservatively estimate whether the negative offset from the frame
// pointer will be sufficient to reach. If a function has a smallish
// frame, it's less likely to have lots of spills and callee saved
Expand Down Expand Up @@ -764,7 +777,8 @@ AArch64RegisterInfo::useFPForScavengingIndex(const MachineFunction &MF) const {
assert((!MF.getSubtarget<AArch64Subtarget>().hasSVE() ||
AFI->hasCalculatedStackSizeSVE()) &&
"Expected SVE area to be calculated by this point");
return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->getStackSizeSVE();
return TFI.hasFP(MF) && !hasStackRealignment(MF) && !AFI->getStackSizeSVE() &&
!AFI->hasStackHazardSlotIndex();
}

bool AArch64RegisterInfo::requiresFrameIndexScavenging(
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# RUN: llc -mtriple=aarch64-none-linux-gnu -aarch64-stack-hazard-size=1024 -run-pass=prologepilog %s -o - | FileCheck %s
--- |

define void @stack_hazard_streaming_compat() "aarch64_pstate_sm_compatible" { entry: unreachable }
define void @stack_hazard_streaming_compat_emergency_spill_slot() "aarch64_pstate_sm_compatible" { entry: unreachable }

...

# +------------------+
# | GPR callee-saves |
# +------------------+ <- FP
# | <hazard padding> |
# +------------------+
# | FPR locals |
# | %stack.1 |
# +------------------+
# | <hazard padding> |
# +------------------+
# | GPR locals |
# | %stack.2 |
# | <emergency spill>|
# +------------------+ <- BP
# | <VLA> |
# +------------------+ <- SP (can't be used due to VLA)

# In this case without the base pointer we'd need the emergency spill slot to
# access both %stack.1 and %stack.2. With the base pointer we can reach both
# without spilling.

name: stack_hazard_streaming_compat
# CHECK-LABEL: name: stack_hazard_streaming_compat
# CHECK: bb.0:
# CHECK: STRDui $d0, $x19, 131
# CHECK-NEXT: STRXui $x0, $x19, 1
# CHECK: bb.1:
tracksRegLiveness: true
frameInfo:
isFrameAddressTaken: true
stack:
- { id: 0, type: variable-sized, alignment: 1 }
- { id: 1, size: 8, alignment: 8 }
- { id: 2, size: 8, alignment: 8 }
body: |
bb.0:
liveins: $x0, $x8, $d0
$x9 = LDRXui $x0, 0 :: (load (s64))
STRDui $d0, %stack.1, 0 :: (store (s64) into %stack.1)
STRXui $x0, %stack.2, 0 :: (store (s64) into %stack.2)
B %bb.1
bb.1:
liveins: $x1, $x2, $x3, $x4, $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x18, $x19, $x20, $x21, $x22, $x23, $x24, $x25, $x26, $x27, $x28, $lr
RET_ReallyLR implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26, implicit $x27, implicit $x28, implicit $lr
...
---
# +------------------+
# | GPR callee-saves |
# +------------------+ <- FP
# | <hazard padding> |
# +------------------+
# | FPR locals |
# | %stack.1 |
# +------------------+
# | <hazard padding> |
# +------------------+
# | GPR locals |
# | %stack.2 | (very large)
# | <emergency spill>|
# +------------------+ <- BP
# | <VLA> |
# +------------------+ <- SP (can't be used due to VLA)

# In this case we need to use the emergency spill slot to access %stack.1 as it
# is too far from the frame pointer and the base pointer to directly address.
# Note: This also tests that the <emergency spill> located near the SP/BP.

name: stack_hazard_streaming_compat_emergency_spill_slot
# CHECK-LABEL: name: stack_hazard_streaming_compat_emergency_spill_slot
# CHECK: bb.0:
# CHECK: STRXui killed $[[SCRATCH:x[0-9]+]], $x19, 0
# CHECK-NEXT: $[[SCRATCH]] = ADDXri $x19, 1056, 0
# CHECK-NEXT: STRDui $d0, killed $[[SCRATCH]], 4095
# CHECK-NEXT: $[[SCRATCH]] = LDRXui $x19, 0
# CHECK: bb.1:
tracksRegLiveness: true
frameInfo:
isFrameAddressTaken: true
stack:
- { id: 0, type: variable-sized, alignment: 1 }
- { id: 1, size: 8, alignment: 8 }
- { id: 2, size: 32761, alignment: 8 }
body: |
bb.0:
liveins: $x0, $x8, $d0
$x9 = LDRXui $x0, 0 :: (load (s64))
STRDui $d0, %stack.1, 0 :: (store (s64) into %stack.1)
B %bb.1
bb.1:
liveins: $x0, $x1, $x2, $x3, $x4, $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x18, $x19, $x20, $x21, $x22, $x23, $x24, $x25, $x26, $x27, $x28, $lr
RET_ReallyLR implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26, implicit $x27, implicit $x28, implicit $lr
27 changes: 15 additions & 12 deletions llvm/test/CodeGen/AArch64/sme-agnostic-za.ll
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ define i64 @streaming_agnostic_caller_nonstreaming_private_za_callee(i64 %v) nou
; CHECK-NEXT: mov x0, x9
; CHECK-NEXT: add x29, sp, #64
; CHECK-NEXT: stp x20, x19, [sp, #96] // 16-byte Folded Spill
; CHECK-NEXT: mov x19, sp
; CHECK-NEXT: mov x8, x0
; CHECK-NEXT: bl __arm_sme_state_size
; CHECK-NEXT: sub sp, sp, x0
Expand Down Expand Up @@ -145,51 +146,53 @@ define i64 @streaming_compatible_agnostic_caller_nonstreaming_private_za_callee(
; CHECK-NEXT: stp d9, d8, [sp, #48] // 16-byte Folded Spill
; CHECK-NEXT: stp x29, x30, [sp, #64] // 16-byte Folded Spill
; CHECK-NEXT: bl __arm_get_current_vg
; CHECK-NEXT: str x0, [sp, #80] // 8-byte Folded Spill
; CHECK-NEXT: stp x0, x21, [sp, #80] // 16-byte Folded Spill
; CHECK-NEXT: mov x0, x9
; CHECK-NEXT: add x29, sp, #64
; CHECK-NEXT: stp x20, x19, [sp, #96] // 16-byte Folded Spill
; CHECK-NEXT: mov x19, sp
; CHECK-NEXT: mov x8, x0
; CHECK-NEXT: bl __arm_sme_state_size
; CHECK-NEXT: sub sp, sp, x0
; CHECK-NEXT: mov x19, sp
; CHECK-NEXT: mov x0, x19
; CHECK-NEXT: mov x20, sp
; CHECK-NEXT: mov x0, x20
; CHECK-NEXT: bl __arm_sme_save
; CHECK-NEXT: bl __arm_sme_state
; CHECK-NEXT: and x20, x0, #0x1
; CHECK-NEXT: tbz w20, #0, .LBB5_2
; CHECK-NEXT: and x21, x0, #0x1
; CHECK-NEXT: tbz w21, #0, .LBB5_2
; CHECK-NEXT: // %bb.1:
; CHECK-NEXT: smstop sm
; CHECK-NEXT: .LBB5_2:
; CHECK-NEXT: mov x0, x8
; CHECK-NEXT: bl private_za_decl
; CHECK-NEXT: mov x2, x0
; CHECK-NEXT: tbz w20, #0, .LBB5_4
; CHECK-NEXT: tbz w21, #0, .LBB5_4
; CHECK-NEXT: // %bb.3:
; CHECK-NEXT: smstart sm
; CHECK-NEXT: .LBB5_4:
; CHECK-NEXT: mov x0, x19
; CHECK-NEXT: mov x0, x20
; CHECK-NEXT: bl __arm_sme_restore
; CHECK-NEXT: mov x0, x19
; CHECK-NEXT: mov x0, x20
; CHECK-NEXT: bl __arm_sme_save
; CHECK-NEXT: bl __arm_sme_state
; CHECK-NEXT: and x20, x0, #0x1
; CHECK-NEXT: tbz w20, #0, .LBB5_6
; CHECK-NEXT: and x21, x0, #0x1
; CHECK-NEXT: tbz w21, #0, .LBB5_6
; CHECK-NEXT: // %bb.5:
; CHECK-NEXT: smstop sm
; CHECK-NEXT: .LBB5_6:
; CHECK-NEXT: mov x0, x2
; CHECK-NEXT: bl private_za_decl
; CHECK-NEXT: mov x1, x0
; CHECK-NEXT: tbz w20, #0, .LBB5_8
; CHECK-NEXT: tbz w21, #0, .LBB5_8
; CHECK-NEXT: // %bb.7:
; CHECK-NEXT: smstart sm
; CHECK-NEXT: .LBB5_8:
; CHECK-NEXT: mov x0, x19
; CHECK-NEXT: mov x0, x20
; CHECK-NEXT: bl __arm_sme_restore
; CHECK-NEXT: mov x0, x1
; CHECK-NEXT: sub sp, x29, #64
; CHECK-NEXT: ldp x20, x19, [sp, #96] // 16-byte Folded Reload
; CHECK-NEXT: ldr x21, [sp, #88] // 8-byte Folded Reload
; CHECK-NEXT: ldp x29, x30, [sp, #64] // 16-byte Folded Reload
; CHECK-NEXT: ldp d9, d8, [sp, #48] // 16-byte Folded Reload
; CHECK-NEXT: ldp d11, d10, [sp, #32] // 16-byte Folded Reload
Expand Down
Loading
Loading