Skip to content

[ARM,AArch64] Don't put BTI at asm goto branch targets #141562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 3, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions clang/docs/LanguageExtensions.rst
Original file line number Diff line number Diff line change
@@ -2042,6 +2042,17 @@ references can be used instead of numeric references.
return -1;
}

ASM Goto versus Branch Target Enforcement
=========================================

Some target architectures implement branch target enforcement, by requiring
indirect (register-controlled) branch instructions to jump only to locations
marked by a special instruction (such as AArch64 ``bti``).

The assembler code inside an ``asm goto`` statement is expected not to use a
branch instruction of that kind to transfer control to any of its destination
labels. Therefore, using a label in an ``asm goto`` statement does not cause
clang to put a ``bti`` or equivalent instruction at the label.

Constexpr strings in GNU ASM statements
=======================================
5 changes: 5 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
@@ -41,6 +41,11 @@ Potentially Breaking Changes
- For ARM targets when compiling assembly files, the features included in the selected CPU
or Architecture's FPU are included. If you wish not to use a specific feature,
the relevant ``+no`` option will need to be amended to the command line option.
- When compiling with branch target enforcement, ``asm goto``
statements will no longer guarantee to place a ``bti`` or
``endbr64`` instruction at the labels named as possible branch
destinations, so it is not safe to use a register-controlled branch
instruction to branch to one. (In line with gcc.)

C/C++ Language Potentially Breaking Changes
-------------------------------------------
8 changes: 8 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
@@ -9596,6 +9596,14 @@ may not be equal to the address provided for the same block to this
instruction's ``indirect labels`` operand. The assembly code may only transfer
control to addresses provided via this instruction's ``indirect labels``.

On target architectures that implement branch target enforcement by requiring
indirect (register-controlled) branch instructions to jump only to locations
marked by a special instruction (such as AArch64 ``bti``), the called code is
expected not to use such an indirect branch to transfer control to the
locations in ``indirect labels``. Therefore, including a label in the
``indirect labels`` of a ``callbr`` does not require the compiler to put a
``bti`` or equivalent instruction at the label.

Arguments:
""""""""""

6 changes: 6 additions & 0 deletions llvm/docs/ReleaseNotes.md
Original file line number Diff line number Diff line change
@@ -70,6 +70,12 @@ Changes to the LLVM IR
`llvm.load.relative`.
* Inline asm calls no longer accept ``label`` arguments. Use ``callbr`` instead.

* Updated semantics of the `callbr` instruction to clarify that its
'indirect labels' are not expected to be reached by indirect (as in
register-controlled) branch instructions, and therefore are not
guaranteed to start with a `bti` or `endbr64` instruction, where
those exist.

Changes to LLVM infrastructure
------------------------------

17 changes: 16 additions & 1 deletion llvm/include/llvm/CodeGen/MachineBasicBlock.h
Original file line number Diff line number Diff line change
@@ -275,18 +275,33 @@ class MachineBasicBlock
/// of a terminator, exception-handling target, or jump table. This is
/// either the result of an IR-level "blockaddress", or some form
/// of target-specific branch lowering.
///
/// The name of this function `hasAddressTaken` implies that the address of
/// the block is known and used in a general sense, but not necessarily that
/// the address is used by an indirect branch instruction. So branch target
/// enforcement need not put a BTI instruction (or equivalent) at the start
/// of a block just because this function returns true. The decision about
/// whether to add a BTI can be more subtle than that, and depends on the
/// more detailed checks that this function aggregates together.
bool hasAddressTaken() const {
return MachineBlockAddressTaken || AddressTakenIRBlock;
return MachineBlockAddressTaken || AddressTakenIRBlock ||
IsInlineAsmBrIndirectTarget;
}

/// Test whether this block is used as something other than the target of a
/// terminator, exception-handling target, jump table, or IR blockaddress.
/// For example, its address might be loaded into a register, or
/// stored in some branch table that isn't part of MachineJumpTableInfo.
///
/// If this function returns true, it _does_ mean that branch target
/// enforcement needs to put a BTI or equivalent at the start of the block.
bool isMachineBlockAddressTaken() const { return MachineBlockAddressTaken; }

/// Test whether this block is the target of an IR BlockAddress. (There can
/// more than one MBB associated with an IR BB where the address is taken.)
///
/// If this function returns true, it _does_ mean that branch target
/// enforcement needs to put a BTI or equivalent at the start of the block.
bool isIRBlockAddressTaken() const { return AddressTakenIRBlock; }

/// Retrieves the BasicBlock which corresponds to this MachineBasicBlock.
2 changes: 2 additions & 0 deletions llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
Original file line number Diff line number Diff line change
@@ -4330,6 +4330,8 @@ void AsmPrinter::emitBasicBlockStart(const MachineBasicBlock &MBB) {
OutStreamer->emitLabel(Sym);
} else if (isVerbose() && MBB.isMachineBlockAddressTaken()) {
OutStreamer->AddComment("Block address taken");
} else if (isVerbose() && MBB.isInlineAsmBrIndirectTarget()) {
OutStreamer->AddComment("Inline asm indirect target");
}

// Print some verbose block comments.
11 changes: 9 additions & 2 deletions llvm/lib/CodeGen/BasicBlockPathCloning.cpp
Original file line number Diff line number Diff line change
@@ -121,14 +121,21 @@ bool IsValidCloning(const MachineFunction &MF,
}
if (PathBB->isMachineBlockAddressTaken()) {
// Avoid cloning blocks which have their address taken since we can't
// rewire branches to those blocks as easily (e.g., branches within
// inline assembly).
// rewire branches to those blocks as easily.
WithColor::warning()
<< "block #" << BBID
<< " has its machine block address taken in function "
<< MF.getName() << "\n";
return false;
}
if (PathBB->isInlineAsmBrIndirectTarget()) {
// Similarly for branches to the block within an asm goto.
WithColor::warning()
<< "block #" << BBID
<< " is a branch target of an 'asm goto' in function "
<< MF.getName() << "\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't handle anonymous functions correctly, in this context probably should use the mangled name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was surely true already of the "has its machine block address taken" clause just above, which I duplicated to make this version with the changed wording?

Are you asking me to do that unrelated fix as part of this same commit? It seems to me that if it needs fixing at all it should be done separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous warning was broken, and this one is too. We probably should remove or rename getName, it's usually not appropriate to use outside of debugger contexts. Can fix separately

return false;
}
}

if (I != ClonePath.size() - 1 && !PathBB->empty() &&
6 changes: 5 additions & 1 deletion llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
@@ -3399,7 +3399,11 @@ void SelectionDAGBuilder::visitCallBr(const CallBrInst &I) {
BasicBlock *Dest = I.getIndirectDest(i);
MachineBasicBlock *Target = FuncInfo.getMBB(Dest);
Target->setIsInlineAsmBrIndirectTarget();
Target->setMachineBlockAddressTaken();
// If we introduce a type of asm goto statement that is permitted to use an
// indirect call instruction to jump to its labels, then we should add a
// call to Target->setMachineBlockAddressTaken() here, to mark the target
// block as requiring a BTI.

Target->setLabelMustBeEmitted();
// Don't add duplicate machine successors.
if (Dests.insert(Dest).second)
3 changes: 2 additions & 1 deletion llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
Original file line number Diff line number Diff line change
@@ -99,7 +99,8 @@ bool AArch64BranchTargets::runOnMachineFunction(MachineFunction &MF) {

// If the block itself is address-taken, it could be indirectly branched
// to, but not called.
if (MBB.hasAddressTaken() || JumpTableTargets.count(&MBB))
if (MBB.isMachineBlockAddressTaken() || MBB.isIRBlockAddressTaken() ||
JumpTableTargets.count(&MBB))
CouldJump = true;

if (CouldCall || CouldJump) {
3 changes: 2 additions & 1 deletion llvm/lib/Target/ARM/ARMBranchTargets.cpp
Original file line number Diff line number Diff line change
@@ -77,7 +77,8 @@ bool ARMBranchTargets::runOnMachineFunction(MachineFunction &MF) {
// modes. These modes do not support PACBTI. As a result, BTI instructions
// are not added in the destination blocks.

if (IsFirstBB || MBB.hasAddressTaken() || MBB.isEHPad()) {
if (IsFirstBB || MBB.isMachineBlockAddressTaken() ||
MBB.isIRBlockAddressTaken() || MBB.isEHPad()) {
addBTI(TII, MBB, IsFirstBB);
MadeChange = true;
}
40 changes: 40 additions & 0 deletions llvm/test/CodeGen/AArch64/callbr-asm-label-bti.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s

; Test function which compares two integers and returns the value of
; the overflow flag, by using an asm goto to make the asm block branch
; based on that flag, and then a phi to set the return value based on
; whether the branch was taken.
define i32 @overflow(i64 %a, i64 %b) #0 {
asm:
callbr void asm sideeffect "cmp $0, $1 \0A\09 b.vs ${2:l}",
"r,r,!i,~{cc}"(i64 %a, i64 %b)
to label %fallthrough [label %indirect]

indirect:
br label %fallthrough

fallthrough:
; Return 1 if we came via the 'indirect' block (because the b.vs was
; taken), and 0 if we came straight from the asm block (because it
; was untaken).
%retval = phi i32 [0, %asm], [1, %indirect]
ret i32 %retval
}

; CHECK: overflow:
; CHECK-NEXT: .cfi_startproc
; CHECK-NEXT: // %bb.{{[0-9]+}}:
; CHECK-NEXT: bti c
; CHECK-NEXT: //APP
; CHECK-NEXT: cmp x0, x1
; CHECK-NEXT: b.vs [[LABEL:\.[A-Za-z0-9_]+]]
; CHECK-NEXT: //NO_APP
; CHECK-NEXT: // %bb.{{[0-9]+}}:
; CHECK-NEXT: mov w0, wzr
; CHECK-NEXT: ret
; CHECK-NEXT: [[LABEL]]:
; CHECK-NOT: bti
; CHECK: mov w0, #1
; CHECK-NEXT: ret

attributes #0 = { "branch-target-enforcement" "target-features"="+bti" }
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/AArch64/callbr-asm-label.ll
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ define i32 @test1() {
; CHECK: .word b
; CHECK-NEXT: .word .LBB0_2
; CHECK: // %bb.1:
; CHECK: .LBB0_2: // Block address taken
; CHECK: .LBB0_2: // Inline asm indirect target
entry:
callbr void asm sideeffect "1:\0A\09.word b, ${0:l}\0A\09", "!i"()
to label %cleanup [label %indirect]
@@ -31,7 +31,7 @@ entry:
if.then:
; CHECK: .word b
; CHECK-NEXT: .word .LBB1_3
; CHECK: .LBB1_3: // Block address taken
; CHECK: .LBB1_3: // Inline asm indirect target
callbr void asm sideeffect "1:\0A\09.word b, ${0:l}\0A\09", "!i"()
to label %if.then4 [label %if.end6]

@@ -46,7 +46,7 @@ if.end6:
br i1 %phitmp, label %if.end10, label %if.then9

if.then9:
; CHECK: .LBB1_5: // Block address taken
; CHECK: .LBB1_5: // Inline asm indirect target
callbr void asm sideeffect "", "!i"()
to label %if.end10 [label %l_yes]

30 changes: 15 additions & 15 deletions llvm/test/CodeGen/AArch64/callbr-asm-outputs-indirect-isel.ll
Original file line number Diff line number Diff line change
@@ -22,7 +22,7 @@ define i32 @test0() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %5
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.indirect_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.indirect_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %5
@@ -35,7 +35,7 @@ define i32 @test0() {
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY %7
; CHECK-NEXT: B %bb.4
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3.direct.indirect_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.3.direct.indirect_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr32all = COPY %7
@@ -87,7 +87,7 @@ define i32 @dont_split0() {
; CHECK-NEXT: $w0 = COPY [[MOVi32imm]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.y (inlineasm-br-indirect-target):
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY $wzr
; CHECK-NEXT: $w0 = COPY [[COPY]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
@@ -116,7 +116,7 @@ define i32 @dont_split1() {
; CHECK-NEXT: $w0 = COPY [[MOVi32imm]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.y (inlineasm-br-indirect-target):
; CHECK-NEXT: $w0 = COPY %1
; CHECK-NEXT: RET_ReallyLR implicit $w0
entry:
@@ -147,7 +147,7 @@ define i32 @dont_split2() {
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY $wzr
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY [[COPY1]]
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.y (inlineasm-br-indirect-target):
; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr32all = PHI [[COPY]], %bb.0, [[COPY2]], %bb.1
; CHECK-NEXT: $w0 = COPY [[PHI]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
@@ -174,7 +174,7 @@ define i32 @dont_split3() {
; CHECK-NEXT: bb.1.x:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.v (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.v (inlineasm-br-indirect-target):
; CHECK-NEXT: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 42
; CHECK-NEXT: $w0 = COPY [[MOVi32imm]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
@@ -198,7 +198,7 @@ define i32 @split_me0() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.y_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.y_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
@@ -248,7 +248,7 @@ define i32 @split_me1(i1 %z) {
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %5
; CHECK-NEXT: B %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.w.v_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.w.v_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY %5
@@ -301,7 +301,7 @@ define i32 @split_me2(i1 %z) {
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY %6
; CHECK-NEXT: B %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.w.v_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.w.v_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr32all = COPY %6
@@ -349,7 +349,7 @@ define i32 @dont_split4() {
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: B %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.y (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
@@ -383,7 +383,7 @@ define i32 @dont_split5() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.y (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
@@ -414,7 +414,7 @@ define i32 @split_me3() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.out_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.out_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
@@ -460,7 +460,7 @@ define i32 @dont_split6(i32 %0) {
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY %4
; CHECK-NEXT: B %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.loop.loop_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.loop.loop_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr32all = COPY %4
@@ -495,7 +495,7 @@ define i32 @split_me4() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.same_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.same_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
@@ -526,7 +526,7 @@ define i32 @split_me5() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.same_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.same_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
Original file line number Diff line number Diff line change
@@ -22,7 +22,7 @@ define void @strncpy_from_kernel_nofault_count() {
; CHECK-NEXT: bb.2.Efault:
; CHECK-NEXT: BLR8 implicit $lr8, implicit $rm
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3.Efault.split (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.3.Efault.split (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: STB %1, 0, $zero8 :: (store (s8) into `ptr null`)
Loading
Oops, something went wrong.
Loading
Oops, something went wrong.