Skip to content

[ARM,AArch64] Don't put BTI at asm goto branch targets #141562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions clang/docs/LanguageExtensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2042,6 +2042,17 @@ references can be used instead of numeric references.
return -1;
}

ASM Goto versus Branch Target Enforcement
=========================================

Some target architectures implement branch target enforcement, by requiring
indirect (register-controlled) branch instructions to jump only to locations
marked by a special instruction (such as AArch64 ``bti``).

The assembler code inside an ``asm goto`` statement is expected not to use a
branch instruction of that kind to transfer control to any of its destination
labels. Therefore, using a label in an ``asm goto`` statement does not cause
clang to put a ``bti`` or equivalent instruction at the label.

Constexpr strings in GNU ASM statements
=======================================
Expand Down
5 changes: 5 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ Potentially Breaking Changes
- For ARM targets when compiling assembly files, the features included in the selected CPU
or Architecture's FPU are included. If you wish not to use a specific feature,
the relevant ``+no`` option will need to be amended to the command line option.
- When compiling with branch target enforcement, ``asm goto``
statements will no longer guarantee to place a ``bti`` or
``endbr64`` instruction at the labels named as possible branch
destinations, so it is not safe to use a register-controlled branch
instruction to branch to one. (In line with gcc.)

C/C++ Language Potentially Breaking Changes
-------------------------------------------
Expand Down
8 changes: 8 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9596,6 +9596,14 @@ may not be equal to the address provided for the same block to this
instruction's ``indirect labels`` operand. The assembly code may only transfer
control to addresses provided via this instruction's ``indirect labels``.

On target architectures that implement branch target enforcement by requiring
indirect (register-controlled) branch instructions to jump only to locations
marked by a special instruction (such as AArch64 ``bti``), the called code is
expected not to use such an indirect branch to transfer control to the
locations in ``indirect labels``. Therefore, including a label in the
``indirect labels`` of a ``callbr`` does not require the compiler to put a
``bti`` or equivalent instruction at the label.

Arguments:
""""""""""

Expand Down
6 changes: 6 additions & 0 deletions llvm/docs/ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,12 @@ Changes to the LLVM IR
`llvm.load.relative`.
* Inline asm calls no longer accept ``label`` arguments. Use ``callbr`` instead.

* Updated semantics of the `callbr` instruction to clarify that its
'indirect labels' are not expected to be reached by indirect (as in
register-controlled) branch instructions, and therefore are not
guaranteed to start with a `bti` or `endbr64` instruction, where
those exist.

Changes to LLVM infrastructure
------------------------------

Expand Down
17 changes: 16 additions & 1 deletion llvm/include/llvm/CodeGen/MachineBasicBlock.h
Original file line number Diff line number Diff line change
Expand Up @@ -275,18 +275,33 @@ class MachineBasicBlock
/// of a terminator, exception-handling target, or jump table. This is
/// either the result of an IR-level "blockaddress", or some form
/// of target-specific branch lowering.
///
/// The name of this function `hasAddressTaken` implies that the address of
/// the block is known and used in a general sense, but not necessarily that
/// the address is used by an indirect branch instruction. So branch target
/// enforcement need not put a BTI instruction (or equivalent) at the start
/// of a block just because this function returns true. The decision about
/// whether to add a BTI can be more subtle than that, and depends on the
/// more detailed checks that this function aggregates together.
bool hasAddressTaken() const {
return MachineBlockAddressTaken || AddressTakenIRBlock;
return MachineBlockAddressTaken || AddressTakenIRBlock ||
IsInlineAsmBrIndirectTarget;
}

/// Test whether this block is used as something other than the target of a
/// terminator, exception-handling target, jump table, or IR blockaddress.
/// For example, its address might be loaded into a register, or
/// stored in some branch table that isn't part of MachineJumpTableInfo.
///
/// If this function returns true, it _does_ mean that branch target
/// enforcement needs to put a BTI or equivalent at the start of the block.
bool isMachineBlockAddressTaken() const { return MachineBlockAddressTaken; }

/// Test whether this block is the target of an IR BlockAddress. (There can
/// more than one MBB associated with an IR BB where the address is taken.)
///
/// If this function returns true, it _does_ mean that branch target
/// enforcement needs to put a BTI or equivalent at the start of the block.
bool isIRBlockAddressTaken() const { return AddressTakenIRBlock; }

/// Retrieves the BasicBlock which corresponds to this MachineBasicBlock.
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4330,6 +4330,8 @@ void AsmPrinter::emitBasicBlockStart(const MachineBasicBlock &MBB) {
OutStreamer->emitLabel(Sym);
} else if (isVerbose() && MBB.isMachineBlockAddressTaken()) {
OutStreamer->AddComment("Block address taken");
} else if (isVerbose() && MBB.isInlineAsmBrIndirectTarget()) {
OutStreamer->AddComment("Inline asm indirect target");
}

// Print some verbose block comments.
Expand Down
11 changes: 9 additions & 2 deletions llvm/lib/CodeGen/BasicBlockPathCloning.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -121,14 +121,21 @@ bool IsValidCloning(const MachineFunction &MF,
}
if (PathBB->isMachineBlockAddressTaken()) {
// Avoid cloning blocks which have their address taken since we can't
// rewire branches to those blocks as easily (e.g., branches within
// inline assembly).
// rewire branches to those blocks as easily.
WithColor::warning()
<< "block #" << BBID
<< " has its machine block address taken in function "
<< MF.getName() << "\n";
return false;
}
if (PathBB->isInlineAsmBrIndirectTarget()) {
// Similarly for branches to the block within an asm goto.
WithColor::warning()
<< "block #" << BBID
<< " is a branch target of an 'asm goto' in function "
<< MF.getName() << "\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't handle anonymous functions correctly, in this context probably should use the mangled name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was surely true already of the "has its machine block address taken" clause just above, which I duplicated to make this version with the changed wording?

Are you asking me to do that unrelated fix as part of this same commit? It seems to me that if it needs fixing at all it should be done separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous warning was broken, and this one is too. We probably should remove or rename getName, it's usually not appropriate to use outside of debugger contexts. Can fix separately

return false;
}
}

if (I != ClonePath.size() - 1 && !PathBB->empty() &&
Expand Down
6 changes: 5 additions & 1 deletion llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3399,7 +3399,11 @@ void SelectionDAGBuilder::visitCallBr(const CallBrInst &I) {
BasicBlock *Dest = I.getIndirectDest(i);
MachineBasicBlock *Target = FuncInfo.getMBB(Dest);
Target->setIsInlineAsmBrIndirectTarget();
Target->setMachineBlockAddressTaken();
// If we introduce a type of asm goto statement that is permitted to use an
// indirect call instruction to jump to its labels, then we should add a
// call to Target->setMachineBlockAddressTaken() here, to mark the target
// block as requiring a BTI.

Target->setLabelMustBeEmitted();
// Don't add duplicate machine successors.
if (Dests.insert(Dest).second)
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,8 @@ bool AArch64BranchTargets::runOnMachineFunction(MachineFunction &MF) {

// If the block itself is address-taken, it could be indirectly branched
// to, but not called.
if (MBB.hasAddressTaken() || JumpTableTargets.count(&MBB))
if (MBB.isMachineBlockAddressTaken() || MBB.isIRBlockAddressTaken() ||
JumpTableTargets.count(&MBB))
CouldJump = true;

if (CouldCall || CouldJump) {
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/Target/ARM/ARMBranchTargets.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ bool ARMBranchTargets::runOnMachineFunction(MachineFunction &MF) {
// modes. These modes do not support PACBTI. As a result, BTI instructions
// are not added in the destination blocks.

if (IsFirstBB || MBB.hasAddressTaken() || MBB.isEHPad()) {
if (IsFirstBB || MBB.isMachineBlockAddressTaken() ||
MBB.isIRBlockAddressTaken() || MBB.isEHPad()) {
addBTI(TII, MBB, IsFirstBB);
MadeChange = true;
}
Expand Down
40 changes: 40 additions & 0 deletions llvm/test/CodeGen/AArch64/callbr-asm-label-bti.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s

; Test function which compares two integers and returns the value of
; the overflow flag, by using an asm goto to make the asm block branch
; based on that flag, and then a phi to set the return value based on
; whether the branch was taken.
define i32 @overflow(i64 %a, i64 %b) #0 {
asm:
callbr void asm sideeffect "cmp $0, $1 \0A\09 b.vs ${2:l}",
"r,r,!i,~{cc}"(i64 %a, i64 %b)
to label %fallthrough [label %indirect]

indirect:
br label %fallthrough

fallthrough:
; Return 1 if we came via the 'indirect' block (because the b.vs was
; taken), and 0 if we came straight from the asm block (because it
; was untaken).
%retval = phi i32 [0, %asm], [1, %indirect]
ret i32 %retval
}

; CHECK: overflow:
; CHECK-NEXT: .cfi_startproc
; CHECK-NEXT: // %bb.{{[0-9]+}}:
; CHECK-NEXT: bti c
; CHECK-NEXT: //APP
; CHECK-NEXT: cmp x0, x1
; CHECK-NEXT: b.vs [[LABEL:\.[A-Za-z0-9_]+]]
; CHECK-NEXT: //NO_APP
; CHECK-NEXT: // %bb.{{[0-9]+}}:
; CHECK-NEXT: mov w0, wzr
; CHECK-NEXT: ret
; CHECK-NEXT: [[LABEL]]:
; CHECK-NOT: bti
; CHECK: mov w0, #1
; CHECK-NEXT: ret

attributes #0 = { "branch-target-enforcement" "target-features"="+bti" }
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/AArch64/callbr-asm-label.ll
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ define i32 @test1() {
; CHECK: .word b
; CHECK-NEXT: .word .LBB0_2
; CHECK: // %bb.1:
; CHECK: .LBB0_2: // Block address taken
; CHECK: .LBB0_2: // Inline asm indirect target
entry:
callbr void asm sideeffect "1:\0A\09.word b, ${0:l}\0A\09", "!i"()
to label %cleanup [label %indirect]
Expand All @@ -31,7 +31,7 @@ entry:
if.then:
; CHECK: .word b
; CHECK-NEXT: .word .LBB1_3
; CHECK: .LBB1_3: // Block address taken
; CHECK: .LBB1_3: // Inline asm indirect target
callbr void asm sideeffect "1:\0A\09.word b, ${0:l}\0A\09", "!i"()
to label %if.then4 [label %if.end6]

Expand All @@ -46,7 +46,7 @@ if.end6:
br i1 %phitmp, label %if.end10, label %if.then9

if.then9:
; CHECK: .LBB1_5: // Block address taken
; CHECK: .LBB1_5: // Inline asm indirect target
callbr void asm sideeffect "", "!i"()
to label %if.end10 [label %l_yes]

Expand Down
30 changes: 15 additions & 15 deletions llvm/test/CodeGen/AArch64/callbr-asm-outputs-indirect-isel.ll
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ define i32 @test0() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %5
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.indirect_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.indirect_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %5
Expand All @@ -35,7 +35,7 @@ define i32 @test0() {
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY %7
; CHECK-NEXT: B %bb.4
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3.direct.indirect_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.3.direct.indirect_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr32all = COPY %7
Expand Down Expand Up @@ -87,7 +87,7 @@ define i32 @dont_split0() {
; CHECK-NEXT: $w0 = COPY [[MOVi32imm]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.y (inlineasm-br-indirect-target):
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY $wzr
; CHECK-NEXT: $w0 = COPY [[COPY]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
Expand Down Expand Up @@ -116,7 +116,7 @@ define i32 @dont_split1() {
; CHECK-NEXT: $w0 = COPY [[MOVi32imm]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.y (inlineasm-br-indirect-target):
; CHECK-NEXT: $w0 = COPY %1
; CHECK-NEXT: RET_ReallyLR implicit $w0
entry:
Expand Down Expand Up @@ -147,7 +147,7 @@ define i32 @dont_split2() {
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY $wzr
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY [[COPY1]]
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.y (inlineasm-br-indirect-target):
; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr32all = PHI [[COPY]], %bb.0, [[COPY2]], %bb.1
; CHECK-NEXT: $w0 = COPY [[PHI]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
Expand All @@ -174,7 +174,7 @@ define i32 @dont_split3() {
; CHECK-NEXT: bb.1.x:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.v (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.v (inlineasm-br-indirect-target):
; CHECK-NEXT: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 42
; CHECK-NEXT: $w0 = COPY [[MOVi32imm]]
; CHECK-NEXT: RET_ReallyLR implicit $w0
Expand All @@ -198,7 +198,7 @@ define i32 @split_me0() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.y_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.y_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
Expand Down Expand Up @@ -248,7 +248,7 @@ define i32 @split_me1(i1 %z) {
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %5
; CHECK-NEXT: B %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.w.v_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.w.v_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY %5
Expand Down Expand Up @@ -301,7 +301,7 @@ define i32 @split_me2(i1 %z) {
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY %6
; CHECK-NEXT: B %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.w.v_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.w.v_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr32all = COPY %6
Expand Down Expand Up @@ -349,7 +349,7 @@ define i32 @dont_split4() {
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: B %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.y (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
Expand Down Expand Up @@ -383,7 +383,7 @@ define i32 @dont_split5() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.y (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.y (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
Expand Down Expand Up @@ -414,7 +414,7 @@ define i32 @split_me3() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.out_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.out_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
Expand Down Expand Up @@ -460,7 +460,7 @@ define i32 @dont_split6(i32 %0) {
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr32all = COPY %4
; CHECK-NEXT: B %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.loop.loop_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.2.loop.loop_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr32all = COPY %4
Expand Down Expand Up @@ -495,7 +495,7 @@ define i32 @split_me4() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.same_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.same_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
Expand Down Expand Up @@ -526,7 +526,7 @@ define i32 @split_me5() {
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32all = COPY %3
; CHECK-NEXT: B %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.entry.same_crit_edge (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.1.entry.same_crit_edge (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32all = COPY %3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ define void @strncpy_from_kernel_nofault_count() {
; CHECK-NEXT: bb.2.Efault:
; CHECK-NEXT: BLR8 implicit $lr8, implicit $rm
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3.Efault.split (machine-block-address-taken, inlineasm-br-indirect-target):
; CHECK-NEXT: bb.3.Efault.split (inlineasm-br-indirect-target):
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: STB %1, 0, $zero8 :: (store (s8) into `ptr null`)
Expand Down
Loading
Loading