Skip to content

[SelectionDAG] Not issue TRAP node if naked function #132147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

yonghong-song
Copy link
Contributor

@yonghong-song yonghong-song commented Mar 20, 2025

In [1], Nikita Popov suggested that during lowering 'unreachable' insn should not generate extra code for naked functions, and this applies to all architectures. Note that for naked functions, 'unreachable' insn is necessary in IR since the basic block needs a terminator to end.

This patch checked whether a function is naked function or not. If it is a naked function, 'unreachable' insn will not generate ISD::TRAP.

[1] #131731

@yonghong-song yonghong-song added the llvm:SelectionDAG SelectionDAGISel as well label Mar 20, 2025
@yonghong-song yonghong-song requested a review from nikic March 20, 2025 04:26
@yonghong-song
Copy link
Contributor Author

cc @4ast @eddyz87

@llvmbot
Copy link
Member

llvmbot commented Mar 20, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: None (yonghong-song)

Changes

In [1], Nikita Popov suggested that during lowering 'unreachable' insn should not generate extra code for naked functions, and this applies to all architectures. Note that for naked functions, 'unreachable' insn is necessary in IR since the basic block needs a terminator to end.

This patch checked whether a function is naked function or not. If it is a naked function, 'unreachable' insn will not generate ISD::TRAP.

I put the RFC in the subject as I am not sure how to find a test which can validate the change in FastISel.cpp. Any advice is welcome.

[1] #131731


Full diff: https://github.com/llvm/llvm-project/pull/132147.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/FastISel.cpp (+3)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+3)
  • (added) llvm/test/CodeGen/X86/naked-fn-with-unreachable-trap.ll (+11)
diff --git a/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp b/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
index 33dd039c4ab2a..1bbb704fc0774 100644
--- a/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
@@ -1862,6 +1862,9 @@ bool FastISel::selectOperator(const User *I, unsigned Opcode) {
           return true;
       }
 
+      if (cast<Instruction>(I)->getFunction()->hasFnAttribute(Attribute::Naked))
+        return true;
+
       return fastEmit_(MVT::Other, MVT::Other, ISD::TRAP) != 0;
     }
     return true;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 14bb1d943d2d6..6480135a3b929 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3514,6 +3514,9 @@ void SelectionDAGBuilder::visitUnreachable(const UnreachableInst &I) {
       return;
   }
 
+  if (I.getFunction()->hasFnAttribute(Attribute::Naked))
+    return;
+
   DAG.setRoot(DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, DAG.getRoot()));
 }
 
diff --git a/llvm/test/CodeGen/X86/naked-fn-with-unreachable-trap.ll b/llvm/test/CodeGen/X86/naked-fn-with-unreachable-trap.ll
new file mode 100644
index 0000000000000..94274fcb1c160
--- /dev/null
+++ b/llvm/test/CodeGen/X86/naked-fn-with-unreachable-trap.ll
@@ -0,0 +1,11 @@
+; RUN: llc -o - %s -mtriple=x86_64-linux-gnu -trap-unreachable | FileCheck %s
+; RUN: llc -o - %s -mtriple=x86_64-linux-gnu -trap-unreachable -fast-isel | FileCheck %s
+
+define dso_local void @foo() #0 {
+entry:
+  tail call void asm sideeffect "movl 3,%eax", "~{dirflag},~{fpsr},~{flags}"()
+  unreachable
+}
+; CHECK-NOT: ud2
+
+attributes #0 = { naked }

@yonghong-song yonghong-song force-pushed the naked-func-no-unreachable-isel branch from 6b2399b to 63e3e60 Compare March 20, 2025 05:45
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code also exists in GlobalISel (IRTranslator::translateUnreachable). I think we have reached the point where we should extract these 3 implementations of the same checks into a helper, to make sure it stays consistent.

@nikic nikic requested review from arsenm and efriedma-quic March 20, 2025 10:18
@yonghong-song
Copy link
Contributor Author

This code also exists in GlobalISel (IRTranslator::translateUnreachable). I think we have reached the point where we should extract these 3 implementations of the same checks into a helper, to make sure it stays consistent.

Good point. I missed GlobalISel. I will try to have a common helper for all these three cases.

@yonghong-song yonghong-song force-pushed the naked-func-no-unreachable-isel branch from 63e3e60 to c75fc1b Compare March 20, 2025 18:06
@yonghong-song yonghong-song changed the title [RFC][SelectionDAG] Not issue TRAP node if naked function [SelectionDAG] Not issue TRAP node if naked function Mar 20, 2025
Copy link

github-actions bot commented Mar 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -4497,6 +4497,9 @@ class UnreachableInst : public Instruction {
return isa<Instruction>(V) && classof(cast<Instruction>(V));
}

// Whether to do target lowering in SelectionDAG.
bool allowISelLowering(bool TrapUnreachable, bool NoTrapAfterNoreturn) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really confusing name... why not shouldLowerToTrap(), or something like that?

Otherwise LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Will change the func name as suggested.

In [1], Nikita Popov suggested that during lowering 'unreachable' insn
should not generate extra code for naked functions, and this applies
to all architectures. Note that for naked functions, 'unreachable'
insn is necessary in IR since the basic block needs a terminator to end.

This patch checked whether a function is naked function or not.
If it is a naked function, 'unreachable' insn will not generate
ISD::TRAP.

  [1] llvm#131731
@yonghong-song yonghong-song force-pushed the naked-func-no-unreachable-isel branch from c75fc1b to b395721 Compare March 20, 2025 20:18
Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yonghong-song yonghong-song merged commit 0ffe83f into llvm:main Mar 21, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 21, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/18081

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/gpupgo/pgo2.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate
# note: command had no output on stdout or stderr
# RUN: at line 2
env LLVM_PROFILE_FILE=pgo2.c.llvm.profraw      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=pgo2.c.llvm.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 4
llvm-profdata show --all-functions --counts      pgo2.c.llvm.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c      --check-prefix="LLVM-HOST"
# executed command: llvm-profdata show --all-functions --counts pgo2.c.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=LLVM-HOST
# note: command had no output on stdout or stderr
# RUN: at line 7
llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.pgo2.c.llvm.profraw      | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix="LLVM-DEVICE"
# executed command: llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.pgo2.c.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=LLVM-DEVICE
# note: command had no output on stdout or stderr
# RUN: at line 11
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-instr-generate
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-instr-generate
# note: command had no output on stdout or stderr
# RUN: at line 12
env LLVM_PROFILE_FILE=pgo2.c.clang.profraw      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=pgo2.c.clang.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 14
llvm-profdata show --all-functions --counts      pgo2.c.clang.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c      --check-prefix="CLANG-HOST"
# executed command: llvm-profdata show --all-functions --counts pgo2.c.clang.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=CLANG-HOST
# note: command had no output on stdout or stderr
# RUN: at line 17
llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.pgo2.c.clang.profraw |      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix="CLANG-DEV"
# executed command: llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.pgo2.c.clang.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=CLANG-DEV
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c:101:15: error: CLANG-DEV: expected string not found in input
# | // CLANG-DEV: Block counts: [11]
# |               ^
# | <stdin>:5:19: note: scanning from here
# |  Function count: 0
...

@aheejin
Copy link
Member

aheejin commented Mar 27, 2025

Sorry for the late comment. I don't think we should do this for WebAssembly. See the comments here:

// WebAssembly type-checks instructions, but a noreturn function with a return
// type that doesn't match the context will cause a check failure. So we lower
// LLVM 'unreachable' to ISD::TRAP and then lower that to WebAssembly's
// 'unreachable' instructions which is meant for that case. Formerly, we also
// needed to add checks to SP failure emission in the instruction selection
// backends, but this has since been tied to TrapUnreachable and is no longer
// necessary.
this->Options.TrapUnreachable = true;

So if we change the return type of naked to i32,

define dso_local i32 @naked() naked "frame-pointer"="all" {
  call void @main()
  unreachable
}

With this PR, the generated .s or .o will not be valid.

$ llc -mtriple wasm32 naked-fn-with-frame-pointer.ll

$ llvm-mc -triple=wasm32-unknown-unknown naked-fn-with-frame-pointer.s
...
naked-fn-with-frame-pointer.s:12:2: error: type mismatch, expected [i32] but got []
        end_function
        ^
...

The reason is, whether the function returns or not, Wasm's validation rule ensures the function's return type matches the type of the value left on the Wasm value stack at the function end (or anytime there is a return instruction). And here naked's return type is changed to i32 but the value stack does not have an i32 on it. But if we have an unreachable instruction at the end, the stack become polymorphic, which allows any types of values be popped. (See https://webassembly.github.io/spec/core/valid/instructions.html about polymorphic stack)

I think this PR should have an exception for Wasm target.

@arsenm
Copy link
Contributor

arsenm commented Mar 27, 2025

I think this PR should have an exception for Wasm target.

I have a couple issues with the current implementation / existence of the TrapOnUnreachable option. First, it's one of the ugly TargetOptions which isn't part of the IR module. In AMDGPUAttributor we try to detect whether any traps are used to enable ABI inputs required for traps on some subtargets, but this option hides them. We can only semi-reliably check for the option there and detect unreachable, as it's not part of the module it depends on consistency of flags between initial optimize and codegen. Plus it assumes later passes will not introduce new unreachables (although if the setting was reliably determinable, we could at least make a conservatively correct check).

Second, this bypasses the ordinary ABI handling you would get for a return. For consistency we should really go through something that looks like LowerReturn for unreachable

@efriedma-quic
Copy link
Collaborator

So if we change the return type of naked to i32,

define dso_local i32 @naked() naked "frame-pointer"="all" {
  call void @main()
  unreachable
}

Can we just say you're not allowed to do this? Anything other than an inline asm statement in a naked function has dubious semantics. Even on x86, the generated code for the given example is broken.

I have a couple issues with the current implementation / existence of the TrapOnUnreachable option

I don't disagree, but everything you're describing is existing issues.

@nikic
Copy link
Contributor

nikic commented Apr 2, 2025

So if we change the return type of naked to i32,

define dso_local i32 @naked() naked "frame-pointer"="all" {
  call void @main()
  unreachable
}

Can we just say you're not allowed to do this? Anything other than an inline asm statement in a naked function has dubious semantics. Even on x86, the generated code for the given example is broken.

Yes, I don't think this is legal code and should really be rejected by the IR verifier. I know that both Rust and Clang essentially limit the contents of naked functions to inline asm. See:

if (FD && FD->hasAttr<NakedAttr>()) {
for (const Stmt *S : Body->children()) {
// Allow local register variables without initializer as they don't
// require prologue.
bool RegisterVariables = false;
if (auto *DS = dyn_cast<DeclStmt>(S)) {
for (const auto *Decl : DS->decls()) {
if (const auto *Var = dyn_cast<VarDecl>(Decl)) {
RegisterVariables =
Var->hasAttr<AsmLabelAttr>() && !Var->hasInit();
if (!RegisterVariables)
break;
}
}
}
if (RegisterVariables)
continue;
if (!isa<AsmStmt>(S) && !isa<NullStmt>(S)) {
Diag(S->getBeginLoc(), diag::err_non_asm_stmt_in_naked_function);
Diag(FD->getAttr<NakedAttr>()->getLocation(), diag::note_attribute);
FD->setInvalidDecl();
break;
}
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants