-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stack protector boot failure in android-4.14-stable #1815
Comments
This is not reproducible without LTO. My LLVM bisect landed on
|
I can repro the report.
https://reviews.llvm.org/D139254 mentions skipping the check for noreturn calls. I didn't see any such calls in the IR, though perhaps LTO is able to deduce calls to functions outside of the TU are noreturn even when they weren't explicitly forward declared as such. Adding I don't see any difference in control flow between LLVM commits (26330e5f5dc7 and d656ae280957). |
enabling KASAN disables LTO, so not reproducible. |
There appears to be a stack guard check in This smells like |
There are actually 2 stack guard checks inserted into |
We discussed adding the attribute before:
Specifically, it looks like perhaps it's the calls to
The comment block in kernel/sched/idle.c
(introduced by commit cf37b6b ("sched/idle: Move cpu/idle.c to sched/idle.c"))
|
diff --git a/init/main.c b/init/main.c
index d50ea3c3473e..3320c7ed2f6a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -535,6 +535,7 @@ static void __init mm_init(void)
pti_init();
}
+__attribute__((no_stack_protector))
asmlinkage __visible void __init start_kernel(void)
{
char *command_line; still panics with another stack check failure:
Probably need the attribute on |
The call to |
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 66f2a950935a..821a9ed2d339 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -210,6 +210,7 @@ static int enable_start_cpu0;
/*
* Activate a secondary processor.
*/
+__attribute__((no_stack_protector))
static void notrace start_secondary(void *unused)
{
/*
diff --git a/init/main.c b/init/main.c
index d50ea3c3473e..3320c7ed2f6a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -535,6 +535,7 @@ static void __init mm_init(void)
pti_init();
}
+__attribute__((no_stack_protector))
asmlinkage __visible void __init start_kernel(void)
{
char *command_line; allows us to boot (though this should get wired up properly in include/linux/compiler_attributes.h). I'll need to think hard about whether we want to remove |
Here's my understanding of the bug (and a proto commit message): If LLVM can see the definition of clang-16 will now check the stack protector before calling noreturn functions. This behavior differs from GCC. When Back during the discussion that ultimately led to commit a9a3ed1 ("x86: Fix early boot crash on gcc-10, third try") it was obvious that a true solution required marking each caller of Now that this option exists in both toolchains, we should start looking at how best to use it to better express intent, thus rolling back commit a9a3ed1 ("x86: Fix early boot crash on gcc-10, third try") at least when newer toolchains are in use. We might want to do this in separate steps in order to fix this on branches of stable. Another potential cleanup here is that it appears that Cc: stable@vger.kernel.org |
commit d4e5268 ("x86,objtool: Mark cpu_startup_entry() __noreturn") marked When building mainline with LTO, I only see the call to panic as noreturn. I wonder why TODO: write a test for ...
902 ; Function Attrs: fn_ret_thunk_extern noinline noredzone noreturn nounwind null_pointer_is_valid sspstrong
903 define hidden void @rest_init() local_unnamed_addr #6 section ".ref.text" align 16 {
...
1230 ; Function Attrs: cold fn_ret_thunk_extern noredzone nounwind null_pointer_is_valid optsize sspstrong
1231 define weak hidden void @arch_call_rest_init() local_unnamed_addr #4 section ".init.text" align 16 {
1232 entry:
1233 tail call void @rest_init() #29
1234 unreachable
1235 } in llvm/unittests/Transforms/IPO/AttributorTest.cpp for |
Yi Kong mentions there's |
Seems it's |
Looks like because |
Yeah, if I remove the check for |
https://reviews.llvm.org/D147177 All that said, it would be weird and bad if a weak definition was noreturn but the strong one was not. Perhaps Testing
Disassembling |
That seems to be only with https://reviews.llvm.org/D147177 applied though. |
It appears that https://lore.kernel.org/cover.1680912057.git.jpoimboe@kernel.org/ will expose this on mainline:
|
…functions https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 introduced a additional checks before calling noreturn functions in response to this security paper related to Catch Handler Oriented Programming (CHOP): https://download.vusec.net/papers/chop_ndss23.pdf See also: https://bugs.chromium.org/p/llvm/issues/detail?id=30 This causes stack canaries to be inserted in C code which was unexpected; we noticed certain Linux kernel trees stopped booting after this (in functions trying to initialize the stack canary itself). ClangBuiltLinux/linux#1815 There is no point checking the stack canary like this when exceptions are disabled (-fno-exceptions or function is marked noexcept) or for C code. The GCC patch for this issue does something similar: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25982ada523689c8745d7fb4b1b93c8f5dab2e7 Android measured a 2% regression in RSS as a result of d656ae2 and undid it globally: https://android-review.googlesource.com/c/platform/build/soong/+/2524336 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D147975
…functions https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 introduced a additional checks before calling noreturn functions in response to this security paper related to Catch Handler Oriented Programming (CHOP): https://download.vusec.net/papers/chop_ndss23.pdf See also: https://bugs.chromium.org/p/llvm/issues/detail?id=30 This causes stack canaries to be inserted in C code which was unexpected; we noticed certain Linux kernel trees stopped booting after this (in functions trying to initialize the stack canary itself). ClangBuiltLinux/linux#1815 There is no point checking the stack canary like this when exceptions are disabled (-fno-exceptions or function is marked noexcept) or for C code. The GCC patch for this issue does something similar: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25982ada523689c8745d7fb4b1b93c8f5dab2e7 Android measured a 2% regression in RSS as a result of d656ae2 and undid it globally: https://android-review.googlesource.com/c/platform/build/soong/+/2524336 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D147975 (cherry picked from commit fc4494d)
…functions https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 introduced a additional checks before calling noreturn functions in response to this security paper related to Catch Handler Oriented Programming (CHOP): https://download.vusec.net/papers/chop_ndss23.pdf See also: https://bugs.chromium.org/p/llvm/issues/detail?id=30 This causes stack canaries to be inserted in C code which was unexpected; we noticed certain Linux kernel trees stopped booting after this (in functions trying to initialize the stack canary itself). ClangBuiltLinux/linux#1815 There is no point checking the stack canary like this when exceptions are disabled (-fno-exceptions or function is marked noexcept) or for C code. The GCC patch for this issue does something similar: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25982ada523689c8745d7fb4b1b93c8f5dab2e7 Android measured a 2% regression in RSS as a result of d656ae2 and undid it globally: https://android-review.googlesource.com/c/platform/build/soong/+/2524336 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D147975 (cherry picked from commit fc4494d)
The llvm-16 build is still failing because the llvm-16 build is from April 4. The backport was merged April 17 (linked above): Leaving open until we confirm clang-16 boots are fixed. Hopefully the kernel patches get picked up too, but that's not a blocker for closing this out. |
apt.llvm.org is only a couple of days behind, I assume we will get a new rebuild here soon.
We may have to request an explicit update to |
On both aarch64 and x86_64:
I have requested that TuxMake and TuxSuite be updated to explicitly include this: https://gitlab.com/Linaro/tuxmake/-/issues/202 I think this can be closed now but I'll let someone else do that if they agree with my assessment. |
TuxMake containers are updated so our builds should go back to green shortly: https://gitlab.com/Linaro/tuxmake/-/issues/202#note_1361045827 Closing this out. |
Linux v6.4-rc1 built with Clang versions <= 16 with stack protector enabled panic with the following stack trace: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0xd8a/0xd90 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc1-00042-g9ea7e6b62c2b-dirty google#106 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 dump_stack_lvl+0x1bc/0x250 lib/dump_stack.c:106 dump_stack+0x1e/0x20 lib/dump_stack.c:113 panic+0x4cd/0xc10 kernel/panic.c:340 __stack_chk_fail+0x18/0x20 kernel/panic.c:759 start_kernel+0xd8a/0xd90 init/main.c:? x86_64_start_reservations+0x2e/0x30 arch/x86/kernel/head64.c:556 x86_64_start_kernel+0x118/0x120 arch/x86/kernel/head64.c:537 secondary_startup_64_no_verify+0xcf/0xdb arch/x86/kernel/head_64.S:358 </TASK> ClangBuiltLinux/linux#1815 describes the problem, which is fixed on the Clang side (https://reviews.llvm.org/D147975), but before the fix reaches syzbot we'll have to keep the stack protector disabled.
Linux v6.4-rc1 built with Clang versions <= 16 with stack protector enabled panic with the following stack trace: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0xd8a/0xd90 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc1-00042-g9ea7e6b62c2b-dirty #106 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 dump_stack_lvl+0x1bc/0x250 lib/dump_stack.c:106 dump_stack+0x1e/0x20 lib/dump_stack.c:113 panic+0x4cd/0xc10 kernel/panic.c:340 __stack_chk_fail+0x18/0x20 kernel/panic.c:759 start_kernel+0xd8a/0xd90 init/main.c:? x86_64_start_reservations+0x2e/0x30 arch/x86/kernel/head64.c:556 x86_64_start_kernel+0x118/0x120 arch/x86/kernel/head64.c:537 secondary_startup_64_no_verify+0xcf/0xdb arch/x86/kernel/head_64.S:358 </TASK> ClangBuiltLinux/linux#1815 describes the problem, which is fixed on the Clang side (https://reviews.llvm.org/D147975), but before the fix reaches syzbot we'll have to keep the stack protector disabled.
…functions https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 introduced a additional checks before calling noreturn functions in response to this security paper related to Catch Handler Oriented Programming (CHOP): https://download.vusec.net/papers/chop_ndss23.pdf See also: https://bugs.chromium.org/p/llvm/issues/detail?id=30 This causes stack canaries to be inserted in C code which was unexpected; we noticed certain Linux kernel trees stopped booting after this (in functions trying to initialize the stack canary itself). ClangBuiltLinux/linux#1815 There is no point checking the stack canary like this when exceptions are disabled (-fno-exceptions or function is marked noexcept) or for C code. The GCC patch for this issue does something similar: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25982ada523689c8745d7fb4b1b93c8f5dab2e7 Android measured a 2% regression in RSS as a result of d656ae2 and undid it globally: https://android-review.googlesource.com/c/platform/build/soong/+/2524336 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D147975
Need https://gitlab.com/Linaro/tuxmake/-/merge_requests/306 to update the android containers, too. |
very strange, there's a similar-ish boot failure: https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4977144432/jobs/8906720088#step:5:4501 but:
|
llvm/llvm-project-release-prs@cbd175f was the fix merged May 1, so 16.0.3 from 4/19 is not new enough. |
https://gitlab.com/Linaro/tuxmake/-/issues/202#note_1392060006 mentions that clang-16 tuxmake containers have been upgraded to 16.0.4. Will close this out once verified in CI. |
Linux v6.4-rc1 built with Clang versions <= 16 with stack protector enabled panic with the following stack trace: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0xd8a/0xd90 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc1-00042-g9ea7e6b62c2b-dirty google#106 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 dump_stack_lvl+0x1bc/0x250 lib/dump_stack.c:106 dump_stack+0x1e/0x20 lib/dump_stack.c:113 panic+0x4cd/0xc10 kernel/panic.c:340 __stack_chk_fail+0x18/0x20 kernel/panic.c:759 start_kernel+0xd8a/0xd90 init/main.c:? x86_64_start_reservations+0x2e/0x30 arch/x86/kernel/head64.c:556 x86_64_start_kernel+0x118/0x120 arch/x86/kernel/head64.c:537 secondary_startup_64_no_verify+0xcf/0xdb arch/x86/kernel/head_64.S:358 </TASK> ClangBuiltLinux/linux#1815 describes the problem, which is fixed on the Clang side (https://reviews.llvm.org/D147975), but before the fix reaches syzbot we'll have to keep the stack protector disabled.
…functions https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 introduced a additional checks before calling noreturn functions in response to this security paper related to Catch Handler Oriented Programming (CHOP): https://download.vusec.net/papers/chop_ndss23.pdf See also: https://bugs.chromium.org/p/llvm/issues/detail?id=30 This causes stack canaries to be inserted in C code which was unexpected; we noticed certain Linux kernel trees stopped booting after this (in functions trying to initialize the stack canary itself). ClangBuiltLinux/linux#1815 There is no point checking the stack canary like this when exceptions are disabled (-fno-exceptions or function is marked noexcept) or for C code. The GCC patch for this issue does something similar: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25982ada523689c8745d7fb4b1b93c8f5dab2e7 Android measured a 2% regression in RSS as a result of d656ae2 and undid it globally: https://android-review.googlesource.com/c/platform/build/soong/+/2524336 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D147975
CI reports:
https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4423392922/jobs/7758026839
It can be reproduced at
llvmorg-16.0.0-rc4
.This appears to be an LLVM 16 regression, as it is not seen with LLVM 15. It could also be related to LTO but I have not tried that yet since I just wanted to get the initial report out there.
The text was updated successfully, but these errors were encountered: