-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang Full LTO broken on Linux 5.14.6 with LLVM/Clang 14-git 84b07c9b #1460
Comments
Can you provide the current LLVM SHA that's known bad, and the prior known good version? That should give us a regression window that might be highly relevant to minimize the time spent bisecting, and get something reverted upstream. You should be able to use |
Alright, I spent a ton of time yesterday trying to bisect LLVM until I came to the conclusion that this appears to be something that has been latent but only got recently exposed by a commit going into stable, as I attempted to bisect LLVM and could reproduce the issue back to July (so your update is not much of a factor I believe). It might be helpful in the future on the initial report to isolate which update truly caused the issue. For the record, these are the changed flags: diff --git a/Makefile b/Makefile
index f9c8bbf8cf71..f68b9602ceaf 100644
--- a/Makefile
+++ b/Makefile
@@ -433,13 +433,13 @@ HOSTCXX = g++
endif
export KBUILD_USERCFLAGS := -Wall -Wmissing-prototypes -Wstrict-prototypes \
- -O2 -fomit-frame-pointer -std=gnu89
-export KBUILD_USERLDFLAGS :=
+ -O3 -march=native -mtune=native -mllvm -polly -mllvm -polly-parallel -fopenmp -mllvm -polly-vectorizer=stripmine -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=24 -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-opt-fusion=max -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition -std=gnu2x
+export KBUILD_USERLDFLAGS := -Wl,-O3 -Wl,--as-needed -Wl,-Bsymbolic-functions
KBUILD_HOSTCFLAGS := $(KBUILD_USERCFLAGS) $(HOST_LFS_CFLAGS) $(HOSTCFLAGS)
-KBUILD_HOSTCXXFLAGS := -Wall -O2 $(HOST_LFS_CFLAGS) $(HOSTCXXFLAGS)
-KBUILD_HOSTLDFLAGS := $(HOST_LFS_LDFLAGS) $(HOSTLDFLAGS)
-KBUILD_HOSTLDLIBS := $(HOST_LFS_LIBS) $(HOSTLDLIBS)
+KBUILD_HOSTCXXFLAGS := -O3 -march=native -mtune=native -mllvm -polly -mllvm -polly-parallel -fopenmp -mllvm -polly-vectorizer=stripmine -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=24 -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-opt-fusion=max -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition $(HOST_LFS_CFLAGS) $(HOSTCXXFLAGS)
+KBUILD_HOSTLDFLAGS := -Wl,-O3 -Wl,--as-needed -Wl,-Bsymbolic-functions $(HOST_LFS_LDFLAGS) $(HOSTLDFLAGS)
+KBUILD_HOSTLDLIBS := -Wl,-O3 -Wl,--as-needed -Wl,-Bsymbolic-functions $(HOST_LFS_LIBS) $(HOSTLDLIBS)
# Make variables (CC, etc...)
CPP = $(CC) -E
@@ -484,10 +484,10 @@ ZSTD = zstd
CHECKFLAGS := -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ \
-Wbitwise -Wno-return-void -Wno-unknown-attribute $(CF)
NOSTDINC_FLAGS :=
-CFLAGS_MODULE =
+CFLAGS_MODULE = -O3 -march=native -mtune=native -mllvm -polly -mllvm -polly-parallel -fopenmp -mllvm -polly-vectorizer=stripmine -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=24 -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-opt-fusion=max -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition
AFLAGS_MODULE =
LDFLAGS_MODULE =
-CFLAGS_KERNEL =
+CFLAGS_KERNEL = -march=native -mtune=native -mllvm -polly -mllvm -polly-parallel -fopenmp -mllvm -polly-vectorizer=stripmine -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=24 -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-opt-fusion=max -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition
AFLAGS_KERNEL =
LDFLAGS_vmlinux =
@@ -766,6 +766,7 @@ ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
KBUILD_CFLAGS += -O2
else ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3
KBUILD_CFLAGS += -O3
+KBUILD_CFLAGS += $(call cc-option, -fno-tree-loop-vectorize)
else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os
endif They are necessary for reproducing this issue, I do not see this issue without them.
It does appear that LTO is not strictly necessary to reproduce an issue:
I was only able to reproduce the initial issue with an LLVM build without assertions. If I enable assertions, it fails much earlier:
I'll reduce that in a bit. |
|
I hope that thanks to Nathan's detective work, that it is no longer neccessary as I'd would need to guess anyway. For future bug reports, I'll keep in mind to note the exact version which I am testing with to save some of your time. |
Sorry, was running out the door for lunch when I wrote that, I should have clarified that we are talking about two different drivers here and two different errors but with the same config and similar flags so the issues might be related? I can spin off this part of the report into a separate issue if you want but I will post it here for now. The reproducer does definitely have
|
As I am using Polly for quite some time now to build my Kernel (it works fine, with some exceptions as mentioned in #1396), I am more than willing to give you my Tested-by if it helps the upstream process. To throw some numbers into the mix: My heavily tweaked Kernel provides a 2.38x performance improvement vs. the default distro Kernel in my standard game benchmark (Company of Heroes 2 which is heavily CPU-bound, built-in benchmark, 90.5 fps vs. 38 fps, tested on current Manjaro but testing on Ubuntu and openSUSE Tumbleweed revealed similar numbers). But Polly is only a smaller part of the equation, I can get most of my improvements also on a mainline Kernel with my custom config and some more modest CFLAGS, even with GCC. Hence I suspect that my configuration tweaks and march=native provide most of the benefits but I haven't had the time for a deeper performance analysis yet. |
This looks like a compiler bug specific to polly ( |
Should I file a bug report on the LLVM bugtracker for this? I just got an account and I've got a couple of other LTO/Polly related findings to share there... |
Sure; I'd recommend familiarizing yourself with creduce or cvise to provide concise reproducers before you go and file bugs. Bug reports that ask toolchain devs to clone a bunch of sources probably won't get much feedback otherwise. For instance, Nathan's reproducer above is really good. |
Thanks for your help, but that is getting into programmer territory which I lack the time to dig deeper into - I filed https://llvm.org/pr51960 and provided two of Nathan's cvise outputs to make the Polly devs aware of the issue and hope that someone finds a fix. |
From Michael Kruse: "I am working on a fix on Number 1. The assembly string is handled as local value and therefore passed as an argument to the outlined function, but the asm string must be known at the call location at compile-time." I've filed https://llvm.org/pr51964 for the second issue regarding simple-card-utils |
Number 1 got fixed by https://reviews.llvm.org/rG9820dd970c1b72c7f77fad647b762053e2f60e31 |
With LLVM/Clang-14 9820dd970c1b72c7f77fad647b762053e2f60e31 and Linux-Xanmod-Edge 5.14.7, slightly different changes in the Makefile (attached), with FullLTO, the build now breaks with the following modpost errors.
ERROR: modpost: "__kmpc_fork_call" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! Michael Kruse suggested to not use |
If ThinLTO 'worked' it just means that that Polly did not find anything worth parallelizing, and thus did not emit calls to Parallelism in the kernel works very differently than in user mode. you'd need kthreads instead of pthreads. Since OpenMP ( In summary, even without parallelization, Polly is not intended for low-level, already hand-optimized code such as you find in an OS kernel. I'd be surprised if you can measure any amount of performance improvement. Of course, it should still not crash or miscompile, hence thanks @ms178 for the bug reports. |
@ms178 nice! Thanks for filing those bugs. Looks like both of those are now closed/fixed. Can you rebase your build of LLVM to TOT and report if these issues are all now fixed? |
@nickdesaulniers I am very happy that we could solve this together, thank you all for your work! As mentioned in my last post, with the suggestion of Michael Kruse to not use Polly's auto-parallelization flags for the Kernel, I can report that my flto-build succeeded succesfully even with the first fix only. As I could not reproduce the second issue at that time, I think that implies that we can close this issue now, right?! Of course I will try out a newer LLVM build somewhen later this week as my day job keeps me busy during the weekdays. I hope you don't mind leaving it open for a couple of days longer as I've seen that the second fix is responsible for some test failures (see: https://reviews.llvm.org/rG91f46bb77e6d56955c3b96e9e844ae6a251c41e9). For the exact flags used this time, here is the Makefile for reference It boils down to: ' -O3 -march=native -mtune=native -mllvm -polly -mllvm -polly-vectorizer=stripmine -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-opt-fusion=max -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition' I've since noticed that polly-opt-fusion is now obsolete. |
@nickdesaulniers Testing with d104db531ee6d821d94ffd4bb48fa3929ab01235 revealed no surprises. I get the same modpost errors if compiled with the original build flags I used, but the build finishes successfully without polly-parallel-related flags, hence I am closing this issue. |
thanks for verifying and reporting back, @ms178 ! |
polly-opt-fusion=max *referance :- (1) llvm/llvm-project@cb879d0#diff-ee4a4d2d0f10b5f8ac34888ffbb8ec1f638dff5cfdd01a6ae7e5d0e5488c46dc (2) ClangBuiltLinux/linux#1460 Signed-off-by: vijaymalav564 <jaymalav10@gmail.com>
polly-opt-fusion=max *referance :- (1) llvm/llvm-project@cb879d0#diff-ee4a4d2d0f10b5f8ac34888ffbb8ec1f638dff5cfdd01a6ae7e5d0e5488c46dc (2) ClangBuiltLinux/linux#1460 Signed-off-by: vijaymalav564 <jaymalav10@gmail.com>
polly-opt-fusion=max *referance :- (1) llvm/llvm-project@cb879d0#diff-ee4a4d2d0f10b5f8ac34888ffbb8ec1f638dff5cfdd01a6ae7e5d0e5488c46dc (2) ClangBuiltLinux/linux#1460 Signed-off-by: vijaymalav564 <jaymalav10@gmail.com>
polly-opt-fusion=max *referance :- (1) llvm/llvm-project@cb879d0#diff-ee4a4d2d0f10b5f8ac34888ffbb8ec1f638dff5cfdd01a6ae7e5d0e5488c46dc (2) ClangBuiltLinux/linux#1460 Signed-off-by: vijaymalav564 <jaymalav10@gmail.com>
polly-opt-fusion=max *referance :- (1) llvm/llvm-project@cb879d0#diff-ee4a4d2d0f10b5f8ac34888ffbb8ec1f638dff5cfdd01a6ae7e5d0e5488c46dc (2) ClangBuiltLinux/linux#1460 Signed-off-by: vijaymalav564 <jaymalav10@gmail.com>
polly-opt-fusion=max *referance :- (1) llvm/llvm-project@cb879d0#diff-ee4a4d2d0f10b5f8ac34888ffbb8ec1f638dff5cfdd01a6ae7e5d0e5488c46dc (2) ClangBuiltLinux/linux#1460 Signed-off-by: vijaymalav564 <jaymalav10@gmail.com>
Hi,
with the attached config and Makefile, I can no longer successfully compile the Linux-Xanmod Kernel 5.14.6 with Full LTO - this is a recent regression, it used to work with 5.14.5 and a LLVM/Clang 14 build one week older than the one mentioned in the title.
The issue occurs when building the modules.
Makefile.txt
config.txt
This is the output I get:
Cannot take the address of an inline asm!
store i32 (i32)* asm "# ALT: oldnstr\0A661:\0A\09call __sw_hweight32\0A662:\0A# ALT: padding\0A.skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90\0A663:\0A.pushsection .altinstructions,\22a\22\0A .long 661b - .\0A .long 6641f - .\0A .word ( 4*32+23)\0A .byte 663b-661b\0A .byte 6651f-6641f\0A.popsection\0A.pushsection .altinstr_replacement, \22ax\22\0A# ALT: replacement 1\0A6641:\0A\09popcntl $1, $0\0A6651:\0A.popsection\0A", "={ax},{di},
{dirflag},{fpsr},~{flags}", i32 (i32)** LTO [M] drivers/input/mouse/elan_i2c.lto.o%36, align 8
LLVM ERROR: Broken module found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
P.S: I also get the following two stack frame size issues with that build:
ld.lld: warning: stack frame size (1288) exceeds limit (1024) in function 'HUF_readCTable'
ld.lld: warning: stack frame size (1080) exceeds limit (1024) in function 'rtnl_newlink'
The text was updated successfully, but these errors were encountered: