Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Kbuild, lto: Add Link Time Optimization support
With LTO gcc will do whole program optimizations for the whole kernel and each module. This increases compile time, and makes incremential builds slower, but can generate faster and smaller code and allows the compiler to do some global checking. gcc can complain now about type mismatches for symbols between different files. The main advantage is that it allows cross file inlining, which enables a range of new optimizations. It also allows the compiler to throw away unused functions, which typically shrinks the kernel somewhat. It also enables a range of advanced and future optimizations in the compiler. Unlike earlier, this version doesn't require special binutils, but relies on THIN_ARCHIVES instead. This adds the basic Kbuild plumbing for LTO: - In Kbuild add a new scripts/Makefile.lto that checks the tool chain and when the tests pass sets the LTO options We enable it only for gcc 5.0+ and reasonable new binutils - Add a new LDFINAL variable that controls the final link for vmlinux or module. In this case we call gcc-ld instead of ld, to run the LTO step. - Kconfigs: Since LTO with allyesconfig needs more than 4G of memory (~8G) and has the potential to makes people's system swap to death. Smaller configs typically work with 4G. I used a nested config that ensures that a simple allyesconfig disables LTO. It has to be explicitely enabled. - This version runs modpost on the LTO object files. This currently breaks MODVERSIONS and causes some warnings and requires disabling the module resolution checks. MODVERSIONS is excluded with LTO here. Solution would be to reorganize the linking step to do a LDFINAL -r link on all modules before running modpost - Since this kernel version links the final kernel two-three times for kallsyms all optimization steps are done multiple times. Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther, Don Zickus, Changlong Xie, Gleb Schukin who helped with this project (and probably some more who I forgot, sorry) Signed-off-by: Andi Kleen <ak@linux.intel.com>
- Loading branch information
Andi Kleen
committed
Nov 27, 2017
1 parent
0822ed4
commit 4c3b1f8
Showing
7 changed files
with
252 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
Link time optimization (LTO) for the Linux kernel | ||
|
||
This is an experimental feature. | ||
|
||
Link Time Optimization allows the compiler to optimize the complete program | ||
instead of just each file. | ||
|
||
The compiler can inline functions between files and do various other global | ||
optimizations, like specializing functions for common parameters, | ||
determing when global variables are clobbered, making functions pure/const, | ||
propagating constants globally, removing unneeded data and others. | ||
|
||
It will also drop unused functions which can make the kernel | ||
image smaller in some circumstances, in particular for small kernel | ||
configurations. | ||
|
||
For small monolithic kernels it can throw away unused code very effectively | ||
(especially when modules are disabled) and usually shrinks | ||
the code size. | ||
|
||
Build time and memory consumption at build time will increase, depending | ||
on the size of the largest binary. Modular kernels are less affected. | ||
With LTO incremental builds are less incremental, as always the whole | ||
binary needs to be re-optimized (but not re-parsed) | ||
|
||
Oops can be somewhat more difficult to read, due to the more aggressive | ||
inlining (it helps to use scripts/faddr2line) | ||
|
||
Normal "reasonable" builds work with less than 4GB of RAM, but very large | ||
configurations like allyesconfig typically need more memory. The actual | ||
memory needed depends on the available memory (gcc sizes its garbage | ||
collector pools based on that or on the ulimit -m limits) and | ||
the compiler version. | ||
|
||
Configuration: | ||
- Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE. | ||
This is mainly to not have allyesconfig default to LTO. | ||
|
||
Requirements: | ||
- Enough memory: 4GB for a standard build, more for allyesconfig | ||
The peak memory usage happens single threaded (when lto-wpa merges types), | ||
so dialing back -j options will not help much. | ||
|
||
A 32bit compiler is unlikely to work due to the memory requirements. | ||
You can however build a kernel targeted at 32bit on a 64bit host. | ||
|
||
FAQs: | ||
|
||
Q: I get a section type attribute conflict | ||
A: Usually because of someone doing | ||
const __initdata (should be const __initconst) or const __read_mostly | ||
(should be just const). Check both symbols reported by gcc. | ||
|
||
Q: What's up with .XXXXX numeric post fixes | ||
A: This is due LTO turning (near) all symbols to static | ||
Use gcc 4.9, it avoids them in most cases. They are also filtered out | ||
in kallsyms. There are still some .lto_priv left. | ||
|
||
References: | ||
|
||
Presentation on Kernel LTO | ||
(note, performance numbers/details outdated. In particular gcc 4.9 fixed | ||
most of the build time problems): | ||
http://halobates.de/kernel-lto.pdf | ||
|
||
Generic gcc LTO: | ||
http://www.ucw.cz/~hubicka/slides/labs2013.pdf | ||
http://www.hipeac.net/system/files/barcelona.pdf | ||
|
||
Somewhat outdated too: | ||
http://gcc.gnu.org/projects/lto/lto.pdf | ||
http://gcc.gnu.org/projects/lto/whopr.pdf | ||
|
||
Happy Link-Time-Optimizing! | ||
|
||
Andi Kleen |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
# | ||
# Support for gcc link time optimization | ||
# | ||
|
||
DISABLE_LTO := | ||
LTO_CFLAGS := | ||
|
||
export DISABLE_LTO | ||
export LTO_CFLAGS | ||
|
||
ifdef CONFIG_LTO | ||
ifdef CONFIG_UBSAN | ||
ifeq ($(call cc-ifversion,-lt,0600,y),y) | ||
# work around compiler asserts due to UBSAN | ||
$(warning Disabling LTO for gcc 5.x because UBSAN is active) | ||
undefine CONFIG_LTO | ||
endif | ||
endif | ||
endif | ||
|
||
ifdef CONFIG_LTO | ||
# 4.7 works mostly, but it sometimes loses symbols on large builds | ||
# This can be worked around by marking those symbols visible, | ||
# but that is fairly ugly and the problem is gone with 4.8 | ||
# 4.8 was very slow | ||
# 4.9 was missing __attribute__((noreorder)) for ordering initcalls, | ||
# and needed -fno-toplevel-reorder, which can lead to missing symbols | ||
# so only support 5.0+ | ||
ifeq ($(call cc-ifversion, -ge, 0500,y),y) | ||
# is the compiler compiled with LTO? | ||
ifneq ($(call cc-option,${LTO_CFLAGS},n),n) | ||
# binutils before 2.27 has various problems with plugins | ||
ifeq ($(call ld-ifversion,-ge,227000000,y),y) | ||
|
||
LTO_CFLAGS := -flto $(DISABLE_TL_REORDER) | ||
LTO_FINAL_CFLAGS := -fuse-linker-plugin | ||
|
||
# would be needed to support < 5.0 | ||
# LTO_FINAL_CFLAGS += -fno-toplevel-reorder | ||
|
||
LTO_FINAL_CFLAGS += -flto=jobserver | ||
|
||
# don't compile everything twice | ||
# requires plugin ar | ||
LTO_CFLAGS += -fno-fat-lto-objects | ||
|
||
# Used to disable LTO for specific files (e.g. vdso) | ||
DISABLE_LTO := -fno-lto | ||
|
||
# shut up lots of warnings for the compat syscalls | ||
LTO_CFLAGS += $(call cc-disable-warning,attribute-alias,) | ||
|
||
LTO_FINAL_CFLAGS += ${LTO_CFLAGS} -fwhole-program | ||
|
||
# most options are passed through implicitely in the LTO | ||
# files per function, but not all. | ||
# should not pass any that may need to be disabled for | ||
# individual files. | ||
LTO_FINAL_CFLAGS += $(filter -pg,${KBUILD_CFLAGS}) | ||
LTO_FINAL_CFLAGS += $(filter -fno-strict-aliasing,${KBUILD_CFLAGS}) | ||
|
||
ifdef CONFIG_LTO_DEBUG | ||
LTO_FINAL_CFLAGS += -fdump-ipa-cgraph -fdump-ipa-inline-details | ||
# add for debugging compiler crashes: | ||
# LTO_FINAL_CFLAGS += -dH -save-temps | ||
endif | ||
ifdef CONFIG_LTO_CP_CLONE | ||
LTO_FINAL_CFLAGS += -fipa-cp-clone | ||
LTO_CFLAGS += -fipa-cp-clone | ||
endif | ||
|
||
KBUILD_CFLAGS += ${LTO_CFLAGS} | ||
|
||
LDFINAL := ${CONFIG_SHELL} ${srctree}/scripts/gcc-ld \ | ||
${LTO_FINAL_CFLAGS} | ||
|
||
# LTO gcc creates a lot of files in TMPDIR, and with /tmp as tmpfs | ||
# it's easy to drive the machine OOM. Use the object directory | ||
# instead. | ||
TMPDIR ?= $(objtree) | ||
export TMPDIR | ||
|
||
# use plugin aware tools | ||
AR = $(CROSS_COMPILE)gcc-ar | ||
NM = $(CROSS_COMPILE)gcc-nm | ||
else | ||
$(warning WARNING old binutils. LTO disabled) | ||
endif | ||
else | ||
$(warning "WARNING: Compiler/Linker does not support LTO/WHOPR with linker plugin. CONFIG_LTO disabled.") | ||
endif | ||
else | ||
$(warning "WARNING: GCC $(call cc-version) too old for LTO/WHOPR. CONFIG_LTO disabled") | ||
endif | ||
endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters