compiler code model for kernel #275

nickdesaulniers · 2018-11-19T18:12:55Z

~~http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/613617.html~~

nickdesaulniers · 2019-01-28T18:44:26Z

other related threads:

https://lkml.org/lkml/2018/4/20/636
https://lkml.org/lkml/2018/3/16/476
https://patchwork.kernel.org/patch/10060381/
http://lkml.iu.edu/hypermail/linux/kernel/1711.0/02817.html

nickdesaulniers · 2020-02-08T09:32:05Z

@pcc might be interested in it.

nickdesaulniers · 2020-02-12T23:53:14Z

-fno-semantic-interposition might be of interest.

nickdesaulniers · 2021-01-30T06:57:52Z

Sorry, @ardbiesheuvel , all of my links for this look irrelevant. Do you have a lore link or something written up for your idea for a kernel code model for aarch64? I think some of the aggressive optimizations @MaskRay has been working on for x86 might play in with some of your ideas for aarch64.

ardbiesheuvel · 2021-01-30T15:53:37Z

I don't have any links at hand, but I can provide some background.

This issue came up when I discussed the assumption in the Linux/arm64 build system that AArch64 code generated by GCC without the -fpic or -fpie flags set is suitable for linking with -pie, so that we can emit dynamic relocations into the bare metal binary, which it can use to self relocate at boot, for KASLR.

Ramana (who is [still] at ARM but no longer works on GCC so I won't pull him into this discussion) pointed out that this is risky, and it would be better to generate -fpic code. However, PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code: in a bare metal binary, there is no ELF symbol preemption, text relocations are not a problem, and executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary. This means that emitting GOT indirections is pointless, but inhibiting that with -fpic is cumbersome: -fvisibility=hidden only affects definitions not declarations, and the visibility pragma (which does affect declarations too) can only be emitted via a .h file, which needs to be pulled in using -include etc etc

So this is when we first discussed introducing -mcmodel=kernel for AArch64, which could imply whichever internal options we need to get small model code but without all the GOT and .so stuff.

MaskRay · 2021-01-30T20:00:44Z

For a STB_GLOBAL/STB_WEAK symbol,

STV_DEFAULT: both compiler & linker need to assume such symbols can be preempted in -fpic mode. The compiler emits GOT indirection by default.
GCC -fno-semantic-interposition uses local aliases on defined non-weak function symbols for x86 (unimplemented in other architectures).
Clang -fno-semantic-interposition uses local aliases on defined non-weak symbols (both function and data) for x86.

STV_PROTECTED: GCC -fpic uses GOT indirection for data symbols, regardless of defined or undefined. This pessimization is to make a misfeature "copy relocation on protected data symbol" work (https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#protected-data-symbols-and-direct-accesses). Clang code generation treats STV_PROTECTED the same way as STV_HIDDEN.

STV_HIDDEN: non-preemptible, regardless of defined or undefined. The compiler suppresses GOT indirection, unless undefined STB_WEAK.

For defined symbols, -fno-pic/-fpie can avoid GOT indirection for STV_DEFAULT (and GCC STV_PROTECTED).
-fvisibility=hidden can change visibility.

For undefined symbols, -fpie/-fpic use GOT indirection by default. Clang -fno-direct-access-external-data (discussed in my article) can avoid GOT indirection. If you -fpic -fno-direct-access-external-data & ld -shared, you'll need additional linker options to make the linker know defined non-STB_LOCAL STV_DEFAULT symbols are non-preemptible.

However, PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code: in a bare metal binary, there is no ELF symbol preemption, text relocations are not a problem, and executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.

The use case is similar to a userspace static no-pie executable (-fno-pic -no-pie) or static pie (-fpie -pie).

and it would be better to generate -fpic code.

Why is -fpie risky?

nickdesaulniers · 2021-01-31T04:05:57Z

Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):

PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code

It seems to also produce an excessive growth in the number of relocations in debug info sections, which accounts for a significant growth in size of the binary (when debug info is not stripped or produced separately). The change in file size of vmlinux from enabling CONFIG_RELOCATABLE can be ~95% attributed to growth in .rela.debug_* sections, at least on x86 and DWARFv4.

there is no ELF symbol preemption

Does -fno-semantic-interposition help, or is there still more? How does -fvisibility differ from -fno-semantic-interposition (I should probably just go look up STV_PROTECTED)? How does -fpie differ from -fpic?

text relocations are not a problem

Right, hence -Wl,-z,notext, or are there additional problems?

executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.

I understand; does this result in sub optimal code gen, in your experience?

Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).

Do Clang and GCC both not implement -fno-semantic-interposition for non-x86 architectures?

nathanchance · 2021-01-31T18:52:34Z

I believe this is the link in the first comment:

https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8LchCWB-b+cDmStwrNf5qDs6ZMQ4drv2HMk-8OyjbzNQ@mail.gmail.com/

Here are lore links for all of the other posts:

https://lore.kernel.org/r/CAKv+Gu_tuYcikQ07QKP-N+rd+DpoucSYn6TG+OJ-jm9CVGaDxg@mail.gmail.com
https://lore.kernel.org/r/26a25069-ea1d-5fb3-549c-ab653f454a30@arm.com
https://lore.kernel.org/r/20171115213428.22559-7-samitolvanen@google.com
https://lore.kernel.org/r/20171103192634.u25go4tu7lgzl6ja@lakrids.cambridge.arm.com/

ardbiesheuvel · 2021-02-01T08:00:43Z

Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):

...

there is no ELF symbol preemption

Does -fno-semantic-interposition help, or is there still more? How does -fvisibility differ from -fno-semantic-interposition (I should probably just go look up STV_PROTECTED)? How does -fpie differ from -fpic?

I don't see a difference with -fno-semantic-interposition, either on GCC or Clang. In both cases, a reference to an undefined symbol is emitted using an entry in the GOT.

text relocations are not a problem

Right, hence -Wl,-z,notext, or are there additional problems?

Not to my knowledge, no.

executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.

I understand; does this result in sub optimal code gen, in your experience?

Yes, through the generation of GOT entries. With a GOT, all relocated quantities are close together, which reduces the footprint of pages that are CoW'ed due to relocation processing. Without CoW, this GOT just takes up more space and results in more memory accesses, but without the benefit.

Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).

The point is really that AArch64's ADRP/ADD pairs are position independent by their very nature, which is why we currently don't need to use -fpic or-fpie to obtain object files that can be linked with -pie. In other words, the object code is identical, and the only difference is in the additional RELA sections and metadata emitted by the linker.

Fundamentally, this code model should equally support CONFIG_RELOCATABLE=n because that code model should codify the current behavior of -mcmodel=small, but with future guarantees that the resulting object files can always be linked using using -pie, and that absolute references are only emitted when strictly needed (i.e., not for jump tables)

Do Clang and GCC both not implement -fno-semantic-interposition for non-x86 architectures?

Both accept it for Aarch64 targets but I don't see any difference in the generated code.

MaskRay · 2021-02-01T08:35:57Z

Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):

...

there is no ELF symbol preemption

Does -fno-semantic-interposition help, or is there still more? How does -fvisibility differ from -fno-semantic-interposition (I should probably just go look up STV_PROTECTED)? How does -fpie differ from -fpic?

My previous comment mentioned the semantics.

text relocations are not a problem

Right, hence -Wl,-z,notext, or are there additional problems?

Not to my knowledge, no.

executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.

I understand; does this result in sub optimal code gen, in your experience?

Yes, through the generation of GOT entries. With a GOT, all relocated quantities are close together, which reduces the footprint of pages that are CoW'ed due to relocation processing. Without CoW, this GOT just takes up more space and results in more memory accesses, but without the benefit.

Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).

The point is really that AArch64's ADRP/ADD pairs are position independent by their very nature, which is why we currently don't need to use -fpic or-fpie to obtain object files that can be linked with -pie. In other words, the object code is identical, and the only difference is in the additional RELA sections and metadata emitted by the linker.

Fundamentally, this code model should equally support CONFIG_RELOCATABLE=n because that code model should codify the current behavior of -mcmodel=small, but with future guarantees that the resulting object files can always be linked using using -pie, and that absolute references are only emitted when strictly needed (i.e., not for jump tables)

clang -fpie -fdirect-access-external-data meets your needs. GCC feature request: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

-fpie means defined symbols are non-preemptible. -fdirect-access-external-data references undefined symbols via direct relocation types.

Do Clang and GCC both not implement -fno-semantic-interposition for non-x86 architectures?

No, as my previous comment mentioned.

nickdesaulniers · 2021-02-02T23:27:48Z

clang -fpie -fdirect-access-external-data meets your needs.

Is -fdirect-access-external-data currently only implemented for x86 in clang, like -fno-semantic-interposition?

and that absolute references are only emitted when strictly needed (i.e., not for jump tables)

Right, if a compiler uses absolute references for jump tables when compiling as -pie, then that's a compiler bug. Right?

MaskRay · 2021-02-03T00:32:35Z

clang -fpie -fdirect-access-external-data meets your needs.

Is -fdirect-access-external-data currently only implemented for x86 in clang, like -fno-semantic-interposition?

-fdirect-access-external-data is supported by most llvm supported targets.

The opposite -fno-pic fno-direct-access-external-data has triggered an x86 fastisel bug and an arm fastisel bug.

and that absolute references are only emitted when strictly needed (i.e., not for jump tables)

Right, if a compiler uses absolute references for jump tables when compiling as -pie, then that's a compiler bug. Right?

(Fixing a typo: -pie => -fpie. -pie is a linker mode.)
Yes. Absolute references should only produced for -fno-pic code. (Technically if a symbol is SHN_ABS, absolute references can be used in -fpie/-fpic mode as well. LLVM IR !absolute_symbol (but clang does not emit it))

nickdesaulniers · 2021-02-03T01:30:40Z

$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make LLVM=1 LLVM_IAS=1 -j72 KCFLAGS=-fdirect-access-external-data

built, booted (in QEMU), and no one died (this time)(I think).

Checking the object files' relocations, which undefined symbols use relocations that reference the GOT? This is on the caller's side that bl should be using non R_AARCH64_CALL26 relocations, without -fno-direct-access-external-data? Or is there something I need to change in Kbuild first? I do have CONFIG_RELOCATABLE=y set.

nickdesaulniers added the [BUG] llvm A bug that should be fixed in upstream LLVM label Nov 19, 2018

nickdesaulniers self-assigned this Nov 19, 2018

nickdesaulniers added the feature-request Not a bug per-se label Nov 21, 2018

nickdesaulniers mentioned this issue Dec 18, 2018

arm64 kvm built with clang doesn't boot #11

Closed

nickdesaulniers removed their assignment Jan 28, 2019

nickdesaulniers added the polish Not a correctness bug, but will improve code performance or size label Feb 8, 2020

nickdesaulniers mentioned this issue Nov 10, 2020

ld.lld: error: section: .exit.data is not contiguous with other relro sections #1189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler code model for kernel #275

compiler code model for kernel #275

nickdesaulniers commented Nov 19, 2018 •

edited by nathanchance

Loading

nickdesaulniers commented Jan 28, 2019

nickdesaulniers commented Feb 8, 2020

nickdesaulniers commented Feb 12, 2020

nickdesaulniers commented Jan 30, 2021

ardbiesheuvel commented Jan 30, 2021

MaskRay commented Jan 30, 2021

nickdesaulniers commented Jan 31, 2021 •

edited

Loading

nathanchance commented Jan 31, 2021

ardbiesheuvel commented Feb 1, 2021

MaskRay commented Feb 1, 2021

nickdesaulniers commented Feb 2, 2021

MaskRay commented Feb 3, 2021

nickdesaulniers commented Feb 3, 2021

compiler code model for kernel #275

compiler code model for kernel #275

Comments

nickdesaulniers commented Nov 19, 2018 • edited by nathanchance Loading

nickdesaulniers commented Jan 28, 2019

nickdesaulniers commented Feb 8, 2020

nickdesaulniers commented Feb 12, 2020

nickdesaulniers commented Jan 30, 2021

ardbiesheuvel commented Jan 30, 2021

MaskRay commented Jan 30, 2021

nickdesaulniers commented Jan 31, 2021 • edited Loading

nathanchance commented Jan 31, 2021

ardbiesheuvel commented Feb 1, 2021

MaskRay commented Feb 1, 2021

nickdesaulniers commented Feb 2, 2021

MaskRay commented Feb 3, 2021

nickdesaulniers commented Feb 3, 2021

nickdesaulniers commented Nov 19, 2018 •

edited by nathanchance

Loading

nickdesaulniers commented Jan 31, 2021 •

edited

Loading