Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler code model for kernel #275

Open
nickdesaulniers opened this issue Nov 19, 2018 · 13 comments
Open

compiler code model for kernel #275

nickdesaulniers opened this issue Nov 19, 2018 · 13 comments
Labels
[BUG] llvm A bug that should be fixed in upstream LLVM feature-request Not a bug per-se polish Not a correctness bug, but will improve code performance or size

Comments

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Nov 19, 2018

http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/613617.html

Updated links: #275 (comment)

@nickdesaulniers nickdesaulniers added the [BUG] llvm A bug that should be fixed in upstream LLVM label Nov 19, 2018
@nickdesaulniers nickdesaulniers self-assigned this Nov 19, 2018
@nickdesaulniers nickdesaulniers added the feature-request Not a bug per-se label Nov 21, 2018
@nickdesaulniers nickdesaulniers removed their assignment Jan 28, 2019
@nickdesaulniers
Copy link
Member Author

@pcc might be interested in it.

@nickdesaulniers nickdesaulniers added the polish Not a correctness bug, but will improve code performance or size label Feb 8, 2020
@nickdesaulniers
Copy link
Member Author

-fno-semantic-interposition might be of interest.

@nickdesaulniers
Copy link
Member Author

Sorry, @ardbiesheuvel , all of my links for this look irrelevant. Do you have a lore link or something written up for your idea for a kernel code model for aarch64? I think some of the aggressive optimizations @MaskRay has been working on for x86 might play in with some of your ideas for aarch64.

@ardbiesheuvel
Copy link

I don't have any links at hand, but I can provide some background.

This issue came up when I discussed the assumption in the Linux/arm64 build system that AArch64 code generated by GCC without the -fpic or -fpie flags set is suitable for linking with -pie, so that we can emit dynamic relocations into the bare metal binary, which it can use to self relocate at boot, for KASLR.

Ramana (who is [still] at ARM but no longer works on GCC so I won't pull him into this discussion) pointed out that this is risky, and it would be better to generate -fpic code. However, PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code: in a bare metal binary, there is no ELF symbol preemption, text relocations are not a problem, and executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary. This means that emitting GOT indirections is pointless, but inhibiting that with -fpic is cumbersome: -fvisibility=hidden only affects definitions not declarations, and the visibility pragma (which does affect declarations too) can only be emitted via a .h file, which needs to be pulled in using -include etc etc

So this is when we first discussed introducing -mcmodel=kernel for AArch64, which could imply whichever internal options we need to get small model code but without all the GOT and .so stuff.

@MaskRay
Copy link
Member

MaskRay commented Jan 30, 2021

For a STB_GLOBAL/STB_WEAK symbol,

STV_DEFAULT: both compiler & linker need to assume such symbols can be preempted in -fpic mode. The compiler emits GOT indirection by default.
GCC -fno-semantic-interposition uses local aliases on defined non-weak function symbols for x86 (unimplemented in other architectures).
Clang -fno-semantic-interposition uses local aliases on defined non-weak symbols (both function and data) for x86.

STV_PROTECTED: GCC -fpic uses GOT indirection for data symbols, regardless of defined or undefined. This pessimization is to make a misfeature "copy relocation on protected data symbol" work (https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#protected-data-symbols-and-direct-accesses). Clang code generation treats STV_PROTECTED the same way as STV_HIDDEN.

STV_HIDDEN: non-preemptible, regardless of defined or undefined. The compiler suppresses GOT indirection, unless undefined STB_WEAK.


For defined symbols, -fno-pic/-fpie can avoid GOT indirection for STV_DEFAULT (and GCC STV_PROTECTED).
-fvisibility=hidden can change visibility.

For undefined symbols, -fpie/-fpic use GOT indirection by default. Clang -fno-direct-access-external-data (discussed in my article) can avoid GOT indirection. If you -fpic -fno-direct-access-external-data & ld -shared, you'll need additional linker options to make the linker know defined non-STB_LOCAL STV_DEFAULT symbols are non-preemptible.

However, PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code: in a bare metal binary, there is no ELF symbol preemption, text relocations are not a problem, and executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.

The use case is similar to a userspace static no-pie executable (-fno-pic -no-pie) or static pie (-fpie -pie).

and it would be better to generate -fpic code.

Why is -fpie risky?

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Jan 31, 2021

Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):

PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code

It seems to also produce an excessive growth in the number of relocations in debug info sections, which accounts for a significant growth in size of the binary (when debug info is not stripped or produced separately). The change in file size of vmlinux from enabling CONFIG_RELOCATABLE can be ~95% attributed to growth in .rela.debug_* sections, at least on x86 and DWARFv4.

there is no ELF symbol preemption

Does -fno-semantic-interposition help, or is there still more? How does -fvisibility differ from -fno-semantic-interposition (I should probably just go look up STV_PROTECTED)? How does -fpie differ from -fpic?

text relocations are not a problem

Right, hence -Wl,-z,notext, or are there additional problems?

executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.

I understand; does this result in sub optimal code gen, in your experience?

Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).

Do Clang and GCC both not implement -fno-semantic-interposition for non-x86 architectures?

@ardbiesheuvel
Copy link

Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):

...

there is no ELF symbol preemption

Does -fno-semantic-interposition help, or is there still more? How does -fvisibility differ from -fno-semantic-interposition (I should probably just go look up STV_PROTECTED)? How does -fpie differ from -fpic?

I don't see a difference with -fno-semantic-interposition, either on GCC or Clang. In both cases, a reference to an undefined symbol is emitted using an entry in the GOT.

text relocations are not a problem

Right, hence -Wl,-z,notext, or are there additional problems?

Not to my knowledge, no.

executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.

I understand; does this result in sub optimal code gen, in your experience?

Yes, through the generation of GOT entries. With a GOT, all relocated quantities are close together, which reduces the footprint of pages that are CoW'ed due to relocation processing. Without CoW, this GOT just takes up more space and results in more memory accesses, but without the benefit.

Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).

The point is really that AArch64's ADRP/ADD pairs are position independent by their very nature, which is why we currently don't need to use -fpic or-fpie to obtain object files that can be linked with -pie. In other words, the object code is identical, and the only difference is in the additional RELA sections and metadata emitted by the linker.

Fundamentally, this code model should equally support CONFIG_RELOCATABLE=n because that code model should codify the current behavior of -mcmodel=small, but with future guarantees that the resulting object files can always be linked using using -pie, and that absolute references are only emitted when strictly needed (i.e., not for jump tables)

Do Clang and GCC both not implement -fno-semantic-interposition for non-x86 architectures?

Both accept it for Aarch64 targets but I don't see any difference in the generated code.

@MaskRay
Copy link
Member

MaskRay commented Feb 1, 2021

Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):

...

there is no ELF symbol preemption

Does -fno-semantic-interposition help, or is there still more? How does -fvisibility differ from -fno-semantic-interposition (I should probably just go look up STV_PROTECTED)? How does -fpie differ from -fpic?

My previous comment mentioned the semantics.

text relocations are not a problem

Right, hence -Wl,-z,notext, or are there additional problems?

Not to my knowledge, no.

executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.

I understand; does this result in sub optimal code gen, in your experience?

Yes, through the generation of GOT entries. With a GOT, all relocated quantities are close together, which reduces the footprint of pages that are CoW'ed due to relocation processing. Without CoW, this GOT just takes up more space and results in more memory accesses, but without the benefit.

Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).

The point is really that AArch64's ADRP/ADD pairs are position independent by their very nature, which is why we currently don't need to use -fpic or-fpie to obtain object files that can be linked with -pie. In other words, the object code is identical, and the only difference is in the additional RELA sections and metadata emitted by the linker.

Fundamentally, this code model should equally support CONFIG_RELOCATABLE=n because that code model should codify the current behavior of -mcmodel=small, but with future guarantees that the resulting object files can always be linked using using -pie, and that absolute references are only emitted when strictly needed (i.e., not for jump tables)

clang -fpie -fdirect-access-external-data meets your needs. GCC feature request: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

-fpie means defined symbols are non-preemptible. -fdirect-access-external-data references undefined symbols via direct relocation types.

Do Clang and GCC both not implement -fno-semantic-interposition for non-x86 architectures?

No, as my previous comment mentioned.

@nickdesaulniers
Copy link
Member Author

clang -fpie -fdirect-access-external-data meets your needs.

Is -fdirect-access-external-data currently only implemented for x86 in clang, like -fno-semantic-interposition?

and that absolute references are only emitted when strictly needed (i.e., not for jump tables)

Right, if a compiler uses absolute references for jump tables when compiling as -pie, then that's a compiler bug. Right?

@MaskRay
Copy link
Member

MaskRay commented Feb 3, 2021

clang -fpie -fdirect-access-external-data meets your needs.

Is -fdirect-access-external-data currently only implemented for x86 in clang, like -fno-semantic-interposition?

-fdirect-access-external-data is supported by most llvm supported targets.

The opposite -fno-pic fno-direct-access-external-data has triggered an x86 fastisel bug and an arm fastisel bug.

and that absolute references are only emitted when strictly needed (i.e., not for jump tables)

Right, if a compiler uses absolute references for jump tables when compiling as -pie, then that's a compiler bug. Right?

(Fixing a typo: -pie => -fpie. -pie is a linker mode.)
Yes. Absolute references should only produced for -fno-pic code. (Technically if a symbol is SHN_ABS, absolute references can be used in -fpie/-fpic mode as well. LLVM IR !absolute_symbol (but clang does not emit it))

@nickdesaulniers
Copy link
Member Author

$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make LLVM=1 LLVM_IAS=1 -j72 KCFLAGS=-fdirect-access-external-data

built, booted (in QEMU), and no one died (this time)(I think).

Checking the object files' relocations, which undefined symbols use relocations that reference the GOT? This is on the caller's side that bl should be using non R_AARCH64_CALL26 relocations, without -fno-direct-access-external-data? Or is there something I need to change in Kbuild first? I do have CONFIG_RELOCATABLE=y set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[BUG] llvm A bug that should be fixed in upstream LLVM feature-request Not a bug per-se polish Not a correctness bug, but will improve code performance or size
Projects
None yet
Development

No branches or pull requests

4 participants