Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kexec_file is broken with LLVM 18 and newer on x86_64 #2016

Open
0n-s opened this issue Apr 15, 2024 · 20 comments
Open

kexec_file is broken with LLVM 18 and newer on x86_64 #2016

0n-s opened this issue Apr 15, 2024 · 20 comments
Assignees
Labels
[ARCH] x86_64 This bug impacts ARCH=x86_64 [BUG] linux A bug that should be fixed in the mainline kernel. [PATCH] Accepted A submitted patch has been accepted upstream

Comments

@0n-s
Copy link

0n-s commented Apr 15, 2024

Don't really think there's much to explain here. Very nice & simple reproducer & splat:

# kexec -s -l /mnt/vmlinuz-6.8.6 --reuse-cmdline
<4>[  319.553852] ------------[ cut here ]------------
<4>[  319.555068] WARNING: CPU: 0 PID: 1501 at kernel/kexec_file.c:936 kexec_load_purgatory+0x3b1/0x4a0
<4>[  319.557254] Modules linked in: 9p netfs drbd lru_cache libcrc32c crc32c_generic af_packet scsi_transport_iscsi cfg80211 qrtr rfkill uinput edac_core intel_rapl_msr intel_rapl_common kvm_intel sr_mod cdrom kvm ahci libahci irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel polyval_clmulni polyval_generic libata gf128mul ghash_clmulni_intel sha512_ssse3 psmouse sha256_ssse3 input_leds iTCO_wdt sha1_ssse3 atkbd intel_pmc_bxt iTCO_vendor_support vivaldi_fmap ppdev libps2 watchdog rapl evdev snd_pcm i8042 parport_pc bochs 9pnet_virtio drm_vram_helper snd_timer drm_ttm_helper 9pnet parport rtc_cmos serio e1000e snd ttm lpc_ich i2c_i801 soundcore button pcspkr i2c_smbus ksmbd rdma_cm iw_cm ib_cm ib_core dm_multipath dm_mod fuse nls_ucs2_utils cifs_arc4 efi_pstore scsi_mod scsi_common dmi_sysfs qemu_fw_cfg virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev autofs4 aesni_intel crypto_simd cryptd
<4>[  319.576191] CPU: 0 PID: 1501 Comm: kexec Tainted: G        W          6.8.6 #1 191f8fecad84ce150bee56a123c22a930e1c0594
<4>[  319.578826] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-stable202302-for-qemu 03/01/2023
<4>[  319.581167] RIP: 0010:kexec_load_purgatory+0x3b1/0x4a0
<4>[  319.582449] Code: 54 24 0c 48 89 c8 48 29 d0 0f 82 4a ff ff ff 49 03 54 24 1c 48 39 d1 0f 83 3c ff ff ff 49 8b 17 48 39 4a 18 0f 84 0e ff ff ff <0f> 0b e9 28 ff ff ff 66 85 c9 74 13 48 8b 5a 28 48 01 d3 45 31 e4
<4>[  319.586837] RSP: 0018:ffffb4e0c0e13bc0 EFLAGS: 00010206
<4>[  319.588123] RAX: 0000000000000000 RBX: 00000000000000d0 RCX: 0000000000000000
<4>[  319.589883] RDX: ffff975f440c7400 RSI: 0000000000000010 RDI: ffffb4e0c027d0c0
<4>[  319.591604] RBP: 0000000000000002 R08: 0000003d8b4c0000 R09: cc0000000025ff00
<4>[  319.593359] R10: 0000003d8b4c0000 R11: cc0000000025ff00 R12: ffffb4e0c0139084
<4>[  319.595079] R13: 00000002bfffe000 R14: ffff975f440c76e0 R15: ffffb4e0c0e13c50
<4>[  319.596833] FS:  00005a2dc5157740(0000) GS:ffff975fb6c00000(0000) knlGS:0000000000000000
<4>[  319.598763] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  319.600202] CR2: 00005a2dc5364370 CR3: 0000000181be4002 CR4: 0000000000060ef0
<4>[  319.601928] Call Trace:
<4>[  319.602597]  <TASK>
<4>[  319.603198]  ? __warn+0xcf/0x1e0
<4>[  319.604038]  ? kexec_load_purgatory+0x3b1/0x4a0
<4>[  319.605170]  ? report_bug+0x154/0x210
<4>[  319.606121]  ? handle_bug+0x3d/0x90
<4>[  319.607037]  ? exc_invalid_op+0x1a/0x60
<4>[  319.608012]  ? asm_exc_invalid_op+0x1a/0x20
<4>[  319.609077]  ? kexec_load_purgatory+0x3b1/0x4a0
<4>[  319.610232]  bzImage64_load+0x1c7/0x6e0
<4>[  319.611222]  kexec_image_load_default+0x57/0x80
<4>[  319.612354]  __se_sys_kexec_file_load+0x57c/0x720
<4>[  319.613570]  do_syscall_64+0x90/0x150
<4>[  319.614511]  ? syscall_exit_work+0x109/0x1a0
<4>[  319.615587]  ? syscall_exit_to_user_mode+0x96/0xc0
<4>[  319.616819]  ? do_syscall_64+0x9c/0x150
<4>[  319.617794]  ? do_user_addr_fault+0x37d/0x6a0
<4>[  319.618891]  ? syscall_exit_to_user_mode+0x96/0xc0
<4>[  319.620118]  entry_SYSCALL_64_after_hwframe+0x78/0x80
<4>[  319.621372] RIP: 0033:0x5a2dc5229469
<4>[  319.622302] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 19 0b 00 f7 d8 64 89 01 48
<4>[  319.626711] RSP: 002b:000077d8310cac88 EFLAGS: 00000246 ORIG_RAX: 0000000000000140
<4>[  319.628531] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00005a2dc5229469
<4>[  319.630282] RDX: 000000000000001c RSI: 00000000ffffffff RDI: 0000000000000003
<4>[  319.632001] RBP: 0000000000000003 R08: 0000000000000004 R09: 000077d8310cb453
<4>[  319.633755] R10: 00005a2dc6c84010 R11: 0000000000000246 R12: 00005a2dc5362080
<4>[  319.635476] R13: 00005a2dc5362120 R14: 000077d8310cb008 R15: 0000000000000004
<4>[  319.637234]  </TASK>
<4>[  319.637835] ---[ end trace 0000000000000000 ]---
<3>[  319.638986] kexec: Overflow in relocation type 10 value 0x2bfffb7f0
<3>[  319.643562] kexec-bzImage64: Loading purgatory failed

vmlinuz-6.8.6 is a bzImage with EFI_STUB enabled, if that matters.

Unsurprisingly trying to kexec -e here will result in a hang.

@nathanchance
Copy link
Member

ThinLTO and CFI are red herrings, this reproduces for me with a standard distribution configuration (Arch Linux) with LLVM 18 and newer. It is only reproducible on x86_64, aarch64 works fine for me.

I bisected LLVM to see what change introduced this and I landed on llvm/llvm-project@239a41e. I have not tried to compare the diff of kernel/kexec_file.o before and after that change to see what is actually changing here. I'll see if I can come up with a minimal reproducer based on that.

@nathanchance nathanchance changed the title kexec_file is broken with thinLTO+CFI on x86_64 kexec_file is broken with LLVM 18 and newer on x86_64 Apr 16, 2024
@nathanchance nathanchance added the [ARCH] x86_64 This bug impacts ARCH=x86_64 label Apr 16, 2024
@0n-s
Copy link
Author

0n-s commented Apr 16, 2024

I bisected LLVM to see what change introduced this and I landed on llvm/llvm-project@239a41e. I have not tried to compare the diff of kernel/kexec_file.o before and after that change to see what is actually changing here. I'll see if I can come up with a minimal reproducer based on that.

I would suggest mainly looking at arch/x86/purgatory first. That code is extremely sensitive & the vast majority of its Makefile is just removing all instrumentation & anything else that might make it unsuitable for kexec.

& OFC inspecting the comment above the line that triggers a warning has useful clues as to the unmet requirements: https://github.com/gregkh/linux/blob/486067966cd308763a6ec811296e2ed4581f4af6/kernel/kexec_file.c#L922

@nathanchance
Copy link
Member

Thanks for the tip, I realized that once I started digging a little bit. There are two distinct but related issues here. I think I understand the first issue but I am not sure I understand the second one...

The first issue is the warning.

[    1.264240] ------------[ cut here ]------------
[    1.264647] WARNING: CPU: 0 PID: 96 at kernel/kexec_file.c:945 kexec_load_purgatory+0x2c8/0x3c0
[    1.265322] Modules linked in:
[    1.265565] CPU: 0 PID: 96 Comm: kexec Not tainted 6.9.0-rc4-00031-g96fca68c4fbf #1 eae91b3fe699ecba2dd0a886471788e49eb36ac0
[    1.266403] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    1.267268] RIP: 0010:kexec_load_purgatory+0x2c8/0x3c0
[    1.267661] Code: 54 24 0c 48 89 c8 48 29 d0 0f 82 5d ff ff ff 49 03 54 24 1c 48 39 d1 0f 83 4f ff ff ff 49 8b 17 48 39 4a 18 0f 84 30 ff ff ff <0f> 0b e9 3b ff ff ff 66 85 c9 74 18 48 8b 5a 28 48 01 d3 45 31 e4
[    1.269052] RSP: 0018:ffffbe28007cfb50 EFLAGS: 00010206
[    1.269447] RAX: 0000000000000000 RBX: 00000000000000d0 RCX: 0000000000000000
[    1.269982] RDX: ffff988c8174d000 RSI: 0000000000000010 RDI: ffffbe2801d940c0
[    1.270527] RBP: 0000000000000002 R08: 0000003d8b4c0000 R09: cc0000000025ff00
[    1.271063] R10: 0000003d8b4c0000 R11: cc0000000025ff00 R12: ffffbe28000d5084
[    1.271603] R13: 000000013ffff000 R14: ffff988c8174d000 R15: ffffbe28007cfbe0
[    1.272140] FS:  00007fec73535740(0000) GS:ffff988cbbc00000(0000) knlGS:0000000000000000
[    1.272744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.273178] CR2: 00007fec736b1390 CR3: 0000000101a24000 CR4: 0000000000350ef0
[    1.273732] Call Trace:
[    1.273929]  <TASK>
[    1.274100]  ? __warn+0xc9/0x1c0
[    1.274356]  ? kexec_load_purgatory+0x2c8/0x3c0
[    1.274704]  ? report_bug+0x139/0x1e0
[    1.274998]  ? handle_bug+0x42/0x70
[    1.275269]  ? exc_invalid_op+0x1a/0x50
[    1.275574]  ? asm_exc_invalid_op+0x1a/0x20
[    1.275900]  ? kexec_load_purgatory+0x2c8/0x3c0
[    1.276251]  bzImage64_load+0x1c1/0x6a0
[    1.276556]  kexec_image_load_default+0x49/0x60
[    1.276907]  __se_sys_kexec_file_load+0x606/0x790
[    1.277280]  ? arch_exit_to_user_mode_prepare+0x6e/0x70
[    1.277675]  do_syscall_64+0x90/0x170
[    1.277955]  ? srso_return_thunk+0x5/0x5f
[    1.278265]  ? __count_memcg_events+0x50/0xc0
[    1.278597]  ? srso_return_thunk+0x5/0x5f
[    1.278901]  ? handle_mm_fault+0xb18/0x11c0
[    1.279218]  ? vfs_read+0x2c8/0x2f0
[    1.279498]  ? srso_return_thunk+0x5/0x5f
[    1.279802]  ? do_user_addr_fault+0x4d2/0x690
[    1.280138]  ? srso_return_thunk+0x5/0x5f
[    1.280449]  ? srso_return_thunk+0x5/0x5f
[    1.280755]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    1.281136] RIP: 0033:0x7fec7363e88d
[    1.281411] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 14 0d 00 f7 d8 64 89 01 48
[    1.282789] RSP: 002b:00007ffd136f4808 EFLAGS: 00000246 ORIG_RAX: 0000000000000140
[    1.283354] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fec7363e88d
[    1.283893] RDX: 00000000000000c5 RSI: 0000000000000005 RDI: 0000000000000003
[    1.284427] RBP: 0000000000000003 R08: 0000000000000000 R09: 00005628517eef10
[    1.284966] R10: 00005628580a75f0 R11: 0000000000000246 R12: 0000000000000003
[    1.285500] R13: 00005628517f89a8 R14: 00007ffd136f4b98 R15: 0000000000000004
[    1.286036]  </TASK>
[    1.286210] ---[ end trace 0000000000000000 ]---

As you aluded to, this warning is caused by the fact that there are now two text sections with -mcmodel=large, which arch/x86/purgatory uses after commit e16c2983fba0 ("x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors"): .text and .ltext.

With LLVM 17:

$ llvm-readelf -S arch/x86/purgatory/purgatory.ro
There are 18 section headers, starting at offset 0x43a0:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 001683 00  AX  0   0 16
  [ 2] __patchable_function_entries PROGBITS 0000000000000000 0016c8 0000c0 00 WAL  1   0  8
  [ 3] .rela.text        RELA            0000000000000000 003250 000468 18   I 15   1  8
  [ 4] .rela__patchable_function_entries RELA 0000000000000000 0036b8 000240 18   I 15   2  8
  [ 5] .kexec-purgatory  PROGBITS        0000000000000000 001790 000120 00  WA  0   0 16
  [ 6] .comment          PROGBITS        0000000000000000 0038f8 000076 01  MS  0   0  1
  [ 7] .llvm_addrsig     LLVM_ADDRSIG    0000000000000000 00396e 000003 00   E  0   0  1
  [ 8] .data             PROGBITS        0000000000000000 002000 001000 00  WA  0   0 4096
  [ 9] .rodata           PROGBITS        0000000000000000 003000 0001e0 00   A  0   0 16
  [10] .rela.rodata      RELA            0000000000000000 003978 000030 18   I 15   9  8
  [11] .bss              NOBITS          0000000000000000 0031e0 001000 00  WA  0   0 4096
  [12] .modinfo          PROGBITS        0000000000000000 0031e0 000039 00   A  0   0  1
  [13] .rodata.str1.1    PROGBITS        0000000000000000 003219 000032 01 AMS  0   0  1
  [14] .note.GNU-stack   PROGBITS        0000000000000000 0039a8 000000 00      0   0  1
  [15] .symtab           SYMTAB          0000000000000000 0039a8 0006f0 18     17  45  8
  [16] .shstrtab         STRTAB          0000000000000000 004098 0000db 00      0   0  1
  [17] .strtab           STRTAB          0000000000000000 004173 00022d 00      0   0  1

With LLVM 18:

$ llvm-readelf -S arch/x86/purgatory/purgatory.ro
There are 21 section headers, starting at offset 0x4770:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 0000d0 00  AX  0   0 16
  [ 2] .ltext            PROGBITS        0000000000000000 000110 0015a6 00 AXl  0   0 16
  [ 3] __patchable_function_entries PROGBITS 0000000000000000 0016b8 0000c0 00 WAL  2   0  8
  [ 4] .rela.ltext       RELA            0000000000000000 003258 0005b8 18   I 18   2  8
  [ 5] .rela__patchable_function_entries RELA 0000000000000000 003810 000240 18   I 18   3  8
  [ 6] .kexec-purgatory  PROGBITS        0000000000000000 001780 000120 00  WA  0   0 16
  [ 7] .comment          PROGBITS        0000000000000000 003a50 00007a 01  MS  0   0  1
  [ 8] .llvm_addrsig     LLVM_ADDRSIG    0000000000000000 003aca 000003 00   E  0   0  1
  [ 9] .data             PROGBITS        0000000000000000 002000 001000 00  WA  0   0 4096
  [10] .rela.text        RELA            0000000000000000 003ad0 000228 18   I 18   1  8
  [11] .rodata           PROGBITS        0000000000000000 003000 0000e0 00   A  0   0 16
  [12] .rela.rodata      RELA            0000000000000000 003cf8 000030 18   I 18  11  8
  [13] .bss              NOBITS          0000000000000000 0030e0 001000 00  WA  0   0 4096
  [14] .modinfo          PROGBITS        0000000000000000 0030e0 000039 00   A  0   0  1
  [15] .lrodata          PROGBITS        0000000000000000 003120 000100 00  Al  0   0 16
  [16] .rodata.str1.1    PROGBITS        0000000000000000 003220 000032 01 AMSl  0   0  1
  [17] .note.GNU-stack   PROGBITS        0000000000000000 003d28 000000 00      0   0  1
  [18] .symtab           SYMTAB          0000000000000000 003d28 000720 18     20  47  8
  [19] .shstrtab         STRTAB          0000000000000000 004448 0000f7 00      0   0  1
  [20] .strtab           STRTAB          0000000000000000 00453f 00022d 00      0   0  1

Resulting in the following section diff:

diff --git a/tmp/.psub.bySdEduC68 b/tmp/.psub.dAUlPpwKwM
index 5399ef7a64b6..254217201609 100644
--- a/tmp/.psub.bySdEduC68
+++ b/tmp/.psub.dAUlPpwKwM
@@ -1,17 +1,20 @@
 Name
 NULL
 .text
+.ltext
 __patchable_function_entries
-.rela.text
+.rela.ltext
 .rela__patchable_function_entries
 .kexec-purgatory
 .comment
 .llvm_addrsig
 .data
+.rela.text
 .rodata
 .rela.rodata
 .bss
 .modinfo
+.lrodata
 .rodata.str1.1
 .note.GNU-stack
 .symtab

This behavior change is caused by LLVM commit llvm/llvm-project@d8a0439, which adds the use of .ltext. As far as I am aware, there is no way to opt out of this optimization. The only way to resolve it will be with a linker script for purgatory.ro. I came up with the following diff based on arch/s390/purgatory/purgatory.lds.S and Ricardo Ribalda's previous attempt for a similar reason, which works with both GCC 13 and LLVM 17 for me.

diff --git a/arch/x86/purgatory/.gitignore b/arch/x86/purgatory/.gitignore
index d2be1500671d..71bd99d98906 100644
--- a/arch/x86/purgatory/.gitignore
+++ b/arch/x86/purgatory/.gitignore
@@ -1 +1,2 @@
 purgatory.chk
+purgatory.lds
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index bc31863c5ee6..d4669644137f 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 OBJECT_FILES_NON_STANDARD := y
 
-purgatory-y := purgatory.o stack.o setup-x86_$(BITS).o sha256.o entry64.o string.o
+purgatory-y := purgatory.o purgatory.lds stack.o setup-x86_$(BITS).o sha256.o entry64.o string.o
 
 targets += $(purgatory-y)
 PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
@@ -25,9 +25,9 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO),$(KBUILD_CFLAGS))
 
 # When linking purgatory.ro with -r unresolved symbols are not checked,
 # also link a purgatory.chk binary without -r to check for unresolved symbols.
-PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib
-LDFLAGS_purgatory.ro := -r $(PURGATORY_LDFLAGS)
-LDFLAGS_purgatory.chk := $(PURGATORY_LDFLAGS)
+PURGATORY_LDFLAGS := -z nodefaultlib
+LDFLAGS_purgatory.ro := -r $(PURGATORY_LDFLAGS) -T
+LDFLAGS_purgatory.chk := -e purgatory_start $(PURGATORY_LDFLAGS)
 targets += purgatory.ro purgatory.chk
 
 # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
@@ -80,7 +80,7 @@ CFLAGS_string.o			+= $(PURGATORY_CFLAGS)
 
 asflags-remove-y		+= $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
 
-$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
+$(obj)/purgatory.ro: $(obj)/purgatory.lds $(PURGATORY_OBJS) FORCE
 		$(call if_changed,ld)
 
 $(obj)/purgatory.chk: $(obj)/purgatory.ro FORCE
diff --git a/arch/x86/purgatory/purgatory.lds.S b/arch/x86/purgatory/purgatory.lds.S
new file mode 100644
index 000000000000..0898a221722f
--- /dev/null
+++ b/arch/x86/purgatory/purgatory.lds.S
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <asm-generic/vmlinux.lds.h>
+#include <asm/cache.h>
+
+OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
+
+#undef i386
+
+#ifdef CONFIG_X86_64
+OUTPUT_ARCH(i386:x86-64)
+#else
+OUTPUT_ARCH(i386)
+#endif
+
+ENTRY(purgatory_start)
+
+SECTIONS
+{
+	. = 0;
+
+	.kexec-purgatory : {
+		*(.kexec-purgatory)
+	}
+
+	.text : {
+		_text = .;
+		*(.text .text.*)
+		_etext = .;
+	}
+
+	.rodata : {
+		_rodata = .;
+		*(.rodata .rodata.*)
+		_erodata = .;
+	}
+
+	.data : {
+		_data = .;
+		*(.data .data.*)
+		_edata = .;
+	}
+
+	. = ALIGN(L1_CACHE_BYTES);
+	.bss : {
+		_bss = .;
+		*(.bss .bss.*)
+		*(COMMON)
+		. = ALIGN(8);	/* For convenience during zeroing */
+		_ebss = .;
+	}
+	_end = .;
+
+	ELF_DETAILS
+
+	DISCARDS
+	/DISCARD/ : {
+		*(.note.GNU-stack .note.gnu.property)
+		*(.llvm_addrsig)
+	}
+}

The second issue is the "Overflow in relocation" error, which is what actually causes the kexec load to fail due to the -ENOEXEC return in arch_kexec_apply_relocations_add().

[    1.286577] kexec: Overflow in relocation type 10 value 0x13fffc760

Using dyndbg="func arch_kexec_apply_relocations_add +p" to turn on the pr_debug() statements in arch_kexec_apply_relocations_add() shows...

LLVM 17:

[    1.271177] kernel: 00000000e4bf285d kernel_size: 0xc54200
[    1.271184] PEFILE: Unsigned PE binary
[    1.279398] kexec: Applying relocate section .rela.text to 1
[    1.279404] kexec: Symbol: purgatory_sha_regions info: 11 shndx: 05 value=20 size: 100
[    1.279406] kexec: Symbol: sha256_update info: 12 shndx: 01 value=2b0 size: eb
[    1.279408] kexec: Symbol: sha256_final info: 12 shndx: 01 value=bb0 size: 15
[    1.279410] kexec: Symbol: purgatory_sha256_digest info: 11 shndx: 05 value=0 size: 20
[    1.279412] kexec: Symbol: memcmp info: 12 shndx: 01 value=e80 size: f
[    1.279414] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279415] kexec: Symbol: .bss info: 03 shndx: 0b value=0 size: 0
[    1.279417] kexec: Symbol: purgatory info: 12 shndx: 01 value=10 size: 237
[    1.279419] kexec: Symbol: entry64 info: 10 shndx: 01 value=dd0 size: 9f
[    1.279420] kexec: Symbol: memcpy info: 12 shndx: 01 value=15c0 size: c3
[    1.279422] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279424] kexec: Symbol: memcpy info: 12 shndx: 01 value=15c0 size: c3
[    1.279425] kexec: Symbol: memset info: 12 shndx: 01 value=14d0 size: 22
[    1.279427] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279429] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279430] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279432] kexec: Symbol: memset info: 12 shndx: 01 value=14d0 size: 22
[    1.279433] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279435] kexec: Symbol: memset info: 12 shndx: 01 value=14d0 size: 22
[    1.279436] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279438] kexec: Symbol: memset info: 12 shndx: 01 value=14d0 size: 22
[    1.279439] kexec: Symbol: memset info: 12 shndx: 01 value=14d0 size: 22
[    1.279441] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279443] kexec: Symbol: sha256_update info: 12 shndx: 01 value=2b0 size: eb
[    1.279444] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279446] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279447] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279449] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279450] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279452] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279453] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279455] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279457] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279458] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279460] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279461] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279463] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279464] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279466] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279467] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279469] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279470] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279472] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279473] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279475] kexec: Symbol: kstrtoull info: 12 shndx: 01 value=1360 size: 137
[    1.279477] kexec: Symbol: .rodata.str1.1 info: 03 shndx: 0d value=0 size: 0
[    1.279478] kexec: Symbol: warn info: 12 shndx: 01 value=260 size: 5
[    1.279480] kexec: Applying relocate section .rela__patchable_function_entries to 2
[    1.279481] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279483] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279484] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279486] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279487] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279489] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279490] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279492] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279494] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279495] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279497] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279498] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279500] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279501] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279503] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279504] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279506] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279507] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279509] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279510] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279512] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279513] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279515] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279516] kexec: Symbol: .text info: 03 shndx: 01 value=0 size: 0
[    1.279518] kexec: Applying relocate section .rela.rodata to 9
[    1.279519] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279521] kexec: Symbol: .rodata info: 03 shndx: 09 value=0 size: 0
[    1.279522] Loaded purgatory at 0x13fffb000
[    1.279525] Loaded boot_param, command line and misc at 0x13fff9000 bufsz=0x11c0 memsz=0x2000
[    1.279527] Loaded 64bit kernel at 0x13c600000 bufsz=0xc4f200 memsz=0x3810000
[    1.279674] Loaded initrd at 0x13bca2000 bufsz=0x95d54a memsz=0x95d54a
[    1.279676] Final command line is: ...
[    1.279679] E820 memmap:
[    1.279680] 0000000000000000-000000000009fbff (1)
[    1.279681] 000000000009fc00-000000000009ffff (2)
[    1.279683] 00000000000f0000-00000000000fffff (2)
[    1.279684] 0000000000100000-00000000bffdffff (1)
[    1.279685] 00000000bffe0000-00000000bfffffff (2)
[    1.279686] 00000000feffc000-00000000feffffff (2)
[    1.279687] 00000000fffc0000-00000000ffffffff (2)
[    1.279689] 0000000100000000-000000013fffffff (1)
[    1.279690] 000000fd00000000-000000ffffffffff (2)
[    1.496093] nr_segments = 4
[    1.496098] segment[0]: buf=0x00000000e8e38c55 bufsz=0x4000 mem=0x13fffb000 memsz=0x5000
[    1.496107] segment[1]: buf=0x000000003584eebb bufsz=0x11c0 mem=0x13fff9000 memsz=0x2000
[    1.496110] segment[2]: buf=0x000000003eff47d6 bufsz=0xc4f200 mem=0x13c600000 memsz=0x3810000
[    1.505094] segment[3]: buf=0x0000000085949574 bufsz=0x95d54a mem=0x13bca2000 memsz=0x95e000
[    1.506887] kexec_file_load: type:0, start:0x13fffb270 head:0x106457002 flags:0x0

LLVM 18:

[    1.307628] kernel: 000000000dad29b7 kernel_size: 0xc54200
[    1.307635] PEFILE: Unsigned PE binary
[    1.315591] ------------[ cut here ]------------
[    1.315969] WARNING: CPU: 1 PID: 95 at kernel/kexec_file.c:945 kexec_load_purgatory+0x2c8/0x3c0
...
[    1.335401] ---[ end trace 0000000000000000 ]---
[    1.335752] kexec: Applying relocate section .rela.ltext to 2
[    1.335754] kexec: Symbol: purgatory_sha_regions info: 11 shndx: 06 value=20 size: 100
[    1.335757] kexec: Overflow in relocation type 10 value 0x13fffc760
[    1.336223] kexec-bzImage64: Loading purgatory failed

It appears that the relocation to purgatory_sha_regions is now R_X86_64_32 instead of R_X86_64_32 but it is too far away to fit into that relocation type so there is an overflow? This appears to be coming from purgatory() in arch/x86/purgatory/purgatory.c.

At llvm/llvm-project@e692d08 (the parent of the previously fingered change, llvm/llvm-project@239a41e)

arch/x86/purgatory/purgatory.ro:   file format elf64-x86-64

Disassembly of section .ltext:

0000000000000010 <purgatory>:
      10: f3 0f 1e fa                   endbr64
      14: 55                            pushq   %rbp
      15: 48 89 e5                      movq    %rsp, %rbp
      18: 41 57                         pushq   %r15
      1a: 41 56                         pushq   %r14
      1c: 53                            pushq   %rbx
      1d: 48 83 e4 f0                   andq    $-0x10, %rsp
      21: 48 81 ec 90 00 00 00          subq    $0x90, %rsp
      28: 48 c7 44 24 18 00 00 00 00    movq    $0x0, 0x18(%rsp)
      31: 48 c7 44 24 10 00 00 00 00    movq    $0x0, 0x10(%rsp)
      3a: 48 c7 44 24 08 00 00 00 00    movq    $0x0, 0x8(%rsp)
      43: 48 c7 04 24 00 00 00 00       movq    $0x0, (%rsp)
      4b: 48 c7 84 24 80 00 00 00 00 00 00 00   movq    $0x0, 0x80(%rsp)
      57: 48 c7 44 24 78 00 00 00 00    movq    $0x0, 0x78(%rsp)
      60: 48 c7 44 24 70 00 00 00 00    movq    $0x0, 0x70(%rsp)
      69: 48 c7 44 24 68 00 00 00 00    movq    $0x0, 0x68(%rsp)
      72: 48 c7 44 24 60 00 00 00 00    movq    $0x0, 0x60(%rsp)
      7b: 48 c7 44 24 58 00 00 00 00    movq    $0x0, 0x58(%rsp)
      84: 48 c7 44 24 50 00 00 00 00    movq    $0x0, 0x50(%rsp)
      8d: 48 c7 44 24 48 00 00 00 00    movq    $0x0, 0x48(%rsp)
      96: 48 b8 67 e6 09 6a 85 ae 67 bb movabsq $-0x4498517a95f61999, %rax # imm = 0xBB67AE856A09E667
      a0: 48 89 44 24 20                movq    %rax, 0x20(%rsp)
      a5: 48 b8 72 f3 6e 3c 3a f5 4f a5 movabsq $-0x5ab00ac5c3910c8e, %rax # imm = 0xA54FF53A3C6EF372
      af: 48 89 44 24 28                movq    %rax, 0x28(%rsp)
      b4: 48 b8 7f 52 0e 51 8c 68 05 9b movabsq $-0x64fa9773aef1ad81, %rax # imm = 0x9B05688C510E527F
      be: 48 89 44 24 30                movq    %rax, 0x30(%rsp)
      c3: 48 b8 ab d9 83 1f 19 cd e0 5b movabsq $0x5be0cd191f83d9ab, %rax # imm = 0x5BE0CD191F83D9AB
      cd: 48 89 44 24 38                movq    %rax, 0x38(%rsp)
      d2: 48 c7 44 24 40 00 00 00 00    movq    $0x0, 0x40(%rsp)
      db: 49 bf 00 00 00 00 00 00 00 00 movabsq $0x0, %r15
        00000000000000dd:  R_X86_64_64  purgatory_sha_regions
      e5: 49 8b 37                      movq    (%r15), %rsi
      e8: 41 8b 57 08                   movl    0x8(%r15), %edx
      ec: 49 be 00 00 00 00 00 00 00 00 movabsq $0x0, %r14
        00000000000000ee:  R_X86_64_64  sha256_update
...

At llvm/llvm-project@239a41e:

arch/x86/purgatory/purgatory.ro:    file format elf64-x86-64

Disassembly of section .ltext:

0000000000000010 <purgatory>:
      10: f3 0f 1e fa                   endbr64
      14: 55                            pushq   %rbp
      15: 48 89 e5                      movq    %rsp, %rbp
      18: 41 57                         pushq   %r15
      1a: 41 56                         pushq   %r14
      1c: 53                            pushq   %rbx
      1d: 48 83 e4 f0                   andq    $-0x10, %rsp
      21: 48 81 ec 90 00 00 00          subq    $0x90, %rsp
      28: 48 c7 44 24 18 00 00 00 00    movq    $0x0, 0x18(%rsp)
      31: 48 c7 44 24 10 00 00 00 00    movq    $0x0, 0x10(%rsp)
      3a: 48 c7 44 24 08 00 00 00 00    movq    $0x0, 0x8(%rsp)
      43: 48 c7 04 24 00 00 00 00       movq    $0x0, (%rsp)
      4b: 48 c7 84 24 80 00 00 00 00 00 00 00   movq    $0x0, 0x80(%rsp)
      57: 48 c7 44 24 78 00 00 00 00    movq    $0x0, 0x78(%rsp)
      60: 48 c7 44 24 70 00 00 00 00    movq    $0x0, 0x70(%rsp)
      69: 48 c7 44 24 68 00 00 00 00    movq    $0x0, 0x68(%rsp)
      72: 48 c7 44 24 60 00 00 00 00    movq    $0x0, 0x60(%rsp)
      7b: 48 c7 44 24 58 00 00 00 00    movq    $0x0, 0x58(%rsp)
      84: 48 c7 44 24 50 00 00 00 00    movq    $0x0, 0x50(%rsp)
      8d: 48 c7 44 24 48 00 00 00 00    movq    $0x0, 0x48(%rsp)
      96: 48 b8 67 e6 09 6a 85 ae 67 bb movabsq $-0x4498517a95f61999, %rax # imm = 0xBB67AE856A09E667
      a0: 48 89 44 24 20                movq    %rax, 0x20(%rsp)
      a5: 48 b8 72 f3 6e 3c 3a f5 4f a5 movabsq $-0x5ab00ac5c3910c8e, %rax # imm = 0xA54FF53A3C6EF372
      af: 48 89 44 24 28                movq    %rax, 0x28(%rsp)
      b4: 48 b8 7f 52 0e 51 8c 68 05 9b movabsq $-0x64fa9773aef1ad81, %rax # imm = 0x9B05688C510E527F
      be: 48 89 44 24 30                movq    %rax, 0x30(%rsp)
      c3: 48 b8 ab d9 83 1f 19 cd e0 5b movabsq $0x5be0cd191f83d9ab, %rax # imm = 0x5BE0CD191F83D9AB
      cd: 48 89 44 24 38                movq    %rax, 0x38(%rsp)
      d2: 48 c7 44 24 40 00 00 00 00    movq    $0x0, 0x40(%rsp)
      db: 41 bf 00 00 00 00             movl    $0x0, %r15d
        00000000000000dd:  R_X86_64_32  purgatory_sha_regions
      e1: 49 8b 37                      movq    (%r15), %rsi
      e4: 41 8b 57 08                   movl    0x8(%r15), %edx
      e8: 49 be 00 00 00 00 00 00 00 00 movabsq $0x0, %r14
        00000000000000ea:  R_X86_64_64  sha256_update

I am not really sure what is going wrong here though. I attempted moving .kexec-purgatory as close as I could to .text but that did not resolve it. @MaskRay @aeubanks is the kernel doing something wrong here or is there a problem with that LLVM change (llvm/llvm-project@239a41e)?

I suspect ChromeOS will hit this eventually, so we should definitely look into this.

@aeubanks
Copy link

About .ltext and llvm/llvm-project@d8a0439, -mcmodel=large should cause all text sections to be .ltext, unless there are explicit sections on functions. Can you more concisely explain the problem with this?

@aeubanks
Copy link

The relocation overflow seems like a real issue. Do you have a reduced repro/godbolt link that shows the 32-bit relocation with the large code model?

@nathanchance
Copy link
Member

About .ltext and llvm/llvm-project@d8a0439, -mcmodel=large should cause all text sections to be .ltext, unless there are explicit sections on functions. Can you more concisely explain the problem with this?

I'm not sure there really is a problem with this, it is just something that the kernel has to account for now because it expects the purgatory object to only have one .text section, see https://git.kernel.org/linus/8652d44f466ad5772e7d1756e9457046189b0dfc for the introduction of the warning that we are hitting here, which kind of covers it.

The bigger issue is the relocation overflow error that we experience after llvm/llvm-project@239a41e, which I expand on above starting with "The second issue is", which may not make too much sense. I am little out of my wheelhouse here but I can try to come at it from another angle tomorrow if it does turn out to be nonsense.

@nathanchance
Copy link
Member

The relocation overflow seems like a real issue. Do you have a reduced repro/godbolt link that shows the 32-bit relocation with the large code model?

No, I have not been able to reduce anything yet. I'll see if I can come up with something tomorrow.

@nickdesaulniers
Copy link
Member

It appears that the relocation to purgatory_sha_regions is now R_X86_64_32 instead of R_X86_64_32

I think you meant "now R_X86_64_32 instead of R_X86_64_64. arch_kexec_apply_relocations_add in arch/x86/kernel/machine_kexec_64.c looks like it should be able to relocate to either.

The definition of purgatory_sha_regions in arch/x86/purgatory/purgatory.c looks like:

22:struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(".kexec-purgatory");

perhaps that section attribute is forcing a 32b relocation somehow (dunno how to get it back to a 64b relocation).

@nathanchance your linker script doesn't seem to mention .ltext. Intentional?

@nathanchance
Copy link
Member

It appears that the relocation to purgatory_sha_regions is now R_X86_64_32 instead of R_X86_64_32

I think you meant "now R_X86_64_32 instead of R_X86_64_64.

Heh, yes.

arch_kexec_apply_relocations_add in arch/x86/kernel/machine_kexec_64.c looks like it should be able to relocate to either.

Right, it is actually that function that is failing when relocating R_X86_64_32.

The definition of purgatory_sha_regions in arch/x86/purgatory/purgatory.c looks like:

22:struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(".kexec-purgatory");

perhaps that section attribute is forcing a 32b relocation somehow (dunno how to get it back to a 64b relocation).

Yeah not entirely sure yet... I want to see if replacing the LLVM 18 built arch/x86/purgatory/purgatory.o with an LLVM 17 built one resolves the relocation error. If so, that file is pretty small, so I suspect coming up with a trivial reproducer should not be too hard.

@nathanchance your linker script doesn't seem to mention .ltext. Intentional?

Whoops, forgot this diff, which was a later commit in my series:

diff --git a/arch/x86/purgatory/purgatory.lds.S b/arch/x86/purgatory/purgatory.lds.S
index 0898a221722f..4fb155942642 100644
--- a/arch/x86/purgatory/purgatory.lds.S
+++ b/arch/x86/purgatory/purgatory.lds.S
@@ -26,12 +26,14 @@ SECTIONS
        .text : {
                _text = .;
                *(.text .text.*)
+               *(.ltext .ltext.*)
                _etext = .;
        }

        .rodata : {
                _rodata = .;
                *(.rodata .rodata.*)
+               *(.lrodata .lrodata.*)
                _erodata = .;
        }

which resolves these orphan section warnings

ld.lld: warning: arch/x86/purgatory/purgatory.o:(.ltext) is being placed in '.ltext'
ld.lld: warning: arch/x86/purgatory/sha256.o:(.ltext) is being placed in '.ltext'
ld.lld: warning: arch/x86/purgatory/sha256.o:(.lrodata) is being placed in '.lrodata'
ld.lld: warning: arch/x86/purgatory/string.o:(.ltext) is being placed in '.ltext'

@aeubanks
Copy link

ah yes, an explicit section on the global and no-PIC large code model does this: https://godbolt.org/z/9x1vhxYYb

there are a couple of things in play here:

with the medium code model, we have a small/large split for globals. small globals can be considered "near", meaning in the bottom 2GB of the address space. we decided to treat any globals with explicit sections as small because if you have multiple globals in an explicit section and some of them are small, some of them are large, you end up with a mixed small/large data in a section which is bad because there's no proper location to place the section in the binary to avoid relocation overflows, since 32-bit relocations are used for small data, but we want to put large data in a place that doesn't contribute to relocation pressure.

for consistency, we did this for the large code model too with the assumption that the user won't have small data sections that sum to over 2GB (causing relocation overflows at that point). so no-PIC large code model functions theoretically could use 32-bit relocations to address small data.

however, IIRC the kernel doesn't load these binaries at the low end of the address space. we can make functions in the large code model always use 64-bit relocations regardless of the global, that shouldn't be an issue. I'll work on a patch to do that

@nathanchance
Copy link
Member

we can make functions in the large code model always use 64-bit relocations regardless of the global, that shouldn't be an issue. I'll work on a patch to do that

I'd be happy to test it to make sure this issue gets resolved properly once it is available, just ping me.

aeubanks added a commit to aeubanks/llvm-project that referenced this issue Apr 17, 2024
This matches other types of relocations, e.g. to constant pool.

Some users of the large code model may not place small data in the lower 2GB of the address space (e.g. ClangBuiltLinux/linux#2016), so just unconditionally use 64-bit relocations in the large code model.
@aeubanks
Copy link

I'd be happy to test it to make sure this issue gets resolved properly once it is available, just ping me.

can you test llvm/llvm-project#89101?

@nathanchance
Copy link
Member

can you test llvm/llvm-project#89101?

Thanks, I can confirm that the relocation overflow error is resolved with this change and systemctl kexec does boot into the new kernel without any apparent issues. Please ensure this gets applied to release/18.x if possible.

@nathanchance nathanchance added [BUG] linux A bug that should be fixed in the mainline kernel. [BUG] llvm A bug that should be fixed in upstream LLVM [PATCH] Submitted A patch has been submitted for review labels Apr 17, 2024
aeubanks added a commit to llvm/llvm-project that referenced this issue Apr 17, 2024
This matches other types of relocations, e.g. to constant pool. And
makes things more consistent with PIC large code model.

Some users of the large code model may not place small data in the lower
2GB of the address space (e.g.
ClangBuiltLinux/linux#2016), so just
unconditionally use 64-bit relocations in the large code model.

So now functions in a section not marked large will use 64-bit
relocations to reference everything when using the large code model.

This also fixes some lldb tests broken by #88172
(https://lab.llvm.org/buildbot/#/builders/68/builds/72458).
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Apr 17, 2024
…89101)

This matches other types of relocations, e.g. to constant pool. And
makes things more consistent with PIC large code model.

Some users of the large code model may not place small data in the lower
2GB of the address space (e.g.
ClangBuiltLinux/linux#2016), so just
unconditionally use 64-bit relocations in the large code model.

So now functions in a section not marked large will use 64-bit
relocations to reference everything when using the large code model.

This also fixes some lldb tests broken by llvm#88172
(https://lab.llvm.org/buildbot/#/builders/68/builds/72458).

(cherry picked from commit 6cea7c4)
@nathanchance
Copy link
Member

@0n-s would you like a Reported-by: tag for this? If so, please provide a Name <email> that I can use. If not, no worries, I've included a link to this issue regardless.

@0n-s
Copy link
Author

0n-s commented Apr 17, 2024

@0n-s would you like a Reported-by: tag for this? If so, please provide a Name <email> that I can use. If not, no worries, I've included a link to this issue regardless.

Sure.

Reported-by: ns <0n-s@users.noreply.github.com>

@nathanchance
Copy link
Member

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Apr 17, 2024
Commit 8652d44 ("kexec: support purgatories with .text.hot
sections") added a warning when the purgatory has more than one .text
section, which is unsupported. A couple of changes have been made to the
x86 purgatory's Makefile to prevent the compiler from splitting the
.text section as a result:

  97b6b9c ("x86/purgatory: remove PGO flags")
  75b2f7e ("x86/purgatory: Remove LTO flags")

Unfortunately, there may be compiler optimizations that add other text
sections that cannot be disabled. For example, starting with LLVM 18,
large text is emitted in '.ltext', which happens for the purgatory due
to commit e16c298 ("x86/purgatory: Change compiler flags from
-mcmodel=kernel to -mcmodel=large to fix kexec relocation errors"), but
there are out of line assembly files that use '.text'.

  $ llvm-readelf -S arch/x86/purgatory/purgatory.ro | rg ' .[a-z]?text'
    [ 1] .text             PROGBITS        0000000000000000 000040 0000d0 00  AX  0   0 16
    [ 2] .ltext            PROGBITS        0000000000000000 000110 0015a6 00 AXl  0   0 16

To avoid the runtime warning when the purgatory has been built with LLVM
18, add a linker script that explicitly describes the sections of the
purgatory.ro and use it to merge '.ltext' and '.lrodata' back into
'.text' and '.rodata' to match the behavior of GCC and LLVM prior to the
optimization, as the distinction between small and large text is not
important in this case. This results in no warnings with
'--orphan-handling=warn' with either GNU or LLVM toolchains and the
resulting kernels can properly kexec other kernels.

This linker script is based on arch/s390/purgatory/purgatory.lds.S and
Ricardo Ribalda's prior attempt to add one for arch/x86 [1].

As a consequence of this change, the aforementioned flag changes can be
reverted because the '.text.*' sections generated by those options will
be combined properly by the linker script, which avoids the only reason
they were added in the first place. kexec continues to work with LTO
enabled.

[1]: https://lore.kernel.org/20230321-kexec_clang16-v5-2-5563bf7c4173@chromium.org/

Reported-by: ns <0n-s@users.noreply.github.com>
Closes: ClangBuiltLinux#2016
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
@0n-s
Copy link
Author

0n-s commented Apr 18, 2024

Thanks, submitted: https://lore.kernel.org/20240417-x86-fix-kexec-with-llvm-18-v1-1-5383121e8fb7@kernel.org/

Thanks! I humbly request that this be backported to stable if & when it's accepted.

@nathanchance
Copy link
Member

The solution has shifted to moving away from -mcmodel=large for the x86 purgatory altogether:

https://lore.kernel.org/20240418201705.3673200-2-ardb+git@google.com/

@0n-s
Copy link
Author

0n-s commented Apr 19, 2024

Compiled 6.8.7 with the above patch applied & one flashbang later (I wish GPU drivers didn't set brightness to max by default), I have managed to successfully kexec -s on my laptop with no regressions. 👍

tmatheson-arm pushed a commit to tmatheson-arm/llvm-project that referenced this issue Apr 22, 2024
…89101)

This matches other types of relocations, e.g. to constant pool. And
makes things more consistent with PIC large code model.

Some users of the large code model may not place small data in the lower
2GB of the address space (e.g.
ClangBuiltLinux/linux#2016), so just
unconditionally use 64-bit relocations in the large code model.

So now functions in a section not marked large will use 64-bit
relocations to reference everything when using the large code model.

This also fixes some lldb tests broken by llvm#88172
(https://lab.llvm.org/buildbot/#/builders/68/builds/72458).
@nathanchance
Copy link
Member

The patch has been accepted into -tip, although the plan is to merge it during the 6.10 cycle and I guess manually backport it to stable because the cc was stripped during application.

https://git.kernel.org/tip/cba786af84a0f9716204e09f518ce3b7ada8555e

@nathanchance nathanchance added [PATCH] Accepted A submitted patch has been accepted upstream and removed [BUG] llvm A bug that should be fixed in upstream LLVM [PATCH] Submitted A patch has been submitted for review labels Apr 22, 2024
tstellar pushed a commit to llvmbot/llvm-project that referenced this issue Apr 23, 2024
…89101)

This matches other types of relocations, e.g. to constant pool. And
makes things more consistent with PIC large code model.

Some users of the large code model may not place small data in the lower
2GB of the address space (e.g.
ClangBuiltLinux/linux#2016), so just
unconditionally use 64-bit relocations in the large code model.

So now functions in a section not marked large will use 64-bit
relocations to reference everything when using the large code model.

This also fixes some lldb tests broken by llvm#88172
(https://lab.llvm.org/buildbot/#/builders/68/builds/72458).

(cherry picked from commit 6cea7c4)
srcres258 pushed a commit to srcres258/linux-doc that referenced this issue Apr 24, 2024
On x86, the ordinary, position dependent small and kernel code models
only support placement of the executable in 32-bit addressable memory,
due to the use of 32-bit signed immediates to generate references to
global variables. For the kernel, this implies that all global variables
must reside in the top 2 GiB of the kernel virtual address space, where
the implicit address bits 63:32 are equal to sign bit 31.

This means the kernel code model is not suitable for other bare metal
executables such as the kexec purgatory, which can be placed arbitrarily
in the physical address space, where its address may no longer be
representable as a sign extended 32-bit quantity. For this reason,
commit

  e16c298 ("x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")

switched to the large code model, which uses 64-bit immediates for all
symbol references, including function calls, in order to avoid relying
on any assumptions regarding proximity of symbols in the final
executable.

The large code model is rarely used, clunky and the least likely to
operate in a similar fashion when comparing GCC and Clang, so it is best
avoided. This is especially true now that Clang 18 has started to emit
executable code in two separate sections (.text and .ltext), which
triggers an issue in the kexec loading code at runtime.

The SUSE bugzilla fixes tag points to gcc 13 having issues with the
large model too and that perhaps the large model should simply not be
used at all.

Instead, use the position independent small code model, which makes no
assumptions about placement but only about proximity, where all
referenced symbols must be within -/+ 2 GiB, i.e., in range for a
RIP-relative reference. Use hidden visibility to suppress the use of a
GOT, which carries absolute addresses that are not covered by static ELF
relocations, and is therefore incompatible with the kexec loader's
relocation logic.

  [ bp: Massage commit message. ]

Fixes: e16c298 ("x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")
Fixes: https://bugzilla.suse.com/show_bug.cgi?id=1211853
Closes: ClangBuiltLinux/linux#2016
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/all/20240417-x86-fix-kexec-with-llvm-18-v1-0-5383121e8fb7@kernel.org/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] x86_64 This bug impacts ARCH=x86_64 [BUG] linux A bug that should be fixed in the mainline kernel. [PATCH] Accepted A submitted patch has been accepted upstream
Projects
None yet
Development

No branches or pull requests

5 participants