Skip to content
This repository has been archived by the owner on Apr 13, 2024. It is now read-only.

Enabling LLD for x86_64 #59

Merged
merged 2 commits into from
Apr 22, 2019
Merged

Enabling LLD for x86_64 #59

merged 2 commits into from
Apr 22, 2019

Conversation

@nickdesaulniers
Copy link
Member

  LD      arch/x86/realmode/rm/realmode.elf
ld.lld-8: error: arch/x86/realmode/rm/realmode.lds:203: unknown output format name: "elf32-i386"
>>> OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
>>>               ^

@tpimh
Copy link
Contributor Author

tpimh commented Nov 15, 2018

I noticed it and I think it was already fixed: llvm-mirror/lld@eaa12bb
The LLVM repo that we are using was last updated on 29 Oct. But I haven't seen any "elf32-i386" tests with lld and @rui314 said about 6 month ago that even if OUTPUT_FORMAT support was added to LLD, elf32-i386 would not be recognized as a valid format. I will build LLD from source and check if it's still the case. And if OUTPUT_FORMAT behavior matches ld.bfd, I can drop one of the patches and update the pull request after LLVM repo is updated.

@nickdesaulniers
Copy link
Member

How is llvm-mirror/lld@eaa12bb related to https://bugs.llvm.org/show_bug.cgi?id=37432? Seems like that llvm bug would have to have been fixed before that commit you linked to would?

The LLVM repo that we are using was last updated on 29 Oct.

Red flag! I thought these were daily builds?

@tpimh
Copy link
Contributor Author

tpimh commented Nov 15, 2018

How is llvm-mirror/lld@eaa12bb related to https://bugs.llvm.org/show_bug.cgi?id=37432?

It is not.
I think it's failing now because of the quotes (but I'll check if my guess is correct). The bug was actually fixed with llvm-mirror/lld@98c7ed5 and llvm-mirror/lld@eaa12bb improved handling of quoted arguments. The first one was pushed before 29 Oct and the second one was after.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Nov 15, 2018

sorry, it's late so this might not be as clear to me as it should be:

llvm-mirror/lld@98c7ed5 looks like a fix for https://bugs.llvm.org/show_bug.cgi?id=37432, which is still open. So it's probably time to close:

Yes?

The first one was pushed before 29 Oct and the second one was after.

oh, good find.

.travis.yml Outdated
@@ -10,6 +10,8 @@ matrix:
env: ARCH=ppc64le
- name: "ARCH=x86_64"
env: ARCH=x86_64
- name: "ARCH=x86_64 LD=ld.lld"
env: ARCH=x86_64 LD=ld.lld-8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once these are green on presubmit (and you can point to a build number that shows they're green), let's demote ARCH=x86_64 and ARCH=x86_64 LD=ld.lld REPO=linux-next to cron (keeping LLD in presubmit).

@tpimh
Copy link
Contributor Author

tpimh commented Nov 15, 2018

I will first test if it really works as intended, then close the bugs. Looks like the fix is there, but I want to make sure.

@nathanchance
Copy link
Member

The LLVM repo that we are using was last updated on 29 Oct.

Red flag! I thought these were daily builds?

The Stretch binaries haven't been updated since Oct 29 because they have been erroring but we switched to Sid in b7ee15e, which has definitely been built nightly.

I noted the failure here, I guess it should be reported to the LLVM bug tracker under packaging.

@nathanchance
Copy link
Member

You'll need to rebase your branch on top of the latest master to get the correct toolchain, Travis won't pass currently.

@nathanchance
Copy link
Member

Doesn't look like ClangBuiltLinux/linux#218 was addressed with this patch set: https://travis-ci.com/nathanchance/continuous-integration/jobs/158894300

@tpimh
Copy link
Contributor Author

tpimh commented Nov 15, 2018

It was not. I actually never hit this bug on local machine, because I was always getting "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel" error regardless to actual CONFIG_UNWINDER_ORC setting. I'll take a look at what I can do about it. Looks like the alignment in question is ALIGN(6) which is indeed not power of two and is probably coming from ORC_UNWIND_TABLE. I will check if this value is different for ld.bfd or maybe ld.bfd doesn't require it to be a power of two.

@tpimh
Copy link
Contributor Author

tpimh commented Nov 15, 2018

@nathanchance
Copy link
Member

Although we didn't actually make it to a shell:

[    3.928185] BUG: unable to handle kernel paging request at ffffffff822f9018
[    3.928481] PGD b220067 P4D b220067 PUD b221063 PMD 0 
[    3.928481] Oops: 0000 [#1] SMP NOPTI
[    3.928481] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc2+ #1
[    3.928481] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[    3.928481] RIP: 0010:acpi_run_osc+0x27/0x1c0
[    3.928481] Code: 1f 40 00 55 48 89 e5 41 57 41 56 41 54 53 48 83 e4 f0 48 81 ec a0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 98 00 00 00 <48> 8b 04 25 18 90 2f 82 48 89 44 24 08 48 8b 04 25 10 90 0f 91 48
[    3.928481] RSP: 0018:ffffa576400d3c40 EFLAGS: 00000282
[    3.928481] RAX: b07c7d786573c900 RBX: ffffffff91cfffac RCX: 0000000000000000
[    3.928481] RDX: ffffffff902d0e99 RSI: ffffa576400d3d18 RDI: ffff9c919ec9a050
[    3.928481] RBP: ffffa576400d3d00 R08: 0000000000000000 R09: 0000000000000000
[    3.928481] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    3.928481] R13: ffffffff916254d0 R14: ffffffff91d0005c R15: ffffffff91c27b2e
[    3.928481] FS:  0000000000000000(0000) GS:ffff9c919f200000(0000) knlGS:0000000000000000
[    3.928481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.928481] CR2: ffffffff822f9018 CR3: 000000000b21e000 CR4: 00000000000006f0
[    3.928481] Call Trace:
[    3.928481]  ? acpi_os_signal_semaphore+0x29/0x30
[    3.928481]  ? acpi_ut_release_mutex+0x3b/0x6d
[    3.928481]  ? acpi_ns_get_node+0x3d/0x46
[    3.928481]  ? acpi_get_handle+0x81/0xb2
[    3.928481]  ? acpi_subsystem_init+0x42/0x42
[    3.928481]  acpi_bus_init+0xd1/0x1e9
[    3.928481]  acpi_init+0x56/0xaf
[    3.928481]  do_one_initcall+0x197/0x380
[    3.928481]  ? lock_acquire+0x1ce/0x210
[    3.928481]  ? _raw_spin_unlock_irqrestore+0x45/0x60
[    3.928481]  ? repair_env_string+0x13/0x5b
[    3.928481]  ? kernel_init+0x6/0x2d0
[    3.928481]  ? kernel_init+0x6/0x2d0
[    3.928481]  do_initcall_level+0xa7/0xb6
[    3.928481]  do_basic_setup+0x25/0x2e
[    3.928481]  kernel_init_freeable+0x122/0x1cb
[    3.928481]  ? rest_init+0x1f0/0x1f0
[    3.928481]  kernel_init+0x6/0x2d0
[    3.928481]  ? rest_init+0x1f0/0x1f0
[    3.928481]  ret_from_fork+0x3a/0x50
[    3.928481] Modules linked in:
[    3.928481] CR2: ffffffff822f9018
[    3.928481] ---[ end trace 1a62da58d4ac7795 ]---
[    3.928481] RIP: 0010:acpi_run_osc+0x27/0x1c0
[    3.928481] Code: 1f 40 00 55 48 89 e5 41 57 41 56 41 54 53 48 83 e4 f0 48 81 ec a0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 98 00 00 00 <48> 8b 04 25 18 90 2f 82 48 89 44 24 08 48 8b 04 25 10 90 0f 91 48
[    3.928481] RSP: 0018:ffffa576400d3c40 EFLAGS: 00000282
[    3.928481] RAX: b07c7d786573c900 RBX: ffffffff91cfffac RCX: 0000000000000000
[    3.928481] RDX: ffffffff902d0e99 RSI: ffffa576400d3d18 RDI: ffff9c919ec9a050
[    3.928481] RBP: ffffa576400d3d00 R08: 0000000000000000 R09: 0000000000000000
[    3.928481] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    3.928481] R13: ffffffff916254d0 R14: ffffffff91d0005c R15: ffffffff91c27b2e
[    3.928481] FS:  0000000000000000(0000) GS:ffff9c919f200000(0000) knlGS:0000000000000000
[    3.928481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.928481] CR2: ffffffff822f9018 CR3: 000000000b21e000 CR4: 00000000000006f0
[    3.928481] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[    3.928481] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0
[    3.928481] INFO: lockdep is turned off.
[    3.928481] irq event stamp: 41906
[    3.928481] hardirqs last  enabled at (41905): [<ffffffff90ac2285>] _raw_spin_unlock_irqrestore+0x45/0x60
[    3.928481] hardirqs last disabled at (41906): [<ffffffff8fe01b7a>] trace_hardirqs_off_thunk+0x1a/0x1c
[    3.928481] softirqs last  enabled at (41898): [<ffffffff8fe74fc7>] irq_exit+0x107/0x120
[    3.928481] softirqs last disabled at (41891): [<ffffffff8fe74fc7>] irq_exit+0x107/0x120
[    3.928481] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D           4.20.0-rc2+ #1
[    3.928481] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[    3.928481] Call Trace:
[    3.928481]  dump_stack+0xa7/0x10b
[    3.928481]  ___might_sleep+0x222/0x240
[    3.928481]  exit_signals+0x2e/0x2f0
[    3.928481]  do_exit+0xa9/0x910
[    3.928481]  ? rest_init+0x1f0/0x1f0
[    3.928481]  rewind_stack_do_exit+0x17/0x20
[    3.928994] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    3.929481] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---

@nickdesaulniers
Copy link
Member

Although we didn't actually make it to a shell:

What happens if we turn off the self tests (additional kernel configs above defconfig)? Maybe one of those it causing the panic?

@nathanchance
Copy link
Member

We could disable them and see but that would certainly point to a deeper issue.

@nickdesaulniers
Copy link
Member

Sorry, I meant "if disabling all of them allows it to boot cleanly, then this is not a boot issue, but an issue from one (or more) of the tests, which we can then bisect configs to see which test(s) is(/are) problematic."

@nathanchance
Copy link
Member

Yeah sorry that's what I was thinking as well. I just meant that if a test is failing, it could mean a deeper issue in the kernel or lld but we absolutely want to see what test is failing specifically.

@nathanchance
Copy link
Member

Nope, with just the defconfig and image built (commit), the boot still fails: https://travis-ci.com/nathanchance/continuous-integration/builds/91771985

@tpimh
Copy link
Contributor Author

tpimh commented Nov 18, 2018

Too bad we can't get binaries from travis. I will try to replicate the setup locally and reproduce the issue. Maybe we can also make the kernel binary reproducible and print the checksum after it's compiled to check against local build.

@nathanchance
Copy link
Member

@tpimh use my Docker image (nathanchance/cbl), you should be able to clone the CI repo then run ARCH=x86_64 LD=ld.lld-8 ./driver.sh and reproduce it.

@nickdesaulniers
Copy link
Member

patches 2 and 3 (from this PR) are very clearly functional changes (and I'm curious now if they are the problem). I wonder if those plus linkage with bfd cause the same issues? acpi_run_osc might have more information about alignment requirements (or a bug), too.

@nathanchance
Copy link
Member

I wonder if those plus linkage with bfd cause the same issues?

Travis shows no, regular x86_64 passes fine with those patches.

@tpimh tpimh changed the title Enabling LLD for x86_64 Enabling LLD for x86_64 (WIP) Nov 23, 2018
@nathanchance nathanchance added the WIP Work in progress label Dec 4, 2018
@nathanchance nathanchance changed the title Enabling LLD for x86_64 (WIP) Enabling LLD for x86_64 Dec 4, 2018
@nathanchance
Copy link
Member

@tpimh the LLD fix for the boot failure should be in the next apt.llvm.org toolchain build (hopefully in the next day or two), which was the last blocker to turning this on. Time to rebase?

@nickdesaulniers
Copy link
Member

oh, we're that close?

@samitolvanen and @Ajs1984 were just asking me about it for Android Cuttlefish. Sorry I haven't been keeping up to date as much lately (it's not my intention).

@nathanchance
Copy link
Member

oh, we're that close?

Yes. Locally, I can use this script with a version of LLD that contains https://reviews.llvm.org/rL357885 and it links and boots successfully on mainline.

PATH=/mnt/build/llvm/bin:${PATH} ./driver.sh ARCH=x86_64 LD=ld.lld

We will need some backports to support the older LTS branches including 4.19 because most of the fixes went into 4.20/5.0, I can look into that tonight.

@nathanchance
Copy link
Member

As it turns out, most of the LLD patches in the kernel have already been backported, thanks to Sasha Levin's AUTOSEL work. The fixes for ClangBuiltLinux/linux#29, ClangBuiltLinux/linux#30, and ClangBuiltLinux/linux#218 have all been merged as of the latest LTS releases.

The only missing commit is the fix for ClangBuiltLinux/linux#31, which is commit ac3e233d29f7 ("x86/vdso: Drop implicit common-page-size linker flag") upstream. For 4.9/4.14, commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") is needed for a clean pick, which should probably be merged anyways since kernel/common is carrying an older version in commit 35b779802c2e ("x86: vdso: Fix leaky vdso linker with CC=clang.").

I can send an email to Greg and Sasha with this info (unless someone else wants to do it), all of those commits are clean picks for me.

I've tested locally with this CI patch on top of #147 and the following for loop:

for VER in 4.9 4.14 4.19; do
    ./driver.sh ARCH=x86_64 LD=ld.lld REPO=common-${VER} || break
done

@tpimh
Copy link
Contributor Author

tpimh commented Apr 10, 2019

I'm pretty sure this will require some extra work, but as I can see most patches went upstream, I will try to build and test ASAP.

@nickdesaulniers
Copy link
Member

backports will make it easier for other Android OEMs to make use of LLD. 🚀 🌔

@nathanchance
Copy link
Member

I have prepared mbox files for 4.4 to 4.19 for x86_64. I am working on arm64 right now (only 4.9 and 4.14 need backports) but I will need to go to work shortly so I won't be able to finish until tonight.

@nathanchance
Copy link
Member

It looks like the only two commits that are needed for arm64 on 4.9 are 9b990e62aee5 ("arm64: ensure the kernel is compiled for LP64") and c931d34ea085 ("arm64: build with baremetal linker target instead of Linux when available"). The former is the most critical; without it, this error occurs:

ld.lld: error: target emulation unknown: -m or at least one .o file required

The latter is a follow up fix, see 38fc42486775 ("arm64: Use aarch64elf and aarch64elfb emulation mode variants") and its revert 96f95a17c1cf ("Revert "arm64: Use aarch64elf and aarch64elfb emulation mode variants"") for more information. However, it introduces a recursive variable error on 4.9 at least, which I don't understand because it is used just like that in arch/x86/boot/compressed/Makefile:

make[1]: Entering directory '/mnt/build/kernel'
arch/arm64/Makefile:63: *** Recursive variable 'LDFLAGS' references itself (eventually).  Stop.
make[1]: *** [/home/nathan/cbl/linux-stable/Makefile:534: __build_one_by_one] Error 2
make[1]: Leaving directory '/mnt/build/kernel'
make: *** [Makefile:152: sub-make] Error 2

@nickdesaulniers
Copy link
Member

let's keep arm64 as a separate work item. KASLR in particular is broken on arm64, which is not on in the defconfig but is on Pixel devices.

@nathanchance
Copy link
Member

Great, will make my life easier :) I'll send out the patches in a few hours with you, @Ajs1984, and @samitolvanen on CC.

@tpimh
Copy link
Contributor Author

tpimh commented Apr 11, 2019

Here are the results of test build (no patches applied yet):

@nathanchance
Copy link
Member

Patches sent and accepted by Sasha: https://lore.kernel.org/stable/20190411143924.GK11568@sasha-vm/

We still need https://reviews.llvm.org/rL357885 on the LLVM side to fix booting with KASLR. I don't know when the next Buster refresh will be as it doesn't appear to be on a timer in Jenkins just yet: https://llvm-jenkins.debian.net/job/llvm-toolchain-buster-8-binaries/

@nathanchance
Copy link
Member

@sylvestre could we get the Buster clang-9 build refreshed? We'd like to turn this on but we need an LLD patch that isn't in the current version of clang-9.

@sylvestre
Copy link

Sure, builds restarted!

@nathanchance
Copy link
Member

4.4.171 will have all of the patches we need for this but that might not get released for a bit. I think we can ship everything but 4.4 now then deal with 4.4 when it's ready to avoid carrying any patches.

@tpimh
Copy link
Contributor Author

tpimh commented Apr 20, 2019

Why not just add necessary patches now and remove later when they start failing?

@nathanchance
Copy link
Member

I guess either way it requires a pull request to enable it in Travis when ready or remove the patches when it starts failing. It's up to you then.

@tpimh
Copy link
Contributor Author

tpimh commented Apr 20, 2019

Good enough?

@tpimh tpimh requested a review from nathanchance April 22, 2019 19:49
@tpimh
Copy link
Contributor Author

tpimh commented Apr 22, 2019

Travis shows green, but there was a segfault with x86_64 LLD:

Stopping network: Segmentation fault
FAIL

@nathanchance
Copy link
Member

I don't see that? https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/194715072

It happens on arm64 but it's always been that way (and it happens with GCC so it isn't a clang issue: https://gist.github.com/nathanchance/85343c5e4c23c360e389b332c052890e).

@tpimh
Copy link
Contributor Author

tpimh commented Apr 22, 2019

Sorry, I was checking arm64 travis log, everything is fine then.

Copy link
Member

@nathanchance nathanchance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Apr 22, 2019

String dump of section '.comment':
[     0] Linker: LLD 9.0.0

🆒

Copy link
Member

@nickdesaulniers nickdesaulniers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I want to turn of bfd builds soon. Maybe not now, but soon.

@tpimh tpimh merged commit 5df0988 into ClangBuiltLinux:master Apr 22, 2019
@tpimh tpimh deleted the x86_64-lld branch April 22, 2019 22:20
@tpimh tpimh removed the WIP Work in progress label Oct 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants