Skip to content

[BOLT][AArch64] Fix PREL Relocs on RHEL8 #144505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

paschalis-mpeis
Copy link
Member

No description provided.

@paschalis-mpeis
Copy link
Member Author

Hey folks,

We run into issues with this test on RHEL8. See details below.

Where do these recent lit failures come from?

You may have recently seen a few patches adjusting tests; a few more may follow. So here are their origins:

Arm recently released Arm Toolchain for Linux (ATfL), which bundles compilers and tools for several platforms. Many colleagues have contributed to this effort, led by @pawosm-arm.

BOLT will be packaged into ATfL, and the goal is to run nightly builds and tests on those platforms.
This expands testing for BOLT, which is great. It also surfaces test failures that we need to address.


Details of the failure

It appears to be some typo at this line:

add x0, x0, :lo12:datable

Essentially has the same effect as:

add x0, x0, :lo12:non_existing_symbol

What happens on non-RHEL8:

On distributions other than RHEL8, lld proceeds as normal, ignores the missing symbol and emits:

00000000000002a0 <_start>:
    adrp x0, 0x0 <datable>
    add x0, x0, #0x0
    mov x0, #0x0 // =0
    ret

Then, llvm-bolt succeeds, the PC-Relative relocations to 16/32/64 bit words are found, and the test passes.

What happens on RHEL8:

On RHEL-8, the linker aborts with error:

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'datable'; recompile with -fPIC

Fixing the typo lets the linker succeeds on any platform. However, because datatable is within ≤ 1MiB, lld optimizes to:

00000000000002a0 <_start>:
    nop
    adr x0, 0x400 <datatable>
    mov x0, #0x0 // =0
    ret

For more, see PC-relative addressing (last bullet point) in the 5.7.9 Relocation optimization section of our ABI.

This triggers ADRRelaxation to essentially undo what lld did. We can't skip this pass, as the ±1MiB range is too small for the bolted binary, causing JITLink errors. When the pass runs, it fails with:

BOLT-ERROR: cannot relax ADR in non-simple function _start(*2)

This happens because buildCFG wants to postProcessIndirectBranches, which involves validateExternallyReferencedOffsets. That validation tries to map external references to known jump-tables. Since our data table is not mapped against some jump table, those references remain unclaimed, making the function non-simple and causing the error.


Next Steps:

First, I'ld like to confirm whether this was indeed a typo. Given it is a typo, we can correct it and skip the lld optimization using -Wl,--no-relax. Would that be acceptable?

Also, should we extend ADRRelaxation to handle the above case better?

(@maksfb, @yavtuk, @smithp35)

@smithp35
Copy link
Collaborator

Out of interest, what flags are being passed to lld?

I am surprised that lld ignores the missing symbol unless the driver is passing lld something like -z undefs or --shared. For example on my clang command line:

clang --target=aarch64-linux-gnu bolt.s -fpie  -fuse-ld=lld -nostartfiles -nostdlib -Wl,-q -Wl,-z,max-page-size=4
ld.lld: error: undefined symbol: datable
>>> referenced by /tmp/bolt-162e01.o:(.text+0x4)

LLD (built from main branch alongside clang) is invoked as

ld.lld -EL -z relro --hash-style=gnu --eh-frame-hdr -m aarch64linux -pie -dynamic-linker /lib/ld-linux-aarch64.so.1 -o a.out -L/usr/lib/gcc-cross/aarch64-linux-gnu/11 -L/usr/lib/gcc-cross/aarch64-linux-gnu/11/../../../../lib64 -L/lib/aarch64-linux-gnu -L/lib/../lib64 -L/usr/lib/aarch64-linux-gnu -L/usr/lib64 -L/usr/lib/gcc-cross/aarch64-linux-gnu/11/../../../../aarch64-linux-gnu/lib -L/lib -L/usr/lib /tmp/bolt-249963.o -q -z max-page-size=4

With the -L just happpen to be the paths to a GCC cross compilation system on my x86_64 Ubuntu machine.

I have to pass -Wl,-zundefs to get lld to link correctly. I don't think datable is weak either.

If I add --shared I get:

ld.lld: error: relocation R_AARCH64_ADD_ABS_LO12_NC cannot be used against symbol 'datable'; recompile with -fPIC
>>> defined in /tmp/bolt-eec5d0.o
>>> referenced by /tmp/bolt-eec5d0.o:(.text+0x4)

ld.lld: error: relocation R_AARCH64_PREL32 cannot be used against symbol '_start'; recompile with -fPIC
>>> defined in /tmp/bolt-eec5d0.o
>>> referenced by /tmp/bolt-eec5d0.o:(.data+0x4)

ld.lld: error: relocation R_AARCH64_PREL64 cannot be used against symbol '_start'; recompile with -fPIC
>>> defined in /tmp/bolt-eec5d0.o
>>> referenced by /tmp/bolt-eec5d0.o:(.data+0x8)
clang: error: linker command failed with exit code 1 (use -v to see invocation)

@paschalis-mpeis
Copy link
Member Author

Hey Peter,

Yes, your assumption is correct, -Wl,--unresolved-symbols=ignore-all is passed. Full command line:

clang  --target=aarch64-unknown-linux-gnu -fPIE -fuse-ld=lld -Wl,--unresolved-symbols=ignore-all -Wl,--build-id=none -pie --target=aarch64-unknown-linux-gnu -nostartfiles -nostdlib -ffreestanding -nostartfiles -nostdlib bolt/test/AArch64/r_aarch64_prelxx.s -o build/tools/bolt/test/AArch64/Output/r_aarch64_prelxx.s.tmp.exe -mlittle-endian      -Wl,-q -Wl,-z,max-page-size=4

Do you think there is an issue on RHEL8 not respecting --unresolved-symbols?
Unsure if -Wl,-zundefs would have worked there.

@yavtuk
Copy link
Contributor

yavtuk commented Jun 17, 2025

@paschalis-mpeis based on the test header this test has to check only relocations, -Wl,--no-relax is acceptable.
Thanks for fixing this typo, I think it's my fault

@smithp35
Copy link
Collaborator

Hey Peter,

Yes, your assumption is correct, -Wl,--unresolved-symbols=ignore-all is passed. Full command line:

clang  --target=aarch64-unknown-linux-gnu -fPIE -fuse-ld=lld -Wl,--unresolved-symbols=ignore-all -Wl,--build-id=none -pie --target=aarch64-unknown-linux-gnu -nostartfiles -nostdlib -ffreestanding -nostartfiles -nostdlib bolt/test/AArch64/r_aarch64_prelxx.s -o build/tools/bolt/test/AArch64/Output/r_aarch64_prelxx.s.tmp.exe -mlittle-endian      -Wl,-q -Wl,-z,max-page-size=4

Do you think there is an issue on RHEL8 not respecting --unresolved-symbols? Unsure if -Wl,-zundefs would have worked there.

The code-path will be the same. I don't think it will have made a difference. Using an older ld.lld on my system rather than one I've just built (lld from llvm-14) I get the same error message as on RHEL8. I think it is just an older lld being used.

--no-relax was added in 2021. That's old enough for my llvm14 lld to accept it. May be worth checking RHEL8 as that might predate the option.

This does highlight a potential problem with these tests that you may be able to solve by requiring an lld built from source at the same time as BOLT or setting a REQUIRES minimum lld version. For example lets say a new lld version does some additional optimisations, under a new flag not accepted by older lld versions so it can't be used to turn them off; then you may find that there's no way to avoid different linker output from affecting your tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants