Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for QEMU bugs causing TLS issues on AArch32 #55

Merged
merged 31 commits into from
Jul 21, 2020

Conversation

langston-barrett
Copy link
Contributor

@langston-barrett langston-barrett commented Jun 30, 2020

Fixes #52

Based on #48 and #54

langston-barrett and others added 19 commits June 30, 2020 16:08
This range is much larger now that we implemented long jumps by storing pointers
in the instruction stream.
For the x86 and PowerPC backends this didn't matter - we constructed the partial
blocks and then threw them away since we couldn't rewrite them.  This became a
problem with the AArch32 backend, which actually inspects block terminators and
panics when it can't figure out how to interpret failures.

This commit just changes the logic to avoid even trying to translate incomplete
blocks, which was useless work anyway.
For some reason, the mac build is using a version of the base package from ghc 8.10...
@langston-barrett
Copy link
Contributor Author

Good news: This causes QEMU user-mode to calculate the correct value for AT_PHDR. Bad news: The binary still segfaults in TLS-related code, with this backtrace:

0x0004bd24 in brk ()
#0  0x0004bd24 in brk ()
#1  0x00029810 in sbrk ()
#2  0x00010920 in __renovate___libc_setup_tls ()
#3  0x000104d4 in __renovate___libc_start_main ()
#4  0x00010288 in _start ()

The binary:
refurbished-workaround.zip

@langston-barrett
Copy link
Contributor Author

This is almost certainly another discrepancy between Linux and QEMU user-mode. On real ARM hardware running Linux, the binary doesn't crash at that point. In QEMU, the argument (r0) to brk is 0x89d08. In the native case, it's 0xa4d08. The program header segment is at 0xa3000, and I assume you'd want the program break to be after the last loadable segment.

@langston-barrett langston-barrett changed the title WIP: Workaround for QEMU bug causing TLS issues on AArch32 WIP: Workaround for QEMU bugs causing TLS issues on AArch32 Jul 2, 2020
@langston-barrett
Copy link
Contributor Author

QEMU bug report no. 2: https://bugs.launchpad.net/qemu/+bug/1886097

tl;dr: Linux calculates the program break based on the virtual address and size of the LOAD segment with the highest virtual address, whereas QEMU calculates it based on the virtual address and size of the writable LOAD segment with the highest virtual address. Hopefully, we can work around this by just sticking a small writable LOAD segment just after PHDR.

@langston-barrett
Copy link
Contributor Author

langston-barrett commented Jul 8, 2020

Status update: This causes the PPC64 tests to fail:

$ ~/qemu/ppc64-linux-user/qemu-ppc64 refurbished.exe
refurbished.exe: Invalid argument
Details

I've isolated the problem to load_elf_image in elfload.c in QEMU, specifically to the target_mmap which allocates the virtual memory space. Here are the arguments to that mmap:

Thread 1 "qemu-ppc64" hit Breakpoint 2, target_mmap (start=start@entry=251047880, len=len@entry=302604344, prot=prot@entry=0, flags=16434, 
    fd=fd@entry=-1, offset=offset@entry=0) at /home/galois/qemu/linux-user/mmap.c:365
365	{
start = 251047880
len = 302604344
prot = 0
flags = 16434
fd = -1
offset = 0

Here's where it hits the error:
https://github.com/qemu/qemu/blob/8796c64ecdfd34be394ea277aaaaa53df0c76996/linux-user/mmap.c#L481-L485

Looks like maybe an alignment issue?

Here's the program headers for the rewritten binary:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr           FileSiz            MemSiz              Flags  Align
  PHDR           0x000000000012a000 0x000000000f095000 0x000000000f095000 0x0000000000000230 0x0000000000000230  R      0x1000
  LOAD           0x000000000012a000 0x000000000f095000 0x000000000f095000 0x0000000000000230 0x0000000000000230  R      0x1000
  LOAD           0x0000000000000000 0x0000000010000000 0x0000000010000000 0x00000000000b2a11 0x00000000000b2a11  R E    0x10000
  LOAD           0x00000000000b9c80 0x00000000100c9c80 0x00000000100c9c80 0x0000000000009b48 0x0000000000009b48  RW     0x10000
  NOTE           0x0000000000000158 0x0000000010000158 0x0000000010000158 0x0000000000000044 0x0000000000000044  R      0x4
  TLS            0x00000000000b9c80 0x00000000100c9c80 0x00000000100c9c80 0x0000000000000020 0x0000000000000064  R      0x8
  GNU_RELRO      0x00000000000b9c80 0x00000000100c9c80 0x00000000100c9c80 0x0000000000006380 0x0000000000006380  R      0x1
  LOAD           0x00000000000d5000 0x000000000f096000 0x000000000f096000 0x0000000000052e74 0x0000000000052e74  R E    0x1000
  LOAD           0x0000000000128000 0x0000000020000000 0x0000000020000000 0x0000000000001000 0x0000000000001000  RW     0x1000
  LOAD           0x000000000012b038 0x000000000f096000 0x000000000f096000 0x0000000000000000 0x0000000000000000   W     0x1000

@langston-barrett
Copy link
Contributor Author

One thing to note is that I introduced a bunch of fails in doRewrite, some of which indicate malformed input, others of which indicate either malformed input or an internal bug. When Renovate prints these though, they are prefixed with User error:. I could do a separate PR that adds better error handling to the ElfRewriter monad if that's desirable.

@langston-barrett
Copy link
Contributor Author

Okay, this is cleaned up enough to be ready for review.

@langston-barrett langston-barrett changed the title WIP: Workaround for QEMU bugs causing TLS issues on AArch32 Workaround for QEMU bugs causing TLS issues on AArch32 Jul 16, 2020
renovate/src/Renovate/BinaryFormat/ELF.hs Outdated Show resolved Hide resolved
renovate/src/Renovate/BinaryFormat/ELF.hs Outdated Show resolved Hide resolved
renovate/src/Renovate/BinaryFormat/ELF.hs Show resolved Hide resolved
renovate/src/Renovate/BinaryFormat/ELF.hs Outdated Show resolved Hide resolved
renovate/src/Renovate/BinaryFormat/ELF.hs Outdated Show resolved Hide resolved
renovate/src/Renovate/BinaryFormat/ELF.hs Show resolved Hide resolved
renovate/src/Renovate/BinaryFormat/ELF.hs Outdated Show resolved Hide resolved
renovate/src/Renovate/BinaryFormat/ELF/Internal.hs Outdated Show resolved Hide resolved
then Nothing
else
let best = L.minimumBy (O.comparing (\addr -> abs (addr - phdrOffset))) validCandidates
in if let m = minimum (fmap (\segInfo -> abs (pVAddr segInfo - pOffset segInfo)) segInfos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this essentially the calculation that QEMU usermode is doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly. This is a last-ditch check to ensure that this function doesn't return a Just addr where addr would be unacceptable to QEMU.

@GaloisInc GaloisInc deleted a comment from travitch Jul 20, 2020
@langston-barrett
Copy link
Contributor Author

@travitch What's your preference on how we merge this? I don't think cleaning up the git history is super feasible, and there's a lot of content... Should I squash? Rebase and try to get halfway coherent commits?

@travitch
Copy link
Contributor

Good question. I'm happy with a squash commit or, alternatively, you could do a local rebase (possibly into one commit?), force push to the branch, and then do a normal merge?

@langston-barrett langston-barrett merged commit 35b4598 into master Jul 21, 2020
@langston-barrett langston-barrett deleted the lb/fix-aarch32-tls branch July 21, 2020 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit ordering of segments in the program header table on AArch32
2 participants