Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patchelf 0.10 corrupts some shared libraries #170

Closed
sjackman opened this issue May 17, 2019 · 10 comments · Fixed by #202
Closed

patchelf 0.10 corrupts some shared libraries #170

sjackman opened this issue May 17, 2019 · 10 comments · Fixed by #202

Comments

@sjackman
Copy link

sjackman commented May 17, 2019

After using patchelf 0.10 to patch a shared library, ldd reports Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed!.

Steps to reproduce:

$ curl -L https://linuxbrew.bintray.com/bottles/glib-2.60.2.x86_64_linux.bottle.tar.gz | tar xz
$ patchelf --print-needed glib/2.60.2/lib/libglib-2.0.so.0.6000.2
libpcre.so.1
libpthread.so.0
libc.so.6
$ patchelf --print-rpath glib/2.60.2/lib/libglib-2.0.so.0.6000.2
@@HOMEBREW_PREFIX@@/lib
$ ldd glib/2.60.2/lib/libglib-2.0.so.0.6000.2
	linux-vdso.so.1 (0x00007ffce67eb000)
	libpcre.so.1 => not found
	libpthread.so.0 => /gsc/btl/linuxbrew/Cellar/glibc/2.23/lib/libpthread.so.0 (0x00007f36cfc68000)
	libc.so.6 => /gsc/btl/linuxbrew/Cellar/glibc/2.23/lib/libc.so.6 (0x00007f36cf8cc000)
	/gsc/btl/linuxbrew/Cellar/glibc/2.23/lib64/ld-linux-x86-64.so.2 (0x00007f36d01be000)
$ chmod +w glib/2.60.2/lib/libglib-2.0.so.0.6000.2
$ patchelf --set-rpath /home/linuxbrew/.linuxbrew/lib glib/2.60.2/lib/libglib-2.0.so.0.6000.2
$ patchelf --print-needed glib/2.60.2/lib/libglib-2.0.so.0.6000.2
$ patchelf --print-rpath glib/2.60.2/lib/libglib-2.0.so.0.6000.2

$ ldd glib/2.60.2/lib/libglib-2.0.so.0.6000.2
Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed!
@sjackman
Copy link
Author

In case it's relevant, this shared library was previously patched using patchelf 0.9 to change the RPATH, and now it's being patched using patchelf 0.10 to change the RPATH again.

@iMichka
Copy link

iMichka commented May 17, 2019

There is a comment about a similar bug here: #158 (comment)

@sjackman
Copy link
Author

In this case, rebuilding this binary package using patchelf 0.10 (rather than patchelf 0.9) allows the resulting binary to be correctly patched patchelf 0.10. It seems there's an incompatibility between patchelf 0.9 and 0.10

@vcunat
Copy link
Member

vcunat commented Sep 22, 2019

I had to abort switching patchelf default from 0.9 to 0.10 in nixpkgs due to some corruptions like this.

Example I studied: ${vscode}/lib/vscode/code gets corrupted. It works in NixOS/nixpkgs@f8a8fc6c7 for example and gets broken by updating patchelf for the single derviation:

nix-build -E 'with (import ./. {}); vscode.overrideAttrs (attrs: { nativeBuildInputs = attrs.nativeBuildInputs ++ [ patchelfUnstable ]; })'

When I look at binary diff (via vsbindiff in my case), among the changes I see that large section of the file gets overwritten by 0x58 bytes 🤷‍♀️

Switching the default rebuilds everything, so it's a bit difficult to play with that, but fortunately it's enough to trigger the bug by overrides like above. Just in case, cache.nixos.org now does have almost full binary set for nixpkgs with patchelf 0.10, e.g. commit NixOS/nixpkgs@2c3dcbb https://hydra.nixos.org/eval/1543924 – might be useful.

@vcunat
Copy link
Member

vcunat commented Sep 22, 2019

One doable thing is to bisect the above test over patchelf versions – that should be relatively fast, and it might help to narrow down the problem.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/why-is-nixos-patchelf-still-at-0-9/6006/2

@domenkozar
Copy link
Member

Offending commit is c4deb5e, according to #192 (comment)

Stripping after patchelfing was one of the main reasons why 0.10 was released.

domenkozar referenced this issue Jun 2, 2020
The current approach to changing sections in ET_DYN executables is to move
the INTERP section to the end of the file. +This means changing PT_PHDR to
add an extra PT_LOAD section so that the new section is mmaped into memory
by the elf loader in the kernel. In order to extend PHDR, this means moving
it to the end of the file.

Its documented in BUGS there is a kernel 'bug' which means that if you have holes
in memory between the base load address and the PT_LOAD segment that contains PHDR,
it will pass an incorrect PHDR address to ld.so and fail to load the binary, segfaulting.

To avoid this, the code currently inserts space into the binary to ensure that when
loaded into memory there are no holes between the PT_LOAD sections. This inflates the
binaries by many MBs in some cases. Whilst we could make them sparse, there is a second
issue which is that strip can fail to process these binaries:

$ strip fixincl
Not enough room for program headers, try linking with -N
[.note.ABI-tag]: Bad value

This turns out to be due to libbfd not liking the relocated PHDR section either
(#10).

Instead this patch implements a different approach, leaving PHDR where it is but extending
it in place to allow addition of a new PT_LOAD section. This overwrites sections in the
binary but those get moved to the end of the file in the new PT_LOAD section.

This is based on patches linked from the above github issue, however whilst the idea
was good, the implementation wasn't correct and they've been rewritten here.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
@rpurdie
Copy link
Contributor

rpurdie commented Jun 2, 2020

From the output of objdump -x on this library, the top of the .dynamic section is getting trashed:

0x600000001 0x0000000000429000
0x335000 0x0000000000335000
0xc740 0x000000000000c740
0x1000 0x0000000000009098
SONAME libglib-2.0.so.0

and with some debugging its the PT_LOAD section injection which does it. That code starts with the comment:
/* Add a segment that maps the replaced sections into memory. */

I've run out of time to figure out why the PT_LOAD section is clobbering .dynamic today but its at least a good start at figuring out the problem.

@rpurdie
Copy link
Contributor

rpurdie commented Jun 2, 2020

Tracked down a possible fix which is adding:
wri(hdr->e_phoff, sizeof(Elf_Ehdr));
under
/* Add a segment that maps the replaced sections into memory. */

The issue is that if the program headers were previously relocated to the end of the file, this new code assumes they're located after the elf header. The above line forces them back there which is where the code has made space for them.

I can sort out a proper patch/pull request in due course but thought I'd at least post this now since I found a potential fix. It explains why Yocto Project doesn't see this since we tend only to edit newly compiled binaries.

@vcunat
Copy link
Member

vcunat commented Jun 3, 2020

Do you think it's already ripe for firing a Hydra.nixos.org jobset to test this on whole NixPkgs? (say, x86_64 only) I haven't tried to really understand this issue so far.

rpurdie added a commit to rpurdie/patchelf that referenced this issue Jun 3, 2020
When running patchelf on some existing patchelf'd binaries to change to longer
RPATHS, ldd would report the binaries as invalid. The output of objdump -x on
those libraryies should show the top of the .dynamic section is getting trashed,
something like:

0x600000001 0x0000000000429000
0x335000 0x0000000000335000
0xc740 0x000000000000c740
0x1000 0x0000000000009098
SONAME libglib-2.0.so.0

(which should be RPATH and DT_NEEDED entries)

This was tracked down to the code which injects the PT_LOAD section.

The issue is that if the program headers were previously relocated to the end
of the file which was how patchelf operated previously, the relocation code
wouldn't work properly on a second run as it now assumes they're located after
the elf header. This change forces them back to immediately follow the elf
header which is where the code has made space for them.

Should fix NixOS#170
and NixOS#192

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants