Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared libraries with transitions aren't always available where expected in runtime #13819

Closed
quval opened this issue Aug 9, 2021 · 16 comments
Closed
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: bug

Comments

@quval
Copy link
Contributor

quval commented Aug 9, 2021

Description of the problem:

When an executable has dynamically-linked dependencies that have transitions, shared libraries aren't always placed where the binary expects them in runtime. This means, for example, that such tests cannot be run in remote execution.

The rpath where shared libraries with transitions are looked up is $ORIGIN/_solib_k8/../../../k8-fastbuild-ST-<...>/bin/_solib_k8, but this path is not always available:

  1. Locally, bazel-out/k8-fastbuild/bin/_solib_k8/ may not always exist (e.g. if all shared libraries have transitions). ld doesn't seem to follow ../ from nonexistent paths.
  2. In remote execution (or when deploying a target by copying its runfiles directory), the configuration directory k8-fastbuild-ST-<...> isn't available at all. The right files exist under _solib_k8, but because of the rpath, ld doesn't look for them there.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Run bazel test :b on this: https://gist.github.com/quval/5077b2edc902e9dc0bcad2311a245ffa
(Or change cc_test to cc_binary with linkstatic = 0 and use bazel run).

What operating system are you running Bazel on?

Linux

What's the output of bazel info release?

4.1.0 (also happens at master)

Have you found anything relevant by searching the web?

Issue #9357 seems to be related (though Bazel doesn't crash here).

Any other information, logs, or outputs that you want to share?

After a90c408 the same library may not be linked more than once, so a solution may be to always add the default solib directory to the rpath. (When running locally, the wrong version may then be loaded for libraries that have been built for more than one configuration – this is already a potential issue; the default solib may have to go last if added as a fallback – or the configuration could be encoded in the mangled name of the library to prevent this altogether.) I'll be happy to send a PR if any of this makes sense.

@oquenchil oquenchil added P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: bug labels Aug 20, 2021
@quval
Copy link
Contributor Author

quval commented Aug 23, 2021

Here's a draft: quval@2e63f5a. Doesn't create the default solib directory, but does the rest. Should I send this as a PR?

@quval
Copy link
Contributor Author

quval commented Aug 30, 2021

@oquenchil: sorry about the ping – sort of a general question, though: is it be better to just send my patch as a PR and iterate on it, or to discuss the general approach on the ticket before sending a PR?

@oquenchil
Copy link
Contributor

Hi @quval , let's discuss first what you plan to do.

@quval
Copy link
Contributor Author

quval commented Aug 31, 2021

@oquenchil: Thank you!

The gist of the problem is that sometimes libraries will be looked for under the default solib dir even if the libraries themselves have transitions (this is the case e.g. in remote execution), but unless there's some non-transitioned library linked in, the default solib dir isn't added to the search path. I'm proposing to add an rpath entry to default solib dir last, as a fallback, if it hasn't been added before. When running under the execroot, all the libraries will be found in earlier rpath entries; otherwise, all the libraries will be found in that last rpath entry.

This in itself solves the issue, but while thinking about this, I've noticed a potential problem: say our binary is linking libA.so with no transitions and libB.so with transitions; the default solib will be added first to the rpath so that libA.so can be found, but if libB.so is also present there, say because it was compiled without transitions for some other target, it will be erroneously picked up from the default solib dir. Essentially, the problem is that libraries built with configuration transitions still have the same name. To fix this, my patch also adds the transition mnemonic to the mangled name of the output library. This can only happen if running under the execroot; in remote execution, only the correct version of the library will be present.

One last thing is that the default solib directory should always exist for rpath lookups to work. On a clean execroot, when building a binary that only depends on libraries with transitions, all rpath lookups will fail: rpaths are relative to the default solib directory and can't work if it doesn't exist. This will only rarely be an actual problem. For completeness, I'd be happy to send a patch for this too, only I haven't been able to figure out how to do this.

I'd appreciate any insights. In the meantime, I've put together a draft implementation (for the first two parts) at quval@2e63f5a.

@quval
Copy link
Contributor Author

quval commented Sep 13, 2021

@oquenchil: friendly ping :)

@oquenchil
Copy link
Contributor

I wonder if the fix should actually be to do the same equality comparison here: https://source.corp.google.com/piper///depot/google3/third_party/bazel/src/main/java/com/google/devtools/build/lib/rules/cpp/LibrariesToLinkCollector.java;rcl=325437765;l=270

but extracting the configuration fragment from both values before comparing, which would then trigger line 222.

I didn't understand why you added the root last while line 222 does it at the beginning.

@quval
Copy link
Contributor Author

quval commented Sep 17, 2021

@oquenchil: thanks!

I've actually been debating about the same, but if I remember correctly, I ultimately decided on this approach because of rpath search order – if we first look in the root, we may either pick up the wrong library (unless we mangle in the configuration fragment, which I think I added later) or waste time looking up the SO in a directory that's potentially large (I wanted to preserve the current behaviour given the comment on line 220, though as far as I understand, this isn't an issue when the code is deployed, only when running under the execroot). If you think this is the right way to do this, I'll revise the patch and send as a PR.

Do you know of a way to ensure the root solib dir exists even if all output libraries are in configured solib dirs? My guess is that the root almost always exists, but rpath lookup won't work. I'd be happy to get this fixed along the way, but wasn't able to figure out how.

@oquenchil
Copy link
Contributor

So I don't know the answer to your last question. This is an area of the code that I haven't touched that much. I will try to find some time next week to run it under the debugger and gain a better idea.

@quval
Copy link
Contributor Author

quval commented Sep 19, 2021

Thank you!

In the meantime, I've just opened a PR for the first part (#14011).

@quval quval mentioned this issue Oct 5, 2021
9 tasks
bazel-io pushed a commit that referenced this issue Oct 7, 2021
When an executable has dynamically-linked dependencies that have transitions, shared libraries are looked up under rpaths like this:

```
$ORIGIN/(../)*_solib_k8/../../../k8-fastbuild-ST-[0-9a-f]+/bin/_solib_k8
```

Unless running under the execroot, the transitioned solib directory may not be available at all; the right files will be present under the default solib directory, so it should be on the rpath.

Adding the default solib directory to the rpath may cause the wrong version of a shared library to be loaded, e.g. if it has been compiled in the default configuration. To prevent this, we also add the transition mnemonic to the mangled name of the library (adding the default solib last to the rpath won't really help, because it can legitimately be added first).

#13819

Closes #14011.

PiperOrigin-RevId: 401521899
quval added a commit to quval/bazel that referenced this issue Nov 14, 2021
PR bazelbuild#14011 took care of cc_libraries, this fixes the same issue for cc_imports.

Work towards bazelbuild#13819.
quval added a commit to quval/bazel that referenced this issue Nov 14, 2021
PR bazelbuild#14011 took care of cc_libraries, this fixes the same issue for cc_imports.

Work towards bazelbuild#13819.
@quval
Copy link
Contributor Author

quval commented Nov 14, 2021

This issue seems to affect cc_imports as well – transitioned cc_imports only have the configured root added to the rpath. I'm sending a PR for this as well – @oquenchil, can you please take a look when you get the chance?

fmeum added a commit to CodeIntelligenceTesting/bazel that referenced this issue Jan 28, 2022
When all dynamic deps in a build are built in transitioned
configurations, the default solib dir is not created. However, while
resolving paths, the dynamic linker stops at the first directory that
does not exist, even when followed by "../".

Before this commit, all rpath entries would consist of the relative
path to the default solib dir followed by the relative path to the
particular library's solib dir. Thus, if the default solib dir was
missing, the dynamic linker wouldn't resolve any of these paths.

This commit ensures that the relative path entries are normalized and
thus contain no references to non-existing directories assuming the
normalized path itself exists.

Work towards bazelbuild#13819.
fmeum added a commit to CodeIntelligenceTesting/bazel that referenced this issue Jan 28, 2022
When all dynamic deps in a build are built in transitioned
configurations, the default solib dir is not created. However, while
resolving paths, the dynamic linker stops at the first directory that
does not exist, even when followed by "../".

Before this commit, all rpath entries would consist of the relative
path to the default solib dir followed by the relative path to the
particular library's solib dir. Thus, if the default solib dir was
missing, the dynamic linker wouldn't resolve any of these paths.

This commit ensures that the relative path entries are normalized and
thus contain no references to non-existing directories assuming the
normalized path itself exists.

Work towards bazelbuild#13819.
@fmeum
Copy link
Collaborator

fmeum commented Jan 28, 2022

One last thing is that the default solib directory should always exist for rpath lookups to work. On a clean execroot, when building a binary that only depends on libraries with transitions, all rpath lookups will fail: rpaths are relative to the default solib directory and can't work if it doesn't exist. This will only rarely be an actual problem. For completeness, I'd be happy to send a patch for this too, only I haven't been able to figure out how to do this.

@quval I also got hit by this issue and came up with the following attempt at a simple fix: #14660. Could you perhaps take a look? I don't think I would be able to spot the more subtle issues this change could cause.

fmeum added a commit to fmeum/rules_meta that referenced this issue Jan 28, 2022
@quval
Copy link
Contributor Author

quval commented Jan 28, 2022

@fmeum: seems like a great way to work around the issue - thanks! I'm far from an expert myself, but normalising paths seems safe enough to me.

bazel-io pushed a commit that referenced this issue Feb 8, 2022
PR #14011 took care of cc_libraries. This fixes the same issue for cc_imports.

Work towards #13819.

Closes #14272.

PiperOrigin-RevId: 427137675
brentleyjones pushed a commit to brentleyjones/bazel that referenced this issue Feb 8, 2022
PR bazelbuild#14011 took care of cc_libraries. This fixes the same issue for cc_imports.

Work towards bazelbuild#13819.

Closes bazelbuild#14272.

PiperOrigin-RevId: 427137675
(cherry picked from commit 8734ccf)
fmeum added a commit to CodeIntelligenceTesting/bazel that referenced this issue Feb 9, 2022
When all dynamic deps in a build are built in transitioned
configurations, the default solib dir is not created. However, while
resolving paths, the dynamic linker stops at the first directory that
does not exist, even when followed by "../".

Before this commit, all rpath entries would consist of the relative
path to the default solib dir followed by the relative path to the
particular library's solib dir. Thus, if the default solib dir was
missing, the dynamic linker wouldn't resolve any of these paths.

This commit ensures that the relative path entries are normalized and
thus contain no references to non-existing directories assuming the
normalized path itself exists.

Work towards bazelbuild#13819.
Wyverald pushed a commit that referenced this issue Feb 9, 2022
#14757)

PR #14011 took care of cc_libraries. This fixes the same issue for cc_imports.

Work towards #13819.

Closes #14272.

PiperOrigin-RevId: 427137675
(cherry picked from commit 8734ccf)

Co-authored-by: Yuval K <yuvalk@gmail.com>
bazel-io pushed a commit that referenced this issue Mar 1, 2022
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while
resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by `../`.

Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the
particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths.

This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists.

Work towards #13819.

Closes #14660.

PiperOrigin-RevId: 431671888
fmeum added a commit to fmeum/bazel that referenced this issue Mar 1, 2022
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while
resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by `../`.

Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the
particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths.

This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists.

Work towards bazelbuild#13819.

Closes bazelbuild#14660.

PiperOrigin-RevId: 431671888
fmeum added a commit to fmeum/bazel that referenced this issue Mar 2, 2022
When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while
resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by `../`.

Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the
particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths.

This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists.

Work towards bazelbuild#13819.

Closes bazelbuild#14660.

PiperOrigin-RevId: 431671888
Wyverald pushed a commit that referenced this issue Mar 2, 2022
…14929)

When all dynamic deps in a build are built in transitioned configurations, the default solib dir is not created. However, while
resolving paths, the dynamic linker stops at the first directory that does not exist, even when followed by `../`.

Before this commit, all rpath entries would consist of the relative path to the default solib dir followed by the relative path to the
particular library's solib dir. Thus, if the default solib dir was missing, the dynamic linker wouldn't resolve any of these paths.

This commit ensures that the relative path entries are normalized and thus contain no references to non-existing directories assuming the normalized path itself exists.

Work towards #13819.

Closes #14660.

PiperOrigin-RevId: 431671888
@fmeum
Copy link
Collaborator

fmeum commented Dec 20, 2022

@quval Just trying to wrap my head around what works and what doesn't: Did our fixes, taken together, fully fix this issue or are you aware of any remaining problems?

@quval
Copy link
Contributor Author

quval commented Dec 20, 2022

@fmeum I may be missing something (it has been a while), but I think the issue is pretty much resolved.

@fmeum
Copy link
Collaborator

fmeum commented Dec 20, 2022

I don't see anything else either. Given that a number of issues link to this one, it's probably better to mark it closed.

@sgowroji Could you close this issue?

@sgowroji
Copy link
Member

Thank you for the above updates. Closing this issue now. Feel free to reach us if you still have any specific queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: bug
Projects
None yet
Development

No branches or pull requests

4 participants