Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the error of making symlink in setupCompilerEnvironmentPhase. #84985

Conversation

@junjihashimoto
Copy link

junjihashimoto commented Apr 11, 2020

Motivation for this change

Fix the error of making symlink in setupCompilerEnvironmentPhase.

This pr is to rebase the branch from master to haskell-update.
Original PR is here.
#83605

Test ends after 2 hours.
https://github.com/hasktorch/hasktorch/runs/578941364?check_suite_focus=true

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
peti and others added 3 commits Apr 11, 2020
This update was generated by hackage2nix v2.15.1 from Hackage revision
commercialhaskell/all-cabal-hashes@53eef6b.
* ghcHEAD: bump to 8.11.20200403

* ghcHead: reduce diff vs. 8.10.1

dontAddExtraLibs was removed by accident (IMO) in ea19a8e

* ghcHEAD: add ability to use system libffi

- enable nixpkgs' libffi
- minimise diffs against 8.10.1
- remove patching

* remove configure warning about --with-curses-includes

configure: WARNING: unrecognized options: --with-curses-includes
for d in $(grep '^dynamic-library-dirs:' "$packageConfDir"/* | cut -d' ' -f2- | tr ' ' '\n' | sort -u); do
for lib in "$d/"*.{dylib,so}; do
ln -s "$lib" "$dynamicLinksDir"
makeDynamicLinks () {

This comment has been minimized.

Copy link
@cdepillabout

cdepillabout Apr 11, 2020

Member

@junjihashimoto Is there a reason for putting this in a function?

Also, could you add some more comments to this code? It is not clear what it is doing or why it is needed. (I know it didn't have any comments to start out with, but it would be nice to get some comments now, hopefully since you have a good idea of what it is doing.)

This comment has been minimized.

Copy link
@junjihashimoto

junjihashimoto Apr 11, 2020

Author

Ok. Sorry for bothering you.

https://github.com/hasktorch/hasktorch/runs/578941364?check_suite_focus=true
This version using makeDynamicLinks fails.

https://github.com/hasktorch/hasktorch/runs/541868184
The test using ln -sf passes.

I seem to misunderstand the problem.

@nh2 nh2 mentioned this pull request Apr 11, 2020
0 of 10 tasks complete
echo >&2 " source link: $lib"
echo >&2 " source following link recursively: $(readlink -f "$lib")"
echo >&2 " existing link: $linkPath"
echo >&2 " existing following link recursively: $(readlink -f "$linkPath")"

This comment has been minimized.

Copy link
@junjihashimoto

junjihashimoto Apr 12, 2020

Author

@cdepillabout
After all, the error in our build was that multiple libraries were using the same file name, but this PR improves the error message and makes debugging easier.
I apologize for the inconvenience.

Copy link
Member

cdepillabout left a comment

I'd like to merge this in, since it is adding much-needed error messages, and it appears to be fixing a blocking problem for @junjihashimoto. But I still don't quite get what the problem is, why this solution seems somewhat complicated, and why it only appears on osx.

Maybe we could get another review from someone familiar with the Haskell stuff? Maybe @Infinisil or @peti?

@peti
Copy link
Member

peti commented Apr 12, 2020

I have the same problem. The PR looks fine and interesting and it's probably a very good idea to merge it ... but I don't understand at all what it does, what problem it fixes, and why that problem appears to be specific to Darwin. Even more so, I wonder why no other Darwin users have ever complained about this -- seemingly rather serious -- issue.

@cdepillabout
Copy link
Member

cdepillabout commented Apr 12, 2020

@peti

I wonder why no other Darwin users have ever complained about this -- seemingly rather serious -- issue.

I think the answer to this question is that this issue was introduced by @Infinisil in #78738 a few months ago. Maybe it is a niche enough issue to not occur that much.

Although it would be good to have the comment from the body of #78738 actually in the nix code explaining why it is needed.


I'd also appreciate if someone like @Infinisil or @Ericson2314 could review this fix from @junjihashimoto.

@peti peti force-pushed the NixOS:haskell-updates branch 2 times, most recently from 5c17368 to d6aedf7 Apr 17, 2020
@peti peti force-pushed the NixOS:haskell-updates branch 6 times, most recently from 5fb9d0d to 51c1d58 Apr 24, 2020
@peti peti force-pushed the NixOS:haskell-updates branch from d8e9519 to 1d4449c May 1, 2020
@cdepillabout
Copy link
Member

cdepillabout commented May 5, 2020

@junjihashimoto Have you worked on this at all recently?

I'm not sure what should be done about this.

As I said above, it seems like this issue was caused by @Infinisil in #78738, but @Infinisil hasn't responded.

@junjihashimoto Would reverting the commit from #78738 (55a8169 or 8d4509e) fix this problem for you? If so, maybe we should do that until @Infinisil has time to respond here.

@peti peti force-pushed the NixOS:haskell-updates branch 2 times, most recently from 2da18dc to fc573e8 May 5, 2020
@junjihashimoto
Copy link
Author

junjihashimoto commented May 5, 2020

@cdepillabout
Thank you for your ping.
For now, I do not need this pr, because I fixed the library having the problem.
stites/pytorch-world@aec46d9
Perhaps the problem may occur with other libraries too, so I think revertering the commit or merging this pr is helpfull.

Copy link
Member

Infinisil left a comment

Having read through this, I think I get the problem: Before #78738, only a subset of the dynamic libraries would be linked to $out/lib/links, and only those could have filename collisions, therefore overriding each other. Now with the changes from that PR, all dynamic libraries are put into that directory, therefore all of their filenames can now override each other.

In @junjihashimoto's case, two different packages were providing the same library name (both of them have libiomp5.so), and previously not both of these were symlinked to $out/lib/links, but now they are, and it seems that the incorrect one got priority.

The current state of this PR only adds a warning when such a conflict occurs. I think a proper fix for this would be to only link libraries to $out/lib/links that have non-colliding names, while leaving the others in the dynamic-library-dirs of each individual package. But for now, just having the warning from this PR is better than nothing.

Reverting #78738 is certainly not the right thing to do though, because without it, the whole macOS workaround thing won't work for probably at least GHC 8.8.*, possibly breaking all Darwin binaries that need a bunch of dynamic libraries. When I made that PR, GHC 8.6.5 was still the default, so nobody would've really noticed 8.8.* stuff breaking, but since then 8.8.* has become the default.

for d in $(grep '^dynamic-library-dirs:' "$packageConfDir"/* | cut -d' ' -f2- | tr ' ' '\n' | sort -u); do
for lib in "$d/"*.{dylib,so} ; do
# When '*.{dylib,so}' does not match anything, '*.{dylib,so}' becomes '*.dylib or *.so' including '*'
# [ -f "$lib" ] checks if the matched path is available.

This comment has been minimized.

Copy link
@Infinisil

Infinisil May 5, 2020

Member

You can use shopt -s nullglob to prevent this from happening and needing this condition

if [ -e "$linkPath" ]; then
# link already exists! but maybe that's OK
if [ ! -L "$linkPath" ] || [ "$(readlink -f "$linkPath")" != "$(readlink -f "$lib")" ]; then
echo >&2 "error: failed to create symbolic link $linkPath: file exists"

This comment has been minimized.

Copy link
@Infinisil

Infinisil May 5, 2020

Member

I suggest explicitly not making this error look like a standard file exists error, this only seems misleading. How about "Not attempting to link from <x> to <y> because <reason>" instead

done
}
makeDynamicLinks

This comment has been minimized.

Copy link
@Infinisil

Infinisil May 5, 2020

Member

Doesn't need to be a function, you can just inline it here

@peti peti force-pushed the NixOS:haskell-updates branch 5 times, most recently from 33a27ef to 2a60d72 May 7, 2020
@peti peti force-pushed the NixOS:haskell-updates branch 3 times, most recently from d768d3d to a4282b6 May 15, 2020
@cdepillabout
Copy link
Member

cdepillabout commented May 17, 2020

This was sort of fixed in #87881, but the following comment from @Infinisil was never addressed:

I think a proper fix for this would be to only link libraries to $out/lib/links that have non-colliding names, while leaving the others in the dynamic-library-dirs of each individual package.

I'm going to go ahead and close this PR, since it was sort of fixed in #87881, but if anyone wants to pick it up and fix it as recommended by @Infinisil, please send another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.