Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the error of making symlink in setupCompilerEnvironmentPhase. #84985

Conversation

junjihashimoto
Copy link
Member

@junjihashimoto junjihashimoto commented Apr 11, 2020

Motivation for this change

Fix the error of making symlink in setupCompilerEnvironmentPhase.

This pr is to rebase the branch from master to haskell-update.
Original PR is here.
#83605

Test ends after 2 hours.
https://github.com/hasktorch/hasktorch/runs/578941364?check_suite_focus=true

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

peti and others added 3 commits April 11, 2020 02:31
This update was generated by hackage2nix v2.15.1 from Hackage revision
commercialhaskell/all-cabal-hashes@53eef6b.
* ghcHEAD: bump to 8.11.20200403

* ghcHead: reduce diff vs. 8.10.1

dontAddExtraLibs was removed by accident (IMO) in ea19a8e

* ghcHEAD: add ability to use system libffi

- enable nixpkgs' libffi
- minimise diffs against 8.10.1
- remove patching

* remove configure warning about --with-curses-includes

configure: WARNING: unrecognized options: --with-curses-includes
for d in $(grep '^dynamic-library-dirs:' "$packageConfDir"/* | cut -d' ' -f2- | tr ' ' '\n' | sort -u); do
for lib in "$d/"*.{dylib,so}; do
ln -s "$lib" "$dynamicLinksDir"
makeDynamicLinks () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junjihashimoto Is there a reason for putting this in a function?

Also, could you add some more comments to this code? It is not clear what it is doing or why it is needed. (I know it didn't have any comments to start out with, but it would be nice to get some comments now, hopefully since you have a good idea of what it is doing.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Sorry for bothering you.

https://github.com/hasktorch/hasktorch/runs/578941364?check_suite_focus=true
This version using makeDynamicLinks fails.

https://github.com/hasktorch/hasktorch/runs/541868184
The test using ln -sf passes.

I seem to misunderstand the problem.

@nh2 nh2 mentioned this pull request Apr 11, 2020
10 tasks
echo >&2 " source link: $lib"
echo >&2 " source following link recursively: $(readlink -f "$lib")"
echo >&2 " existing link: $linkPath"
echo >&2 " existing following link recursively: $(readlink -f "$linkPath")"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdepillabout
After all, the error in our build was that multiple libraries were using the same file name, but this PR improves the error message and makes debugging easier.
I apologize for the inconvenience.

Copy link
Member

@cdepillabout cdepillabout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to merge this in, since it is adding much-needed error messages, and it appears to be fixing a blocking problem for @junjihashimoto. But I still don't quite get what the problem is, why this solution seems somewhat complicated, and why it only appears on osx.

Maybe we could get another review from someone familiar with the Haskell stuff? Maybe @infinisil or @peti?

@peti
Copy link
Member

peti commented Apr 12, 2020

I have the same problem. The PR looks fine and interesting and it's probably a very good idea to merge it ... but I don't understand at all what it does, what problem it fixes, and why that problem appears to be specific to Darwin. Even more so, I wonder why no other Darwin users have ever complained about this -- seemingly rather serious -- issue.

@cdepillabout
Copy link
Member

cdepillabout commented Apr 12, 2020

@peti

I wonder why no other Darwin users have ever complained about this -- seemingly rather serious -- issue.

I think the answer to this question is that this issue was introduced by @infinisil in #78738 a few months ago. Maybe it is a niche enough issue to not occur that much.

Although it would be good to have the comment from the body of #78738 actually in the nix code explaining why it is needed.


I'd also appreciate if someone like @infinisil or @Ericson2314 could review this fix from @junjihashimoto.

@peti peti force-pushed the haskell-updates branch 2 times, most recently from 5c17368 to d6aedf7 Compare April 17, 2020 18:51
@peti peti force-pushed the haskell-updates branch 6 times, most recently from 5fb9d0d to 51c1d58 Compare May 1, 2020 19:30
@cdepillabout
Copy link
Member

@junjihashimoto Have you worked on this at all recently?

I'm not sure what should be done about this.

As I said above, it seems like this issue was caused by @infinisil in #78738, but @infinisil hasn't responded.

@junjihashimoto Would reverting the commit from #78738 (55a8169 or 8d4509e) fix this problem for you? If so, maybe we should do that until @infinisil has time to respond here.

@peti peti force-pushed the haskell-updates branch 2 times, most recently from 2da18dc to fc573e8 Compare May 5, 2020 14:25
@junjihashimoto
Copy link
Member Author

junjihashimoto commented May 5, 2020

@cdepillabout
Thank you for your ping.
For now, I do not need this pr, because I fixed the library having the problem.
stites/pytorch-world@aec46d9
Perhaps the problem may occur with other libraries too, so I think revertering the commit or merging this pr is helpfull.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having read through this, I think I get the problem: Before #78738, only a subset of the dynamic libraries would be linked to $out/lib/links, and only those could have filename collisions, therefore overriding each other. Now with the changes from that PR, all dynamic libraries are put into that directory, therefore all of their filenames can now override each other.

In @junjihashimoto's case, two different packages were providing the same library name (both of them have libiomp5.so), and previously not both of these were symlinked to $out/lib/links, but now they are, and it seems that the incorrect one got priority.

The current state of this PR only adds a warning when such a conflict occurs. I think a proper fix for this would be to only link libraries to $out/lib/links that have non-colliding names, while leaving the others in the dynamic-library-dirs of each individual package. But for now, just having the warning from this PR is better than nothing.

Reverting #78738 is certainly not the right thing to do though, because without it, the whole macOS workaround thing won't work for probably at least GHC 8.8.*, possibly breaking all Darwin binaries that need a bunch of dynamic libraries. When I made that PR, GHC 8.6.5 was still the default, so nobody would've really noticed 8.8.* stuff breaking, but since then 8.8.* has become the default.

for d in $(grep '^dynamic-library-dirs:' "$packageConfDir"/* | cut -d' ' -f2- | tr ' ' '\n' | sort -u); do
for lib in "$d/"*.{dylib,so} ; do
# When '*.{dylib,so}' does not match anything, '*.{dylib,so}' becomes '*.dylib or *.so' including '*'
# [ -f "$lib" ] checks if the matched path is available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use shopt -s nullglob to prevent this from happening and needing this condition

if [ -e "$linkPath" ]; then
# link already exists! but maybe that's OK
if [ ! -L "$linkPath" ] || [ "$(readlink -f "$linkPath")" != "$(readlink -f "$lib")" ]; then
echo >&2 "error: failed to create symbolic link $linkPath: file exists"
Copy link
Member

@infinisil infinisil May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest explicitly not making this error look like a standard file exists error, this only seems misleading. How about "Not attempting to link from <x> to <y> because <reason>" instead

done
}

makeDynamicLinks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be a function, you can just inline it here

@peti peti force-pushed the haskell-updates branch 5 times, most recently from 33a27ef to 2a60d72 Compare May 8, 2020 19:12
@peti peti force-pushed the haskell-updates branch 3 times, most recently from d768d3d to a4282b6 Compare May 15, 2020 19:19
@cdepillabout
Copy link
Member

This was sort of fixed in #87881, but the following comment from @infinisil was never addressed:

I think a proper fix for this would be to only link libraries to $out/lib/links that have non-colliding names, while leaving the others in the dynamic-library-dirs of each individual package.

I'm going to go ahead and close this PR, since it was sort of fixed in #87881, but if anyone wants to pick it up and fix it as recommended by @infinisil, please send another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants