Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot find libnccl.so.2 #331

Closed
lixiangMindSpore opened this issue Oct 26, 2021 · 7 comments
Closed

cannot find libnccl.so.2 #331

lixiangMindSpore opened this issue Oct 26, 2021 · 7 comments
Labels
question Further information is requested

Comments

@lixiangMindSpore
Copy link

Describe the bug
A clear and concise description of what the bug is.
image

Environment

  • Your operating system and version: Ubuntu18.04
  • Your python version:3.8
  • Your PyTorch version:11.0
  • How did you install python (e.g. apt or pyenv)? Did you use a virtualenv?:
  • conda create -n torch17 python=3.8
  • Have you tried using latest bagua master (python3 -m pip install git+https://github.com/BaguaSys/bagua.git -f https://repo.arrayfire.com/python/wheels/3.8.0/ )?:I use 0.8.1.post1

Reproducing

Please provide a minimal working example. This means the runnable code.

Please also write what exact commands are required to reproduce your results.

Additional context
Add any other context about the problem here.

@NOBLES5E
Copy link
Member

NOBLES5E commented Oct 26, 2021

Thanks for opening the issue. Bagua cannot find NCCL installation on your system in this case. Have you tried to follow the error message's instruction by running import bagua_core; bagua_core.install_deps() in your Python interpreter? It will help install needed system libraries.

@NOBLES5E NOBLES5E changed the title Use bagua with the problem as follows: cannot find libnccl.so.2 Oct 26, 2021
@lixiangMindSpore
Copy link
Author

Thanks for opening the issue. Bagua cannot find NCCL installation on your system in this case. Have you tried to follow the error message's instruction by running import bagua_core; bagua_core.install_deps() in your Python interpreter? It will help install needed system libraries.

I run bagua_install_deps.py and solve the problem. Thank you so much!

@NOBLES5E
Copy link
Member

You're welcome :)

@NOBLES5E NOBLES5E added the question Further information is requested label Oct 26, 2021
@Godricly
Copy link

Python 3.8.0 (default, Feb 25 2021, 22:10:10) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import bagua
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-b6bb5bf6d045> in <module>
----> 1 import bagua

~/python38/lib/python3.8/site-packages/bagua/__init__.py in <module>
     10 """
     11 
---> 12 import bagua_core  # noqa: F401
     13 from .version import __version__  # noqa: F401

~/python38/lib/python3.8/site-packages/bagua_core/__init__.py in <module>
      2 
      3 _environment._preload_libraries()
----> 4 from .bagua_core import *  # noqa: F401,E402,F403
      5 from .bagua_install_deps import install_deps  # noqa: F401,E402,F403

ImportError: libnccl.so.2: cannot open shared object file: No such file or directory

I got the same error with bagua-cuda116 using virtualenv. running bagua_install_deps.py failed for me.

bagua_install_deps.py 
import-im6.q16: not authorized `os' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `platform' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `shutil' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `tempfile' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `pathlib' @ error/constitute.c/WriteImage/1037.
from: too many arguments
/home/xxx/python38/bin/bagua_install_deps.py: line 10: _nccl_records: command not found
/home/xxx/python38/bin/bagua_install_deps.py: line 11: library_records: command not found
/home/xxx/python38/bin/bagua_install_deps.py: line 14: syntax error near unexpected token `('
/home/xxx/python38/bin/bagua_install_deps.py: line 14: `class DownloadProgressBar(tqdm):'

@Godricly
Copy link

bagua-cuda116 was built differently with other cuda release.

bagua-cuda116                 0.8.3.dev215

@woqidaideshi
Copy link
Contributor

@Godricly Which python version did you use to run bagua_install_deps.py?

Maybe you can try: python3 bagua_install_deps.py?

@Godricly
Copy link

Godricly commented Sep 3, 2022

I tried on an other machine with cuda113 and nccl, which works well for me.
I think the problem is that nccl is not installed. Also that bagua-cuda116 version should be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants