Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not load dynamic library 'libcudart.so.11.0' #39

Closed
vickerse1 opened this issue May 8, 2023 · 11 comments
Closed

Could not load dynamic library 'libcudart.so.11.0' #39

vickerse1 opened this issue May 8, 2023 · 11 comments

Comments

@vickerse1
Copy link

Hi,

When I install in conda for linux/GPU the environment doesn't show up. Then, when I install with pip for linux/GPU I get the following error in jupyter notebook when I try to run "import keypoint_moseq as kpms":


2023-05-08 15:05:56.736333: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-05-08 15:05:56.773764: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-05-08 15:05:56.776625: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

Do you have any suggestions?

Thanks,

Evan

@vickerse1
Copy link
Author

ah yes, then the kernel dies and restarts. thanks, evan

@vickerse1
Copy link
Author

...and, more details of the error appeared in the command line window:


2023-05-08 16:07:24.091515: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2023-05-08 16:07:24.191301: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:454] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

@calebweinreb
Copy link
Contributor

Hmm seems like there are a couple things to sort out:

  • When you say the env doesn't show up with conda install, do you mean in jupyter? If so, are you using jupyterlab or a jupyter notebook? The env will only show up if you use jupyterlab and launch it from within the conda env. To make it show up more generally (e.g. for notebooks) you need to run
python -m ipykernel install --user --name=keypoint_moseq
  • The last error about ptxas seems to be related to conda installation, according to the JAX install guide. Did you install cuda/cudnn directly (from NVIDIA) before you tried installing keypoint-MoSeq via conda? If so, you may need to delete the conda env and start over before pip installation to ensure that it uses your system-wide installation of cuda/cudnn. Speaking of which, what version of cuda do you have?
nvcc --version

@vickerse1
Copy link
Author

vickerse1 commented May 9, 2023 via email

@calebweinreb
Copy link
Contributor

CUDA issue

Some thoughts...

  • This seems like an issue with JAX rather than keypoint-moseq per se, so the JAX docs might help
  • Recent versions of JAX seem not to with with cudnn < 8.6. What version of cudnn do you have? We pinned jax==0.3.22 in the install docs so that it would be compatible with cudnn 8.2. Did you install 0.3.22?
  • You could try explicitly specifying the cuda path. To do that, run the following before importing jax:
import os
cuda_path = os.environ['CUDA_PATH'] # or maybe cuda_path='/usr/local/cuda-11.0'
os.environ['XLA_FLAGS'] = '--xla_gpu_cuda_data_dir='+cuda_path

mistune issue

First I should note that the calibration step can safely be skipped. Second, it seems like pinning mistune to an earlier version might be a workaround? Maybe pip instal -U mistune==0.8.4?

@vickerse1
Copy link
Author

vickerse1 commented May 9, 2023 via email

@calebweinreb
Copy link
Contributor

So did pip instal -U mistune==0.8.4 solve the calibration issue?

Hmm I don't have experience with WSL. Is there a reason you can't just do everything using the Windows OS? We've gotten Windows+GPU working fine and I think a number of other users have as well.

Regarding CUDA+JAX+WSL, according to this stackoverflow post, they seemed to get things working using the nighly build of jax:

python3 -m pip install git+https://github.com/google/jax 
pip install jaxlib --pre -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_cuda_releases.html

Given that this is a WSL-specific issue, I'm going to sign off at this point, but please post if you figure out a solution in case others have this issue!

@vickerse1
Copy link
Author

vickerse1 commented May 9, 2023 via email

@calebweinreb
Copy link
Contributor

@wingillis mentioned that pytorch packages its own private cuda/cudnn when installed on WSL, whereas JAX requires dynamic linking to the system-wide (Windows) install. So the success with pytorch may not translate.

@vickerse1
Copy link
Author

vickerse1 commented May 9, 2023 via email

@calebweinreb
Copy link
Contributor

Closing for now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants