-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error occuring in release 1.14.4: load library failed: libnvidia-ml.so.1: cannot open shared object file #305
Comments
I will be able to check what the source of this could be on Monday. For now, you should be able to downgrade by specifying the versions of all packages:
|
@bawee could you provide more information on your setup? How are you running containers? How is the NVIDIA Container Toolkit installed and configured to be used with Docker? |
Hi @elezar,
The NVIDIA toolkit was installed as follows with instructions from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Then configured using:
The containers are run using a nextflow pipeline documented here: https://labs.epi2me.io/workflows/wf-basecalling/ Rolling back to the previous version of nvida-container-toolkit using your previous instructions above did not help. The error still appeared even with v1.14.3-1. My identical machine running v1.14.3 that i set up last week is still working fine I also tried completely removing docker and reinstalling using I hope that is at all helpful. Please let me know if I need to provide more info. Thank you! |
Note that Would you be able to install |
Hi Evan, Thank you for pointing that out. I had not seen that. I followed the instructions on https://docs.docker.com/engine/install/ubuntu/ and replaced docker.io with docker-ce. The error message is still the same, unfortunately:
Here is some more information:
|
Hi @elezar, it turns out the solution was to run Thanks for helping to troubleshoot and apologies, I did not see the requirement for the driver in the documentation https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html |
Ah. Thanks. Yes, we should definitely include the output of Can we close this issue then? |
Yes, thank you very much. |
I had this same issue, btw, and fixed it with the same solution (downloading the drivers based off the docs here https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#driver-installation ) The reason I had this problem is because I thought I had already installed the drivers, because in these CUDA download instructions it is very much implied they are being installed as the last step, and I went with the legacy non open "flavor". But the "download" instructions do not seem to have anything about "sudo apt-get install cuda-drivers-535" from the "cuda installation guide". I am a bit new to CUDA and eager to just get machine learnin' on my newly rented GPU server so I can't say I fully understand the difference between cuda-drivers-535 from the "installation guide" instructions and the "sudo apt-get install -y cuda-drivers" from the CUDA download instructions. @bawee thanks for posting this question, saved me a lot of time! |
Hi @elezar, I am trying to run an application on ubuntu 24.04 which needs nvidia container but stopped having the same issue. I have no nvidia hardware installed and I wish there be a solution such as gpu simulation. |
No comments @elezar? How can I run the application without gpu? |
thanks, I have meet same question, I slove it use the same way. |
After reinstalling the cuda drivers, cuda toolkit and container toolkit, I was still getting the error. Issue got resolved by just reinstalling docker-ce using |
Hello, I'm getting a load library failed error as as previous issue (unsure whether related, hence the new issue) when running a nextflow pipeline with docker that uses the nvidia-runtime-toolkit. It seems that the error is only present in the new version of nvidia-runtime-toolkit (1.14.4) but does not occur on an identical computer running version 1.14.3 which I had set up only few days prior.
Command error: docker: Error response from daemon: failed to create task for container: failed to create a shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown. time="2024-01-24T14:05:51Z" level=error msg="error waiting for container: "
nvidia-runtime-toolkit was installed using apt on Ubuntu 22.04.3 following instructions from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
I tried installing the old version instead (
sudo apt install nvidia-container-toolkit=1.14.3-1
) but it was unsuccessful due to an unavailable dependency.Thanks in advance!
Originally posted by @bawee in #302 (comment)
The text was updated successfully, but these errors were encountered: