-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ERROR] nvidia-container-cli: ldcache error: process /usr/sbin/ldconfig failed with error code: 1 #177
Comments
This looks like an issue with libnvidia-container. Did you try with docker and GPU support? |
thank @flx42
Yes,I can start the container with docker normally and execute nvidia-smi.
nvidia-container-cli can execute the following commands normally: But when I execute the following command, I encountered an error: |
thank @3XX0
When I export NVIDIA_DEBUG_LOG=1, The fatal error is as follows:
the full output is as follows:
|
@SolenoidWGT would you be able to check whether the NVIDIA Container CLI ( Also, could you check the permissions on the |
@klueska this may be caused by https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/141 which was already included in the v1.9.0 release. |
Meaning this fix broke something for @SolenoidWGT |
Sorry for some mistakes in my reply, I will update later. |
Thanks @elezar
yes , I installed version 1.9.0 of NVIDIA Container CLI by compiling from source, but the same error is still reported. All the following commands are executed as root.
-- WARNING, the following logs are for debugging purposes only --
I also check the permissions on the /dev/nvidia devices, However, everything seems to be ok.
Here are the versions of libnvidia-container-tools Although I compiled and installed nvidia-container-runtime from container-toolkit with tag v1.9.0, its version is still displayed as 1.1.2. I don't know if this is a misunderstanding?
|
@SolenoidWGT I have as yet not been able to reproduce this locally. Our current hypothesis is that this is caused by the code here which is triggered when changing the root and was added to the 1.9.0 release to address an issue on some Debian systems. Would you be able to:
@flx42 @3XX0 would you be able to provide access to a centos7 system so that I can debug this further? |
Hi,When I follow the documentation to test:
# enroot create --name cuda nvidia+cuda+10.0-base.sqsh
# enroot start --root --rw cuda sh -c 'pwd'
enroot reported an ERROR:
The error seems to be GPU related, and if I start an ubuntu image there is no error.
I'm not quite sure what's causing this error,looking forward to your reply.
Below is some configuration of my system:
enroot version is 3.4.0
OS is CentOS Linux release 7.6.1810 (Core)
./enroot-check_*.run --verify
nvidia-container-cli -V
Here is nvidia-container-cli output:
# nvidia-container-cli -k -d /dev/tty info
Thanks
The text was updated successfully, but these errors were encountered: