-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble Running NVIDIA GPU Containers, ldconfig failed
#1516
Comments
Hi,
You can see the tests we run on meta-tegra images in the test spreadsheet. |
Hi,
The issue is, I need to use |
Hi @Nauman3S Could you please use Please share any findings when you are able to test with |
HI @Nauman3S |
Closing this issue since no updates provided. |
I'm experiencing difficulties running NVIDIA GPU containers. I encounter errors when attempting to run containers that utilize the GPU.
Issue Reproduction Steps:
Configuring the container runtime:
sudo nvidia-ctk runtime configure --runtime=containerd
sudo systemctl restart containerd
Pulling images for testing:
sudo ctr images pull docker.io/nvidia/cuda:12.0.0-runtime-ubuntu20.04
sudo ctr images pull docker.io/nvidia/cuda:12.0.0-runtime-ubi8
sudo ctr images pull docker.io/nvidia/cuda:12.0.0-base-ubuntu20.04
sudo ctr images pull docker.io/nvidia/cuda:12.0.0-base-ubi8
Running a container with GPU:
sudo ctr run --rm --gpus 0 --runtime io.containerd.runc.v1 --privileged docker.io/nvidia/cuda:12.0.0-runtime-ubuntu20.04 test nvidia-smi
Error Message:
ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1: unknown
This error persists across all pulled NVIDIA images(non-ubuntu based images show the same error but with /sbin/ldconfig instead of /sbin/ldconfig.real. However, non-GPU containers (e.g., docker.io/macabees/neofetch:latest) work without issues.
Further Details:
Running ldconfig -p shows 264 libs found, including various NVIDIA libraries while running ldconfig outputs no error.
Output from
sudo nvidia-container-cli -k -d /dev/tty info
includes warnings about missing libraries and compat32 libraries, although nvidia-smi shows the GPU is recognized correctly.Attempted Solutions:
Verifying all NVIDIA driver and toolkit components are correctly installed. Ensuring the ldconfig cache is current and includes paths to the NVIDIA libraries and /sbin/ldconfig.real is a symlink to /sbin/ldconfig.
Despite these efforts, the error persists, and GPU containers fail to start. I'm seeking advice on resolving this ldcache and container initialization error to run NVIDIA GPU containers.
The text was updated successfully, but these errors were encountered: