Skip to content

nerdctl --gpus all containers miss libnvidia-ml.so.1 symlink compared to docker #4621

@ChengyuZhu6

Description

@ChengyuZhu6

Description

When running the same gpu-enabled image with docker and nerdctl using --gpus all with command nvidia-smi, only nerdctl fails due to libnvidia-ml.so cannot be found

# nerdctl run --rm -it --gpus all env12.com/cuda13.0.1-cudnn9-py3.12-torch2.9.0:251031 nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

nerdctl containers (with --gpus all) only have the real library file; the symlink is missing unless ldconfig is run manually inside the container

# nerdctl run --rm -it --gpus all env12.com/cuda13.0.1-cudnn9-py3.12-torch2.9.0:251031 find / -name libnvidia-ml.so*
/usr/local/cuda-13.0/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/usr/lib64/libnvidia-ml.so.535.261.03
# docker run --rm -it --gpus all env12.com/cuda13.0.1-cudnn9-py3.12-torch2.9.0:251031 find / -name libnvidia-ml.so*
/usr/local/cuda-13.0/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/usr/lib64/libnvidia-ml.so.1
/usr/lib64/libnvidia-ml.so.535.261.03

Docker works because nvidia-container-runtime-hook reads /etc/nvidia-container-runtime/config.toml and passes --ldconfig=@/sbin/ldconfig to nvidia-container-cli configure , while nerdctl calls nvidia-container-cli directly, so it needs an explicit --ldconfig argument for ldconfig to run inside the container and create the symlinks.

Steps to reproduce the issue

Describe the results you received and expected

Expected: Containers started with nerdctl run --gpus all should expose the same NVIDIA libraries and SONAME symlinks as docker run --gpus all, including:

  • /usr/lib64/libnvidia-ml.so.<full-version>
  • /usr/lib64/libnvidia-ml.so.1 (symlink created by ldconfig)

Received: docker containers (with --gpus all) have both the real library file and the symlink (libnvidia-ml.so.1). nerdctl containers (with --gpus all) only have the real library file; the symlink is missing unless ldconfig is run manually inside the container.

What version of nerdctl are you using?

nerdctl 2.2.0

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions