Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 1 #1760

Closed
9 tasks done
jacksonsshen opened this issue Jun 17, 2023 · 1 comment
Closed
9 tasks done

Comments

@jacksonsshen
Copy link

jacksonsshen commented Jun 17, 2023

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

In old system, I can use the Docker and NVIDIA Docker images normally. When I switched to the new system, the old Docker images worked properly, but the old NVIDIA Docker image did not work properly. (need to download new NVIDIA Docker image again to work)

2. Steps to reproduce the issue

for example: docker run -it --gpus all nvidia/cuda:10.1-base
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 1: unknown.
ERRO[0000] error waiting for container:

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
    I0617 07:33:47.698166 22846 nvc.c:376] initializing library context (version=1.13.1, build=6f4aea0fca16aaff01bab2567adb34ec30847a0e)
    I0617 07:33:47.698211 22846 nvc.c:350] using root /
    I0617 07:33:47.698220 22846 nvc.c:351] using ldcache /etc/ld.so.cache
    I0617 07:33:47.698228 22846 nvc.c:352] using unprivileged user 1000:1000
    I0617 07:33:47.698249 22846 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
    I0617 07:33:47.698439 22846 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
    W0617 07:33:47.699341 22846 nvc.c:258] failed to detect NVIDIA devices
    W0617 07:33:47.699591 22847 nvc.c:273] failed to set inheritable capabilities
    W0617 07:33:47.699640 22847 nvc.c:274] skipping kernel modules load due to failure
    I0617 07:33:47.699981 22848 rpc.c:71] starting driver rpc service
    I0617 07:33:47.709075 22849 rpc.c:71] starting nvcgo rpc service
    I0617 07:33:47.710040 22846 nvc_info.c:796] requesting driver information with ''
    I0617 07:33:47.711269 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.510.73.05
    I0617 07:33:47.711388 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.510.73.05
    I0617 07:33:47.711443 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.510.73.05
    I0617 07:33:47.711484 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.510.73.05
    I0617 07:33:47.711527 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.510.73.05
    I0617 07:33:47.711605 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.510.73.05
    I0617 07:33:47.711665 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.510.73.05
    I0617 07:33:47.711705 22846 nvc_info.c:176] skipping /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4.0.0
    I0617 07:33:47.711746 22846 nvc_info.c:176] skipping /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4.0.0
    I0617 07:33:47.711788 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.510.73.05
    I0617 07:33:47.711827 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.510.73.05
    I0617 07:33:47.711887 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.510.73.05
    I0617 07:33:47.711926 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.510.73.05
    I0617 07:33:47.711965 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.510.73.05
    I0617 07:33:47.712005 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.510.73.05
    I0617 07:33:47.712067 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.510.73.05
    I0617 07:33:47.712124 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.510.73.05
    I0617 07:33:47.712162 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.510.73.05
    I0617 07:33:47.712202 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.510.73.05
    I0617 07:33:47.712261 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.510.73.05
    I0617 07:33:47.712323 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.510.73.05
    I0617 07:33:47.712506 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.510.73.05
    I0617 07:33:47.712633 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.510.73.05
    I0617 07:33:47.712675 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.510.73.05
    I0617 07:33:47.712717 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.510.73.05
    I0617 07:33:47.712761 22846 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.510.73.05
    I0617 07:33:47.712808 22846 nvc_info.c:174] selecting /usr/lib32/vdpau/libvdpau_nvidia.so.510.73.05
    I0617 07:33:47.712853 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-tls.so.510.73.05
    I0617 07:33:47.712891 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-ptxjitcompiler.so.510.73.05
    I0617 07:33:47.712941 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-opticalflow.so.510.73.05
    I0617 07:33:47.712994 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-opencl.so.510.73.05
    I0617 07:33:47.713031 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-ml.so.510.73.05
    I0617 07:33:47.713080 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-glvkspirv.so.510.73.05
    I0617 07:33:47.713117 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-glsi.so.510.73.05
    I0617 07:33:47.713152 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-glcore.so.510.73.05
    I0617 07:33:47.713189 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-fbc.so.510.73.05
    I0617 07:33:47.713242 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-encode.so.510.73.05
    I0617 07:33:47.713293 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-eglcore.so.510.73.05
    I0617 07:33:47.713328 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-compiler.so.510.73.05
    I0617 07:33:47.713366 22846 nvc_info.c:174] selecting /usr/lib32/libnvidia-allocator.so.510.73.05
    I0617 07:33:47.713417 22846 nvc_info.c:174] selecting /usr/lib32/libnvcuvid.so.510.73.05
    I0617 07:33:47.713472 22846 nvc_info.c:174] selecting /usr/lib32/libcuda.so.510.73.05
    I0617 07:33:47.713527 22846 nvc_info.c:174] selecting /usr/lib32/libGLX_nvidia.so.510.73.05
    I0617 07:33:47.713563 22846 nvc_info.c:174] selecting /usr/lib32/libGLESv2_nvidia.so.510.73.05
    I0617 07:33:47.713618 22846 nvc_info.c:174] selecting /usr/lib32/libGLESv1_CM_nvidia.so.510.73.05
    I0617 07:33:47.713656 22846 nvc_info.c:174] selecting /usr/lib32/libEGL_nvidia.so.510.73.05
    W0617 07:33:47.713679 22846 nvc_info.c:400] missing library libnvidia-nscq.so
    W0617 07:33:47.713688 22846 nvc_info.c:400] missing library libcudadebugger.so
    W0617 07:33:47.713696 22846 nvc_info.c:400] missing library libnvidia-fatbinaryloader.so
    W0617 07:33:47.713705 22846 nvc_info.c:400] missing library libnvidia-pkcs11.so
    W0617 07:33:47.713713 22846 nvc_info.c:400] missing library libnvidia-nvvm.so
    W0617 07:33:47.713722 22846 nvc_info.c:400] missing library libnvidia-ifr.so
    W0617 07:33:47.713731 22846 nvc_info.c:400] missing library libnvidia-cbl.so
    W0617 07:33:47.713738 22846 nvc_info.c:404] missing compat32 library libnvidia-cfg.so
    W0617 07:33:47.713747 22846 nvc_info.c:404] missing compat32 library libnvidia-nscq.so
    W0617 07:33:47.713757 22846 nvc_info.c:404] missing compat32 library libcudadebugger.so
    W0617 07:33:47.713762 22846 nvc_info.c:404] missing compat32 library libnvidia-fatbinaryloader.so
    W0617 07:33:47.713783 22846 nvc_info.c:404] missing compat32 library libnvidia-pkcs11.so
    W0617 07:33:47.713791 22846 nvc_info.c:404] missing compat32 library libnvidia-nvvm.so
    W0617 07:33:47.713800 22846 nvc_info.c:404] missing compat32 library libnvidia-ngx.so
    W0617 07:33:47.713808 22846 nvc_info.c:404] missing compat32 library libnvidia-ifr.so
    W0617 07:33:47.713817 22846 nvc_info.c:404] missing compat32 library libnvidia-rtcore.so
    W0617 07:33:47.713826 22846 nvc_info.c:404] missing compat32 library libnvoptix.so
    W0617 07:33:47.713835 22846 nvc_info.c:404] missing compat32 library libnvidia-cbl.so
    I0617 07:33:47.714350 22846 nvc_info.c:300] selecting /usr/bin/nvidia-smi
    I0617 07:33:47.714376 22846 nvc_info.c:300] selecting /usr/bin/nvidia-debugdump
    I0617 07:33:47.714399 22846 nvc_info.c:300] selecting /usr/bin/nvidia-persistenced
    I0617 07:33:47.714439 22846 nvc_info.c:300] selecting /usr/bin/nvidia-cuda-mps-control
    I0617 07:33:47.714464 22846 nvc_info.c:300] selecting /usr/bin/nvidia-cuda-mps-server
    W0617 07:33:47.714565 22846 nvc_info.c:426] missing binary nv-fabricmanager
    I0617 07:33:47.714643 22846 nvc_info.c:486] listing firmware path /lib/firmware/nvidia/510.73.05/gsp.bin
    I0617 07:33:47.714674 22846 nvc_info.c:559] listing device /dev/nvidiactl
    I0617 07:33:47.714680 22846 nvc_info.c:559] listing device /dev/nvidia-uvm
    I0617 07:33:47.714689 22846 nvc_info.c:559] listing device /dev/nvidia-uvm-tools
    I0617 07:33:47.714700 22846 nvc_info.c:559] listing device /dev/nvidia-modeset
    W0617 07:33:47.714728 22846 nvc_info.c:350] missing ipc path /var/run/nvidia-persistenced/socket
    W0617 07:33:47.714754 22846 nvc_info.c:350] missing ipc path /var/run/nvidia-fabricmanager/socket
    W0617 07:33:47.714776 22846 nvc_info.c:350] missing ipc path /tmp/nvidia-mps
    I0617 07:33:47.714784 22846 nvc_info.c:852] requesting device information with ''
    I0617 07:33:47.720756 22846 nvc_info.c:743] listing device /dev/nvidia0 (GPU-24dda873-8a7f-d02f-a9ad-52f56259f944 at 00000000:17:00.0)
    I0617 07:33:47.726745 22846 nvc_info.c:743] listing device /dev/nvidia1 (GPU-a4211eb1-75b7-df20-7a29-4bd4fd504774 at 00000000:73:00.0)
    NVRM version: 510.73.05
    CUDA version: 11.6

Device Index: 0
Device Minor: 0
Model: NVIDIA GeForce RTX 2080 Ti
Brand: GeForce
GPU UUID: GPU-24dda873-8a7f-d02f-a9ad-52f56259f944
Bus Location: 00000000:17:00.0
Architecture: 7.5

Device Index: 1
Device Minor: 1
Model: NVIDIA GeForce RTX 2080 Ti
Brand: GeForce
GPU UUID: GPU-a4211eb1-75b7-df20-7a29-4bd4fd504774
Bus Location: 00000000:73:00.0
Architecture: 7.5
I0617 07:33:47.726793 22846 nvc.c:434] shutting down library context
I0617 07:33:47.726874 22849 rpc.c:95] terminating nvcgo rpc service
I0617 07:33:47.727439 22846 rpc.c:135] nvcgo rpc service terminated successfully
I0617 07:33:47.729964 22848 rpc.c:95] terminating driver rpc service
I0617 07:33:47.730098 22846 rpc.c:135] driver rpc service terminated successfully

  • Kernel version from uname -a
    Linux shen 5.10.14 Add README image #1 SMP PREEMPT_RT Fri May 5 15:37:12 CST 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Any relevant kernel output lines from dmesg
  • Driver information from nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Sat Jun 17 15:35:58 2023
Driver Version : 510.73.05
CUDA Version : 11.6

Attached GPUs : 2
GPU 00000000:17:00.0
Product Name : NVIDIA GeForce RTX 2080 Ti
Product Brand : GeForce
Product Architecture : Turing
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-24dda873-8a7f-d02f-a9ad-52f56259f944
Minor Number : 0
VBIOS Version : 90.02.17.40.9A
MultiGPU Board : No
Board ID : 0x1700
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x17
Device : 0x00
Domain : 0x0000
Device Id : 0x1E0410DE
Bus Id : 00000000:17:00.0
Sub System Id : 0x1E0410DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 35 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11264 MiB
Reserved : 244 MiB
Used : 5 MiB
Free : 11013 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 4 MiB
Free : 252 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 28 C
GPU Shutdown Temp : 94 C
GPU Slowdown Temp : 91 C
GPU Max Operating Temp : 89 C
GPU Target Temperature : 84 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 12.61 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 310.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7000 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2544
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 4 MiB

GPU 00000000:73:00.0
Product Name : NVIDIA GeForce RTX 2080 Ti
Product Brand : GeForce
Product Architecture : Turing
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-a4211eb1-75b7-df20-7a29-4bd4fd504774
Minor Number : 1
VBIOS Version : 90.02.30.40.90
MultiGPU Board : No
Board ID : 0x7300
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x73
Device : 0x00
Domain : 0x0000
Device Id : 0x1E0710DE
Bus Id : 00000000:73:00.0
Sub System Id : 0x37181028
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 18 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11264 MiB
Reserved : 246 MiB
Used : 15 MiB
Free : 11001 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 4 MiB
Free : 252 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 27 C
GPU Shutdown Temp : 94 C
GPU Slowdown Temp : 91 C
GPU Max Operating Temp : 89 C
GPU Target Temperature : 84 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 15.36 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 280.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7000 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2544
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 9 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 4182
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 4 MiB

  • Docker version from docker version
    Client: Docker Engine - Community
    Version: 24.0.2
    API version: 1.43
    Go version: go1.20.4
    Git commit: cb74dfc
    Built: Thu May 25 21:52:22 2023
    OS/Arch: linux/amd64
    Context: default

Server: Docker Engine - Community
Engine:
Version: 24.0.2
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 659604f
Built: Thu May 25 21:52:22 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0

  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
  • NVIDIA container library version from nvidia-container-cli -V
    cli-version: 1.13.1
    lib-version: 1.13.1
    build date: 2023-04-24T12:23+00:00
    build revision: 6f4aea0fca16aaff01bab2567adb34ec30847a0e
    build compiler: x86_64-linux-gnu-gcc-7 7.5.0
    build platform: x86_64
    build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • NVIDIA container library logs (see troubleshooting)
  • Docker command, image and tag used
@jacksonsshen
Copy link
Author

jacksonsshen commented Jun 25, 2023

  • using strace to debug, it has some permission problems

image

  • The following commands are the steps for debugging

shen@shen:~$ docker run -it --gpus all nvidia/cuda:10.1-base
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy': unknown.
ERRO[0000] error waiting for container:

shen@shen:~$ docker run -it nvidia/cuda:10.1-base
root@fd4fbecdbd68:/# ll
total 88
drwxr-xr-x 1 root root 4096 Jun 25 02:02 ./
drwxr-xr-x 1 root root 4096 Jun 25 02:02 ../
-rwxr-xr-x 1 root root 0 Jun 25 02:02 .dockerenv*
-rw-r--r-- 1 1000 1000 16047 Jul 2 2021 NGC-DL-CONTAINER-LICENSE
drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 bin/
drwxr-xr-x 2 1000 1000 4096 Apr 24 2018 boot/
drwxr-xr-x 5 root root 360 Jun 25 02:02 dev/
drwxr-xr-x 1 1000 1000 4096 Jun 25 02:02 etc/
drwxr-xr-x 2 1000 1000 4096 Apr 24 2018 home/
drwxr-xr-x 1 1000 1000 4096 May 23 2017 lib/
drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 lib64/
drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 media/
drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 mnt/
drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 opt/
dr-xr-xr-x 867 root root 0 Jun 25 02:02 proc/
drwx------ 2 1000 1000 4096 Jun 15 2021 root/
drwxr-xr-x 5 1000 1000 4096 Jun 15 2021 run/
drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 sbin/
drwxr-xr-x 2 1000 1000 4096 Jun 15 2021 srv/
dr-xr-xr-x 13 root root 0 Jun 25 02:02 sys/
drwxrwxrwt 2 1000 1000 4096 Jun 15 2021 tmp/
drwxr-xr-x 1 1000 1000 4096 Jun 15 2021 usr/
drwxr-xr-x 1 1000 1000 4096 Jun 15 2021 var/
root@fd4fbecdbd68:/# exit
exit

shen@shen:~$

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant