Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue running Clara on WSL2+Ubuntu 20.4+Docker: merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown. #287

Open
6 of 8 tasks
akemisetti opened this issue Oct 12, 2021 · 3 comments

Comments

@akemisetti
Copy link

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

Clara v4,.0 does not run on WLS2+Ubuntu20.4 errors out. I got the following error

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/c78f7e8d06e54ac4efaf4d12915bbf305449899fd5a3f2a40126f16f26a8f54c/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

2. Steps to reproduce the issue

Launch the docker container and it errors out.
Followed the steps to configure nvidia docker on WLS using the steps mentioned in https://docs.nvidia.com/cuda/wsl-user-guide/index.html. The setup went fine.

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
    WARNING, the following logs are for debugging purposes only --

I1013 02:45:29.787970 606 nvc.c:372] initializing library context (version=1.5.1, build=4afad130c4c253abd3b2db563ffe9331594bda41)
I1013 02:45:29.787993 606 nvc.c:346] using root /
I1013 02:45:29.787995 606 nvc.c:347] using ldcache /etc/ld.so.cache
I1013 02:45:29.787997 606 nvc.c:348] using unprivileged user 1000:1000
I1013 02:45:29.788008 606 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1013 02:45:29.803470 606 dxcore.c:227] Creating a new WDDM Adapter for hAdapter:40000000 luid:2d610d
I1013 02:45:29.809296 606 dxcore.c:210] Core Nvidia component libcuda.so.1.1 not found in /usr/lib/wsl/drivers/iigd_dch.inf_amd64_e6610765cda2bce8
I1013 02:45:29.810241 606 dxcore.c:210] Core Nvidia component libcuda_loader.so not found in /usr/lib/wsl/drivers/iigd_dch.inf_amd64_e6610765cda2bce8
I1013 02:45:29.811035 606 dxcore.c:210] Core Nvidia component libnvidia-ptxjitcompiler.so.1 not found in /usr/lib/wsl/drivers/iigd_dch.inf_amd64_e6610765cda2bce8
I1013 02:45:29.811743 606 dxcore.c:210] Core Nvidia component libnvidia-ml.so.1 not found in /usr/lib/wsl/drivers/iigd_dch.inf_amd64_e6610765cda2bce8
I1013 02:45:29.812464 606 dxcore.c:210] Core Nvidia component libnvidia-ml_loader.so not found in /usr/lib/wsl/drivers/iigd_dch.inf_amd64_e6610765cda2bce8
I1013 02:45:29.813196 606 dxcore.c:210] Core Nvidia component nvidia-smi not found in /usr/lib/wsl/drivers/iigd_dch.inf_amd64_e6610765cda2bce8
I1013 02:45:29.813246 606 dxcore.c:215] No Nvidia component found in /usr/lib/wsl/drivers/iigd_dch.inf_amd64_e6610765cda2bce8
E1013 02:45:29.813249 606 dxcore.c:261] Failed to query the core Nvidia libraries for the adapter. Skipping it.
I1013 02:45:29.813252 606 dxcore.c:227] Creating a new WDDM Adapter for hAdapter:40000040 luid:2e596b
I1013 02:45:29.820258 606 dxcore.c:268] Adding new adapter via dxcore hAdapter:40000040 luid:2e596b wddm version:3000
I1013 02:45:29.820294 606 dxcore.c:326] dxcore layer initialized successfully
W1013 02:45:29.820582 606 nvc.c:397] skipping kernel modules load on WSL
I1013 02:45:29.820755 607 driver.c:101] starting driver service
I1013 02:45:29.865297 606 nvc_info.c:758] requesting driver information with ''
I1013 02:45:29.958605 606 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-opticalflow.so.1
I1013 02:45:29.959908 606 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-ml.so.1
I1013 02:45:29.961220 606 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-encode.so.1
I1013 02:45:29.962268 606 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvcuvid.so.1
I1013 02:45:29.962374 606 nvc_info.c:197] selecting /usr/lib/wsl/lib/libdxcore.so
I1013 02:45:29.962416 606 nvc_info.c:197] selecting /usr/lib/wsl/lib/libcuda.so.1
W1013 02:45:29.962482 606 nvc_info.c:397] missing library libnvidia-cfg.so
W1013 02:45:29.962502 606 nvc_info.c:397] missing library libnvidia-nscq.so
W1013 02:45:29.962506 606 nvc_info.c:397] missing library libnvidia-opencl.so
W1013 02:45:29.962508 606 nvc_info.c:397] missing library libnvidia-ptxjitcompiler.so
W1013 02:45:29.962510 606 nvc_info.c:397] missing library libnvidia-fatbinaryloader.so
W1013 02:45:29.962512 606 nvc_info.c:397] missing library libnvidia-allocator.so
W1013 02:45:29.962514 606 nvc_info.c:397] missing library libnvidia-compiler.so
W1013 02:45:29.962515 606 nvc_info.c:397] missing library libnvidia-ngx.so
W1013 02:45:29.962517 606 nvc_info.c:397] missing library libvdpau_nvidia.so
W1013 02:45:29.962519 606 nvc_info.c:397] missing library libnvidia-eglcore.so
W1013 02:45:29.962521 606 nvc_info.c:397] missing library libnvidia-glcore.so
W1013 02:45:29.962523 606 nvc_info.c:397] missing library libnvidia-tls.so
W1013 02:45:29.962525 606 nvc_info.c:397] missing library libnvidia-glsi.so
W1013 02:45:29.962527 606 nvc_info.c:397] missing library libnvidia-fbc.so
W1013 02:45:29.962528 606 nvc_info.c:397] missing library libnvidia-ifr.so
W1013 02:45:29.962530 606 nvc_info.c:397] missing library libnvidia-rtcore.so
W1013 02:45:29.962532 606 nvc_info.c:397] missing library libnvoptix.so
W1013 02:45:29.962534 606 nvc_info.c:397] missing library libGLX_nvidia.so
W1013 02:45:29.962536 606 nvc_info.c:397] missing library libEGL_nvidia.so
W1013 02:45:29.962538 606 nvc_info.c:397] missing library libGLESv2_nvidia.so
W1013 02:45:29.962553 606 nvc_info.c:397] missing library libGLESv1_CM_nvidia.so
W1013 02:45:29.962557 606 nvc_info.c:397] missing library libnvidia-glvkspirv.so
W1013 02:45:29.962559 606 nvc_info.c:397] missing library libnvidia-cbl.so
W1013 02:45:29.962578 606 nvc_info.c:401] missing compat32 library libnvidia-ml.so
W1013 02:45:29.962594 606 nvc_info.c:401] missing compat32 library libnvidia-cfg.so
W1013 02:45:29.962601 606 nvc_info.c:401] missing compat32 library libnvidia-nscq.so
W1013 02:45:29.962616 606 nvc_info.c:401] missing compat32 library libcuda.so
W1013 02:45:29.962634 606 nvc_info.c:401] missing compat32 library libnvidia-opencl.so
W1013 02:45:29.962639 606 nvc_info.c:401] missing compat32 library libnvidia-ptxjitcompiler.so
W1013 02:45:29.962642 606 nvc_info.c:401] missing compat32 library libnvidia-fatbinaryloader.so
W1013 02:45:29.962658 606 nvc_info.c:401] missing compat32 library libnvidia-allocator.so
W1013 02:45:29.962679 606 nvc_info.c:401] missing compat32 library libnvidia-compiler.so
W1013 02:45:29.962684 606 nvc_info.c:401] missing compat32 library libnvidia-ngx.so
W1013 02:45:29.962687 606 nvc_info.c:401] missing compat32 library libvdpau_nvidia.so
W1013 02:45:29.962690 606 nvc_info.c:401] missing compat32 library libnvidia-encode.so
W1013 02:45:29.962693 606 nvc_info.c:401] missing compat32 library libnvidia-opticalflow.so
W1013 02:45:29.962698 606 nvc_info.c:401] missing compat32 library libnvcuvid.so
W1013 02:45:29.962700 606 nvc_info.c:401] missing compat32 library libnvidia-eglcore.so
W1013 02:45:29.962702 606 nvc_info.c:401] missing compat32 library libnvidia-glcore.so
W1013 02:45:29.962704 606 nvc_info.c:401] missing compat32 library libnvidia-tls.so
W1013 02:45:29.962706 606 nvc_info.c:401] missing compat32 library libnvidia-glsi.so
W1013 02:45:29.962708 606 nvc_info.c:401] missing compat32 library libnvidia-fbc.so
W1013 02:45:29.962710 606 nvc_info.c:401] missing compat32 library libnvidia-ifr.so
W1013 02:45:29.962712 606 nvc_info.c:401] missing compat32 library libnvidia-rtcore.so
W1013 02:45:29.962714 606 nvc_info.c:401] missing compat32 library libnvoptix.so
W1013 02:45:29.962729 606 nvc_info.c:401] missing compat32 library libGLX_nvidia.so
W1013 02:45:29.962733 606 nvc_info.c:401] missing compat32 library libEGL_nvidia.so
W1013 02:45:29.962735 606 nvc_info.c:401] missing compat32 library libGLESv2_nvidia.so
W1013 02:45:29.962737 606 nvc_info.c:401] missing compat32 library libGLESv1_CM_nvidia.so
W1013 02:45:29.962739 606 nvc_info.c:401] missing compat32 library libnvidia-glvkspirv.so
W1013 02:45:29.962741 606 nvc_info.c:401] missing compat32 library libnvidia-cbl.so
W1013 02:45:29.962744 606 nvc_info.c:401] missing compat32 library libdxcore.so
I1013 02:45:29.964192 606 nvc_info.c:277] selecting /usr/lib/wsl/drivers/nvdmwi.inf_amd64_53b6a0a2497c9235/nvidia-smi
W1013 02:45:30.230696 606 nvc_info.c:423] missing binary nvidia-debugdump
W1013 02:45:30.230729 606 nvc_info.c:423] missing binary nvidia-persistenced
W1013 02:45:30.230733 606 nvc_info.c:423] missing binary nv-fabricmanager
W1013 02:45:30.230735 606 nvc_info.c:423] missing binary nvidia-cuda-mps-control
W1013 02:45:30.230737 606 nvc_info.c:423] missing binary nvidia-cuda-mps-server
I1013 02:45:30.230755 606 nvc_info.c:437] skipping path lookup for dxcore
I1013 02:45:30.230764 606 nvc_info.c:520] listing device /dev/dxg
W1013 02:45:30.230793 606 nvc_info.c:347] missing ipc path /var/run/nvidia-persistenced/socket
W1013 02:45:30.230804 606 nvc_info.c:347] missing ipc path /var/run/nvidia-fabricmanager/socket
W1013 02:45:30.230832 606 nvc_info.c:347] missing ipc path /tmp/nvidia-mps
I1013 02:45:30.230851 606 nvc_info.c:814] requesting device information with ''
I1013 02:45:30.243786 606 nvc_info.c:686] listing dxcore adapter 0 (GPU-db227b23-d556-36af-b5b5-1de7cb718915 at 00000000:01:00.0)
NVRM version: 510.06
CUDA version: 11.6

Device Index: 0
Device Minor: 0
Model: Quadro RTX 3000
Brand: Unknown
GPU UUID: GPU-db227b23-d556-36af-b5b5-1de7cb718915
Bus Location: 00000000:01:00.0
Architecture: 7.5
I1013 02:45:30.243858 606 nvc.c:423] shutting down library context
I1013 02:45:30.245250 607 driver.c:163] terminating driver service
I1013 02:45:30.247501 606 driver.c:203] driver service terminated successfully

  • Kernel version from uname -a
    Linux akemisetti 5.10.60.1-microsoft-standard-WSL2 Add README image nvidia-docker#1 SMP Wed Aug 25 23:20:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Any relevant kernel output lines from dmesg
  • Driver information from nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Tue Oct 12 19:40:40 2021
Driver Version : 510.06
CUDA Version : 11.6

Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : Quadro RTX 3000
Product Brand : Quadro RTX
Product Architecture : Turing
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : WDDM
Pending : WDDM
Serial Number : N/A
GPU UUID : GPU-db227b23-d556-36af-b5b5-1de7cb718915
Minor Number : N/A
VBIOS Version : 90.06.39.00.6f
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1F3610DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x09261028
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 27000 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 6144 MiB
Used : 790 MiB
Free : 5354 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : N/A
Memory : N/A
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
Max Power Limit : N/A
Clocks
Graphics : 48 MHz
SM : 48 MHz
Memory : 130 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : Unknown Error
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None

===================================

  • [x ] Docker version from docker version

Client:
Version: 20.10.7
API version: 1.41
Go version: go1.13.8
Git commit: 20.10.7-0ubuntu1~20.04.2
Built: Fri Oct 1 14:07:06 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:52:31 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0

  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================-============-============-=====================================================
un libgldispatch0-nvidia (no description available)
ii libnvidia-container-tools 1.5.1-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.5.1-1 amd64 NVIDIA container runtime library
ii nvidia-container-runtime 3.5.0-1 amd64 NVIDIA container runtime
un nvidia-container-runtime-hook (no description available)
ii nvidia-container-toolkit 1.5.1-1 amd64 NVIDIA container runtime hook
un nvidia-docker (no description available)
ii nvidia-docker2 2.6.0-1 all nvidia-docker CLI wrapper

  • NVIDIA container library version from nvidia-container-cli -V
    version: 1.5.1
    build date: 2021-09-20T14:30+00:00
    build revision: 4afad130c4c253abd3b2db563ffe9331594bda41
    build compiler: x86_64-linux-gnu-gcc-7 7.5.0
    build platform: x86_64
    build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

  • NVIDIA container library logs (see troubleshooting)

  • Docker command, image and tag used
    nvcr.io/nvidia/clara-train-sdk:v4.0

@elezar
Copy link
Member

elezar commented Oct 13, 2021

This seems to be a duplicate of #289. I will check the nvcr.io/nvidia/clara-train-sdk:v4.0 image to see whether it already contains the files as discussed there.

@akemisetti
Copy link
Author

akemisetti commented Oct 14, 2021

@elezar Thanks for pointing to the existing issue.

The solution suggested in #289 worked for me. Copying it here.

docker run --privileged the image
then execute unmount & rm to get rid of libnvidia* and libcuda* files
then docker commit to save a new image
when I run this new image with --gpus all --runtime=nvidia options, it doesn't give error any more

@Opdoop
Copy link

Opdoop commented Jul 24, 2022

To build a new image, more specifically:

FROM <the image you care about>

RUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia* /usr/lib/x86_64-linux-gnu/libcuda*

Then run docker run -it --gpus all [the new image:tag] command, it uses GPU successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants