-
Notifications
You must be signed in to change notification settings - Fork 2k
WSL2 container mount issue for Modulus: lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown #1699
Comments
It seems as if the |
Assuming that you're talking about removing the libnvidia-ml.so.1, libcuda.so.1, etc. files, I already tried the solution which I linked in the original post. Creating a new image which removes those files results in an image which isn't able to detect any CUDA capable devices and as a result doesn't use the GPU. As a side note, I'm not sure if the issue is with how I'm running containers, I have no issues with the PyTorch container, which I load and run the same way |
@elezar It seems like I'm not alone with regards to the proposed solution resulting in a container which isn't able to utilize the GPU. Several people on the developer forums experienced a similar issue with different image, and attempted the same solution only to have a container which was unable to use the GPU |
I also faced exactly the same problem (failed to run As mentioned in NVIDIA/nvidia-container-toolkit#289, After further struggle I finally found a true workaround for FROM modulus:22.09
RUN rm -rf \
/usr/lib/x86_64-linux-gnu/libcuda.so* \
/usr/lib/x86_64-linux-gnu/libnvcuvid.so* \
/usr/lib/x86_64-linux-gnu/libnvidia-*.so* \
/usr/lib/firmware \
/usr/local/cuda/compat/lib and GPU is enabled now :) $ sudo docker build -t modulus:22.09.0hotfix - <./Dockerfile
...
$ sudo docker run -it --rm --gpus all modulus:22.09.0hotfix python3
=============
== PyTorch ==
=============
NVIDIA Release 22.08 (build 42105213)
PyTorch Version 1.13.0a0+d321be6
Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for PyTorch. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True It seems that Also, I think the layer that executes |
Thank you @atomicky ! This fixed the problem for me and thus I'm marking this issue as closed |
thank you !!! |
1. Issue or feature description
Docker container version of Modulus 22.09 doesn't run on WSL2 with Ubuntu version 20.04, yields the following error
Please note that I believe this is the same issue encountered in NVIDIA/nvidia-container-toolkit#289 and NVIDIA/nvidia-container-toolkit#287 , similar to each of those issues running
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
works without any issues. I have attempted the suggested solution and created a new image which removes the problematic files. However, doing so results in the container failing to detect any CUDA capable devices, and any executed code fails to utilize the GPU. Running the container with the --runtime nvidia --gpus all flags results in the container running without error, but yields the same issue of being unable to utilize the GPU. This issue has been previously mentioned on the Modulus developer forums, and the response seems to suggest the problem is with nvidia-docker, not the container itself.2. Steps to reproduce the issue
docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \ --runtime nvidia -v ${PWD}/examples:/examples \ -it --rm modulus:22.09 bash
3. Information to attach (optional if deemed irrelevant)
nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I1026 21:07:24.462934 4665 nvc.c:376] initializing library context (version=1.11.0, build=c8f267be0bac1c654d59ad4ea5df907141149977)
I1026 21:07:24.462974 4665 nvc.c:350] using root /
I1026 21:07:24.462990 4665 nvc.c:351] using ldcache /etc/ld.so.cache
I1026 21:07:24.462993 4665 nvc.c:352] using unprivileged user 1000:1000
I1026 21:07:24.463004 4665 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1026 21:07:24.481144 4665 dxcore.c:227] Creating a new WDDM Adapter for hAdapter:40000000 luid:9e89c3
I1026 21:07:24.489958 4665 dxcore.c:268] Adding new adapter via dxcore hAdapter:40000000 luid:9e89c3 wddm version:2700
I1026 21:07:24.489988 4665 dxcore.c:325] dxcore layer initialized successfully
W1026 21:07:24.490253 4665 nvc.c:405] skipping kernel modules load on WSL
I1026 21:07:24.490393 4666 rpc.c:71] starting driver rpc service
I1026 21:07:24.524009 4667 rpc.c:71] starting nvcgo rpc service
I1026 21:07:24.524588 4665 nvc_info.c:766] requesting driver information with ''
I1026 21:07:24.606205 4665 nvc_info.c:199] selecting /usr/lib/wsl/lib/libnvidia-opticalflow.so.1
I1026 21:07:24.606880 4665 nvc_info.c:199] selecting /usr/lib/wsl/lib/libnvidia-ml.so.1
I1026 21:07:24.607533 4665 nvc_info.c:199] selecting /usr/lib/wsl/lib/libnvidia-encode.so.1
I1026 21:07:24.608182 4665 nvc_info.c:199] selecting /usr/lib/wsl/lib/libnvcuvid.so.1
I1026 21:07:24.608257 4665 nvc_info.c:199] selecting /usr/lib/wsl/lib/libdxcore.so
I1026 21:07:24.608292 4665 nvc_info.c:199] selecting /usr/lib/wsl/lib/libcuda.so.1
W1026 21:07:24.608345 4665 nvc_info.c:399] missing library libnvidia-cfg.so
W1026 21:07:24.608361 4665 nvc_info.c:399] missing library libnvidia-nscq.so
W1026 21:07:24.608364 4665 nvc_info.c:399] missing library libcudadebugger.so
W1026 21:07:24.608390 4665 nvc_info.c:399] missing library libnvidia-opencl.so
W1026 21:07:24.608393 4665 nvc_info.c:399] missing library libnvidia-ptxjitcompiler.so
W1026 21:07:24.608396 4665 nvc_info.c:399] missing library libnvidia-fatbinaryloader.so
W1026 21:07:24.608397 4665 nvc_info.c:399] missing library libnvidia-allocator.so
W1026 21:07:24.608399 4665 nvc_info.c:399] missing library libnvidia-compiler.so
W1026 21:07:24.608401 4665 nvc_info.c:399] missing library libnvidia-pkcs11.so
W1026 21:07:24.608402 4665 nvc_info.c:399] missing library libnvidia-ngx.so
W1026 21:07:24.608404 4665 nvc_info.c:399] missing library libvdpau_nvidia.so
W1026 21:07:24.608406 4665 nvc_info.c:399] missing library libnvidia-eglcore.so
W1026 21:07:24.608420 4665 nvc_info.c:399] missing library libnvidia-glcore.so
W1026 21:07:24.608424 4665 nvc_info.c:399] missing library libnvidia-tls.so
W1026 21:07:24.608426 4665 nvc_info.c:399] missing library libnvidia-glsi.so
W1026 21:07:24.608428 4665 nvc_info.c:399] missing library libnvidia-fbc.so
W1026 21:07:24.608430 4665 nvc_info.c:399] missing library libnvidia-ifr.so
W1026 21:07:24.608432 4665 nvc_info.c:399] missing library libnvidia-rtcore.so
W1026 21:07:24.608433 4665 nvc_info.c:399] missing library libnvoptix.so
W1026 21:07:24.608435 4665 nvc_info.c:399] missing library libGLX_nvidia.so
W1026 21:07:24.608437 4665 nvc_info.c:399] missing library libEGL_nvidia.so
W1026 21:07:24.608438 4665 nvc_info.c:399] missing library libGLESv2_nvidia.so
W1026 21:07:24.608440 4665 nvc_info.c:399] missing library libGLESv1_CM_nvidia.so
W1026 21:07:24.608441 4665 nvc_info.c:399] missing library libnvidia-glvkspirv.so
W1026 21:07:24.608443 4665 nvc_info.c:399] missing library libnvidia-cbl.so
W1026 21:07:24.608445 4665 nvc_info.c:403] missing compat32 library libnvidia-ml.so
W1026 21:07:24.608447 4665 nvc_info.c:403] missing compat32 library libnvidia-cfg.so
W1026 21:07:24.608462 4665 nvc_info.c:403] missing compat32 library libnvidia-nscq.so
W1026 21:07:24.608465 4665 nvc_info.c:403] missing compat32 library libcuda.so
W1026 21:07:24.608467 4665 nvc_info.c:403] missing compat32 library libcudadebugger.so
W1026 21:07:24.608469 4665 nvc_info.c:403] missing compat32 library libnvidia-opencl.so
W1026 21:07:24.608471 4665 nvc_info.c:403] missing compat32 library libnvidia-ptxjitcompiler.so
W1026 21:07:24.608473 4665 nvc_info.c:403] missing compat32 library libnvidia-fatbinaryloader.so
W1026 21:07:24.608474 4665 nvc_info.c:403] missing compat32 library libnvidia-allocator.so
W1026 21:07:24.608476 4665 nvc_info.c:403] missing compat32 library libnvidia-compiler.so
W1026 21:07:24.608478 4665 nvc_info.c:403] missing compat32 library libnvidia-pkcs11.so
W1026 21:07:24.608479 4665 nvc_info.c:403] missing compat32 library libnvidia-ngx.so
W1026 21:07:24.608481 4665 nvc_info.c:403] missing compat32 library libvdpau_nvidia.so
W1026 21:07:24.608484 4665 nvc_info.c:403] missing compat32 library libnvidia-encode.so
W1026 21:07:24.608511 4665 nvc_info.c:403] missing compat32 library libnvidia-opticalflow.so
W1026 21:07:24.608514 4665 nvc_info.c:403] missing compat32 library libnvcuvid.so
W1026 21:07:24.608530 4665 nvc_info.c:403] missing compat32 library libnvidia-eglcore.so
W1026 21:07:24.608546 4665 nvc_info.c:403] missing compat32 library libnvidia-glcore.so
W1026 21:07:24.608548 4665 nvc_info.c:403] missing compat32 library libnvidia-tls.so
W1026 21:07:24.608564 4665 nvc_info.c:403] missing compat32 library libnvidia-glsi.so
W1026 21:07:24.608567 4665 nvc_info.c:403] missing compat32 library libnvidia-fbc.so
W1026 21:07:24.608569 4665 nvc_info.c:403] missing compat32 library libnvidia-ifr.so
W1026 21:07:24.608571 4665 nvc_info.c:403] missing compat32 library libnvidia-rtcore.so
W1026 21:07:24.608573 4665 nvc_info.c:403] missing compat32 library libnvoptix.so
W1026 21:07:24.608575 4665 nvc_info.c:403] missing compat32 library libGLX_nvidia.so
W1026 21:07:24.608589 4665 nvc_info.c:403] missing compat32 library libEGL_nvidia.so
W1026 21:07:24.608591 4665 nvc_info.c:403] missing compat32 library libGLESv2_nvidia.so
W1026 21:07:24.608606 4665 nvc_info.c:403] missing compat32 library libGLESv1_CM_nvidia.so
W1026 21:07:24.608621 4665 nvc_info.c:403] missing compat32 library libnvidia-glvkspirv.so
W1026 21:07:24.608624 4665 nvc_info.c:403] missing compat32 library libnvidia-cbl.so
W1026 21:07:24.608626 4665 nvc_info.c:403] missing compat32 library libdxcore.so
I1026 21:07:24.610022 4665 nvc_info.c:279] selecting /usr/lib/wsl/drivers/nv_dispui.inf_amd64_f2b06cc19dadc00f/nvidia-smi
W1026 21:07:25.541539 4665 nvc_info.c:425] missing binary nvidia-debugdump
W1026 21:07:25.541567 4665 nvc_info.c:425] missing binary nvidia-persistenced
W1026 21:07:25.541570 4665 nvc_info.c:425] missing binary nv-fabricmanager
W1026 21:07:25.541571 4665 nvc_info.c:425] missing binary nvidia-cuda-mps-control
W1026 21:07:25.541573 4665 nvc_info.c:425] missing binary nvidia-cuda-mps-server
I1026 21:07:25.541591 4665 nvc_info.c:441] skipping path lookup for dxcore
I1026 21:07:25.541598 4665 nvc_info.c:529] listing device /dev/dxg
W1026 21:07:25.541623 4665 nvc_info.c:349] missing ipc path /var/run/nvidia-persistenced/socket
W1026 21:07:25.541649 4665 nvc_info.c:349] missing ipc path /var/run/nvidia-fabricmanager/socket
W1026 21:07:25.541669 4665 nvc_info.c:349] missing ipc path /tmp/nvidia-mps
I1026 21:07:25.541685 4665 nvc_info.c:822] requesting device information with ''
I1026 21:07:25.552505 4665 nvc_info.c:694] listing dxcore adapter 0 (GPU-dbbd71f6-7bf3-4280-3674-7d2f6ce7558e at 00000000:65:00.0)
NVRM version: 522.06
CUDA version: 11.8
Device Index: 0
Device Minor: 0
Model: Quadro RTX 5000
Brand: QuadroRTX
GPU UUID: GPU-dbbd71f6-7bf3-4280-3674-7d2f6ce7558e
Bus Location: 00000000:65:00.0
Architecture: 7.5
I1026 21:07:25.552580 4665 nvc.c:434] shutting down library context
I1026 21:07:25.552637 4667 rpc.c:95] terminating nvcgo rpc service
I1026 21:07:25.553008 4665 rpc.c:135] nvcgo rpc service terminated successfully
I1026 21:07:25.554398 4666 rpc.c:95] terminating driver rpc service
I1026 21:07:25.556557 4665 rpc.c:135] driver rpc service terminated successfully
uname -a
Linux dpr5820-009 5.10.102.1-microsoft-standard-WSL2
dmesg
nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Wed Oct 26 14:09:05 2022
Driver Version : 522.06
CUDA Version : 11.8
Attached GPUs : 1
GPU 00000000:65:00.0
Product Name : Quadro RTX 5000
Product Brand : Quadro RTX
Product Architecture : Turing
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : WDDM
Pending : WDDM
Serial Number : 1562621002428
GPU UUID : GPU-dbbd71f6-7bf3-4280-3674-7d2f6ce7558e
Minor Number : N/A
VBIOS Version : 90.04.99.00.03
MultiGPU Board : No
Board ID : 0x6500
GPU Part Number : 900-5G180-0100-032
Module ID : 0
Inforom Version
Image Version : G180.0500.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x65
Device : 0x00
Domain : 0x0000
Device Id : 0x1EB010DE
Bus Id : 00000000:65:00.0
Sub System Id : 0x129F1028
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 34000 KB/s
Rx Throughput : 60000 KB/s
Fan Speed : 49 %
Performance State : P2
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 16384 MiB
Reserved : 214 MiB
Used : 9679 MiB
Free : 6490 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 91 %
Memory : 70 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 74 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 89 C
GPU Target Temperature : 83 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 151.91 W
Power Limit : 230.00 W
Default Power Limit : 230.00 W
Enforced Power Limit : 230.00 W
Min Power Limit : 125.00 W
Max Power Limit : 230.00 W
Clocks
Graphics : 1844 MHz
SM : 1844 MHz
Memory : 6494 MHz
Video : 1711 MHz
Applications Clocks
Graphics : 1620 MHz
Memory : 7001 MHz
Default Applications Clocks
Graphics : 1620 MHz
Memory : 7001 MHz
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 711
Type : C
Name : /python3.8
Used GPU Memory : Not available in WDDM driver model
docker version
Client: Docker Engine - Community
Version: 20.10.20
API version: 1.41
Go version: go1.18.7
Git commit: 9fdeb9c
Built: Tue Oct 18 18:20:23 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.20
API version: 1.41 (minimum version 1.12)
Go version: go1.18.7
Git commit: 03df974
Built: Tue Oct 18 18:18:12 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.8
GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================-============-============-=====================================================
un libgldispatch0-nvidia (no description available)
ii libnvidia-container-tools 1.11.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.11.0-1 amd64 NVIDIA container runtime library
un nvidia-container-runtime (no description available)
un nvidia-container-runtime-hook (no description available)
ii nvidia-container-toolkit 1.11.0-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.11.0-1 amd64 NVIDIA Container Toolkit Base
un nvidia-docker (no description available)
ii nvidia-docker2 2.11.0-1 all nvidia-docker CLI wrapper
nvidia-container-cli -V
cli-version: 1.11.0
lib-version: 1.11.0
build date: 2022-09-06T09:21+00:00
build revision: c8f267be0bac1c654d59ad4ea5df907141149977
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
nvcr.io/nvidia/modulus/modulus:22.09
The text was updated successfully, but these errors were encountered: