Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Nvidia-docker issues with jupyer-notebooks #1042

Closed
7 tasks
kapara-jpg opened this issue Aug 8, 2019 · 1 comment
Closed
7 tasks

Nvidia-docker issues with jupyer-notebooks #1042

kapara-jpg opened this issue Aug 8, 2019 · 1 comment

Comments

@kapara-jpg
Copy link

kapara-jpg commented Aug 8, 2019

1. Issue or feature description

When trying to deploy jupyter-notebook with jupyterhub I get this error:

2019-08-08 10:02:17+00:00 [Warning] Error: failed to start container "notebook": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=no-gpu-has-1MiB-to-run --compute --utility --require=cuda>=10.1 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 --pid=13507 /var/lib/docker/overlay2/7e1caede1d313bdd0a23dbc1c841b130067c512115c9d2a15af64263d8c12c1e/merged]\\\\nnvidia-container-cli: device error: unknown device id: no-gpu-has-1MiB-to-run\\\\n\\\"\"": unknown

Im useing gpushare-device-plugin by Aliyun (Alibaba Cloud) Container Service (link)

this issues happens only when trying to deploy the notebook through k8s.

2. Steps to reproduce the issue

every time I try to create new note-book
image

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --

I0808 10:14:55.825632 21204 nvc.c:281] initializing library context (version=1.0.2, build=ff40da533db929bf515aca59ba4c701a65a35e6b)
I0808 10:14:55.825762 21204 nvc.c:255] using root /
I0808 10:14:55.825781 21204 nvc.c:256] using ldcache /etc/ld.so.cache
I0808 10:14:55.825810 21204 nvc.c:257] using unprivileged user 65534:65534
I0808 10:14:55.828359 21205 nvc.c:191] loading kernel module nvidia
I0808 10:14:55.828999 21205 nvc.c:203] loading kernel module nvidia_uvm
I0808 10:14:55.829430 21205 nvc.c:211] loading kernel module nvidia_modeset
I0808 10:14:55.830193 21206 driver.c:133] starting driver service
I0808 10:14:55.854227 21204 nvc_info.c:434] requesting driver information with ''
I0808 10:14:55.854658 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.418.67
I0808 10:14:55.854777 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.67 over /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.418.67
I0808 10:14:55.854873 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.67
I0808 10:14:55.855004 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.67
I0808 10:14:55.855135 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.67
I0808 10:14:55.855224 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.67
I0808 10:14:55.855360 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.418.67
I0808 10:14:55.855487 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.67
I0808 10:14:55.855577 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.418.67
I0808 10:14:55.855667 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.418.67
I0808 10:14:55.855791 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.67
I0808 10:14:55.855881 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.67
I0808 10:14:55.856004 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.418.67
I0808 10:14:55.856189 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.67
I0808 10:14:55.856287 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.67
I0808 10:14:55.856420 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.67
I0808 10:14:55.856717 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.418.67
I0808 10:14:55.856902 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.418.67
I0808 10:14:55.857001 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.418.67
I0808 10:14:55.857093 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.418.67
I0808 10:14:55.857180 21204 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.418.67
W0808 10:14:55.857241 21204 nvc_info.c:299] missing library libvdpau_nvidia.so
W0808 10:14:55.857260 21204 nvc_info.c:303] missing compat32 library libnvidia-ml.so
W0808 10:14:55.857282 21204 nvc_info.c:303] missing compat32 library libnvidia-cfg.so
W0808 10:14:55.857301 21204 nvc_info.c:303] missing compat32 library libcuda.so
W0808 10:14:55.857323 21204 nvc_info.c:303] missing compat32 library libnvidia-opencl.so
W0808 10:14:55.857338 21204 nvc_info.c:303] missing compat32 library libnvidia-ptxjitcompiler.so
W0808 10:14:55.857357 21204 nvc_info.c:303] missing compat32 library libnvidia-fatbinaryloader.so
W0808 10:14:55.857378 21204 nvc_info.c:303] missing compat32 library libnvidia-compiler.so
W0808 10:14:55.857394 21204 nvc_info.c:303] missing compat32 library libvdpau_nvidia.so
W0808 10:14:55.857416 21204 nvc_info.c:303] missing compat32 library libnvidia-encode.so
W0808 10:14:55.857435 21204 nvc_info.c:303] missing compat32 library libnvidia-opticalflow.so
W0808 10:14:55.857457 21204 nvc_info.c:303] missing compat32 library libnvcuvid.so
W0808 10:14:55.857471 21204 nvc_info.c:303] missing compat32 library libnvidia-eglcore.so
W0808 10:14:55.857488 21204 nvc_info.c:303] missing compat32 library libnvidia-glcore.so
W0808 10:14:55.857510 21204 nvc_info.c:303] missing compat32 library libnvidia-tls.so
W0808 10:14:55.857526 21204 nvc_info.c:303] missing compat32 library libnvidia-glsi.so
W0808 10:14:55.857547 21204 nvc_info.c:303] missing compat32 library libnvidia-fbc.so
W0808 10:14:55.857566 21204 nvc_info.c:303] missing compat32 library libnvidia-ifr.so
W0808 10:14:55.857588 21204 nvc_info.c:303] missing compat32 library libGLX_nvidia.so
W0808 10:14:55.857602 21204 nvc_info.c:303] missing compat32 library libEGL_nvidia.so
W0808 10:14:55.857619 21204 nvc_info.c:303] missing compat32 library libGLESv2_nvidia.so
W0808 10:14:55.857641 21204 nvc_info.c:303] missing compat32 library libGLESv1_CM_nvidia.so
I0808 10:14:55.858158 21204 nvc_info.c:229] selecting /usr/bin/nvidia-smi
I0808 10:14:55.858213 21204 nvc_info.c:229] selecting /usr/bin/nvidia-debugdump
I0808 10:14:55.858270 21204 nvc_info.c:229] selecting /usr/bin/nvidia-persistenced
I0808 10:14:55.858323 21204 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-control
I0808 10:14:55.858376 21204 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-server
I0808 10:14:55.858436 21204 nvc_info.c:366] listing device /dev/nvidiactl
I0808 10:14:55.858454 21204 nvc_info.c:366] listing device /dev/nvidia-uvm
I0808 10:14:55.858475 21204 nvc_info.c:366] listing device /dev/nvidia-uvm-tools
I0808 10:14:55.858495 21204 nvc_info.c:366] listing device /dev/nvidia-modeset
I0808 10:14:55.858569 21204 nvc_info.c:270] listing ipc /run/nvidia-persistenced/socket
W0808 10:14:55.858612 21204 nvc_info.c:274] missing ipc /tmp/nvidia-mps
I0808 10:14:55.858628 21204 nvc_info.c:490] requesting device information with ''
I0808 10:14:55.864650 21204 nvc_info.c:520] listing device /dev/nvidia0 (GPU-d5951a2f-baab-0503-82b1-920531e013bc at 00000000:02:00.0)
NVRM version:   418.67
CUDA version:   10.1

Device Index:   0
Device Minor:   0
Model:          Quadro M4000
Brand:          Quadro
GPU UUID:       GPU-d5951a2f-baab-0503-82b1-920531e013bc
Bus Location:   00000000:02:00.0
Architecture:   5.2
I0808 10:14:55.864729 21204 nvc.c:318] shutting down library context
I0808 10:14:55.865073 21206 driver.c:192] terminating driver service
I0808 10:14:55.873420 21204 driver.c:233] driver service terminated successfully
  • Kernel version from uname -a
    Linux gpu-server 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Driver information from nvidia-smi -a
    `==============NVSMI LOG==============

Timestamp : Thu Aug 8 10:16:40 2019
Driver Version : 418.67
CUDA Version : 10.1

Attached GPUs : 1
GPU 00000000:02:00.0
Product Name : Quadro M4000
Product Brand : Quadro
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0322316028845
GPU UUID : GPU-d5951a2f-baab-0503-82b1-920531e013bc
Minor Number : 0
VBIOS Version : 84.04.88.00.61
MultiGPU Board : No
Board ID : 0x200
GPU Part Number : N/A
Inforom Version
Image Version : G400.0501.01.03
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x13F110DE
Bus Id : 00000000:02:00.0
Sub System Id : 0x1153103C
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 46 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : N/A
HW Power Brake Slowdown : N/A
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 8124 MiB
Used : 1 MiB
Free : 8123 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 4 MiB
Free : 252 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 3 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 37 C
GPU Shutdown Temp : 104 C
GPU Slowdown Temp : 99 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 20.11 W
Power Limit : 120.00 W
Default Power Limit : 120.00 W
Enforced Power Limit : 120.00 W
Min Power Limit : 10.00 W
Max Power Limit : 120.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 324 MHz
Video : 405 MHz
Applications Clocks
Graphics : 772 MHz
Memory : 3005 MHz
Default Applications Clocks
Graphics : 772 MHz
Memory : 3005 MHz
Max Clocks
Graphics : 772 MHz
SM : 772 MHz
Memory : 3005 MHz
Video : 710 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : On
Auto Boost Default : On
Processes : None`

  • Docker version from docker version
    `Client:
    Version: 18.09.8
    API version: 1.39
    Go version: go1.10.8
    Git commit: 0dd43dd87f
    Built: Wed Jul 17 17:40:56 2019
    OS/Arch: linux/amd64
    Experimental: false

Server: Docker Engine - Community
Engine:
Version: 18.09.8
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 0dd43dd
Built: Wed Jul 17 17:07:25 2019
OS/Arch: linux/amd64
Experimental: false
`

  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
    Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-=====================-===============-===============-================================================ un libgldispatch0-nvidia <none> <none> (no description available) ii libnvidia-cfg1-418:am 418.67-0ubuntu1 amd64 NVIDIA binary OpenGL/GLX configuration library un libnvidia-cfg1-any <none> <none> (no description available) un libnvidia-common <none> <none> (no description available) ii libnvidia-common-418 418.67-0ubuntu1 all Shared files used by the NVIDIA libraries ii libnvidia-compute-418 418.67-0ubuntu1 amd64 NVIDIA libcompute package ii libnvidia-container-t 1.0.2-1 amd64 NVIDIA container runtime library (command-line t ii libnvidia-container1: 1.0.2-1 amd64 NVIDIA container runtime library un libnvidia-decode <none> <none> (no description available) ii libnvidia-decode-418: 418.67-0ubuntu1 amd64 NVIDIA Video Decoding runtime libraries un libnvidia-encode <none> <none> (no description available) ii libnvidia-encode-418: 418.67-0ubuntu1 amd64 NVENC Video Encoding runtime library un libnvidia-fbc1 <none> <none> (no description available) ii libnvidia-fbc1-418:am 418.67-0ubuntu1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime un libnvidia-gl <none> <none> (no description available) ii libnvidia-gl-418:amd6 418.67-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and V un libnvidia-ifr1 <none> <none> (no description available) ii libnvidia-ifr1-418:am 418.67-0ubuntu1 amd64 NVIDIA OpenGL-based Inband Frame Readback runtim un nvidia-304 <none> <none> (no description available) un nvidia-340 <none> <none> (no description available) un nvidia-384 <none> <none> (no description available) un nvidia-390 <none> <none> (no description available) ii nvidia-compute-utils- 418.67-0ubuntu1 amd64 NVIDIA compute utilities ii nvidia-container-runt 3.0.0-1 amd64 NVIDIA container runtime ii nvidia-container-runt 1.4.0-1 amd64 NVIDIA container runtime hook ii nvidia-dkms-418 418.67-0ubuntu1 amd64 NVIDIA DKMS package un nvidia-dkms-kernel <none> <none> (no description available) un nvidia-docker <none> <none> (no description available) ii nvidia-docker2 2.1.0-1 all nvidia-docker CLI wrapper ii nvidia-driver-418 418.67-0ubuntu1 amd64 NVIDIA driver metapackage un nvidia-driver-binary <none> <none> (no description available) un nvidia-kernel-common <none> <none> (no description available) ii nvidia-kernel-common- 418.67-0ubuntu1 amd64 Shared files used with the kernel module un nvidia-kernel-source <none> <none> (no description available) ii nvidia-kernel-source- 418.67-0ubuntu1 amd64 NVIDIA kernel source package un nvidia-legacy-340xx-v <none> <none> (no description available) ii nvidia-modprobe 418.67-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device un nvidia-opencl-icd <none> <none> (no description available) un nvidia-persistenced <none> <none> (no description available) ii nvidia-prime 0.8.8.2 all Tools to enable NVIDIA's Prime ii nvidia-settings 418.67-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver un nvidia-settings-binar <none> <none> (no description available) un nvidia-smi <none> <none> (no description available) un nvidia-utils <none> <none> (no description available) ii nvidia-utils-418 418.67-0ubuntu1 amd64 NVIDIA driver support binaries un nvidia-vdpau-driver <none> <none> (no description available) ii xserver-xorg-video-nv 418.67-0ubuntu1 amd64 NVIDIA binary Xorg driver dpkg-query: no packages found matching *nvidia*rpm dpkg-query: no packages found matching -qa
  • NVIDIA container library version from nvidia-container-cli -V
    version: 1.0.2 build date: 2019-03-26T03:58+00:00 build revision: ff40da533db929bf515aca59ba4c701a65a35e6b build compiler: x86_64-linux-gnu-gcc-7 7.3.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • Docker command, image and tag used
    `FROM nvidia/cuda
    ENV NVIDIA_VISIBLE_DEVICES all
    ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
    RUN apt update
    RUN apt install -y software-properties-common
    RUN add-apt-repository ppa:deadsnakes/ppa
    RUN apt install -y python3.7
    RUN apt-get -y install python3-pip
    RUN apt-get update
    RUN pip3 install jupyter cupy-cuda101

CMD jupyter notebook --ip=0.0.0.0 --allow-root

`

@RenaudWasTaken
Copy link
Contributor

Hello!

This is an issue with the plugin, feel free to open an issue with them :)
It seems to try and isolate a non existing GPU device=no-gpu-has-1MiB-to-run.

My guess is that you have an error when trying to use that plugin :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants