WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available #464

lxzh · 2017-09-11T08:13:50Z

After install docker and nvidia-docker, when i try to run nvidia-smi throght NVIDIA-Caffe image, but there is a warning:
The NVIDIA Driver was not detected. GPU functionality will not be available.
The total logs:

nvidia-docker run --rm 1e07735bc788 nvidia-smi

==================
== NVIDIA Caffe ==
==================

NVIDIA Release 17.03 (build 12375)

Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

/usr/local/bin/nvidia_entrypoint.sh: line 31: exec: nvidia-smi: not found

The text was updated successfully, but these errors were encountered:

3XX0 · 2017-09-11T08:29:10Z

Does it work with the nvidia/cuda image?
Did you modify the original 17.03 image? it might be similar to #457

lxzh · 2017-09-11T09:12:52Z

I haven't modify the orifinal 17.03 image.
it does not work with the nvidia/cuda image,

nvidia-docker run --rm nvidia/cuda nvidia-smi
container_linux.go:247: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH"
docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH".

3XX0 · 2017-09-12T02:51:48Z

What's the output of sudo journalctl -u nvidia-docker?

lxzh · 2017-09-12T03:18:29Z

My OS is ubuntu 14.04.3 which is not support journalctl journalctl: command not found
this is thelatest content of nvidia-docker.log by command:
cat /var/log/upstart/nvidia-docker.log

/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:01 Loading NVIDIA unified memory
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:01 Loading NVIDIA management library
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Discovering GPU devices
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Error: nvml: Unknown Error
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Loading NVIDIA unified memory
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Loading NVIDIA management library
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Discovering GPU devices
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Error: nvml: Unknown Error
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Loading NVIDIA unified memory
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Loading NVIDIA management library
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Discovering GPU devices
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Error: nvml: Unknown Error
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Loading NVIDIA unified memory
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Loading NVIDIA management library
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Discovering GPU devices
/usr/bin/nvidia-docker-plugin | 2017/09/12 10:52:02 Error: nvml: Unknown Error

3XX0 · 2017-09-12T03:26:20Z

That's not good, is nvidia-smi working on the host? If not, your driver installation might have gone wrong.

Can you stop the daemon (sudo service stop nvidia-docker) and run in manually to get the a debug trace as instructed here

lxzh · 2017-09-12T03:42:11Z

nvidia-smi command working fine on my host.
there is something wrong with stop nvidia-docekr service:

root@kirin:/etc/default# sudo service stop nvidia-docker
stop: unrecognized service
root@kirin:/etc/default# sudo service nvidia-docker stop
stop: Unknown instance: 
root@kirin:/etc/default# sudo service nvidia-docker restart
stop: Unknown instance: 
nvidia-docker start/post-stop, process 5198
root@kirin:/etc/default# sudo service nvidia-docker stop
stop: Unknown instance:

The whole log is as below(I am sorry i can't attach a file as attachment):

lxzh · 2017-09-12T03:43:00Z

nvidia-smi log:

root@kirin:/etc/default# nvidia-smi
Tue Sep 12 11:28:58 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  ERR!                Off  | 0000:4B:00.0      On |                  N/A |
| 23%   37C    P8    10W / 250W |     43MiB / 11170MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1440    G   /usr/bin/X                                      40MiB |
+-----------------------------------------------------------------------------+

lxzh · 2017-09-12T03:43:32Z

nvidia-smi -q log:
root@kirin:/etc/default# nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Tue Sep 12 11:30:23 2017
Driver Version : 375.26

Attached GPUs : 1
GPU 0000:4B:00.0
Product Name : Unknown Error
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0321117105335
GPU UUID : GPU-0cdc2da2-a7b7-737c-6551-4514fa7e0070
Minor Number : 0
VBIOS Version : 86.02.39.00.01
MultiGPU Board : No
Board ID : 0x4b00
GPU Part Number : 900-1G611-0050-000
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x4B
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 0000:4B:00.0
Sub System Id : 0x85E21043
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 23 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Sync Boost : Not Active
Unknown : Not Active
FB Memory Usage
Total : 11170 MiB
Used : 43 MiB
Free : 11127 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 2 %
Encoder : 0 %
Decoder : 0 %
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 37 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
Power Readings
Power Management : Supported
Power Draw : 10.28 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5505 MHz
Video : 1708 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 1440
Type : G
Name : /usr/bin/X
Used GPU Memory : 40 MiB

lxzh · 2017-09-12T03:43:53Z

root@kirin:/etc/default# sudo nvidia-docker-plugin
nvidia-docker-plugin | 2017/09/12 11:31:15 Loading NVIDIA unified memory
nvidia-docker-plugin | 2017/09/12 11:31:15 Loading NVIDIA management library
nvidia-docker-plugin | 2017/09/12 11:31:15 Discovering GPU devices
nvidia-docker-plugin | 2017/09/12 11:31:15 Error: nvml: Unknown Error
root@kirin:/etc/default# dmesg | grep -i nvidia
[ 9.900668] nvidia: module license 'NVIDIA' taints kernel.
[ 9.903857] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 9.906932] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
[ 9.906948] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 375.26 Thu Dec 8 18:36:43 PST 2016 (using threaded interrupts)
[ 9.910522] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 375.26 Thu Dec 8 18:04:14 PST 2016
[ 9.911811] [drm] [nvidia-drm] [GPU ID 0x00004b00] Loading driver
[ 9.982120] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 246
[ 10.445858] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.0/0000:4b:00.1/sound/card1/input14
[ 10.445913] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.0/0000:4b:00.1/sound/card1/input15
[ 10.445956] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.0/0000:4b:00.1/sound/card1/input16
[ 10.446000] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.0/0000:4b:00.1/sound/card1/input17
[ 10.780878] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
[ 10.904484] init: nvidia-docker main process (777) terminated with status 1
[ 10.904493] init: nvidia-docker main process ended, respawning
[ 11.488958] init: nvidia-prime main process (1183) terminated with status 127
[ 11.673292] init: nvidia-docker main process (1108) terminated with status 1
[ 11.673301] init: nvidia-docker main process ended, respawning
[ 12.439932] init: nvidia-docker main process (1395) terminated with status 1
[ 12.439941] init: nvidia-docker main process ended, respawning
[ 13.083939] nvidia-modeset: Allocated GPU:0 (GPU-0cdc2da2-a7b7-737c-6551-4514fa7e0070) @ PCI:0000:4b:00.0
[ 13.084377] init: nvidia-docker main process (1542) terminated with status 1
[ 13.084385] init: nvidia-docker main process ended, respawning
[ 13.097214] init: nvidia-docker main process (1556) terminated with status 1
[ 13.097221] init: nvidia-docker main process ended, respawning
[ 13.105832] init: nvidia-docker main process (1570) terminated with status 1
[ 13.105839] init: nvidia-docker respawning too fast, stopped
[ 2270.745125] init: nvidia-docker main process (5186) terminated with status 1
[ 2270.745148] init: nvidia-docker main process ended, respawning
[ 2270.762785] init: nvidia-docker main process (5200) terminated with status 1
[ 2270.762805] init: nvidia-docker main process ended, respawning
[ 2270.779454] init: nvidia-docker main process (5215) terminated with status 1
[ 2270.779473] init: nvidia-docker main process ended, respawning
[ 2270.797194] init: nvidia-docker main process (5229) terminated with status 1
[ 2270.797214] init: nvidia-docker main process ended, respawning
[ 2270.812970] init: nvidia-docker main process (5243) terminated with status 1
[ 2270.812990] init: nvidia-docker main process ended, respawning
[ 2270.829314] init: nvidia-docker main process (5258) terminated with status 1
[ 2270.829331] init: nvidia-docker respawning too fast, stopped

lxzh · 2017-09-12T03:46:25Z

root@kirin:/etc/default# NV_DEBUG=1 nvidia-docker-plugin
nvidia-docker-plugin | 2017/09/12 11:31:59 Loading NVIDIA unified memory
nvidia-docker-plugin | 2017/09/12 11:31:59 Loading NVIDIA management library
nvidia-docker-plugin | 2017/09/12 11:31:59 Discovering GPU devices
nvidia-docker-plugin | 2017/09/12 11:31:59 Error: nvml: Unknown Error
nvidia-docker-plugin | 2017/09/12 11:31:59 /go/src/github.com/NVIDIA/nvidia-docker/src/nvidia-docker-plugin/main.go:48 (0x404790)
/usr/local/go/src/runtime/asm_amd64.s:437 (0x468cce)
/usr/local/go/src/runtime/panic.go:423 (0x438b89)
/usr/local/go/src/log/log.go:334 (0x48bf41)
/go/src/github.com/NVIDIA/nvidia-docker/src/nvidia-docker-plugin/main.go:38 (0x40461a)
/go/src/github.com/NVIDIA/nvidia-docker/src/nvidia-docker-plugin/main.go:75 (0x404e0c)
/usr/local/go/src/runtime/proc.go:111 (0x43b070)
/usr/local/go/src/runtime/asm_amd64.s:1721 (0x46b021)
root@kirin:/etc/default# ltrace -f nvidia-docker-plugin
[pid 5450] __libc_start_main(0x46bf60, 1, 0x7fff690fe1d8, 0x752bd0 <unfinished ...>
[pid 5450] pthread_once(0xd003a8, 0x728db0, 0x7fff690fe1e8, 0 <unfinished ...>
[pid 5450] malloc(104) = 0x25ac010
[pid 5450] pthread_mutexattr_init(0x7fff690fe030, 0x25ac070, 0x25ac010, 0x7fb9adc83760) = 0
[pid 5450] pthread_mutexattr_settype(0x7fff690fe030, 1, 0x25ac010, 0x7fb9adc83760) = 0
[pid 5450] pthread_mutex_init(0xd008c0, 0x7fff690fe030, 0, 0x7fb9adc83760) = 0
[pid 5450] pthread_mutexattr_destroy(0x7fff690fe030, 0x7fff690fe030, 0, 0) = 0
[pid 5450] pthread_mutexattr_init(0x7fff690fe040, 0x7fff690fe030, 0, 0) = 0
[pid 5450] pthread_mutexattr_settype(0x7fff690fe040, 1, 0, 0) = 0
[pid 5450] pthread_mutex_init(0xd00900, 0x7fff690fe040, 0, 0) = 0
[pid 5450] pthread_mutexattr_destroy(0x7fff690fe040, 0x7fff690fe040, 0, 0) = 0
[pid 5450] __cxa_atexit(0x72a7f0, 0, 0, 0) = 0
[pid 5450] <... pthread_once resumed> ) = 0
[pid 5450] __cxa_atexit(0x72a830, 0, 0xcbbaa8, -1) = 0
[pid 5450] pthread_attr_init(0x7fff690fe080, 0x46a640, 0xbfebfbff, 0xcdbce0) = 0
[pid 5450] pthread_attr_getstacksize(0x7fff690fe080, 0x7fff690fe078, 0xbfebfbff, 0) = 0
[pid 5450] pthread_attr_destroy(0x7fff690fe080, 1, 0x800000, 0) = 0
[pid 5450] malloc(24) = 0x25ac080
[pid 5450] sigfillset(<31-32>) = 0
[pid 5450] pthread_sigmask(2, 0x7fff690fdec0, 0x7fff690fdf40, 0x7fb9adc83760) = 0
[pid 5450] pthread_attr_init(0x7fff690fde80, 0x7fff690fdec0, 0, 0) = 0
[pid 5450] pthread_attr_getstacksize(0x7fff690fde80, 0x7fff690fde78, 0, 0) = 0
[pid 5450] pthread_create(0x7fff690fde70, 0x7fff690fde80, 0x711c40, 0x25ac080) = 0
[pid 5450] pthread_sigmask(2, 0x7fff690fdf40, 0, -1 <unfinished ...>
[pid 5451] free(0x25ac080 <unfinished ...>
[pid 5450] <... pthread_sigmask resumed> ) = 0
[pid 5451] <... free resumed> ) =
[pid 5450] malloc(24) = 0x25ac080
[pid 5450] sigfillset(<31-32>) = 0
[pid 5450] pthread_sigmask(2, 0x7fff690fddf0, 0x7fff690fde70, 0x7fb9adc83760) = 0
[pid 5450] pthread_attr_init(0x7fff690fddb0, 0x7fff690fddf0, 0, 0) = 0
[pid 5450] pthread_attr_getstacksize(0x7fff690fddb0, 0x7fff690fdda8, 0, 0) = 0
[pid 5450] pthread_create(0x7fff690fdda0, 0x7fff690fddb0, 0x711c40, 0x25ac080) = 0
[pid 5450] pthread_sigmask(2, 0x7fff690fde70, 0, -1 <unfinished ...>
[pid 5452] free(0x25ac080 <unfinished ...>
[pid 5450] <... pthread_sigmask resumed> ) = 0
[pid 5452] <... free resumed> ) =
[pid 5450] malloc(24) = 0x25ac080
[pid 5450] sigfillset(<31-32>) = 0
[pid 5450] pthread_sigmask(2, 0x7fff690fdca0, 0x7fff690fdd20, 0x7fb9adc83760) = 0
[pid 5450] pthread_attr_init(0x7fff690fdc60, 0x7fff690fdca0, 0, 0) = 0
[pid 5452] malloc(24 <unfinished ...>
[pid 5450] pthread_attr_getstacksize(0x7fff690fdc60, 0x7fff690fdc58, 0, 0) = 0
[pid 5450] pthread_create(0x7fff690fdc50, 0x7fff690fdc60, 0x711c40, 0x25ac080) = 0
[pid 5450] pthread_sigmask(2, 0x7fff690fdd20, 0, -1 <unfinished ...>
[pid 5453] free(0x25ac080 <unfinished ...>
[pid 5452] <... malloc resumed> ) = 0x7fb9a80008c0
[pid 5450] <... pthread_sigmask resumed> ) = 0
[pid 5453] <... free resumed> ) =
[pid 5452] sigfillset(<31-32>) = 0
[pid 5452] pthread_sigmask(2, 0x7fb9ad082c30, 0x7fb9ad082cb0, 0x7fb9a8000020) = 0
[pid 5452] pthread_attr_init(0x7fb9ad082bf0, 0x7fb9ad082c30, 0, 0) = 0
[pid 5452] pthread_attr_getstacksize(0x7fb9ad082bf0, 0x7fb9ad082be8, 0, 0) = 0
[pid 5452] pthread_create(0x7fb9ad082be0, 0x7fb9ad082bf0, 0x711c40, 0x7fb9a80008c0 <unfinished ...>
[pid 5454] free(0x7fb9a80008c0 <unfinished ...>
[pid 5452] <... pthread_create resumed> ) = 0
[pid 5454] <... free resumed> ) =
[pid 5452] pthread_sigmask(2, 0x7fb9ad082cb0, 0, -1) = 0
[pid 5450] pthread_mutex_lock(0xcffe20, 0xcdbce0, 0xc820057f18, 0) = 0
[pid 5450] pthread_cond_broadcast(0xcffe60, 0, 0, 0) = 0
[pid 5450] pthread_mutex_unlock(0xcffe20, 1, 0, 0) = 0
nvidia-docker-plugin | 2017/09/12 11:32:36 Loading NVIDIA unified memory
[pid 5455] --- Called exec() ---
[pid 5455] __libc_start_main(0x4017a0, 3, 0x7ffc562295e8, 0x401290 <unfinished ...>
[pid 5455] __strdup(0x7ffc5622a76c, 0x7ffc562295e8, 0x7ffc562295e8, 0) = 0xded010
[pid 5455] free(0xded010) =
[pid 5455] __strdup(0x7ffc5622a76f, 0x7ffc562295e8, 0x7ffc562295e8, 0) = 0xded010
[pid 5455] __strtol_internal("0", 0x7ffc56229360, 0) = 0
[pid 5455] free(0xded010) =
[pid 5455] fopen("/proc/modules", "r") = 0xded090
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, -1) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 118) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 105) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 102) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 120) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 120) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 105) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 102) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 102) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 102) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 120) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 120) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 102) = 1
[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 102) = 1

lxzh · 2017-09-12T03:46:32Z

[pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 98) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 98) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 115) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 108) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 111) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 120) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 105) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 105) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 120) [pid 5455] fscanf(0xded090, 0x405a6c, 0x7ffc56229340, 115) [pid 5455] fclose(0xded090) [pid 5455] fopen("/proc/devices", "r") [pid 5455] fgets("Character devices:\n", 255, 0xded090) [pid 5455] ferror(0xded090) [pid 5455] fgets(" 1 mem\n", 255, 0xded090) [pid 5455] strstr(" 1 mem\n", "nvidia-uvm") [pid 5455] fgets(" 4 /dev/vc/0\n", 255, 0xded090) [pid 5455] strstr(" 4 /dev/vc/0\n", "nvidia-uvm") [pid 5455] fgets(" 4 tty\n", 255, 0xded090) [pid 5455] strstr(" 4 tty\n", "nvidia-uvm") [pid 5455] fgets(" 4 ttyS\n", 255, 0xded090) [pid 5455] strstr(" 4 ttyS\n", "nvidia-uvm") [pid 5455] fgets(" 5 /dev/tty\n", 255, 0xded090) [pid 5455] strstr(" 5 /dev/tty\n", "nvidia-uvm") [pid 5455] fgets(" 5 /dev/console\n", 255, 0xded090) [pid 5455] strstr(" 5 /dev/console\n", "nvidia-uvm") [pid 5455] fgets(" 5 /dev/ptmx\n", 255, 0xded090) [pid 5455] strstr(" 5 /dev/ptmx\n", "nvidia-uvm") [pid 5455] fgets(" 5 ttyprintk\n", 255, 0xded090) [pid 5455] strstr(" 5 ttyprintk\n", "nvidia-uvm") [pid 5455] fgets(" 6 lp\n", 255, 0xded090) [pid 5455] strstr(" 6 lp\n", "nvidia-uvm") [pid 5455] fgets(" 7 vcs\n", 255, 0xded090) [pid 5455] strstr(" 7 vcs\n", "nvidia-uvm") [pid 5455] fgets(" 10 misc\n", 255, 0xded090) [pid 5455] strstr(" 10 misc\n", "nvidia-uvm") [pid 5455] fgets(" 13 input\n", 255, 0xded090) [pid 5455] strstr(" 13 input\n", "nvidia-uvm") [pid 5455] fgets(" 21 sg\n", 255, 0xded090) [pid 5455] strstr(" 21 sg\n", "nvidia-uvm") [pid 5455] fgets(" 29 fb\n", 255, 0xded090) [pid 5455] strstr(" 29 fb\n", "nvidia-uvm") [pid 5455] fgets(" 89 i2c\n", 255, 0xded090) [pid 5455] strstr(" 89 i2c\n", "nvidia-uvm") [pid 5455] fgets(" 99 ppdev\n", 255, 0xded090) [pid 5455] strstr(" 99 ppdev\n", "nvidia-uvm") [pid 5455] fgets("108 ppp\n", 255, 0xded090) [pid 5455] strstr("108 ppp\n", "nvidia-uvm") [pid 5455] fgets("116 alsa\n", 255, 0xded090) [pid 5455] strstr("116 alsa\n", "nvidia-uvm") [pid 5455] fgets("128 ptm\n", 255, 0xded090) [pid 5455] strstr("128 ptm\n", "nvidia-uvm") [pid 5455] fgets("136 pts\n", 255, 0xded090) [pid 5455] strstr("136 pts\n", "nvidia-uvm") [pid 5455] fgets("180 usb\n", 255, 0xded090) [pid 5455] strstr("180 usb\n", "nvidia-uvm") [pid 5455] fgets("189 usb_device\n", 255, 0xded090) [pid 5455] strstr("189 usb_device\n", "nvidia-uvm") [pid 5455] fgets("195 nvidia-frontend\n", 255, 0xded090) [pid 5455] strstr("195 nvidia-frontend\n", "nvidia-uvm") [pid 5455] fgets("216 rfcomm\n", 255, 0xded090) [pid 5455] strstr("216 rfcomm\n", "nvidia-uvm") [pid 5455] fgets("226 drm\n", 255, 0xded090) [pid 5455] strstr("226 drm\n", "nvidia-uvm") [pid 5455] fgets("246 nvidia-uvm\n", 255, 0xded090) [pid 5455] strstr("246 nvidia-uvm\n", "nvidia-uvm") [pid 5455] sscanf(0x7ffc56229260, 0x405b3b, [pid 5455] fclose(0xded090) [pid 5455] __xstat(1, "/dev/nvidia-uvm", 0x7ffc56229160) [pid 5455] __xstat(1, "/dev/nvidia-uvm-tools", [pid 5455] +++ exited (status 0) +++
[pid 5450] --- SIGCHLD (Child exited) ---
nvidia-docker-plugin | 2017/09/12 11:32:36 [pid 5450] setenv("CUDA_DISABLE_UNIFIED_MEMORY", "1", 1) [pid 5450] setenv("CUDA_CACHE_DISABLE", "1", 1) [pid 5450] unsetenv("@\352\n \310") [pid 5450] dlopen("libnvidia-ml.so.1", 257) [pid 5450] nvmlInit_v2(0x7fb9ae6dd968, nvidia-docker-plugin | 2017/09/12 11:32:36 [pid 5450] nvmlDeviceGetCount_v2(0xc82007cf50, [pid 5450] nvmlDeviceGetHandleByIndex_v2(0, [pid 5450] nvmlDeviceGetName(0x7fb9a7fe67a8, [pid 5450] nvmlErrorString(999, 0xcdbce0, nvidia-docker-plugin | 2017/09/12 11:32:36 [pid 5450] nvmlShutdown(0xc820057af0, [pid 5450] dlclose(0x25ac650) [pid 5454] +++ exited (status 1) +++
[pid 5453] +++ exited (status 1) +++
[pid 5452] +++ exited (status 1) +++
[pid 5451] +++ exited (status 1) +++
[pid 5450] +++ exited (status 1) +++
root@kirin:/etc/default# nvidia-smi topo GPU0 CPU Affinity
GPU0 X 0-11 = 1
= 1
= 1
= 1
= 1
= 1
= 1
= 1
= 1
= 1
= 0
= 0xded090
= 0x7ffc56229260
= 0
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= nil
= 0x7ffc56229260
= "nvidia-uvm\n"
0x7ffc5622936c, 0) = 1
= 0
= 0
0x7ffc56229160) = 0
Loading NVIDIA management library
= 0
= 0
=
= 0x25ac650
0x25ac5f8, 0x25ac650, 0) = 0
Discovering GPU devices
0xcdbce0, 0xc820057b40, 0xc820057ba8) = 0
0xc82008c048, 0xc82008c048, 0xc820057a98) = 0
0xc820078580, 64, 0xc820078580) = 999
0xc820057978, 0xc8200579e8) = 0x7fb9a7dbe074
Error: nvml: Unknown Error
0xcdbce0, 0xc820057a98, 0xc820057af0) = 0
= 0
-m

Legend:

X = Self
SOC = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks

lxzh · 2017-09-12T03:48:37Z

I am so sorry i can't even put much text at one time for the reason of information security of our company.

3XX0 · 2017-09-12T03:52:21Z

It's clearly a driver issue as shown in nvidia-smi (ERR!), can you try upgrading to the latest stable drivers (i.e. 384.XX)?

lxzh · 2017-09-12T06:36:40Z

Success, after upgrade the driver to 384.69. Thank you very much.

nvidia-docker run --rm 1e07735bc788 nvidia-smi

== NVIDIA Caffe ==
==================

NVIDIA Release 17.03 (build 12375)

Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

Tue Sep 12 06:25:42 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.69                 Driver Version: 384.69                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:4B:00.0  On |                  N/A |
| 23%   37C    P8    10W / 250W |     53MiB / 11171MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

nvidia-docker run --rm nvidia/cuda:latest nvidia-smi

Tue Sep 12 06:29:17 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.69                 Driver Version: 384.69                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:4B:00.0  On |                  N/A |
| 23%   37C    P8    10W / 250W |     53MiB / 11171MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

nvidia-docker run -it --name ljf_caffe -v /home:/home nvcr.io/nvidia/caffe:17.03 /bin/bash

==================
== NVIDIA Caffe ==
==================

NVIDIA Release 17.03 (build 12375)

Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

paolorota · 2019-09-14T08:31:20Z

I feel obliged to reopen this, with last upgrades I have the same problem on 5/5 machines.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use 'nvidia-docker run' to start this container; see https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker

So basically docker does not see the drivers and therefore Tensorflow does not work anymore.

I am using the following image: nvcr.io/nvidia/tensorflow:19.08-py3 and I have installed the following nvidia drivers: 390.116

Docker version:
Docker version 19.03.1, build 74b1e89

Anyone can help? I tried to purge drivers, reinstall a different version but the result is the same. nvidia-docker is not found, I run docker normally as described in the guide.

lxzh closed this as completed Sep 12, 2017

paolorota mentioned this issue Sep 14, 2019

WARNING: The NVIDIA Driver was not detected #1071

Closed

skyler14 mentioned this issue Aug 2, 2022

size mismatch for context_lstm.weight_ih_l0 and context_lstm.weight_ih_l0_reverse NVIDIA/radtts#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available #464

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available #464

lxzh commented Sep 11, 2017 •

edited

3XX0 commented Sep 11, 2017

lxzh commented Sep 11, 2017 •

edited

3XX0 commented Sep 12, 2017 •

edited

lxzh commented Sep 12, 2017 •

edited

3XX0 commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

3XX0 commented Sep 12, 2017

lxzh commented Sep 12, 2017 •

edited

paolorota commented Sep 14, 2019

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available #464

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available #464

Comments

lxzh commented Sep 11, 2017 • edited

3XX0 commented Sep 11, 2017

lxzh commented Sep 11, 2017 • edited

3XX0 commented Sep 12, 2017 • edited

lxzh commented Sep 12, 2017 • edited

3XX0 commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

lxzh commented Sep 12, 2017

3XX0 commented Sep 12, 2017

lxzh commented Sep 12, 2017 • edited

paolorota commented Sep 14, 2019

lxzh commented Sep 11, 2017 •

edited

lxzh commented Sep 11, 2017 •

edited

3XX0 commented Sep 12, 2017 •

edited

lxzh commented Sep 12, 2017 •

edited

lxzh commented Sep 12, 2017 •

edited