RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid #11

najmehs · 2021-01-13T22:58:21Z

Hi,
Thanks for sharing the images and comprehensive instructions! They are very helpful!

When I use this image datamachines/cudnn_tensorflow_opencv:10.2_2.3.1_4.5.0-20201204, tf will have problems. In fact when I run the two following lines from python,
from tensorflow.python.client import device_lib print(device_lib.list_local_devices())
it will result in this error:

  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/test_util.py", line 131, in gpu_device_name
    for x in device_lib.list_local_devices():
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/device_lib.py", line 43, in list_local_devices
    _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid```

The text was updated successfully, but these errors were encountered:

mmartial · 2021-02-17T22:40:16Z

I am able to reproduce this issue. This seems to be related to this issue tensorflow/tensorflow#41990 but the proposed workaround tensorflow/tensorflow#41990 (comment) did not seem to solve my issue from within the container.

I am in the process of releasing updated versions (20210211) and this error does not happen anymore with a later TF.
There is a tool that runs similar commands in test/tf_hw.py

docker run --rm -it --gpus all -v `pwd`:/dmc datamachines/cudnn_tensorflow_opencv:10.2_2.4.1_3.4.13-20210211 python3 /dmc/test/tf_hw.py
2021-02-17 22:01:54.242594: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
*** Tensorflow version   :  2.4.1
*** Tensorflow Keras     :  2.4.0
*** TF Builf with cuda   :  True
*** TF compile flags     :  ['-I/usr/local/lib/python3.8/dist-packages/tensorflow/include', '-D_GLIBCXX_USE_CXX11_ABI=1']
*** TF include           :  /usr/local/lib/python3.8/dist-packages/tensorflow/include
*** TF lib               :  /usr/local/lib/python3.8/dist-packages/tensorflow
*** TF link flags        :  ['-L/usr/local/lib/python3.8/dist-packages/tensorflow', '-l:libtensorflow_framework.so.2']
*** Keras version        :  2.4.3
*** PyTorch version      :  1.7.1
*** pandas version       :  1.2.2
*** scikit-learn version :  0.24.1

(!! the following is build device specific, and here only to confirm hardware availability, ignore !!)
2021-02-17 22:01:55.537022: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-17 22:01:55.538548: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-02-17 22:01:55.551334: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:01:55.551894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-02-17 22:01:55.551911: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-02-17 22:01:55.552995: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-02-17 22:01:55.553019: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-02-17 22:01:55.554014: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-17 22:01:55.554186: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-17 22:01:55.555275: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-02-17 22:01:55.555782: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-02-17 22:01:55.557944: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2021-02-17 22:01:55.558043: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:01:55.558610: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:01:55.559106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-02-17 22:01:55.559127: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-02-17 22:07:15.635601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-17 22:07:15.635630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-02-17 22:07:15.635640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-02-17 22:07:15.635835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:07:15.636365: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:07:15.636862: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:07:15.637338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 21897 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:08:00.0, compute capability: 8.6)
--- All seen hardware    :
 [name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9698607629912424656
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 22960798464
locality {
  bus_id: 1
  links {
  }
}
incarnation: 645222317647545331
physical_device_desc: "device: 0, name: GeForce RTX 3090, pci bus id: 0000:08:00.0, compute capability: 8.6"
]
2021-02-17 22:07:15.637877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:07:15.638993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-02-17 22:07:15.639030: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-02-17 22:07:15.639079: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-02-17 22:07:15.639115: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-02-17 22:07:15.639143: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-17 22:07:15.639170: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-17 22:07:15.639189: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-02-17 22:07:15.639206: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-02-17 22:07:15.639223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2021-02-17 22:07:15.639326: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:07:15.640391: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-17 22:07:15.641351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
--- TF GPU Available     :
 [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

The only glitch is that the Successfully opened dynamic library libcudart.so.10.2 step hangs for a few minutes, which seems to be a different problem tensorflow/tensor2tensor#1643

That script is also run at the end of a build to confirm it "works". I note that a timeout (and not a crash) pass.

I will note that the soon to be made available CUDA11 based container(datamachines/cudnn_tensorflow_opencv:11.2.0_2.4.1_3.4.13-20210211) does not have this hangup problem.

mmartial · 2021-02-17T23:41:55Z

It did take me less extra time than I expected, the latest containers are available at this point. I will close this issue at this point.
From my testing on the datamachines/cudnn_tensorflow_opencv:11.2.0_2.4.1_3.4.13-20210211 this appears to be functional at this point.

mmartial closed this as completed Feb 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid #11

RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid #11

najmehs commented Jan 13, 2021

mmartial commented Feb 17, 2021

mmartial commented Feb 17, 2021

RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid #11

RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid #11

Comments

najmehs commented Jan 13, 2021

mmartial commented Feb 17, 2021

mmartial commented Feb 17, 2021