Blas SGEMM launch failed when predicting with GPU #4

VolkerH · 2020-02-12T03:48:41Z

Hi,
I manage to successfully predict the test dataset when using tensorflow for CPU, although it is very slow.
When I try running with tensorflow-gpu=1.9 I run into the following error:

/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2020-02-12 14:37:21.233352: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /home/vhil0002/Github/TRAILMAP/data/model-weights/trailmap_model.hdf5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
2020-02-12 14:37:21.377533: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2020-02-12 14:37:21.636131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:65:00.0
totalMemory: 10.73GiB freeMemory: 9.98GiB
2020-02-12 14:37:21.636165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2020-02-12 14:37:22.148132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-12 14:37:22.148167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2020-02-12 14:37:22.148173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2020-02-12 14:37:22.148399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9637 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:65:00.0, compute capability: 7.5)
Name: 
[                                        ]   0%       ETA: Pending        2020-02-12 14:37:25.992729: E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "segment_brain_batch.py", line 40, in <module>
    segment_brain(input_folder, output_folder, model)
  File "/home/vhil0002/Github/TRAILMAP/inference/segment_brain.py", line 127, in segment_brain
    section_seg = helper_segment_section(model, section_vol)
  File "/home/vhil0002/Github/TRAILMAP/inference/segment_brain.py", line 237, in helper_segment_section
    output = np.squeeze(model.predict(batch_input)[:, :, :, :, [0]])
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1478, in predict
    self, x, batch_size=batch_size, verbose=verbose, steps=steps)
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 363, in predict_loop
    batch_outs = f(ins_batch)
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 2897, in __call__
    fetched = self._callable_fn(*array_vals)
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1454, in __call__
    self._session._session, self._handle, args, status, None)
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=699840, n=1, k=64
	 [[Node: conv3d_14/Conv3D = Conv3D[T=DT_FLOAT, data_format="NDHWC", dilations=[1, 1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](batch_normalization_13/batchnorm/add_1, conv3d_
14/Conv3D/ReadVariableOp)]]

I googled some of the error messages and some stackoverflow post suggested this may have to do with lack o video memory. However, the 2080Ti card should have the same amount of video memory as the 1080Ti that you mention in the paper. There is some video memory used for the desktop, but not much.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:65:00.0  On |                  N/A |
| 41%   41C    P8    28W / 260W |    327MiB / 10986MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2278      G   /usr/lib/xorg/Xorg                            26MiB |
|    0      2321      G   /usr/bin/gnome-shell                          17MiB |
|    0      6186      G   /usr/lib/xorg/Xorg                           181MiB |
|    0      6329      G   /usr/bin/gnome-shell                          98MiB |
+-----------------------------------------------------------------------------+

Do you have any suggestions ?

The text was updated successfully, but these errors were encountered:

albert597 · 2020-02-12T04:07:16Z

Could you try the stackoverflow post below by adding the config.gpu_options?
https://stackoverflow.com/questions/38303974/tensorflow-running-error-with-cublas

One of the comments here (qqwweee/keras-yolo3#332 (comment)) mentioned that he was able to run tensorflow fine on his 1080 ti, but the 2080 ti gives the same error you received. It seems to do something with cuda version. What version of cudnn are you using?

Edit: This may also be useful. Could you try adding those lines and checking your cuda/cudnn versions? tensorflow/tensorflow#25403 (comment)

VolkerH · 2020-02-12T23:59:53Z

Hi,
thanks for your suggesions. It seems that you landed on slightly different stackoverflow posts with your google search than I did.

I tried setting the tensorflow config to allow growth and only use 0.9 percent of the available memory but that did not help.

However, some of the comments seemed to indicate that a newer tensorflow version is required (which supports more recent cuda/cudnn versions). I am using conda as a package manager and I originally installed tensorflow=1.9 as per your readme. After deinstalling tensorflow and installing

conda install tensorflow-gpu=1.12

the above error went away.
However, not all is well... I'm now running into some other problems. I will dig into these some more and open another issue once I have a better grip on what's happening.

VolkerH · 2020-02-13T04:26:13Z

Sorted out the other issue (see #5) and can now confirm that upgrading tensorflow-gpu to 1.12 fixed the tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed error on the GTX2080Ti.

albert597 · 2020-02-13T21:14:37Z

Thanks for figuring this out. I had only done testing on tensorflow-gpu 1.9, which had worked fine on our 1080 ti. I will see if tensorflow-gpu 1.12 works well with our 1080 ti. If this is the case, I will update the README to be 1.12.

auesro · 2020-05-15T14:35:32Z

I have succesfully used a RTX 2080Ti with tensorflow-GPU=2.1. You just need t o correctly align nvidia driver, CUDA version, cdDNN version and supported tensorflow verison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blas SGEMM launch failed when predicting with GPU #4

Blas SGEMM launch failed when predicting with GPU #4

VolkerH commented Feb 12, 2020

albert597 commented Feb 12, 2020 •

edited

Loading

VolkerH commented Feb 12, 2020

VolkerH commented Feb 13, 2020

albert597 commented Feb 13, 2020

auesro commented May 15, 2020

Blas SGEMM launch failed when predicting with GPU #4

Blas SGEMM launch failed when predicting with GPU #4

Comments

VolkerH commented Feb 12, 2020

albert597 commented Feb 12, 2020 • edited Loading

VolkerH commented Feb 12, 2020

VolkerH commented Feb 13, 2020

albert597 commented Feb 13, 2020

auesro commented May 15, 2020

albert597 commented Feb 12, 2020 •

edited

Loading