Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blas SGEMM launch failed when predicting with GPU #4

Open
VolkerH opened this issue Feb 12, 2020 · 5 comments
Open

Blas SGEMM launch failed when predicting with GPU #4

VolkerH opened this issue Feb 12, 2020 · 5 comments

Comments

@VolkerH
Copy link

VolkerH commented Feb 12, 2020

Hi,
I manage to successfully predict the test dataset when using tensorflow for CPU, although it is very slow.
When I try running with tensorflow-gpu=1.9 I run into the following error:

/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2020-02-12 14:37:21.233352: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /home/vhil0002/Github/TRAILMAP/data/model-weights/trailmap_model.hdf5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
2020-02-12 14:37:21.377533: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2020-02-12 14:37:21.636131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:65:00.0
totalMemory: 10.73GiB freeMemory: 9.98GiB
2020-02-12 14:37:21.636165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2020-02-12 14:37:22.148132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-12 14:37:22.148167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2020-02-12 14:37:22.148173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2020-02-12 14:37:22.148399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9637 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:65:00.0, compute capability: 7.5)
Name: 
[                                        ]   0%       ETA: Pending        2020-02-12 14:37:25.992729: E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "segment_brain_batch.py", line 40, in <module>
    segment_brain(input_folder, output_folder, model)
  File "/home/vhil0002/Github/TRAILMAP/inference/segment_brain.py", line 127, in segment_brain
    section_seg = helper_segment_section(model, section_vol)
  File "/home/vhil0002/Github/TRAILMAP/inference/segment_brain.py", line 237, in helper_segment_section
    output = np.squeeze(model.predict(batch_input)[:, :, :, :, [0]])
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1478, in predict
    self, x, batch_size=batch_size, verbose=verbose, steps=steps)
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 363, in predict_loop
    batch_outs = f(ins_batch)
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 2897, in __call__
    fetched = self._callable_fn(*array_vals)
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1454, in __call__
    self._session._session, self._handle, args, status, None)
  File "/home/vhil0002/anaconda3/envs/trailmap/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=699840, n=1, k=64
	 [[Node: conv3d_14/Conv3D = Conv3D[T=DT_FLOAT, data_format="NDHWC", dilations=[1, 1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](batch_normalization_13/batchnorm/add_1, conv3d_
14/Conv3D/ReadVariableOp)]]

I googled some of the error messages and some stackoverflow post suggested this may have to do with lack o video memory. However, the 2080Ti card should have the same amount of video memory as the 1080Ti that you mention in the paper. There is some video memory used for the desktop, but not much.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:65:00.0  On |                  N/A |
| 41%   41C    P8    28W / 260W |    327MiB / 10986MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2278      G   /usr/lib/xorg/Xorg                            26MiB |
|    0      2321      G   /usr/bin/gnome-shell                          17MiB |
|    0      6186      G   /usr/lib/xorg/Xorg                           181MiB |
|    0      6329      G   /usr/bin/gnome-shell                          98MiB |
+-----------------------------------------------------------------------------+

Do you have any suggestions ?

@albert597
Copy link
Owner

albert597 commented Feb 12, 2020

Could you try the stackoverflow post below by adding the config.gpu_options?
https://stackoverflow.com/questions/38303974/tensorflow-running-error-with-cublas

One of the comments here (qqwweee/keras-yolo3#332 (comment)) mentioned that he was able to run tensorflow fine on his 1080 ti, but the 2080 ti gives the same error you received. It seems to do something with cuda version. What version of cudnn are you using?

Edit: This may also be useful. Could you try adding those lines and checking your cuda/cudnn versions? tensorflow/tensorflow#25403 (comment)

@VolkerH
Copy link
Author

VolkerH commented Feb 12, 2020

Hi,
thanks for your suggesions. It seems that you landed on slightly different stackoverflow posts with your google search than I did.

I tried setting the tensorflow config to allow growth and only use 0.9 percent of the available memory but that did not help.

However, some of the comments seemed to indicate that a newer tensorflow version is required (which supports more recent cuda/cudnn versions). I am using conda as a package manager and I originally installed tensorflow=1.9 as per your readme. After deinstalling tensorflow and installing

conda install tensorflow-gpu=1.12

the above error went away.
However, not all is well... I'm now running into some other problems. I will dig into these some more and open another issue once I have a better grip on what's happening.

@VolkerH
Copy link
Author

VolkerH commented Feb 13, 2020

Sorted out the other issue (see #5) and can now confirm that upgrading tensorflow-gpu to 1.12 fixed the tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed error on the GTX2080Ti.

@albert597
Copy link
Owner

Thanks for figuring this out. I had only done testing on tensorflow-gpu 1.9, which had worked fine on our 1080 ti. I will see if tensorflow-gpu 1.12 works well with our 1080 ti. If this is the case, I will update the README to be 1.12.

@auesro
Copy link

auesro commented May 15, 2020

I have succesfully used a RTX 2080Ti with tensorflow-GPU=2.1. You just need t o correctly align nvidia driver, CUDA version, cdDNN version and supported tensorflow verison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants