Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InternalError (see above for traceback): Blas SGEMM launch failed : m=802816, n=64, k=32 #224

Open
to-be-snail opened this issue Feb 12, 2019 · 23 comments

Comments

@to-be-snail
Copy link

When I perform channel pruning on the mobilenet at ilsvrc12 dataset,this error occured. But the pruning at cifar10 dataset can be done normally.

@jiaxiang-wu
Copy link
Contributor

@to-be-snail
Copy link
Author

Maybe something related to the GPU memory?
https://stackoverflow.com/questions/37337728/tensorflow-internalerror-blas-sgemm-launch-failed

My machine is GTX2080,the GPUmemory is 8G,I dont know if i can finish the pruning...

@jiaxiang-wu
Copy link
Contributor

jiaxiang-wu commented Feb 12, 2019

Could you try solutions provided in the above stack-overflow link, and see if anything helps?

@to-be-snail
Copy link
Author

anything
I'm sure I only run a tensorflow program at the same time and have reinstalled the tensorflow-gpu,it didn't worked.

@jiaxiang-wu
Copy link
Contributor

Maybe this one? https://stackoverflow.com/a/43130779/10611647

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(
  allow_soft_placement=True, log_device_placement=True))

@to-be-snail
Copy link
Author

Could you try solutions provided in the above stack-overflow link, and see if anything helps?

I'm sure I only run a tensorflow program at the same time and have reinstalled the tensorflow-gpu,it didn't worked.

@to-be-snail
Copy link
Author

Maybe this one? https://stackoverflow.com/a/43130779/10611647
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True))

I have tried,although I'm not sure where to put it.
image

@jiaxiang-wu
Copy link
Contributor

How many GPU cards do you have?

@to-be-snail
Copy link
Author

How many GPU cards do you have?

only one...

@jiaxiang-wu
Copy link
Contributor

Try to reduce the batch size?

@to-be-snail
Copy link
Author

Try to reduce the batch size?

I have reduced the batch_size_eval to 1

@jiaxiang-wu
Copy link
Contributor

If the error occurs in the training process, then you should reduce FLAGS.batch_size instead of FLAGS.batch_size_eval.

@to-be-snail
Copy link
Author

If the error occurs in the training process, then you should reduce FLAGS.batch_size instead of FLAGS.batch_size_eval.

It didn't work...

@jiaxiang-wu
Copy link
Contributor

Any updates? Still not working?

@ShuteLee
Copy link

Hey bro, have u figured it out ? I met the same issue

@0113bernoyoun
Copy link

plz if you solve this problem, let me know how to solve it,,,

@Donald-Su
Copy link

Donald-Su commented Aug 7, 2019

I encountered the same issue when I run my code at the machine of the GTX2080(the signal GPU memory is 8G, total have two card), the error info as the following:

InternalError (see above for traceback): Blas SGEMM launch failed : m=53290, n=80, k=64
	 [[node while/AdvInceptionV3/AdvInceptionV3/Conv2d_3b_1x1/Conv2D (defined at /home/suy/.pyenv/versions/mypython3.6/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1057)  = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](while/AdvInceptionV3/AdvInceptionV3/MaxPool_3a_3x3/MaxPool, while/AdvInceptionV3/AdvInceptionV3/Conv2d_3b_1x1/kernel/Regularizer/l2_regularizer/L2Loss/Enter)]]
	 [[{{node while/Exit/_791}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4223_while/Exit", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

However, I could run the same code at another machine of the GTX2080(the signal GPU memory is 10G, total have two card).

I still don't know why.

@ShuteLee
Copy link

ShuteLee commented Aug 7, 2019

I fixed this issue just by installing the patches of CUDA_Toolkit @Donald-Su @0113bernoyoun

@Donald-Su
Copy link

I fixed this issue just by installing the patches of CUDA_Toolkit @Donald-Su @0113bernoyoun

Hi ShuteLee, the machine installed the CUDA_Toolkit, but still have the issue

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

@ShuteLee
Copy link

ShuteLee commented Aug 7, 2019

I fixed this issue just by installing the patches of CUDA_Toolkit @Donald-Su @0113bernoyoun

Hi ShuteLee, the machine installed the CUDA_Toolkit, but still have the issue

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Please be sure that you have installed the four PATCHES

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

@Donald-Su
Copy link

I fixed this issue just by installing the patches of CUDA_Toolkit @Donald-Su @0113bernoyoun

Hi ShuteLee, the machine installed the CUDA_Toolkit, but still have the issue

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Please be sure that you have installed the four PATCHES

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

There is not the package for my OS of the ubuntu 18.04

@ShuteLee
Copy link

ShuteLee commented Aug 8, 2019

I fixed this issue just by installing the patches of CUDA_Toolkit @Donald-Su @0113bernoyoun

Hi ShuteLee, the machine installed the CUDA_Toolkit, but still have the issue

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Please be sure that you have installed the four PATCHES
https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

There is not the package for my OS of the ubuntu 18.04

So, maybe the CUDA Tookit 9.0 is not so compatible with your Ubuntu 18.04. you can choose a more recent version.

@bryanbocao
Copy link

Make sure TensorFlow is in 1.12.0 version mentioned in main.sh?

pip install tensorflow-gpu==1.12.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants