RuntimeError: CUDNN_STATUS_BAD_PARAM in loss.backward() #2

manmanCover · 2019-01-21T09:11:05Z

Thank you for your wonderful code! Have you met this problem before and do you know how to solve it?

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-3021f006d740>", line 3, in <module>
    runfile('/home/Sarah/project/main_gpu.py', args=[---], wdir='/home/Sarah/project')
  File "/home/Sarah/pycharm/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/Sarah/project/main_gpu.py", line 244, in <module>
    main()
  File "/home/Sarah/project/main_gpu.py", line 212, in main
    loss = train(imgL_crop, imgR_crop, disp_crop_L)
  File "/home/Sarah/project/main_gpu.py", line 161, in train
    loss.backward()
  File "/home/Sarah/py40/local/lib/python2.7/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/Sarah/py40/local/lib/python2.7/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDNN_STATUS_BAD_PARAM

The text was updated successfully, but these errors were encountered:

cfzd · 2019-01-21T09:37:29Z

As the problem occurred in the backward step, the network should have finished forward step. So I think the problem might be out of memory.
The spatial attention method could cost huge amount of memory. You can try to place the PSANet module on the last conv layer or reduce the size of input features.
If it is not a problem of out of memory, please tell me more information of your environment like PyTorch version, CUDA version, CuDNN version, etc.

manmanCover · 2019-01-21T10:13:28Z

@cfzd
hi,
I think I found the problem.
In file PSANetFunc.py, the backward methods for both PSANetCollectFunction and PSANetDistributeFunction

b1_grad_n = mask_grad.shape[0]
		b1_grad_c = (2 * mask_grad.shape[2] - 1)*(2 * mask_grad.shape[3] - 1)
		b1_grad_h = mask_grad.shape[2]
		b1_grad_w = mask_grad.shape[3]
		**bottom1_grad = torch.zeros(b1_grad_n,b1_grad_c,b1_grad_w,b1_grad_h).cuda()**

b1_grad_w and b1_grad_h should be exchanged in bottom1_grad initialization.

cfzd · 2019-01-21T11:40:13Z

@manmanCover Sorry, I have fixed the code. It's really confusing that I never met this problem and my experiment gains better performance. I think I need to release a benchmark soon.

manmanCover · 2019-01-21T12:22:28Z

@cfzd Maybe your samples are all squares.
By the way, do you notice that the memory consumption of your implementation is unbalanced? When I ran the project on 2 GPUs, one of them is almost fully occupied.

sarah@Battlebox2:~$ nvidia-smi
Mon Jan 21 13:18:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25                 Driver Version: 390.25                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |                  N/A |
| 49%   80C    P2   190W / 250W |   7799MiB / 12196MiB |     72%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |                  N/A |
| 56%   84C    P2   186W / 250W |  12044MiB / 12188MiB |     89%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     12251      C   python                                      7789MiB |
|    1     12251      C   python                                      7693MiB |
+-----------------------------------------------------------------------------+

cfzd · 2019-01-22T07:33:34Z

@manmanCover I tested my implementation with moderate and aggressive memory settings. I found slightly unbalanced memory consumption, but neither of my GPUs is fully occupied.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51                 Driver Version: 375.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 0000:05:00.0     Off |                  N/A |
| 52%   71C    P2   101W / 250W |   4879MiB / 11170MiB |     83%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  On   | 0000:06:00.0     Off |                  N/A |
| 60%   79C    P2   108W / 250W |   5485MiB / 11172MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     17789    C   python                                        4877MiB |
|    1     17789    C   python                                        5483MiB |
+-----------------------------------------------------------------------------+

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51                 Driver Version: 375.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 0000:05:00.0     Off |                  N/A |
| 53%   71C    P2    70W / 250W |   8999MiB / 11170MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  On   | 0000:06:00.0     Off |                  N/A |
| 62%   81C    P2    83W / 250W |   9785MiB / 11172MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     18510    C   python                                        8995MiB |
|    1     18510    C   python                                        9781MiB |
+-----------------------------------------------------------------------------+

manmanCover · 2019-01-23T13:47:13Z

@cfzd Thank you for your test. I have checked that my input feature size is [32, 64, 128], how about yours?
By the way, are the input images from training dataset and test dataset must keep the same size?

manmanCover · 2019-01-28T13:49:59Z

@cfzd By the way, is your implementation also use the sliding windows? It seems like not...

cfzd · 2019-01-29T10:01:01Z

@manmanCover
64x128 is a large feature size for spatial attention method. In this case, the "over-completed map" would have a feature size of [32385,64,128]. In my implementation, I always keep the channel size of the "over-completed map" lower than 10000, because the spatial attention information doesn't have to be that precise.
As for the multi-scale test, you can use the adaptive pool module before this attention module.

cfzd · 2019-01-29T10:04:24Z

I did't find any description of sliding window in the paper and I didn't see any reason of using sliding window, as it is an attention module.

manmanCover · 2019-01-29T10:06:17Z

@cfzd yeah, adaptive pooling can also be a choice. The author of psanet said they use sliding windows with different input size (hszhao/PSANet#11 (comment)).
Here's how they use slide windows: https://github.com/hszhao/PSANet/blob/master/evaluation/scale_process.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDNN_STATUS_BAD_PARAM in loss.backward() #2

RuntimeError: CUDNN_STATUS_BAD_PARAM in loss.backward() #2

manmanCover commented Jan 21, 2019

cfzd commented Jan 21, 2019

manmanCover commented Jan 21, 2019

cfzd commented Jan 21, 2019

manmanCover commented Jan 21, 2019

cfzd commented Jan 22, 2019

manmanCover commented Jan 23, 2019

manmanCover commented Jan 28, 2019

cfzd commented Jan 29, 2019

cfzd commented Jan 29, 2019

manmanCover commented Jan 29, 2019 •

edited

RuntimeError: CUDNN_STATUS_BAD_PARAM in loss.backward() #2

RuntimeError: CUDNN_STATUS_BAD_PARAM in loss.backward() #2

Comments

manmanCover commented Jan 21, 2019

cfzd commented Jan 21, 2019

manmanCover commented Jan 21, 2019

cfzd commented Jan 21, 2019

manmanCover commented Jan 21, 2019

cfzd commented Jan 22, 2019

manmanCover commented Jan 23, 2019

manmanCover commented Jan 28, 2019

cfzd commented Jan 29, 2019

cfzd commented Jan 29, 2019

manmanCover commented Jan 29, 2019 • edited

manmanCover commented Jan 29, 2019 •

edited