Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDNN_STATUS_BAD_PARAM in loss.backward() #2

Open
manmanCover opened this issue Jan 21, 2019 · 10 comments
Open

RuntimeError: CUDNN_STATUS_BAD_PARAM in loss.backward() #2

manmanCover opened this issue Jan 21, 2019 · 10 comments

Comments

@manmanCover
Copy link

Thank you for your wonderful code! Have you met this problem before and do you know how to solve it?

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-3021f006d740>", line 3, in <module>
    runfile('/home/Sarah/project/main_gpu.py', args=[---], wdir='/home/Sarah/project')
  File "/home/Sarah/pycharm/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/Sarah/project/main_gpu.py", line 244, in <module>
    main()
  File "/home/Sarah/project/main_gpu.py", line 212, in main
    loss = train(imgL_crop, imgR_crop, disp_crop_L)
  File "/home/Sarah/project/main_gpu.py", line 161, in train
    loss.backward()
  File "/home/Sarah/py40/local/lib/python2.7/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/Sarah/py40/local/lib/python2.7/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDNN_STATUS_BAD_PARAM
@cfzd
Copy link
Owner

cfzd commented Jan 21, 2019

As the problem occurred in the backward step, the network should have finished forward step. So I think the problem might be out of memory.
The spatial attention method could cost huge amount of memory. You can try to place the PSANet module on the last conv layer or reduce the size of input features.
If it is not a problem of out of memory, please tell me more information of your environment like PyTorch version, CUDA version, CuDNN version, etc.

@manmanCover
Copy link
Author

@cfzd
hi,
I think I found the problem.
In file PSANetFunc.py, the backward methods for both PSANetCollectFunction and PSANetDistributeFunction

b1_grad_n = mask_grad.shape[0]
		b1_grad_c = (2 * mask_grad.shape[2] - 1)*(2 * mask_grad.shape[3] - 1)
		b1_grad_h = mask_grad.shape[2]
		b1_grad_w = mask_grad.shape[3]
		**bottom1_grad = torch.zeros(b1_grad_n,b1_grad_c,b1_grad_w,b1_grad_h).cuda()**

b1_grad_w and b1_grad_h should be exchanged in bottom1_grad initialization.

@cfzd
Copy link
Owner

cfzd commented Jan 21, 2019

@manmanCover Sorry, I have fixed the code. It's really confusing that I never met this problem and my experiment gains better performance. I think I need to release a benchmark soon.

@manmanCover
Copy link
Author

@cfzd Maybe your samples are all squares.
By the way, do you notice that the memory consumption of your implementation is unbalanced? When I ran the project on 2 GPUs, one of them is almost fully occupied.

sarah@Battlebox2:~$ nvidia-smi
Mon Jan 21 13:18:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25                 Driver Version: 390.25                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |                  N/A |
| 49%   80C    P2   190W / 250W |   7799MiB / 12196MiB |     72%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |                  N/A |
| 56%   84C    P2   186W / 250W |  12044MiB / 12188MiB |     89%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     12251      C   python                                      7789MiB |
|    1     12251      C   python                                      7693MiB |
+-----------------------------------------------------------------------------+

@cfzd
Copy link
Owner

cfzd commented Jan 22, 2019

@manmanCover I tested my implementation with moderate and aggressive memory settings. I found slightly unbalanced memory consumption, but neither of my GPUs is fully occupied.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51                 Driver Version: 375.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 0000:05:00.0     Off |                  N/A |
| 52%   71C    P2   101W / 250W |   4879MiB / 11170MiB |     83%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  On   | 0000:06:00.0     Off |                  N/A |
| 60%   79C    P2   108W / 250W |   5485MiB / 11172MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     17789    C   python                                        4877MiB |
|    1     17789    C   python                                        5483MiB |
+-----------------------------------------------------------------------------+
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51                 Driver Version: 375.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 0000:05:00.0     Off |                  N/A |
| 53%   71C    P2    70W / 250W |   8999MiB / 11170MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  On   | 0000:06:00.0     Off |                  N/A |
| 62%   81C    P2    83W / 250W |   9785MiB / 11172MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     18510    C   python                                        8995MiB |
|    1     18510    C   python                                        9781MiB |
+-----------------------------------------------------------------------------+

@manmanCover
Copy link
Author

@cfzd Thank you for your test. I have checked that my input feature size is [32, 64, 128], how about yours?
By the way, are the input images from training dataset and test dataset must keep the same size?

@manmanCover
Copy link
Author

@cfzd By the way, is your implementation also use the sliding windows? It seems like not...

@cfzd
Copy link
Owner

cfzd commented Jan 29, 2019

@manmanCover
64x128 is a large feature size for spatial attention method. In this case, the "over-completed map" would have a feature size of [32385,64,128]. In my implementation, I always keep the channel size of the "over-completed map" lower than 10000, because the spatial attention information doesn't have to be that precise.
As for the multi-scale test, you can use the adaptive pool module before this attention module.

@cfzd
Copy link
Owner

cfzd commented Jan 29, 2019

I did't find any description of sliding window in the paper and I didn't see any reason of using sliding window, as it is an attention module.

@manmanCover
Copy link
Author

manmanCover commented Jan 29, 2019

@cfzd yeah, adaptive pooling can also be a choice. The author of psanet said they use sliding windows with different input size (hszhao/PSANet#11 (comment)).
Here's how they use slide windows: https://github.com/hszhao/PSANet/blob/master/evaluation/scale_process.m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants