Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA running out of memory #2

Closed
silakanveli opened this issue Nov 28, 2017 · 3 comments
Closed

CUDA running out of memory #2

silakanveli opened this issue Nov 28, 2017 · 3 comments

Comments

@silakanveli
Copy link

silakanveli commented Nov 28, 2017

Hi

Thanks for this wonderful script. It is really helpful when testing various models!
I have issue of running out of memory in GPU. I know that this is NOT exactly a bug too. This is a CUDA memory issue.

Is there any way to reduce GPU memory usage. I only have 2 GB on my Geforce GTX 1050.

Only happens when training from scratch and training Deep

This is the error:

[29, 30] loss: nan [0.0044375000000000005]
[30, 30] loss: nan [0.0043333333333333392]
[31, 30] loss: nan [0.0011041666666666609]
[32, 30] loss: nan [0.0041250000000000002]
Finished Training
Evaluating...
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "retrain.py", line 380, in
CLR=use_clr)
File "retrain.py", line 322, in train_eval
stats_eval = evaluate_stats(net, testloader)
File "retrain.py", line 304, in evaluate_stats
outputs = net(Variable(images))
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 58, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib/python3.6/site-packages/torchvision/models/inception.py", line 81, in forward
x = self.Conv2d_2b_3x3(x)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib/python3.6/site-packages/torchvision/models/inception.py", line 325, in forward
x = self.bn(x)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 37, in forward
self.training, self.momentum, self.eps)
File "/usr/lib64/python3.6/site-packages/torch/nn/functional.py", line 639, in batch_norm
return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:66
[tomppa@localhost pytorch-retraining]$

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.22 Driver Version: 387.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:01:00.0 On | N/A |
| 54% 58C P0 N/A / 75W | 1942MiB / 1998MiB | 84% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1405 G /usr/libexec/Xorg 18MiB |
| 0 1444 G /usr/bin/gnome-shell 42MiB |
| 0 1776 G /usr/libexec/Xorg 114MiB |
| 0 1870 G /usr/bin/gnome-shell 87MiB |
| 0 6652 G gnome-control-center 1MiB |
| 0 7139 C python3 1665MiB |
+-----------------------------------------------------------------------------+

CUDA version:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

@ahirner
Copy link
Owner

ahirner commented Nov 28, 2017

Yes, try to decrease the batch size first. I'm not sure how low you have to go for a 2gig card though.

@silakanveli
Copy link
Author

That Solved it, thanks!

Looks like 2gig card is is not too much to play with

@ahirner
Copy link
Owner

ahirner commented Nov 29, 2017

Great. Keep in mind that the optimal learning rate is affected by batch size. Good luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants