CUDA running out of memory #2

silakanveli · 2017-11-28T17:26:07Z

Hi

Thanks for this wonderful script. It is really helpful when testing various models!
I have issue of running out of memory in GPU. I know that this is NOT exactly a bug too. This is a CUDA memory issue.

Is there any way to reduce GPU memory usage. I only have 2 GB on my Geforce GTX 1050.

Only happens when training from scratch and training Deep

This is the error:

[29, 30] loss: nan [0.0044375000000000005]
[30, 30] loss: nan [0.0043333333333333392]
[31, 30] loss: nan [0.0011041666666666609]
[32, 30] loss: nan [0.0041250000000000002]
Finished Training
Evaluating...
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "retrain.py", line 380, in
CLR=use_clr)
File "retrain.py", line 322, in train_eval
stats_eval = evaluate_stats(net, testloader)
File "retrain.py", line 304, in evaluate_stats
outputs = net(Variable(images))
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 58, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib/python3.6/site-packages/torchvision/models/inception.py", line 81, in forward
x = self.Conv2d_2b_3x3(x)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib/python3.6/site-packages/torchvision/models/inception.py", line 325, in forward
x = self.bn(x)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/lib64/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 37, in forward
self.training, self.momentum, self.eps)
File "/usr/lib64/python3.6/site-packages/torch/nn/functional.py", line 639, in batch_norm
return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:66
[tomppa@localhost pytorch-retraining]$

nvidia-smi

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1405 G /usr/libexec/Xorg 18MiB |
| 0 1444 G /usr/bin/gnome-shell 42MiB |
| 0 1776 G /usr/libexec/Xorg 114MiB |
| 0 1870 G /usr/bin/gnome-shell 87MiB |
| 0 6652 G gnome-control-center 1MiB |
| 0 7139 C python3 1665MiB |
+-----------------------------------------------------------------------------+

CUDA version:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

ahirner · 2017-11-28T22:47:35Z

Yes, try to decrease the batch size first. I'm not sure how low you have to go for a 2gig card though.

silakanveli · 2017-11-29T11:20:34Z

That Solved it, thanks!

Looks like 2gig card is is not too much to play with

ahirner · 2017-11-29T11:48:22Z

Great. Keep in mind that the optimal learning rate is affected by batch size. Good luck.

silakanveli closed this as completed Nov 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA running out of memory #2

CUDA running out of memory #2

silakanveli commented Nov 28, 2017 •

edited

Loading

ahirner commented Nov 28, 2017

silakanveli commented Nov 29, 2017

ahirner commented Nov 29, 2017

CUDA running out of memory #2

CUDA running out of memory #2

Comments

silakanveli commented Nov 28, 2017 • edited Loading

ahirner commented Nov 28, 2017

silakanveli commented Nov 29, 2017

ahirner commented Nov 29, 2017

silakanveli commented Nov 28, 2017 •

edited

Loading