RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66 #13

clu5 · 2017-06-27T15:01:28Z

I get this error trying to run the mnist example. I have a Titan X GPU so I don't think I should run out of memory on mnist. I'm using PyTorch version 0.1.12_2 and Python 3.

Generator (
  (block1): Sequential (
    (0): ConvTranspose2d(256, 128, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU (inplace)
  )
  (block2): Sequential (
    (0): ConvTranspose2d(128, 64, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU (inplace)
  )
  (deconv_out): ConvTranspose2d(64, 1, kernel_size=(8, 8), stride=(2, 2))
  (preprocess): Sequential (
    (0): Linear (128 -> 4096)
    (1): ReLU (inplace)
  )
  (sigmoid): Sigmoid ()
)
Discriminator (
  (main): Sequential (
    (0): Linear (784 -> 4096)
    (1): ReLU (inplace)
    (2): Linear (4096 -> 4096)
    (3): ReLU (inplace)
    (4): Linear (4096 -> 4096)
    (5): ReLU (inplace)
    (6): Linear (4096 -> 4096)
    (7): ReLU (inplace)
    (8): Linear (4096 -> 4096)
    (9): ReLU (inplace)
    (10): Linear (4096 -> 1)
  )
)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-c32a204873f5> in <module>()
      4 print(net_D)
      5 if use_cuda:
----> 6     net_D = net_D.cuda()
      7     net_G = net_G.cuda()
      8 opt_D = optim.Adam(net_D.parameters(), lr=1e04, betas=(0.5, 0.9))

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in cuda(self, device_id)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    122                 # Variables stored in modules are graph leaves, and we don't
    123                 # want to create copy nodes, so we have to unpack the data.
--> 124                 param.data = fn(param.data)
    125                 if param._grad is not None:
    126                     param._grad.data = fn(param._grad.data)

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in <lambda>(t)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/_utils.py in _cuda(self, device, async)
     63         else:
     64             new_type = getattr(torch.cuda, self.__class__.__name__)
---> 65             return new_type(self.size()).copy_(self, async)
     66 
     67

The text was updated successfully, but these errors were encountered:

caogang · 2017-06-27T15:50:00Z

I figure out that you should use the latest pytorch master version from github, not the release version that you directly install using conda. And the latest version of wgan-gp has use nn.Conv2D instead of nn.Linear.

caogang · 2017-06-27T15:51:35Z

Well, i am using python 2.7, maybe there is something wrong with the print function in your envirnment.

clu5 · 2017-06-27T17:10:45Z

I changed all the python 2 syntax to python 3 but I'll try updating my PyTorch version and see if that works.

clu5 · 2017-06-27T17:24:06Z

I updated PyTorch but now Torch can't find the attribute Cuda.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-b2a093d98c8b> in <module>()
     22 
     23 if use_cuda:
---> 24     netD = netD.cuda(gpu)
     25     netG = netG.cuda(gpu)
     26 

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in cuda(self, device_id)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    122                 # Variables stored in modules are graph leaves, and we don't
    123                 # want to create copy nodes, so we have to unpack the data.
--> 124                 param.data = fn(param.data)
    125                 if param._grad is not None:
    126                     param._grad.data = fn(param._grad.data)

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in <lambda>(t)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/_utils.py in _cuda(self, device, async)
     55         if device is None:
     56             device = -1
---> 57     with torch.cuda.device(device):
     58         if self.is_sparse:
     59             new_type = getattr(torch.cuda.sparse, self.__class__.__name__)

AttributeError: module 'torch' has no attribute 'cuda'

caogang · 2017-06-28T00:55:01Z

Do you install it from source code of pytorch correctly? And python setup.py clean before you excute python setup install.

LukasMosser · 2017-09-01T08:47:25Z

I also get this issue after a few iterations depending on the size of the network.
It seems there might be some variables not being garbage collected here.

caogang · 2017-09-01T09:21:30Z

Which code do you run? @LukasMosser @vishwakftw . And can you show me the related memory cost. I will check whether it cost the same memory for me.

vishwakftw · 2017-09-01T09:32:44Z

I am sorry about it. I had a BatchNorm2d layer which caused a known memory leak. I will retract my comment. Sorry. @LukasMosser can you check if you have a BatchNorm2d layer?

LukasMosser · 2017-09-02T06:55:45Z

Yes, I have a Batchnorm2d layer as well. Is that an issue already on pytorch/github that you could link?
Thanks for pointing that out, I wouldn't have come across it myself.

vishwakftw · 2017-09-02T06:57:48Z

@LukasMosser This is the issue: pytorch/pytorch#2287. There seems to be patch ready for it, but you may have to recompile it from source.

caogang closed this as completed Sep 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66 #13

RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66 #13

clu5 commented Jun 27, 2017

caogang commented Jun 27, 2017

caogang commented Jun 27, 2017

clu5 commented Jun 27, 2017

clu5 commented Jun 27, 2017

caogang commented Jun 28, 2017

LukasMosser commented Sep 1, 2017

caogang commented Sep 1, 2017

vishwakftw commented Sep 1, 2017

LukasMosser commented Sep 2, 2017

vishwakftw commented Sep 2, 2017

RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66 #13

RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66 #13

Comments

clu5 commented Jun 27, 2017

caogang commented Jun 27, 2017

caogang commented Jun 27, 2017

clu5 commented Jun 27, 2017

clu5 commented Jun 27, 2017

caogang commented Jun 28, 2017

LukasMosser commented Sep 1, 2017

caogang commented Sep 1, 2017

vishwakftw commented Sep 1, 2017

LukasMosser commented Sep 2, 2017

vishwakftw commented Sep 2, 2017