Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66 #13

Closed
clu5 opened this issue Jun 27, 2017 · 10 comments

Comments

@clu5
Copy link

clu5 commented Jun 27, 2017

I get this error trying to run the mnist example. I have a Titan X GPU so I don't think I should run out of memory on mnist. I'm using PyTorch version 0.1.12_2 and Python 3.

Generator (
  (block1): Sequential (
    (0): ConvTranspose2d(256, 128, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU (inplace)
  )
  (block2): Sequential (
    (0): ConvTranspose2d(128, 64, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU (inplace)
  )
  (deconv_out): ConvTranspose2d(64, 1, kernel_size=(8, 8), stride=(2, 2))
  (preprocess): Sequential (
    (0): Linear (128 -> 4096)
    (1): ReLU (inplace)
  )
  (sigmoid): Sigmoid ()
)
Discriminator (
  (main): Sequential (
    (0): Linear (784 -> 4096)
    (1): ReLU (inplace)
    (2): Linear (4096 -> 4096)
    (3): ReLU (inplace)
    (4): Linear (4096 -> 4096)
    (5): ReLU (inplace)
    (6): Linear (4096 -> 4096)
    (7): ReLU (inplace)
    (8): Linear (4096 -> 4096)
    (9): ReLU (inplace)
    (10): Linear (4096 -> 1)
  )
)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-c32a204873f5> in <module>()
      4 print(net_D)
      5 if use_cuda:
----> 6     net_D = net_D.cuda()
      7     net_G = net_G.cuda()
      8 opt_D = optim.Adam(net_D.parameters(), lr=1e04, betas=(0.5, 0.9))

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in cuda(self, device_id)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    122                 # Variables stored in modules are graph leaves, and we don't
    123                 # want to create copy nodes, so we have to unpack the data.
--> 124                 param.data = fn(param.data)
    125                 if param._grad is not None:
    126                     param._grad.data = fn(param._grad.data)

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in <lambda>(t)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/_utils.py in _cuda(self, device, async)
     63         else:
     64             new_type = getattr(torch.cuda, self.__class__.__name__)
---> 65             return new_type(self.size()).copy_(self, async)
     66 
     67 
@caogang
Copy link
Owner

caogang commented Jun 27, 2017

I figure out that you should use the latest pytorch master version from github, not the release version that you directly install using conda. And the latest version of wgan-gp has use nn.Conv2D instead of nn.Linear.

@caogang
Copy link
Owner

caogang commented Jun 27, 2017

Well, i am using python 2.7, maybe there is something wrong with the print function in your envirnment.

@clu5
Copy link
Author

clu5 commented Jun 27, 2017

I changed all the python 2 syntax to python 3 but I'll try updating my PyTorch version and see if that works.

@clu5
Copy link
Author

clu5 commented Jun 27, 2017

I updated PyTorch but now Torch can't find the attribute Cuda.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-b2a093d98c8b> in <module>()
     22 
     23 if use_cuda:
---> 24     netD = netD.cuda(gpu)
     25     netG = netG.cuda(gpu)
     26 

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in cuda(self, device_id)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    122                 # Variables stored in modules are graph leaves, and we don't
    123                 # want to create copy nodes, so we have to unpack the data.
--> 124                 param.data = fn(param.data)
    125                 if param._grad is not None:
    126                     param._grad.data = fn(param._grad.data)

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in <lambda>(t)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/_utils.py in _cuda(self, device, async)
     55         if device is None:
     56             device = -1
---> 57     with torch.cuda.device(device):
     58         if self.is_sparse:
     59             new_type = getattr(torch.cuda.sparse, self.__class__.__name__)

AttributeError: module 'torch' has no attribute 'cuda'

@caogang
Copy link
Owner

caogang commented Jun 28, 2017

Do you install it from source code of pytorch correctly? And python setup.py clean before you excute python setup install.

@LukasMosser
Copy link

I also get this issue after a few iterations depending on the size of the network.
It seems there might be some variables not being garbage collected here.

@caogang
Copy link
Owner

caogang commented Sep 1, 2017

Which code do you run? @LukasMosser @vishwakftw . And can you show me the related memory cost. I will check whether it cost the same memory for me.

@vishwakftw
Copy link

I am sorry about it. I had a BatchNorm2d layer which caused a known memory leak. I will retract my comment. Sorry. @LukasMosser can you check if you have a BatchNorm2d layer?

@LukasMosser
Copy link

Yes, I have a Batchnorm2d layer as well. Is that an issue already on pytorch/github that you could link?
Thanks for pointing that out, I wouldn't have come across it myself.

@vishwakftw
Copy link

@LukasMosser This is the issue: pytorch/pytorch#2287. There seems to be patch ready for it, but you may have to recompile it from source.

@caogang caogang closed this as completed Sep 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants