Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 124.19 MiB free; 9.68 GiB reserved in total by PyTorch) #2

Closed
caixh39 opened this issue Mar 11, 2021 · 2 comments

Comments

@caixh39
Copy link

caixh39 commented Mar 11, 2021

Hi, I had meet "RuntimeError: CUDA out of memory" when I reccurent, with "python -m torch.distributed.launch --nproc_per_node=2 --master_port 20011 train.py --gpu 0,1", and my env is RTX2080ti.
could you tell me what is nvidia device you used?
thanks you very much!

@caixh39
Copy link
Author

caixh39 commented Mar 11, 2021

detail error info:
File "/home//env/tbrats/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home/
/project/segmentation/TransBTS/models/TransBTS/TransBTS_downsample8x_skipconnection.py", line 122, in forward
x1_1, x2_1, x3_1, encoder_output, intmd_encoder_outputs, auxillary_output_layers
File "/home/
/project/segmentation/TransBTS/models/TransBTS/TransBTS_downsample8x_skipconnection.py", line 229, in decode
y2 = self.DeBlock2(y2)
File "/home//env/tbrats/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home/
/project/segmentation/TransBTS/models/TransBTS/TransBTS_downsample8x_skipconnection.py", line 310, in forward
x1 = self.conv2(x1)
File "/home/
/env/tbrats/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home/
/env/tbrats/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 567, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 124.19 MiB free; 9.68 GiB reserved in total by
PyTorch)
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home//env/tbrats/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home/
/env/tbrats/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)

@Rubics-Xuan
Copy link
Owner

We use 4 Nvidia Titan RTX GPUs (each has 24GB memory) for training the model. You can check the Implementation Details part of the paper for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants