RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 124.19 MiB free; 9.68 GiB reserved in total by PyTorch) #2

caixh39 · 2021-03-11T08:08:45Z

Hi, I had meet "RuntimeError: CUDA out of memory" when I reccurent, with "python -m torch.distributed.launch --nproc_per_node=2 --master_port 20011 train.py --gpu 0,1", and my env is RTX2080ti.
could you tell me what is nvidia device you used?
thanks you very much!

caixh39 · 2021-03-11T08:11:38Z

detail error info:
File "/home//env/tbrats/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home//project/segmentation/TransBTS/models/TransBTS/TransBTS_downsample8x_skipconnection.py", line 122, in forward
x1_1, x2_1, x3_1, encoder_output, intmd_encoder_outputs, auxillary_output_layers
File "/home//project/segmentation/TransBTS/models/TransBTS/TransBTS_downsample8x_skipconnection.py", line 229, in decode
y2 = self.DeBlock2(y2)
File "/home//env/tbrats/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home//project/segmentation/TransBTS/models/TransBTS/TransBTS_downsample8x_skipconnection.py", line 310, in forward
x1 = self.conv2(x1)
File "/home//env/tbrats/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home//env/tbrats/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 567, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 124.19 MiB free; 9.68 GiB reserved in total by
PyTorch)
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home//env/tbrats/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/home//env/tbrats/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)

Rubics-Xuan · 2021-03-11T08:47:02Z

We use 4 Nvidia Titan RTX GPUs (each has 24GB memory) for training the model. You can check the Implementation Details part of the paper for more details.

Rubics-Xuan closed this as completed Mar 11, 2021

Rubics-Xuan reopened this Mar 11, 2021

Rubics-Xuan closed this as completed Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 124.19 MiB free; 9.68 GiB reserved in total by PyTorch) #2

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 124.19 MiB free; 9.68 GiB reserved in total by PyTorch) #2

caixh39 commented Mar 11, 2021

caixh39 commented Mar 11, 2021

Rubics-Xuan commented Mar 11, 2021

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 124.19 MiB free; 9.68 GiB reserved in total by PyTorch) #2

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 10.76 GiB total capacity; 9.65 GiB already allocated; 124.19 MiB free; 9.68 GiB reserved in total by PyTorch) #2

Comments

caixh39 commented Mar 11, 2021

caixh39 commented Mar 11, 2021

Rubics-Xuan commented Mar 11, 2021