OOM error #8

taineleau · 2021-08-20T20:27:02Z

Hi, I tried the seq2seq prefixtuning and found:

RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 15.90 GiB total capacity; 4.63 GiB already allocated; 797.50 MiB free; 5.81 GiB reserved in total by PyTorch)

I run the expr on a 16GB GPU. Am I supposed to use a 32GB GPU instead? Thanks!

The text was updated successfully, but these errors were encountered:

XiangLi1999 · 2021-08-22T07:00:18Z

Hi,

I used a 32GB GPU for the XSUM experiments. You could either switch to a GPU with larger memory, or you could reduce the bsz and increase the gradient_accumulation_steps.

StevenTang1998 · 2021-08-23T05:29:03Z

Hi, I used one GPU (Tesla V100 SXM2 32GB) and used the command in the homepage.
However, I still had the OOM problem with bsz=16 or =12. If I set the bsz to 8 and the OOM will disappear.
So, is the command in the homepage the command used to reproduce the paper?

XiangLi1999 · 2021-08-23T06:19:47Z

Hi,

it's the command to reproduce.

Could you check if you have --fp16 yes and whether this turn on half-precision? This should turn on half precision, so that bsz=16 could fit.

Side Note: I used AWS single GPU (I think it's A100) to run all XSUM experiments.

StevenTang1998 · 2021-08-23T06:35:47Z

I have --fp 16 yes and how can I see whether this turn on half-precision?

Side Note: the A100 in AWS has 40GB GPU memory rather than 32GB.

XiangLi1999 · 2021-08-23T06:39:23Z

Maybe check if your stdout contains this: Using native 16bit precision.

StevenTang1998 · 2021-08-23T06:44:26Z

Thanks!
My stdout contains: Using native 16bit precision.
It may be due to the GPU memory and I reduce the bsz to train the model.

By the way, what does the --mid_dim mean?

XiangLi1999 · 2021-08-23T06:51:12Z

It means the dim of the MLP's middle layer! (we use an MLP for re-parametrization.)

StevenTang1998 · 2021-08-23T06:52:10Z

Got it! Thanks for your answer!

taineleau · 2021-08-26T17:15:06Z

Thanks! My mistake.
I just figured out I didn't change the bsz in the right way so that I still get OOM even if I made bsz as 1.

XiangLi1999 closed this as completed Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM error #8

OOM error #8

taineleau commented Aug 20, 2021

XiangLi1999 commented Aug 22, 2021

StevenTang1998 commented Aug 23, 2021

XiangLi1999 commented Aug 23, 2021

StevenTang1998 commented Aug 23, 2021

XiangLi1999 commented Aug 23, 2021

StevenTang1998 commented Aug 23, 2021

XiangLi1999 commented Aug 23, 2021

StevenTang1998 commented Aug 23, 2021

taineleau commented Aug 26, 2021

OOM error #8

OOM error #8

Comments

taineleau commented Aug 20, 2021

XiangLi1999 commented Aug 22, 2021

StevenTang1998 commented Aug 23, 2021

XiangLi1999 commented Aug 23, 2021

StevenTang1998 commented Aug 23, 2021

XiangLi1999 commented Aug 23, 2021

StevenTang1998 commented Aug 23, 2021

XiangLi1999 commented Aug 23, 2021

StevenTang1998 commented Aug 23, 2021

taineleau commented Aug 26, 2021