What GPU needed to finetune Large version? #27

Rai220 · 2020-11-10T12:28:20Z

I have 16Gb GPU and get CUDA out of memory error (for batch size = 1!):

RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 14.76 GiB total capacity; 13.25 GiB already allocated; 21.44 MiB free; 13.84 GiB reserved in total by PyTorch)

Is this memory really not enough to train the large version? May be there is some tips to reduce memory using on pretraining? I using such list of parameters:

    --per_gpu_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --overwrite_cache \
    --num_train_epochs 2 \
    --save_steps 1000 \
    --block_size 256 \
    --fp16

The text was updated successfully, but these errors were encountered:

OzoneReloaded · 2020-11-11T00:55:01Z

Hello! I've managed to run finetuning on 11 gb GPU with:

gpt_options="
--hidden-size 1024
--seq-length 1024
--cpu-optimizer
--cpu_torch_adam
"

Hope it helps. @Rai220

fen0s · 2020-11-12T16:25:36Z

I have 16Gb GPU and get CUDA out of memory error (for batch size = 1!):

RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 14.76 GiB total capacity; 13.25 GiB already allocated; 21.44 MiB free; 13.84 GiB reserved in total by PyTorch)

Is this memory really not enough to train the large version? May be there is some tips to reduce memory using on pretraining? I using such list of parameters:
    --per_gpu_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --overwrite_cache \
    --num_train_epochs 2 \
    --save_steps 1000 \
    --block_size 256 \
    --fp16

Apparently, optimization level of O3 helps, but I haven't quite figured out how to make it generate samples, it just outputs negative probability for some reason. The above answer is for GPT-3 large, not GPT-2 large, so...

fen0s · 2020-11-12T16:42:47Z

Basically what's needed is gradient checkpointing that was provided in one of transformers library versions. Not sure if I can implement it, especially considering that old versions of transformers library is used in here...

TatianaShavrina · 2020-11-13T13:06:20Z

Hey @Rai220 @fen0s The organizers gave participants the opportunity to get access to Cristofari. To get access, please send to AIJ_ruGPT-3@sberbank.ru your request with brief information about your project. We will review your request and get back to you. Please note that the number of such accesses is limited. If necessary, please leave your request as early as possible.

king-menin closed this as completed Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What GPU needed to finetune Large version? #27

What GPU needed to finetune Large version? #27

Rai220 commented Nov 10, 2020 •

edited

OzoneReloaded commented Nov 11, 2020 •

edited

fen0s commented Nov 12, 2020

fen0s commented Nov 12, 2020

TatianaShavrina commented Nov 13, 2020

What GPU needed to finetune Large version? #27

What GPU needed to finetune Large version? #27

Comments

Rai220 commented Nov 10, 2020 • edited

OzoneReloaded commented Nov 11, 2020 • edited

fen0s commented Nov 12, 2020

fen0s commented Nov 12, 2020

TatianaShavrina commented Nov 13, 2020

Rai220 commented Nov 10, 2020 •

edited

OzoneReloaded commented Nov 11, 2020 •

edited