-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What GPU needed to finetune Large version? #27
Comments
Hello! I've managed to run finetuning on 11 gb GPU with:
Hope it helps. @Rai220 |
Apparently, optimization level of O3 helps, but I haven't quite figured out how to make it generate samples, it just outputs negative probability for some reason. The above answer is for GPT-3 large, not GPT-2 large, so... |
Basically what's needed is gradient checkpointing that was provided in one of transformers library versions. Not sure if I can implement it, especially considering that old versions of transformers library is used in here... |
Hey @Rai220 @fen0s The organizers gave participants the opportunity to get access to Cristofari. To get access, please send to AIJ_ruGPT-3@sberbank.ru your request with brief information about your project. We will review your request and get back to you. Please note that the number of such accesses is limited. If necessary, please leave your request as early as possible. |
I have 16Gb GPU and get CUDA out of memory error (for batch size = 1!):
RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 14.76 GiB total capacity; 13.25 GiB already allocated; 21.44 MiB free; 13.84 GiB reserved in total by PyTorch)
Is this memory really not enough to train the large version? May be there is some tips to reduce memory using on pretraining? I using such list of parameters:
The text was updated successfully, but these errors were encountered: