Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove model from accelerate prepare and add precision argument #61

Merged
merged 9 commits into from
Apr 21, 2023

Conversation

loubnabnl
Copy link
Collaborator

@loubnabnl loubnabnl commented Apr 20, 2023

Passing both model and dataloader to accelerate.prepare takes unnecessary memory as noticed by @RaymondLi0, which causes OOM for large models.
This is because the model is wrapped in the DistributedDataParallel class which will reserve memory for the gradients for training (issue). We now only wrap the dataloader, and we also add precision argument to properly load model in bf16 or fp16. (the mixed-precison accelerate argument in config is for mixed precision in training and will load two model copies..)

Todo: add cpu case

lm_eval/generation.py Outdated Show resolved Hide resolved
@loubnabnl loubnabnl merged commit 705b007 into main Apr 21, 2023
@loubnabnl loubnabnl deleted the handle-large-model branch June 12, 2023 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants