Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading over multiple gpus in 8bit and 4bit with transformers loader #5

Open
RandomInternetPreson opened this issue Mar 28, 2024 · 4 comments

Comments

@RandomInternetPreson
Copy link

I can load the instruct model using the transformers loader and 8bit bits and bytes, I can get it to load evenly among multiple gpus.

However, I cannot seem to load the model with 4bit precion over multiple gpus, I managed to get the model to load across 1 24GB gpu and then start loading onto a second gpu of equivalent size, but it will not move on to any of the remaining gpus (7 in total). It will oom on the second gpu with the others sitting empty.

I've loaded other transformers based models via 4bit and never experience this heavily unbalanced loading before.

@jzwilliams07
Copy link

how to load the model in 8bit?

@lhl
Copy link

lhl commented Mar 28, 2024

I have this same issue, get an OOM/only uses a single GPU if I try to use bitsandbytes (load_in_8bit or load_in_4bit)...

@huhuhu5798
Copy link

how to load the model in 8bit or 4bit???

@RandomInternetPreson
Copy link
Author

Bitsandbytes library and a lot of ram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants