Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Bloom #52

Open
kamalkraj opened this issue Jul 7, 2022 · 5 comments
Open

Running Bloom #52

kamalkraj opened this issue Jul 7, 2022 · 5 comments

Comments

@kamalkraj
Copy link

What kind of machine is required to just run the inference on the 176B model? https://huggingface.co/bigscience/bloom

@aljungberg
Copy link

aljungberg commented Aug 22, 2022

Right now, 8*80GB A100 or 16*40GB A100 [GPUs]. With the "accelerate" library you have offloading though so as long as you have enough RAM or even just disk for 300GB you're good to go (but slower).

Source: https://www.infoq.com/news/2022/07/bigscience-bloom-nlp-ai/

According to this post you can run it on consumer hardware at 3 minutes/token.

According to this post even on pretty good GPU hardware it can take 90 seconds/token though. Seems like you need really upper range systems to run it quickly.

@celsofranssa
Copy link

For inference only, what are the minimum requirements for RAM and GPU memories?

@aljungberg
Copy link

About 350 GB of GPU RAM (~200 GB if you quantise to int8).

@celsofranssa
Copy link

About 350 GB of GPU RAM (~200 GB if you quantise to int8).
For inference only?

@aljungberg
Copy link

Yep, need to get all those parameters into GPU RAM to run inference. Like I mentioned, you can use the accelerate framework to do "swapping" from CPU RAM to GPU RAM, which lets you do it with much less GPU RAM at a ridiculous speed penalty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants