Running Bloom #52

kamalkraj · 2022-07-07T13:11:59Z

What kind of machine is required to just run the inference on the 176B model? https://huggingface.co/bigscience/bloom

aljungberg · 2022-08-22T16:44:47Z

Right now, 8*80GB A100 or 16*40GB A100 [GPUs]. With the "accelerate" library you have offloading though so as long as you have enough RAM or even just disk for 300GB you're good to go (but slower).

Source: https://www.infoq.com/news/2022/07/bigscience-bloom-nlp-ai/

According to this post you can run it on consumer hardware at 3 minutes/token.

According to this post even on pretty good GPU hardware it can take 90 seconds/token though. Seems like you need really upper range systems to run it quickly.

celsofranssa · 2022-10-10T13:58:54Z

For inference only, what are the minimum requirements for RAM and GPU memories?

aljungberg · 2022-10-12T16:54:52Z

About 350 GB of GPU RAM (~200 GB if you quantise to int8).

celsofranssa · 2022-10-25T13:28:49Z

About 350 GB of GPU RAM (~200 GB if you quantise to int8).
For inference only?

aljungberg · 2022-10-25T20:29:40Z

Yep, need to get all those parameters into GPU RAM to run inference. Like I mentioned, you can use the accelerate framework to do "swapping" from CPU RAM to GPU RAM, which lets you do it with much less GPU RAM at a ridiculous speed penalty.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Bloom #52

Running Bloom #52

kamalkraj commented Jul 7, 2022

aljungberg commented Aug 22, 2022 •

edited

Loading

celsofranssa commented Oct 10, 2022

aljungberg commented Oct 12, 2022

celsofranssa commented Oct 25, 2022

aljungberg commented Oct 25, 2022

Running Bloom #52

Running Bloom #52

Comments

kamalkraj commented Jul 7, 2022

aljungberg commented Aug 22, 2022 • edited Loading

celsofranssa commented Oct 10, 2022

aljungberg commented Oct 12, 2022

celsofranssa commented Oct 25, 2022

aljungberg commented Oct 25, 2022

aljungberg commented Aug 22, 2022 •

edited

Loading