-
-
Notifications
You must be signed in to change notification settings - Fork 796
Closed
Description
I tried to use 8xA100 to run BLOOM. But I cannot do load_in_8bit. I tried to follow the instruction here load the model by model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto', load_in_8bit=True, max_memory=max_memory) Basically, if I don't have max_memory=max_memory, then most memory would go the gpu:0 and then CUDA out of memory error. If I put max_memory=max_memory, it will throw 8-bit operation are not supported under CPU.

Metadata
Metadata
Assignees
Labels
No labels