Cannot use int8

I tried to use 8xA100 to run BLOOM. But I cannot do load_in_8bit. I tried to follow the instruction [here](https://gist.github.com/younesbelkada/073f0b7902cbed2cbff662996a74162e)  load the model by `model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto', load_in_8bit=True, max_memory=max_memory)` Basically, if I don't have max_memory=max_memory, then most memory would go the gpu:0 and then CUDA out of memory error. If I put max_memory=max_memory, it will throw 8-bit operation are not supported under CPU. 
<img width="991" alt="Screen Shot 2022-08-13 at 10 45 09 PM" src="https://user-images.githubusercontent.com/54600128/184524432-06afc3cb-0a08-4c5c-bb59-674dd05649c8.png">




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Cannot use int8 #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cannot use int8 #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions