[BUG] The loading of sharded checkpoints with BitBLAS is currently not supported.

Hello, thanks for your great work to make AutoGPTQ more useable.

I want to load GPTQ model and inference with BitBLAS backend.

The corresponding model loading code is:

```
model = GPTQModel.from_quantized(args.model, device_map='auto',torch_dtype=torch.float16, quantize_config=quant_config,backend=get_backend('BITBLAS'))
```

Such manner works on small models （4-bit 7B or 2-bit 13B）. However, it failed with large model, such as 4-bit 13B. The raised error is:
![image](https://github.com/user-attachments/assets/7d96dc41-56de-42c0-b71a-1cfa187ae668)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] The loading of sharded checkpoints with BitBLAS is currently not supported. #252

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] The loading of sharded checkpoints with BitBLAS is currently not supported. #252

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions