Hello, thanks for your great work to make AutoGPTQ more useable.
I want to load GPTQ model and inference with BitBLAS backend.
The corresponding model loading code is:
model = GPTQModel.from_quantized(args.model, device_map='auto',torch_dtype=torch.float16, quantize_config=quant_config,backend=get_backend('BITBLAS'))
Such manner works on small models (4-bit 7B or 2-bit 13B). However, it failed with large model, such as 4-bit 13B. The raised error is:
