Can I use tensor_parallel to inference for a GPTQ quantized model? #131

minlik · 2023-11-15T09:40:48Z

What should I do if I want to use tensor_parallel for a GPTQ quantized model(Llama-2-7b-Chat-GPTQ for examlpe) to inference on 2 or more GPUs?

Currently, I am using AutoGPTQ to load the quantized model, and then use tp.tensor_parallel to make tensors distribute on diffenrence devices. But I am getting the following error: TypeError: cannot pickle 'module' object

Do you have any suggentions on this? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use tensor_parallel to inference for a GPTQ quantized model? #131

Can I use tensor_parallel to inference for a GPTQ quantized model? #131

minlik commented Nov 15, 2023

Can I use tensor_parallel to inference for a GPTQ quantized model? #131

Can I use tensor_parallel to inference for a GPTQ quantized model? #131

Comments

minlik commented Nov 15, 2023