-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sharded quantized model files in from_quantized
#319
Comments
try changing your MODEL_BASENAME to |
I met the same problem when inference (Qwen-VL-Chat-Int4) with “AutoGPTQForCausalLM.from_quantized”: I check the and the name of quantized model : model-00001-of-00005.safetensors,model-00002-of-00005.safetensors.... How should I solve the problem? |
Hi @shakealeg, the model loading was improved in #383. Please pass the argument
@CrazyBrick There is a WIP PR to support sharded models in AutoGPTQ: #364 |
from_quantized
Hi,@fxmarty,Thank you for you reply.I changed the code according to the PR#364(mainly in ``),but it dosen't work.
But I have It won't find the whole model through one splitname(when I change the then I will get The following is the code I am using:
what should I do? |
Hi @CrazyBrick, the model you are trying to use uses sharded checkpoints. This is unfortunately currently not supported in AutoGPTQ, there is a WIP PR open for it #364 |
do you solve the bug? Met the same situation..... |
I've been using 0cc4m GPTQ for a while and it's been smooth sailing, no errors. But when I try AutoGPTQ, it just, doesn't work. I'm trying to load Pygmalion 7b 4bit 32g, from TehVenom. And I get the following error:
My file structure looks like this:
My code is the following:
Overall, I'm confused about what to do to fix it and understand AutoGPTQ. If anyone could help, that would be appreciated.
The text was updated successfully, but these errors were encountered: