-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU #385
Comments
Same question. |
I also faced this error when I tried to quantize the model |
You just need to add |
|
Have you solved it?I faced the same question |
Hello, I've tried following code to change this config, but this error still remains. config = AutoConfig.from_pretrained("Qwen/Qwen-7B-Chat-Int4",
trust_remote_code=True)
config.disable_exllama = True
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat-Int4",
config=config,
device_map="cpu",
trust_remote_code=True).eval() Could you tell me if I'm doing wrong? Edit, using following code to skip this error, but as JustinLin610 said, int4 is not working on CPU. config = AutoConfig.from_pretrained("Qwen/Qwen-7B-Chat-Int4",
trust_remote_code=True)
config.quantization_config["disable_exllama"] = True
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat-Int4",
config=config,
device_map="cpu",
trust_remote_code=True).eval() |
change the config.json in your model file and add disable_exllama: true to quantization_config section. |
I think you guys are using int4 models on CPU. It is not supported! If you would like to use it on CPU, I advise you to check our new project qwen.cpp |
Same problem. Just add disable_exllama=True at the quantization_config filed of config.json file |
I also face this question. I add disable_exllama:true into the 7B-int4/quantize_config.json. but problem is still in there. So how can i fix this? |
Sorry i have a mistake in the file name, i guess add the disable_exllama:true into the config.json/quantization_config is the right way to fix this. am i right |
这个配置会让整个推理变得很慢。 正确的做法是:AutoGPTQ/AutoGPTQ#406 |
各位,这里好几个问题混了:
|
你好,在A100上可以正常运行chat-int4,但是在Tesla T4上,不管单卡和多卡都会报错 |
请问如何导出int4的权重,模型load进来打印发现权重都是float16的 |
应该在模型文件夹里 |
那如果我想用在GPU上微调int4版本的模型,应该修改哪个部分?finetune.py中找不到修改的地方? |
我尝试修改config.json中的quantization_config: {"use_exllama": false},解决了该问题。 translation:
|
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
Qwen-7B-Chat-Int4微调报错,这是什么原因呢
File "/home/llm/qwen/fine-tune/finetune.py", line 353, in
train()
File "/home/llm/qwen/fine-tune/finetune.py", line 294, in train
model = transformers.AutoModelForCausalLM.from_pretrained(
File "/home/.conda/envs/qwen_env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 511, in from_pretrained
return model_class.from_pretrained(
File "/home/.conda/envs/qwen_env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3161, in from_pretrained
model = quantizer.post_init_model(model)
File "/home/jing.yu/.conda/envs/qwen_env/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting
disable_exllama=True
in the quantization config object期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: