[BUG] Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU #385

studyhardstudyhard · 2023-09-28T11:58:53Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

Qwen-7B-Chat-Int4微调报错，这是什么原因呢
File "/home/llm/qwen/fine-tune/finetune.py", line 353, in
train()
File "/home/llm/qwen/fine-tune/finetune.py", line 294, in train
model = transformers.AutoModelForCausalLM.from_pretrained(
File "/home/.conda/envs/qwen_env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 511, in from_pretrained
return model_class.from_pretrained(
File "/home/.conda/envs/qwen_env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3161, in from_pretrained
model = quantizer.post_init_model(model)
File "/home/jing.yu/.conda/envs/qwen_env/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config object

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

The text was updated successfully, but these errors were encountered:

Moemu · 2023-09-29T17:56:07Z

Same question.

omega-accelr · 2023-10-01T16:32:24Z

I also faced this error when I tried to quantize the model

Tejaswgupta · 2023-10-03T16:45:57Z

You just need to add disable_exllama=True in the config.json

zhuqiangqiangqiang · 2023-10-04T03:13:38Z

您只需添加disable_exllama=True在config.json
May I ask where config.json is located?

csuer411 · 2023-10-04T14:12:33Z

Have you solved it?I faced the same question

dragove · 2023-10-05T12:10:06Z

You just need to add disable_exllama=True in the config.json

Hello, I've tried following code to change this config, but this error still remains.

config = AutoConfig.from_pretrained("Qwen/Qwen-7B-Chat-Int4",
                                          trust_remote_code=True)
config.disable_exllama = True
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat-Int4",
                                             config=config,
                                             device_map="cpu",
                                             trust_remote_code=True).eval()

Could you tell me if I'm doing wrong?

Edit, using following code to skip this error, but as JustinLin610 said, int4 is not working on CPU.

config = AutoConfig.from_pretrained("Qwen/Qwen-7B-Chat-Int4",
                                          trust_remote_code=True)
config.quantization_config["disable_exllama"] = True
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat-Int4",
                                             config=config,
                                             device_map="cpu",
                                             trust_remote_code=True).eval()

ilovesouthpark · 2023-10-05T13:06:29Z

change the config.json in your model file and add disable_exllama: true to quantization_config section.

JustinLin610 · 2023-10-07T05:29:20Z

I think you guys are using int4 models on CPU. It is not supported! If you would like to use it on CPU, I advise you to check our new project qwen.cpp

x1ngzai · 2023-10-08T14:48:49Z

Same problem. Just add disable_exllama=True at the quantization_config filed of config.json file

sjhm131 · 2023-10-26T09:17:42Z

I also face this question. I add disable_exllama:true into the 7B-int4/quantize_config.json. but problem is still in there. So how can i fix this?

sjhm131 · 2023-10-26T09:21:30Z

I also face this question. I add disable_exllama:true into the 7B-int4/quantize_config.json. but problem is still in there. So how can i fix this?

Sorry i have a mistake in the file name, i guess add the disable_exllama:true into the config.json/quantization_config is the right way to fix this. am i right

tigerinus · 2023-11-07T04:06:04Z

Same problem. Just add disable_exllama=True at the quantization_config filed of config.json file

这个配置会让整个推理变得很慢。

正确的做法是：AutoGPTQ/AutoGPTQ#406

jklj077 · 2023-11-08T13:58:32Z

各位，这里好几个问题混了：

量化模型不支持CPU推理：旧版AutoGPTQ (<5.0.0)是不支持的CPU推理的，新版AutoGPTQ有实验性的支持。
量化模型GPU推理，但exllama报错：
- exllama提供了一种高效的kernel实现，仅支持GPTQ方式量化得到的int4模型和Modern GPU，需要所有模型参数在GPU上。AutoGPTQ旧版支持使用exllama的kernel，新版(5.0.0)支持exllama v2的kernel，可开可关，速度影响、显存占用影响，可以参考AutoGPTQ的benchmark。
- 在我们的样例代码中，部分使用了device_map="auto"，目的是缓解大模型加载的内存/显存压力，有可能一部分参数到内存里去了（可以查看模型的hf_device_map确认分配是否合理）。需要改成device_map="cuda:0"，这样会全加载到第一张GPU上（或者类似的方法，就是需要让模型都在GPU上）。如果硬件支持、软件版本匹配的话，"Found modules on cpu/disk"的报错可以解决。
  - device_map参数是利用Hugging Face Accelerate来支持的，它的语义跟device是不等价的，后面也不容易再修改模型参数的位置。
  - 如果你的内存/显存确实是不够用，那建议还是用device_map="auto"，然后关掉exllama。
- 如果是exllama不支持，比如int8或者比较旧的显卡，就需要关掉它，GPU上也是能推理运行的。在config.json的quantization_config字段或者代码中，旧版transformers是disable_exllama=True，新版transformers是use_exllama=False，按transformers的版本相应修改。

CrazyBrick · 2023-11-09T09:11:06Z

各位，这里好几个问题混了：

1. 量化模型不支持CPU推理：旧版AutoGPTQ (<5.0.0)是不支持的CPU推理的，新版AutoGPTQ有实验性的支持。

2. 量化模型GPU推理，但exllama报错：
   
   * exllama提供了一种高效的kernel实现，仅支持GPTQ方式量化得到的int4模型和Modern GPU，需要所有模型参数在GPU上。AutoGPTQ旧版支持使用exllama的kernel，新版(5.0.0)支持exllama v2的kernel，可开可关，速度影响、显存占用影响，可以参考AutoGPTQ的[benchmark](https://github.com/huggingface/optimum/tree/main/tests/benchmark#batch-size--1)。
   * 在我们的样例代码中，部分使用了`device_map="auto"`，目的是缓解大模型加载的内存/显存压力，有可能一部分参数到内存里去了（可以查看模型的`hf_device_map`确认分配是否合理）。需要改成`device_map="cuda:0"`，这样会全加载到第一张GPU上（或者类似的方法，就是需要让模型都在GPU上）。如果硬件支持、软件版本匹配的话，"Found modules on cpu/disk"的报错可以解决。
     
     * `device_map`参数是利用Hugging Face Accelerate来支持的，它的语义跟`device`是不等价的，后面也不容易再修改模型参数的位置。
     * 如果你的内存/显存确实是不够用，那建议还是用`device_map="auto"`，然后关掉exllama。
   * 如果是exllama不支持，比如int8或者比较旧的显卡，就需要关掉它，GPU上也是能推理运行的。在quantization_config.json或者config.json的quantization_config字段或者代码中，旧版transformers是disable_exllama=True，新版transformers是use_exllama=False，按transformers的版本相应修改。

你好，在A100上可以正常运行chat-int4，但是在Tesla T4上，不管单卡和多卡都会报错no kernel image is available for execution on the device，这种是因为什么原因呢（用的都是torch2.0.1，驱动都是可以支持cuda117的，且torch都可用），就是在T4上会报错

ColdCodeCool · 2023-12-08T07:01:48Z

请问如何导出int4的权重，模型load进来打印发现权重都是float16的

PlanetesDDH · 2023-12-14T10:19:56Z

您只需添加disable_exllama=True在config.json
May I ask where config.json is located?

应该在模型文件夹里

danjuan-77 · 2024-02-23T15:01:34Z

各位，这里好几个问题混了：

量化模型不支持CPU推理：旧版AutoGPTQ (<5.0.0)是不支持的CPU推理的，新版AutoGPTQ有实验性的支持。

量化模型GPU推理，但exllama报错：

exllama提供了一种高效的kernel实现，仅支持GPTQ方式量化得到的int4模型和Modern GPU，需要所有模型参数在GPU上。AutoGPTQ旧版支持使用exllama的kernel，新版(5.0.0)支持exllama v2的kernel，可开可关，速度影响、显存占用影响，可以参考AutoGPTQ的benchmark。

在我们的样例代码中，部分使用了device_map="auto"，目的是缓解大模型加载的内存/显存压力，有可能一部分参数到内存里去了（可以查看模型的hf_device_map确认分配是否合理）。需要改成device_map="cuda:0"，这样会全加载到第一张GPU上（或者类似的方法，就是需要让模型都在GPU上）。如果硬件支持、软件版本匹配的话，"Found modules on cpu/disk"的报错可以解决。

device_map参数是利用Hugging Face Accelerate来支持的，它的语义跟device是不等价的，后面也不容易再修改模型参数的位置。

如果你的内存/显存确实是不够用，那建议还是用device_map="auto"，然后关掉exllama。

如果是exllama不支持，比如int8或者比较旧的显卡，就需要关掉它，GPU上也是能推理运行的。在config.json的quantization_config字段或者代码中，旧版transformers是disable_exllama=True，新版transformers是use_exllama=False，按transformers的版本相应修改。

那如果我想用在GPU上微调int4版本的模型，应该修改哪个部分？finetune.py中找不到修改的地方？

feb-cloud · 2024-05-14T09:33:34Z

我尝试修改config.json中的quantization_config: {"use_exllama": false}，解决了该问题。
config.json配置文件中，use_exllama默认值为true，现在将其更改为false，如下例所示

translation:
I tried to modify the quantization_config in config.json: {"use_exllama": false}, which solved the problem.
config.json，The default value of use_exllama is true, now change it to false as shown in the following example.

{ ...... "quantization_config": { "use_exllama": false }, ...... }

jklj077 changed the title ~~[BUG] <title>~~ [BUG] Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU Oct 8, 2023

jklj077 closed this as completed Nov 8, 2023

Jiefei-Wang mentioned this issue Nov 19, 2023

[BUG] 微调中 use_exllama 和 disable_exllama 在代码中不生效 #649

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU #385

[BUG] Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU #385

studyhardstudyhard commented Sep 28, 2023

Moemu commented Sep 29, 2023

omega-accelr commented Oct 1, 2023

Tejaswgupta commented Oct 3, 2023

zhuqiangqiangqiang commented Oct 4, 2023

csuer411 commented Oct 4, 2023

dragove commented Oct 5, 2023 •

edited

Loading

ilovesouthpark commented Oct 5, 2023

JustinLin610 commented Oct 7, 2023

x1ngzai commented Oct 8, 2023

sjhm131 commented Oct 26, 2023

sjhm131 commented Oct 26, 2023

tigerinus commented Nov 7, 2023

jklj077 commented Nov 8, 2023 •

edited

Loading

CrazyBrick commented Nov 9, 2023

ColdCodeCool commented Dec 8, 2023

PlanetesDDH commented Dec 14, 2023

danjuan-77 commented Feb 23, 2024

feb-cloud commented May 14, 2024

[BUG] Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU #385

[BUG] Qwen-7B-Chat-Int4微调报错 Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU #385

Comments

studyhardstudyhard commented Sep 28, 2023

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Moemu commented Sep 29, 2023

omega-accelr commented Oct 1, 2023

Tejaswgupta commented Oct 3, 2023

zhuqiangqiangqiang commented Oct 4, 2023

csuer411 commented Oct 4, 2023

dragove commented Oct 5, 2023 • edited Loading

ilovesouthpark commented Oct 5, 2023

JustinLin610 commented Oct 7, 2023

x1ngzai commented Oct 8, 2023

sjhm131 commented Oct 26, 2023

sjhm131 commented Oct 26, 2023

tigerinus commented Nov 7, 2023

jklj077 commented Nov 8, 2023 • edited Loading

CrazyBrick commented Nov 9, 2023

ColdCodeCool commented Dec 8, 2023

PlanetesDDH commented Dec 14, 2023

danjuan-77 commented Feb 23, 2024

feb-cloud commented May 14, 2024

dragove commented Oct 5, 2023 •

edited

Loading

jklj077 commented Nov 8, 2023 •

edited

Loading