[BUG] <title> 'ZeRO3 is incompatible with LoRA when finetuning on base model.' #1104

hxhcreate · 2024-02-29T08:09:13Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

'ZeRO3 is incompatible with LoRA when finetuning on base model.'

期望行为 | Expected Behavior

'ZeRO3 is incompatible with LoRA when finetuning on base model.'

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

我想知道为什么代码当中需要有这一段 ,求解答

if (
        training_args.use_lora
        and not lora_args.q_lora
        and deepspeed.is_deepspeed_zero3_enabled()
        and not is_chat_model
    ):
        raise RuntimeError(
            'ZeRO3 is incompatible with LoRA when finetuning on base model.'
        )

The text was updated successfully, but these errors were encountered:

jklj077 · 2024-02-29T08:26:33Z

Due to incompatibilities, DeepSpeed ZeRO3 and LoRA cannot be used together when fine-tuning a base model. Kindly refer to the README file for further explanation, as this issue has already been addressed there.

由于兼容性问题，在对基模型进行微调时，DeepSpeed ZeRO3 与 LoRA 无法同时使用。请务必阅读 README 文件，其中已对此问题进行了详细说明。

hxhcreate · 2024-02-29T08:49:28Z

Thanks for your help! I have read this README.

But my question mainly focuses on why "if we have these parameters trainable, it is not available to use ZeRO 3".

Thanks very much

jklj077 · 2024-03-01T03:27:53Z

The peft library employs a distinctive technique to render parameters trainable, as evident in its implementation of the ModulesToSaveWrapper. This particular approach has been known to hinder ZeRO Stage3's parameter partitioning capabilities under specific configurations. However, it appears that this issue has been addressed in a recent pull request (PR) #1450 on the huggingface/peft repository (huggingface/peft#1450). We strongly encourage you to review this update.

Please note that we have previously emphasized that Qwen(1.0) codebase and models are no longer subject to further updates. Therefore, for access to the latest features and ongoing support, we advise users to migrate their work to Qwen1.5.

hxhcreate · 2024-03-01T04:44:08Z

✅ Got you, thanks for your kindly reply

1424153694 · 2024-05-20T04:13:22Z

Due to incompatibilities, DeepSpeed ZeRO3 and LoRA cannot be used together when fine-tuning a base model. Kindly refer to the README file for further explanation, as this issue has already been addressed there.

由于兼容性问题，在对基模型进行微调时，DeepSpeed ZeRO3 与 LoRA 无法同时使用。请务必阅读 README 文件，其中已对此问题进行了详细说明。

您好，我对Qwen-14b-chat模型进行lora微调，使用zero3，还是会出现这个问题。另外我在8张4090显卡上进行lora微调，设置zero2的时候，显存会溢出，这个多卡的时候，微调显存是怎样计算的？期待您的回答

jklj077 closed this as completed Feb 29, 2024

jklj077 mentioned this issue Mar 19, 2024

Why "ZeRO3 is incompatible with LoRA when finetuning on base model" #1161

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] <title> 'ZeRO3 is incompatible with LoRA when finetuning on base model.' #1104

[BUG] <title> 'ZeRO3 is incompatible with LoRA when finetuning on base model.' #1104

hxhcreate commented Feb 29, 2024

jklj077 commented Feb 29, 2024

hxhcreate commented Feb 29, 2024

jklj077 commented Mar 1, 2024

hxhcreate commented Mar 1, 2024

1424153694 commented May 20, 2024

[BUG] <title> 'ZeRO3 is incompatible with LoRA when finetuning on base model.' #1104

[BUG] <title> 'ZeRO3 is incompatible with LoRA when finetuning on base model.' #1104

Comments

hxhcreate commented Feb 29, 2024

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

jklj077 commented Feb 29, 2024

hxhcreate commented Feb 29, 2024

jklj077 commented Mar 1, 2024

hxhcreate commented Mar 1, 2024

1424153694 commented May 20, 2024