Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] <title> 'ZeRO3 is incompatible with LoRA when finetuning on base model.' #1104

Closed
2 tasks done
hxhcreate opened this issue Feb 29, 2024 · 5 comments
Closed
2 tasks done

Comments

@hxhcreate
Copy link

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

'ZeRO3 is incompatible with LoRA when finetuning on base model.'

期望行为 | Expected Behavior

'ZeRO3 is incompatible with LoRA when finetuning on base model.'

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

我想知道为什么代码当中需要有这一段 ,求解答

if (
        training_args.use_lora
        and not lora_args.q_lora
        and deepspeed.is_deepspeed_zero3_enabled()
        and not is_chat_model
    ):
        raise RuntimeError(
            'ZeRO3 is incompatible with LoRA when finetuning on base model.'
        )
@jklj077
Copy link
Contributor

jklj077 commented Feb 29, 2024

Due to incompatibilities, DeepSpeed ZeRO3 and LoRA cannot be used together when fine-tuning a base model. Kindly refer to the README file for further explanation, as this issue has already been addressed there.

image

由于兼容性问题,在对基模型进行微调时,DeepSpeed ZeRO3 与 LoRA 无法同时使用。请务必阅读 README 文件,其中已对此问题进行了详细说明。

image

@jklj077 jklj077 closed this as completed Feb 29, 2024
@hxhcreate
Copy link
Author

Thanks for your help! I have read this README.

But my question mainly focuses on why "if we have these parameters trainable, it is not available to use ZeRO 3".

Thanks very much

@jklj077
Copy link
Contributor

jklj077 commented Mar 1, 2024

The peft library employs a distinctive technique to render parameters trainable, as evident in its implementation of the ModulesToSaveWrapper. This particular approach has been known to hinder ZeRO Stage3's parameter partitioning capabilities under specific configurations. However, it appears that this issue has been addressed in a recent pull request (PR) #1450 on the huggingface/peft repository (huggingface/peft#1450). We strongly encourage you to review this update.

Please note that we have previously emphasized that Qwen(1.0) codebase and models are no longer subject to further updates. Therefore, for access to the latest features and ongoing support, we advise users to migrate their work to Qwen1.5.

@hxhcreate
Copy link
Author

✅ Got you, thanks for your kindly reply

@1424153694
Copy link

Due to incompatibilities, DeepSpeed ZeRO3 and LoRA cannot be used together when fine-tuning a base model. Kindly refer to the README file for further explanation, as this issue has already been addressed there.

image

由于兼容性问题,在对基模型进行微调时,DeepSpeed ZeRO3 与 LoRA 无法同时使用。请务必阅读 README 文件,其中已对此问题进行了详细说明。

image

您好,我对Qwen-14b-chat模型进行lora微调,使用zero3,还是会出现这个问题。另外我在8张4090显卡上进行lora微调,设置zero2的时候,显存会溢出,这个多卡的时候,微调显存是怎样计算的?期待您的回答

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants