-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added QLoRA support for Decoder transformers with tune_strategy "Normal" #613
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why the eror Action failed with "The process '/usr/bin/git' failed with exit code 128 occurs. It seems to be a setting in the OptimalScale/LMFlow repository.
Hi, |
Hi, |
and prepare_model_for_kbit_training if qlora is enabled
I have further added some code to improve performance (gradient checkpointing and peft prepare_model_for_kbit_training). Need to include these... |
How do I install LMFlow on colab? I have tried the attached notebook but it get's stuck at 'Running setup.py develop for lmflow' |
It's strange. Normally it should be ok with the attached notebook. Which notebook you are working on? could you share me the link? |
BUGs:[src/lmflow/args.py ][line 168]
line 172
line 177
ERRORSMay be you need to find a proper version of transformers,deepspeed,peft and bitsandbyres
|
default=False, metadata={"help": "Whether to use qlora."}, )
Here is the colab link (available to everyone with link) where I am trying to install lmflow dependencies and test. I have been trying on A100 machine but burned a 100 compute units just waiting for setup.py to finish. Now trying again on V100... https://colab.research.google.com/drive/1rcD2OnTGZ_dz8BLn49XiaKB9JEUY9Aoz?usp=sharing |
Hi I don't have a GPU machine to try with so I've been trying on colab but as you can see from my previous messages, I've been unable to install. Would greatly appreciate your help in testing this please.. |
Hi, |
Appreciate the support and help! I am very keen to continue using and
developing with LMFlow!
…On Thu, Aug 10, 2023 at 11:49 PM shizhediao ***@***.***> wrote:
Hi,
Yes, @yaoguany <https://github.com/yaoguany> will test it and get to you
in a day.
Thanks
—
Reply to this email directly, view it on GitHub
<#613 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKJU2UHHVA4RRMSSZ325MDXUUQZ5ANCNFSM6AAAAAA3KR7LRM>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
That's nice! Thank you so much for your contribution, which means a lot to us! |
ErrorsWhen using multi-gpu training it throws this error, maybe you can google it to find some solution.Single gpu is OK.
|
…be good for multi GPU usage
Hi @yaoguany Thanks so much for the test. The device_map={"":0} earlier was the issue, I suspect. I have added code to use local_rank environment variable to set the device map else leave it at "auto" which is better. Could you please do a quick try on multi-GPU again with the updated code? Greatly appreciate your kind support! Thanks |
The code runs well now, but we need to update lmflow requirements before merge this branch. |
Ok, sure. Is that something I need to work on? I have updated lmflow requirements.txt to the below - numpy==1.24.2 |
Hi, |
Hi, Thank you for your contribution. |
Hi
I knew this bug would appear before hand since it isn’t possible to use
model.merge_and_unload() for quantised models.
We have two options -
1. Print out a helpful message to tell user it isn’t possible to aggregate
Lora adapters with quantised models if use_QLoRA is True
2. Reload the base model again in bf16 without quantisation and do the
model.merge_and_unload()
I’ll go with number 2 for now.
Thanks
Ankit Pasi
…On Fri, 11 Aug 2023 at 9:32 PM, Xiang LIU ***@***.***> wrote:
Hi,
Thank you for your contribution.
I encountered a bug while training with your Qlora code, specifically when
using the --save_aggregated_lora=1 flag, which is intended for merging
the trained lora with the base model. The error message indicates that
merging lora with an int4 model isn't possible. Could you provide a script
to facilitate the merging of the lora model with the base model?
—
Reply to this email directly, view it on GitHub
<#613 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKJU2W634MUXMUW5VBERA3XUZJSHANCNFSM6AAAAAA3KR7LRM>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
…rained using QLoRA. The script reloads the model in torch_dtype and then calls merge_and_unload() on the peft model generated from training
Hi |
Question, shouldn't the following code
Be this instead?
because backend_model_full refers to the model without peft and backend_model refers to the PeftModel? |
…and fixed requirements.txt to work
…_dtype=='float16' else (torch.bfloat16 if torch_dtype=='bfloat16' else torch.float32))
HI, |
This had been done, per my previous comment. It is now able to train using QLoRA and then merge the lora adapters with the base model (tested on multi GPU setup) |
Hi, |
Hi, Thanks for your interests. |
QLoRA is an important feature of large language model training. We wanted to express our deepest gratitude for your outstanding contribution to implementing QLoRA to LMFlow. Your contributions are always welcomed and will undoubtedly help us shape a more successful future for LMFlow. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Merged.
I thank you and wanted to express my gratitude for the opportunity to contribute and learn along the way! I just had a follow up question regarding the below code, and not sure where to put it -
The code above saves backend_model_full if save_full_model=True, but in the code for hf_decoder_model.py (lines 279-312) below self.backend_model_full actually doesn't include the lora adapters, correct? The model which includes the lora adapters and is merged is self.backend_model (not self.backend_model_full) -
Apologies for this but it's just ringing in my mind so would appreciate if these code snippets can be validated. |
Hi, Thank you for your careful investigation. In fact, You can regard |
I have added arguments under model_args to enable QLoRA support. Namely arguments are:
Is model_args.use_qlora is set to 1/True, it also sets model_arga.lora to True so the entire pipeline works.