-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Allow merge and unload of PEFT models into base models. #260
Comments
I am not sure if this is possible. It would be a miracle if quantized models could have a lora merged to them. Would save having to d/l the full HF weights. |
Theoretically I think it's possible. I also believe that supporting LoRA weights merge back to quantized base model is very valuable to industrial applications, however there is a lot experiments should be done at first. |
Thanks @PanQiWei and @Ph0rk0z . I'm unsure now if my request was clear. I'm asking about merging the LoRa back into the base GPTQ quantized model - which should be a much easier task. The base model I'm referring to above is the GPTQ quantized one. However, the merging of the LoRa adapter isn't working. Yes, being able to merge back into the root model would be useful - and industrially valuable. And that makes sense Pan that extra testing would be required to see if it even makes sense from a perplexity standpoint. |
As I heard, GGUF models can be merged with their adapters. So I guess it can be a possibility to merge adapters to GPTQs as well. |
In principle yes, but the code base doesn't allow for that with GPTQ. Actually, as of very recently, I believe it is now possible to merge bnb nf4 model adapters with the base. |
Is your feature request related to a problem? Please describe.
Yes. The problem is that a PEFT model cannot be merged with the base model and pushed to hub.
after trying:
Describe the solution you'd like
Allow for merging and unloading, just like bitsandbytes quantized models.
Describe alternatives you've considered
Otherwise, the base model always needs to be referenced when running inference.
The text was updated successfully, but these errors were encountered: