Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Allow merge and unload of PEFT models into base models. #260

Closed
RonanKMcGovern opened this issue Aug 14, 2023 · 5 comments
Closed
Labels
enhancement New feature or request

Comments

@RonanKMcGovern
Copy link

Is your feature request related to a problem? Please describe.
Yes. The problem is that a PEFT model cannot be merged with the base model and pushed to hub.

Cannot merge LORA layers when the model is gptq quantized

after trying:

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") # must be auto, cannot be cpu

from peft import PeftModel

# load PEFT model with new adapters
model = PeftModel.from_pretrained(
    model,
    adapter_model_name,
)

model = model.merge_and_unload() # merge adapters with the base model.

Describe the solution you'd like
Allow for merging and unloading, just like bitsandbytes quantized models.

Describe alternatives you've considered
Otherwise, the base model always needs to be referenced when running inference.

@RonanKMcGovern RonanKMcGovern added the enhancement New feature or request label Aug 14, 2023
@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Aug 14, 2023

I am not sure if this is possible. It would be a miracle if quantized models could have a lora merged to them. Would save having to d/l the full HF weights.

@PanQiWei
Copy link
Collaborator

Theoretically I think it's possible. I also believe that supporting LoRA weights merge back to quantized base model is very valuable to industrial applications, however there is a lot experiments should be done at first.

@RonanKMcGovern
Copy link
Author

Thanks @PanQiWei and @Ph0rk0z .

I'm unsure now if my request was clear.

I'm asking about merging the LoRa back into the base GPTQ quantized model - which should be a much easier task. The base model I'm referring to above is the GPTQ quantized one. However, the merging of the LoRa adapter isn't working.

Yes, being able to merge back into the root model would be useful - and industrially valuable. And that makes sense Pan that extra testing would be required to see if it even makes sense from a perplexity standpoint.

@prp-e
Copy link

prp-e commented Dec 24, 2023

As I heard, GGUF models can be merged with their adapters. So I guess it can be a possibility to merge adapters to GPTQs as well.

@RonanKMcGovern
Copy link
Author

As I heard, GGUF models can be merged with their adapters. So I guess it can be a possibility to merge adapters to GPTQs as well.

In principle yes, but the code base doesn't allow for that with GPTQ. Actually, as of very recently, I believe it is now possible to merge bnb nf4 model adapters with the base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants