-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot merge LORA layers when the model is loaded in 8-bit mode #29
Comments
|
Just fyi decapoda-research is extremely out of date. Please use huggyllama instead. |
did you solve this? and its the same result with huggyllama |
@bodaay what is the sie of the adapter.bin you are getting? Mine is like in bytes. Btw, I just commented out the |
remove "model = model.merge_and_unload()", and it works. |
when I re-save the model, its proper 3.2GB |
@yangjianxin1, Based on the code snippet you provided, it seems that you are loading the model using To address this issue, we recommend the following approach:
By removing the |
How can i optain one model so i can use it in llama.cpp if i cant merge them? Do you have any idea how to make the fine-tuned model applicable inside llama.cpp ? Any help is highly appreciated |
Removing merge_and_unload() is not the solution!!. |
You can see a workaround here: https://github.com/substratusai/model-falcon-7b-instruct/blob/430cf5dfda02c0359122d4ef7f9b6d0c01bb3b39/src/train.ipynb Effectively I reload the base model in 16 bit to work around the issue. It works fine for my use case. |
Anyone found a way to solve this without loading the model in 16bit? My GPU cannot load whole Falcon-40B (in 16-bit), using |
I hope in the future this code could work... its more natural. model = peft_model.merge_and_unload()
model.save_pretrained("/model/trained") |
Any updates on this? |
The link is broken |
Same here, the link is broken. can pls re-share the link? |
Thank you so much! |
May I ask another baby question: this is training code and saving model code. After model is saved, did you test to load the saved model and do inference and check whether generated results are good? If so, do you have this separated inference/generating script? Thanks a lot in advance. |
Yes in substratus.ai we separate model loading, finetuning and serving in separate images. I did check whether the finetuned model provided different results and it did. In the notebook that I linked, the following paths are used
|
Thank you so much, I will try on my side and let you know. |
I am worried about whether this quick fix would be harmful to the model's ability? Is there any other way to fix this problem? |
I found a way that works in my case I hope it works, the problems working with Llama2 in terms of training time, inference time while we just have one (not big memory) GPU can be splited into 3 different parts which I will go through each of them.
|
While not the most elegant solution, @hamidahmadian solution works for me. |
I have an issue with this approach when I add a special token. Anyone figure out a way to do that? Code I'm using:
error observed:
|
On my laptop with 16GB RAM + 16GB VRAM, @hamidahmadian's solution allows me to load the models, but still gives me an OOM error when doing merge_and_unload(). However, the following works for me:
|
This should be the correct way to fix the issue: peft_model = model
# When you execute the commonly used `model = model.merge_and_unload()`, the error `Cannot merge LORA layers when the model is loaded in 8-bit mode` occurs. The reason is that the base model was loaded as 4 bit. Therefore, the base model must be reloaded as 16 bit.
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16,
load_in_8bit=False,
device_map="auto",
trust_remote_code=True,
)
from peft import PeftModel
peft_model = PeftModel.from_pretrained(model, NEW_MODEL_PATH)
merged_model = peft_model.merge_and_unload()
|
When I load the model as following, throw the error: Cannot merge LORA layers when the model is loaded in 8-bit mode
How can I load model with 4bit when inferencing?
model_path = 'decapoda-research/llama-30b-hf' adapter_path = 'timdettmers/guanaco-33b' quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type='nf4' ), model = AutoModelForCausalLM.from_pretrained( model_path, low_cpu_mem_usage=True, load_in_4bit=True, quantization_config=quantization_config, torch_dtype=torch.float16, device_map='auto' ) model = PeftModel.from_pretrained(model, adapter_path) model = model.merge_and_unload()
The text was updated successfully, but these errors were encountered: