Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model size doubles after .merge_and_unload() and .save_pretrained() #137

Open
anudeep-peela opened this issue Sep 5, 2023 · 4 comments
Open

Comments

@anudeep-peela
Copy link

My System Info

peft==0.4.0
accelerate==0.18.0
transformers==4.28.0
py310

Reproduction

After training, I merge the peft weights with base model using:

model_ft = PeftModel.from_pretrained(
    AutoModelForCausalLM.from_pretrained(
        base_model_path,
        return_dict=True,
        torch_dtype='auto',
        use_cache=True,
    ),
    peft_path,
    torch_dtype=torch.float16
).merge_and_unload()

Then for inference as standalone model, I save to disk using

model.save_pretrained(destination_path)
tokenizer.save_pretrained(destination_path)

And later load it back again whenever needed using

inference_model = AutoModelForCausalLM.from_pretrained(
model_path,
return_dict=True,
torch_dtype=torch.float16,
use_cache=True,
device_map="auto"
)

Expected behavior

I am training Star Coder 7B, which initially has a size of around 15GB. I began the training with specific LoRa Rank and alpha parameters. To experiment with different combinations of these parameters, I stopped the training process few times, performed a "merge_and_unload" operation. Afterward, I restart the training with a new combination of LoRa and alpha values on top of latest stored model. This approach worked well up to approximately 500-600 steps. However, after that point, I noticed an issue: when I saved my model after merging, its disk size unexpectedly ballooned to 30GB, even though my "adapter_bin" file is only around 400MB. Not sure why the model size increased?

@SankhaSubhra
Copy link

SankhaSubhra commented Sep 15, 2023

I am having the same issue with Falcon 1b. The original model is about 2.3g on disk while the adapter is about 40m. After merging, the model is saved with 4.5g in disk. I checked if the number of parameters are keeping constant and they are. Also using safetensors did not reduce the model size after merging.

I am using
HuggingFace 4.30
PEFT 0.5.0

@kiamesdavies
Copy link

Same Issue with LLama 2 models both 7b and 13b

@SankhaSubhra
Copy link

SankhaSubhra commented Oct 22, 2023

Try with dtype=torch.bfloat16 (i.e. during model load for merging, assuming the original was already in half precision so is the lora), that solved the issue for me. I believe the model in default loads in torch.float32, that explains the doubling in size.

@kiamesdavies
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants