Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add load_in_16bit Parameter and Fix 8-bit Quantization Config #2022

Open
wants to merge 4 commits into
base: nightly
Choose a base branch
from

Conversation

marcelodiaz558
Copy link

This pull request introduces a new parameter, load_in_16bit, across our model loading functions and fixes an issue with 8-bit quantization configuration.

Current Issues Addressed:

  1. No 16-bit LoRA Support: Currently, there is no way to train a model with 16-bit precision using the FastModel class because the code automatically falls back to QLoRA (4-bit) if none of the following arguments are set to True: load_in_4bit, load_in_8bit, or full_finetuning. This creates a significant limitation for users who want to use 16-bit LoRA finetuning.

  2. 8-bit Quantization Config Bug: The code was only checking for load_in_4bit when setting the quantization_config parameter, which meant that proper 8-bit finetuning wasn't being configured correctly even when load_in_8bit=True was specified.

Key Changes:

• Added load_in_16bit parameter to FastBaseModel.from_pretrained, FastModel.from_pretrained, and FastLanguageModel.from_pretrained with a default value of False.

• Fixed the quantization config logic to properly set kwargs["quantization_config"] = bnb_config when either load_in_4bit OR load_in_8bit is True. Before it only checked for load_in_4bit value.

• Implemented logic to check for conflicting loading options (load_in_4bit, load_in_8bit, load_in_16bit, and full_finetuning) so that only one can be enabled at a time.

• Added code to remove load_in_16bit from kwargs before calling the Transformers library's from_pretrained to avoid passing an invalid parameter to Transformers.

• Updated the fallback logic to consider the new load_in_16bit parameter before defaulting to QLoRA.

Benefits:

• Enables explicit 16-bit LoRA finetuning without falling back to 4-bit quantization.

• Fixes 8-bit quantization configuration, ensuring proper setup when users select 8-bit training.

• Provides a clearer and flexible API for users who wish to load models in different precision formats.

- Add load_in_16bit parameter with default value of False
- Add validation to prevent conflicting loading options
- Add support for loading models in 16-bit precision (float16/bfloat16)
- Update error messages to include the new 16-bit option
Update condition to assign quantization_config to kwargs when either load_in_4bit or load_in_8bit is True
@danielhanchen danielhanchen changed the base branch from main to nightly March 18, 2025 04:47
@danielhanchen
Copy link
Contributor

Appreciate it! Actually I did notice if all load_in_4bit, load_in_8bit and full_finetuning are all False, it should do 16bit LoRA, but rather it used 4bit QLoRA! I added a fix yesterday for it!

But I like load_in_16bit for LoRA actually! There are some merge conflicts but happy to add load_in_16bit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants