Add load_in_16bit Parameter and Fix 8-bit Quantization Config #2022

marcelodiaz558 · 2025-03-14T22:36:46Z

This pull request introduces a new parameter, load_in_16bit, across our model loading functions and fixes an issue with 8-bit quantization configuration.

Current Issues Addressed:

No 16-bit LoRA Support: Currently, there is no way to train a model with 16-bit precision using the FastModel class because the code automatically falls back to QLoRA (4-bit) if none of the following arguments are set to True: load_in_4bit, load_in_8bit, or full_finetuning. This creates a significant limitation for users who want to use 16-bit LoRA finetuning.
8-bit Quantization Config Bug: The code was only checking for load_in_4bit when setting the quantization_config parameter, which meant that proper 8-bit finetuning wasn't being configured correctly even when load_in_8bit=True was specified.

Key Changes:

• Added load_in_16bit parameter to FastBaseModel.from_pretrained, FastModel.from_pretrained, and FastLanguageModel.from_pretrained with a default value of False.

• Fixed the quantization config logic to properly set kwargs["quantization_config"] = bnb_config when either load_in_4bit OR load_in_8bit is True. Before it only checked for load_in_4bit value.

• Implemented logic to check for conflicting loading options (load_in_4bit, load_in_8bit, load_in_16bit, and full_finetuning) so that only one can be enabled at a time.

• Added code to remove load_in_16bit from kwargs before calling the Transformers library's from_pretrained to avoid passing an invalid parameter to Transformers.

• Updated the fallback logic to consider the new load_in_16bit parameter before defaulting to QLoRA.

Benefits:

• Enables explicit 16-bit LoRA finetuning without falling back to 4-bit quantization.

• Fixes 8-bit quantization configuration, ensuring proper setup when users select 8-bit training.

• Provides a clearer and flexible API for users who wish to load models in different precision formats.

- Add load_in_16bit parameter with default value of False - Add validation to prevent conflicting loading options - Add support for loading models in 16-bit precision (float16/bfloat16) - Update error messages to include the new 16-bit option

Update condition to assign quantization_config to kwargs when either load_in_4bit or load_in_8bit is True

…ansformers

danielhanchen · 2025-03-18T04:48:19Z

Appreciate it! Actually I did notice if all load_in_4bit, load_in_8bit and full_finetuning are all False, it should do 16bit LoRA, but rather it used 4bit QLoRA! I added a fix yesterday for it!

But I like load_in_16bit for LoRA actually! There are some merge conflicts but happy to add load_in_16bit!

marcelodiaz558 · 2025-03-27T02:49:17Z

Appreciate it! Actually I did notice if all load_in_4bit, load_in_8bit and full_finetuning are all False, it should do 16bit LoRA, but rather it used 4bit QLoRA! I added a fix yesterday for it!

But I like load_in_16bit for LoRA actually! There are some merge conflicts but happy to add load_in_16bit!

Awesome, @danielhanchen! I just solved the merge conflict, and the default is now 16bit LoRA, as in the fix you added last week.

Datta0 · 2025-03-28T02:10:07Z

unsloth/models/loader.py

        load_in_4bit               = True,
        load_in_8bit               = False,
+        load_in_16bit              = False,


Should we have an arg called load_dtype which would take the values 4bit, 8bit, 16bit instead of having these three args? Makes things cleaner and simpler ig?

Hey @Datta0! Thanks a lot for reviewing the PR. I believe that load_in_4bit and load_in_8bit are good arguments because they match transformers.BitsAndBytesConfig names and are accepted directly by auto_model.from_pretrained when you pass them as kwargs, keeping those two arguments is consistent with the Transformer's implementation.

With the latest fix performed by @danielhanchen to change fallback precision from 4bits to 16bits, it is not crucial to merge this pull request anymore since now to train using 16bits LoRA you can just set load_in_4bit and load_in_8bit as False (it was not possible at the time I made the changes since the default was always set as 4bit QLoRA, so this PR was essential back then). load_in_16bit would add some extra verbosity and it is an argument that users might try intuitively after seeing that two parameters already exist for 4bits and 8bits training; also, another benefit is that if the load_in_16bit argument is included in the sample notebooks, the users will know right away that training with 16bits precision is possible.

Furthermore, commit bf3ca8e might be important to train models with 8bits precision, as currently we are only passing the quantization_config keyword argument for 4bits QLoRA, not for 8bits.

I believe that load_in_4bit and load_in_8bit are good arguments because they match transformers.BitsAndBytesConfig names and are accepted directly by auto_model.from_pretrained

Now that you put it that way, it makes sense.

Datta0

LGTM

marcelodiaz558 added 4 commits March 14, 2025 16:34

Fix quantization_config assignment for 8-bit loading

bf3ca8e

Update condition to assign quantization_config to kwargs when either load_in_4bit or load_in_8bit is True

Remove load_in_16bit from kwargs as it's not a valid parameter for tr…

38d2409

…ansformers

Add load_in_16bit parameter for 16-bit precision vision model loading

5cb2d6d

danielhanchen changed the base branch from main to nightly March 18, 2025 04:47

marcelodiaz558 force-pushed the main branch from 69e486d to 5cb2d6d Compare March 27, 2025 02:07

Merge branch 'nightly' into nightly-1

48c495b

Datta0 reviewed Mar 28, 2025

View reviewed changes

Datta0 approved these changes Apr 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add load_in_16bit Parameter and Fix 8-bit Quantization Config #2022

Add load_in_16bit Parameter and Fix 8-bit Quantization Config #2022

Uh oh!

marcelodiaz558 commented Mar 14, 2025

Uh oh!

danielhanchen commented Mar 18, 2025

Uh oh!

marcelodiaz558 commented Mar 27, 2025

Uh oh!

Datta0 Mar 28, 2025

Uh oh!

marcelodiaz558 Mar 29, 2025

Uh oh!

Datta0 Mar 30, 2025

Uh oh!

Datta0 left a comment

Uh oh!

Uh oh!

Uh oh!

Add load_in_16bit Parameter and Fix 8-bit Quantization Config #2022

Are you sure you want to change the base?

Add load_in_16bit Parameter and Fix 8-bit Quantization Config #2022

Uh oh!

Conversation

marcelodiaz558 commented Mar 14, 2025

Uh oh!

danielhanchen commented Mar 18, 2025

Uh oh!

marcelodiaz558 commented Mar 27, 2025

Uh oh!

Datta0 Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

marcelodiaz558 Mar 29, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 Mar 30, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!