Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add load_in_16bit Parameter and Fix 8-bit Quantization Config #2022

Open
wants to merge 4 commits into
base: nightly
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add load_in_16bit parameter to FastBaseModel.from_pretrained
- Add load_in_16bit parameter with default value of False
- Add validation to prevent conflicting loading options
- Add support for loading models in 16-bit precision (float16/bfloat16)
- Update error messages to include the new 16-bit option
  • Loading branch information
marcelodiaz558 committed Mar 14, 2025
commit a4d67ba40ad0273f7d24a8131e3812ca706af0ac
15 changes: 11 additions & 4 deletions unsloth/models/vision.py
Original file line number Diff line number Diff line change
@@ -158,6 +158,7 @@ def from_pretrained(
dtype = None,
load_in_4bit = True,
load_in_8bit = False,
load_in_16bit = False,
full_finetuning = False,
token = None,
device_map = "sequential",
@@ -240,15 +241,18 @@ def from_pretrained(
break
pass

# Check for conflicting loading options
loading_options = sum([load_in_4bit, load_in_8bit, load_in_16bit, full_finetuning])
if loading_options > 1:
raise RuntimeError("Unsloth: Can only use one of load_in_4bit, load_in_8bit, load_in_16bit, or full_finetuning!")

bnb_config = None
if full_finetuning and (load_in_4bit or load_in_8bit):
print("Unsloth: You selected full finetuning support, but 4bit / 8bit is enabled - disabling LoRA / QLoRA.")
load_in_4bit = False
load_in_8bit = False
pass

if load_in_4bit and load_in_8bit:
raise RuntimeError("Unsloth: Can only load in 4bit or 8bit, not both!")
if load_in_4bit:
bnb_config = BitsAndBytesConfig(
load_in_4bit = True,
@@ -262,8 +266,11 @@ def from_pretrained(
load_in_8bit = True,
llm_int8_skip_modules = SKIP_QUANTIZATION_MODULES.copy(),
)
elif not load_in_4bit and not load_in_8bit and not full_finetuning:
print("Unsloth: LoRA, QLoRA and full finetuning all not selected. Switching to QLoRA.")
elif load_in_16bit:
print("Unsloth: Loading model in 16-bit precision.")
# No bnb_config needed for 16-bit, we'll use torch_dtype directly
elif not load_in_4bit and not load_in_8bit and not load_in_16bit and not full_finetuning:
print("Unsloth: LoRA, QLoRA, 16-bit, and full finetuning all not selected. Switching to QLoRA.")
load_in_4bit = True
bnb_config = BitsAndBytesConfig(
load_in_4bit = True,