You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
████████████████████████████████████████████████████████████████████████| 200/200 [1:20:46<00:00, 21.49s/itTraceback (most recent call last):
File "/home/llmadmin/lawrence/autoWSB/lora_train.py", line 188, in <module>
train()
File "/home/llmadmin/lawrence/autoWSB/lora_train.py", line 171, in train
trainer.train()
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/transformers/trainer.py", line 2062, in _inner_training_loop
self._load_best_model()
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/transformers/trainer.py", line 2238, in _load_best_model
load_result = model.load_state_dict(state_dict, False)
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2027, in load_state_dict
load(self, state_dict)
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
[Previous line repeated 4 more times]
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2009, in load
module._load_from_state_dict(
File "/home/llmadmin/anaconda3/envs/trainer/lib/python3.10/site-packages/bitsandbytes-0.38.1-py3.10.egg/bitsandbytes/nn/modules.py", line 298, in _load_from_state_dict
Traceback (most recent call last):
File "/home/llmadmin/lawrence/autoWSB/lora_train.py", line 188, in <module>
raise RuntimeError("Loading a quantized checkpoint into non-quantized Linear8bitLt is "
RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()
I can confirm the issue doesn't arise when using older repos.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Looking at the chunk of code
bitsandbytes/bitsandbytes/nn/modules.py
Line 293 in 9e7cdc9
My 8bit PEFT LoRA training code seems to throw this as soon as it tries to save the steps, at least using the following:
Full traceback
I can confirm the issue doesn't arise when using older repos.
The text was updated successfully, but these errors were encountered: