Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem: with the latest
peft
&autogptq_versions
- usingget_gptq_peft_model
does not replace GPTQ quantized linear layers with correct LoRA wrappers (GPTQLoraLinear
/GPTQSVDLinear
instead ofpeft
's built-in QuantLinear), leading to issues with forward pass calculation data types and bad LoRA initialization values leading to NaN losses.Detailed:
In this library, we have the following function:
which do some hijack on top of
peft
'sget_peft_model(model.model, peft_config)
However, inside the code, we need to replace auto_gptq's quant linear layers with custom LoRA layers - such as
GPTQLoraLinear
/GPTQSVDLinear
.I see the corresponding code in
GPTQLoraModel::_find_and_replace
:and the same method for
GPTQAdaLoraModel
.However, the latest versions of
peft
use another system of LoRA modules initialization (peft'sLoraModel::_create_and_replace
):Where
_create_new_module
has code like next:The problem is:
AutoGPTQQuantLinear
class (so loaded module containsGeneralQuantLinear
's, for instance, butAutoGPTQQuantLinear
isQuantLinearCuda
)QuantLinear
instead ofGPTQSVDLinear
/GPTQLoraLinear
leading to type issues during forward pass, at least for fp16 computationsQuantLinear.reset_lora_parameters
usesLoraLayer.reset_lora_parameters
, which in the case of GPTQ Llama models - leading to such initialization values they cause NaN losses:GPTQLoraLinear
's:What does this patch do:
_create_new_module
method insideGPTQLoraModel
andGPTQAdaLoraModel