-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial implementation of lqlora #8324
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
from .lora_model import LoRAModel | ||
from .lora_layers import LoRALinear | ||
|
||
import paddle | ||
from paddlenlp.quantization.qlora import qlora_weight_quantize_dequantize | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 建议把lqlora初始化的过程写成一个lqlora_init的函数,通过lora_config传入是否使用lqlora,考虑在621行前对lora_module apply这个lqlora_init,https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/peft/lora/lora_model.py#L621 |
||
|
||
def transform_lora_layers( | ||
model: LoRAModel, | ||
num_iterations: int = 100 | ||
) -> None: | ||
if not isinstance(model, LoRAModel): | ||
raise NotImplementedError(f"Unknown model type: {type(model)}") | ||
for name, submodule in model.named_sublayers(): | ||
if isinstance(submodule, LoRALinear): | ||
num_ranks = submodule.r | ||
W = submodule.weight | ||
|
||
if W.dtype in [paddle.float16]: | ||
old_dtype = W.dtype | ||
W = paddle.cast(W, dtype=paddle.float32) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cast成fp32的原因? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
else: | ||
old_dtype = None | ||
|
||
Q = paddle.zeros_like(W) | ||
last_error = paddle.to_tensor(float("inf"), dtype=W.dtype) | ||
for i in range(num_iterations): | ||
A = W - Q | ||
if A.ndim != 2: | ||
raise ValueError(f"Expected 2D Matrix, but got {A.ndim}.") | ||
|
||
U, S, Vh = paddle.linalg.svd(A, full_matrices=False) | ||
Ur = U[:, :num_ranks] | ||
Sr = S[:num_ranks] | ||
Vhr = Vh[:num_ranks] | ||
|
||
lora_A = Ur @ paddle.diag(paddle.sqrt(Sr)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 配置的时候需要考虑lora scaling,看起来lora scaling只能强制为1 |
||
lora_B = paddle.diag(paddle.sqrt(Sr)) @ Vhr | ||
|
||
Q = qlora_weight_quantize_dequantize(W-lora_A@lora_B, double_quant=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. double_quant=True,应该作为一个可调节参数,qlora_weight_quantize_dequantize中的其他参数也一样 |
||
|
||
W_ = Q + lora_A@lora_B | ||
error = paddle.norm(W - W_, p = "fro") | ||
|
||
if error > last_error: | ||
print("break.") | ||
break | ||
last_error = error | ||
|
||
if old_dtype is not None: | ||
lora_A = paddle.cast(lora_A, dtype=old_dtype) | ||
lora_B = paddle.cast(lora_B, dtype=old_dtype) | ||
Q = paddle.cast(Q, dtype=old_dtype) | ||
|
||
submodule.lora_A.set_value(lora_A) | ||
submodule.lora_B.set_value(lora_B) | ||
submodule.weight.set_value(Q) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
传入到lora_config lqlora来控制