-
Notifications
You must be signed in to change notification settings - Fork 3.1k
add loralinear #10385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add loralinear #10385
Conversation
|
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #10385 +/- ##
===========================================
- Coverage 49.08% 48.95% -0.13%
===========================================
Files 763 767 +4
Lines 125673 126153 +480
===========================================
+ Hits 61689 61764 +75
- Misses 63984 64389 +405 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| self.disable_lora = False | ||
| if mp_moe or is_distributed: | ||
| for p in self.parameters(): | ||
| p.is_distributed = is_distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用于EP,is_distributed标识训练开始的时候不要同步参数和mp_moe用于uc
| level=self.args.fp16_opt_level, | ||
| dtype=self.amp_dtype, | ||
| excluded_layers=[QuantizationLinear] + self._decorate_exclude_layers(model), | ||
| excluded_layers=[QuantizationLinear, ColumnParallelQuantizationLinear, RowParallelQuantizationLinear] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
防止精度为fp32的量化scale被cast成bf16
|
|
||
| # Optimize for skip unused shard files for supper large model | ||
| if sharded_metadata is not None and quantization_linear_list is None: | ||
| if sharded_metadata is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip掉不需要读取的参数分片,加速加载
| @@ -1,4 +1,4 @@ | |||
| # Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. | |||
| # Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个貌似不用改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| new_weight += self.lora_A @ self.lora_B * self.scaling | ||
| self.quantize_weight(new_weight) | ||
| self.merged = True | ||
| mp_moe = getattr(self.quant_weight, "mp_moe", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有太明白这里为什么一定对MoE的参数进行标识
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是unified checkpoint需要使用
| @@ -0,0 +1,154 @@ | |||
| # Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | |||
| # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadamard_utils.py 的来源是来自哪里了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slim同学给的,现在去掉不需要使用的部分
| from .hadamard_utils import random_hadamard_matrix | ||
|
|
||
|
|
||
| def quantize_tensorwise(x, quantization_config=None, bit_length=8, state=0, training=False, act_scale=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要单独对QAT用到的量化方法单独一个文件吗?是不是所有的量化方法都在一起比较好
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qat方法比较复杂,后续还会加比较多东西,所以单独写一个qat_utils
paddlenlp/quantization/qat_utils.py
Outdated
| if quantization_config.apply_hadamard: | ||
| target_x = x @ infohub.hadamard[x.shape[-1]][0] | ||
| else: | ||
| target_x = x.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里clone的原因?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
去掉了
| input_grad = None | ||
|
|
||
| if not quant_weight.stop_gradient: | ||
| weight_grad = paddle.einsum("bsh,bsd->hd", x, grad_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle的einsum在某些场景下有坑,看看是否适合用enisum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
果然有问题!einsum比matmul慢好多,我换成matmul了
wawltor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
wawltor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
APIs
Description
loralinear