Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM] Add sequence_parallel support for qwen #8558

Merged
merged 11 commits into from
Jun 21, 2024

Conversation

Difers
Copy link
Contributor

@Difers Difers commented Jun 6, 2024

PR types

Performance optimization

PR changes

Models

Description

千问模型增加sequence_parallel支持

自测记录:
关闭sequence_parallel:
workerlog_disable_sp.0.log
开启sequence_parallel:
workerlog_sp.0.log

Copy link

paddle-bot bot commented Jun 6, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 40.35088% with 34 lines in your changes missing coverage. Please review.

Project coverage is 55.80%. Comparing base (5619cc3) to head (396bdc9).
Report is 1 commits behind head on develop.

Files Patch % Lines
paddlenlp/transformers/qwen/modeling.py 45.09% 28 Missing ⚠️
paddlenlp/transformers/qwen/modeling_pp.py 0.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8558      +/-   ##
===========================================
- Coverage    55.81%   55.80%   -0.02%     
===========================================
  Files          620      620              
  Lines        96599    96642      +43     
===========================================
+ Hits         53917    53928      +11     
- Misses       42682    42714      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -476,15 +525,22 @@ def get_tensor_parallel_split_mappings(num_hidden_layers):
base_actions = {
# Column Linear
"lm_head.weight": partial(fn, is_column=True),
"qwen.h.0.mlp.w2.weight": partial(fn, is_column=True),
"qwen.h.0.mlp.w1.weight": partial(fn, is_column=True),
"qwen.h.0.attn.c_attn.weight": partial(fn, is_column=True, is_naive_3fuse=True),
"qwen.h.0.attn.c_attn.bias": partial(fn, is_column=True, is_naive_3fuse=True),
# Row Linear
"qwen.wte.weight": partial(fn, is_column=False),
"qwen.h.0.mlp.c_proj.weight": partial(fn, is_column=False),
"qwen.h.0.attn.c_proj.weight": partial(fn, is_column=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c_proj->o_proj

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

话说,为啥把c_proj改成了o_proj,会不会影响模型权重的加载了,这一点确认一下吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

命名习惯,那我改回来吧...

input_is_parallel=True,
)
else:
self.c_attn = nn.Linear(config.hidden_size, 3 * self.projection_size, bias_attr=True)
self.c_proj = nn.Linear(config.hidden_size, self.projection_size, bias_attr=not config.no_bias)
self.o_proj = nn.Linear(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的改动有测试过么?

@Difers
Copy link
Contributor Author

Difers commented Jun 12, 2024

@DesmonDay hi,可以再review下吗~

@@ -252,18 +284,26 @@ def forward(
encoder_hidden_states=None,
encoder_attention_mask=None,
output_attentions=False,
alibi=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么会增加alibi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

)
else:
attn_output, attn_weight = self._attn(query, key, value, attention_mask)
context_layer = self._merge_heads(attn_output, self.num_heads, self.head_dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个为啥删掉了,有测试过么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一步改到_attn里了,自测记录在Description里添加了

@DesmonDay
Copy link
Contributor

从自测记录里看,打开和关闭sequence_parallel,loss精度对不齐,需要对齐精度。

@DrownFish19 DrownFish19 changed the title Add sequence_parallel support for qwen [llm] Add sequence_parallel support for qwen Jun 20, 2024
@DrownFish19 DrownFish19 changed the title [llm] Add sequence_parallel support for qwen [LLM] Add sequence_parallel support for qwen Jun 20, 2024
@DrownFish19
Copy link
Collaborator

LGTM

DrownFish19
DrownFish19 previously approved these changes Jun 20, 2024
DesmonDay
DesmonDay previously approved these changes Jun 20, 2024
Copy link
Contributor

@DesmonDay DesmonDay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 43 to 45
tensor_parallel_output=True,
sequence_parallel=False,
fuse_sequence_parallel_allreduce=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

都删除吧,已经内置了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Difers Difers dismissed stale reviews from DesmonDay and DrownFish19 via 396bdc9 June 20, 2024 09:41
Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit 65e721e into PaddlePaddle:develop Jun 21, 2024
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants