[Optimization] Refine row parallel bias and nranks and moe all_reduce#5247
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
This PR refactors tensor parallelism-related code by standardizing variable naming and improving bias handling for distributed training. The changes focus on code consistency and correctness without altering the core functionality.
- Standardizes variable naming from
nrankstotp_sizeacross multiple modules for better clarity - Introduces special handling for row-parallel bias division in tensor parallelism
- Removes unused variables and simplifies conditional logic in RowParallelLinear
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/utils.py | Adds tp_row_bias attribute handling to divide bias by tensor parallel size during weight loading |
| fastdeploy/model_executor/models/qwen3moe.py | Removes unused self.nranks variable |
| fastdeploy/model_executor/models/qwen3.py | Renames nranks to tp_size for consistency |
| fastdeploy/model_executor/models/qwen2.py | Removes unused self.nranks variable |
| fastdeploy/model_executor/models/ernie4_5_moe.py | Removes unused self.nranks variable |
| fastdeploy/model_executor/layers/mtp_linear.py | Renames self.nranks to self.tp_size |
| fastdeploy/model_executor/layers/lm_head.py | Renames self.nranks to self.tp_size |
| fastdeploy/model_executor/layers/linear.py | Renames variables, removes unused field, adds bias attribute handling, and simplifies logic |
| fastdeploy/model_executor/layers/backends/intel_hpu/attention/hpu_attn_backend.py | Renames self.nranks to self.tp_size |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5247 +/- ##
==========================================
Coverage ? 59.92%
==========================================
Files ? 317
Lines ? 38774
Branches ? 5843
==========================================
Hits ? 23234
Misses ? 13703
Partials ? 1837
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -211,7 +211,7 @@ def __init__( | |||
| self.speculate_max_draft_token_num: int = llm_config.speculative_config.num_speculative_tokens | |||
| self.keep_pd_step_flag: bool = llm_config.speculative_config.model_type == "mtp" | |||
| self.rank: int = llm_config.parallel_config.tensor_parallel_rank | |||
There was a problem hiding this comment.
self.rank 的命名也不准确,应该直接叫self.tp_rank
Motivation
Modifications
Usage or Command
无。
Accuracy Tests
无。
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.