[Common][PyTorch] Fix normalization for fused_score_for_moe_aux_loss#2720
[Common][PyTorch] Fix normalization for fused_score_for_moe_aux_loss#2720yaox12 merged 2 commits intoNVIDIA:mainfrom
fused_score_for_moe_aux_loss#2720Conversation
27d676b to
cc33d85
Compare
fused_score_for_moe_aux_loss
Signed-off-by: tongliu <tongliu@nvidia.com>
cc33d85 to
875e7e3
Compare
Greptile SummaryFixed a bug where sigmoid and sqrtsoftplus score normalization was incorrectly conditional on topk value. For MoE auxiliary loss computation, scores must always be normalized to form a proper probability distribution across all experts, regardless of the topk parameter. Changes made:
The fix ensures consistent normalization behavior for sigmoid and sqrtsoftplus activation functions, which unlike softmax do not naturally sum to one and require explicit normalization. Confidence Score: 5/5
Important Files Changed
Last reviewed commit: 94d708e |
Additional Comments (2)
The PR description explicitly states it fixes the case |
|
/te-ci pytorch |
Signed-off-by: tongliu <tongliu@nvidia.com>
|
/te-ci pytorch |
NVIDIA#2720) * fix topk=1 Signed-off-by: tongliu <tongliu@nvidia.com> * add topk=1 ut Signed-off-by: tongliu <tongliu@nvidia.com> --------- Signed-off-by: tongliu <tongliu@nvidia.com>
Description
The scores for aux-loss should always be normalized for any topk.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: