-
Notifications
You must be signed in to change notification settings - Fork 416
GRPO MCORE Path: Improve MoE Training MFU #985
Copy link
Copy link
Closed
2 / 22 of 2 issues completedLabels
PerformanceRelated to improving performanceRelated to improving performancedeepseekRelated to deepseek 671bRelated to deepseek 671bqa_rcca_donewhen RCCA finished for the issue, the qa will mark with this label .when RCCA finished for the issue, the qa will mark with this label .t-mcore
Metadata
Metadata
Assignees
Labels
PerformanceRelated to improving performanceRelated to improving performancedeepseekRelated to deepseek 671bRelated to deepseek 671bqa_rcca_donewhen RCCA finished for the issue, the qa will mark with this label .when RCCA finished for the issue, the qa will mark with this label .t-mcore
Type
Fields
Give feedbackNo fields configured for issues without a type.
Opening this issue to compare and align GRPO mcore-training MFU in RL with pre-training MFU in Megatron-LM, for MoE models: DeepSeek V3, Qwen 30B, Qwen 235 B.