-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Insights: NVIDIA/Megatron-LM
Overview
-
0 Active pull requests
-
- 0 Merged pull requests
- 0 Open pull requests
- 2 Closed issues
- 2 New issues
Could not load contribution data
Please try again later
2 Issues closed by 2 people
-
[QUESTION] why is pre_mlp_layernorm an IdentityOp if num_experts is None
#1362 closed
Jan 21, 2025 -
[BUG] state[p]['master_weight'] become bf16
#1359 closed
Jan 18, 2025
2 Issues opened by 2 people
-
[QUESTION] How can I train a model from hugging face
#1364 opened
Jan 22, 2025
8 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[QUESTION] found NaN in local grad norm in backward pass before data-parallel communication collective
#780 commented on
Jan 19, 2025 • 0 new comments -
[QUESTION] deepseek v2 compatility?
#1295 commented on
Jan 20, 2025 • 0 new comments -
[BUG] When trying to convert llama2-7b model from HF format to megatron format
#1348 commented on
Jan 21, 2025 • 0 new comments -
[BUG] Using fp16 uses more memory than using fp32
#1349 commented on
Jan 23, 2025 • 0 new comments -
[BUG] can't load saved fp8 checkpoint when resume training
#1350 commented on
Jan 23, 2025 • 0 new comments -
Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1262 commented on
Jan 21, 2025 • 0 new comments -
[Update] Print training log in rank0
#1296 commented on
Jan 23, 2025 • 0 new comments -
Add Mamba TRTLLM support
#1320 commented on
Jan 23, 2025 • 0 new comments