rename confusing arg by yaroslavvb · Pull Request #2 · NVIDIA/Megatron-LM

yaroslavvb · 2019-04-24T17:03:14Z

WORLD_SIZE usually refers to total number of gpus while --nproc_per_node should be set to number of gpus per node

raulpuric

I agree that this isn't proper convention. If anything I should have set a WORLD_SIZE=$GPUS_PER_NODE argument afterwards for clarity.

Could you add that variable assignment and also make these changes to the other scripts for consistency.

jaredcasper · 2019-09-20T00:28:38Z

These variables should be named correctly in our latest version so this has been applied manually. Thanks for your help!

Ds no pp

Pipeline parallelism for Switch and MoB models

Test #2: Memory, timing See merge request ADLR/megatron-lm!677

Resolves NVIDIA#2.

…ar_lr_hyp_tune patch workers.

minor change on auto schedule

add pre-allocation for each cpu grad and overlap CPU/CUDA step

…compatibility from 0.14.0) (Followed up on !3945) Author: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

* add forward-mainloop and bwd_partial_dlogits kernel Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> * skip TestFusedLinearCrossEntropyOnGptModel for single GPU Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> * added unit-test for linear_cross_entropy on dp Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> --------- Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>

Unit and functional test for PP

* add forward-mainloop and bwd_partial_dlogits kernel Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> * skip TestFusedLinearCrossEntropyOnGptModel for single GPU Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> * added unit-test for linear_cross_entropy on dp Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> --------- Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>

…lpers - Use BooleanOptionalAction for --inference-dynamic-batching-prefix-caching (comment #2) - Inline 5 trivial helper methods in BlockAllocator per reviewer feedback (comments #4-8): set_block_hash, get_block_hash, lookup_block_by_hash, increment_ref_count, decrement_ref_count - Update all call sites in dynamic_context.py, dynamic_engine.py, and tests - Apply autoformat (black, isort) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

add citation and readme_zh

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…ward Fix the fix dummy forward to avoid picking up a cudagraph

Fix mtp_detach_heads

Fix row-parallel bias TP mode detection as replicated

Resolves all eleven comments on the PR thread: * Rename CheckpointManager → CheckpointWithoutOutputManager and update the docstring; the class strictly manages CheckpointWithoutOutput instances, so the new name avoids the broader "checkpoint" overloading. Updates all importers and tests. (NVIDIA#1) * Document why subtracting the per-row max in SinkhornKnopp.forward is benign — Sinkhorn's first row-normalization cancels any per-row scalar, so the shifted and unshifted exp produce the same fixed point and gradient. (NVIDIA#2) * Use NotImplementedError for the mhc + fine_grained_activation_offloading block — it's a known unimplemented interaction, not a config error. (NVIDIA#3) * Drop the new __call__ override and backward_dw_cudagraph from base TransformerLayer; the mHC kwarg extraction now lives on HyperConnectionTransformerLayer.__call__, with _mhc_recompute_manager initialized in __init__ so forward() reads it directly without a getattr fallback. cuda_graphs.py reads is_decode_only() directly, so dropping the dynamic_inference_decode_only injection is safe. (NVIDIA#4, NVIDIA#5, NVIDIA#10) * Rename the FineGrainedActivationOffloadingInterface alias off_interface → offload_interface in transformer_layer.py for clarity. (NVIDIA#6) * Extract a _run_mlp helper on TransformerLayer that owns the MLP-call branching (recompute / chunked-prefill / fp8-fp4 / plain-mlp); both base and HC _forward_mlp call it, eliminating the previous ~80-line duplication. The MoE-cudagraph early-return remains in base _forward_mlp after the helper call (HC is guarded against MoE). (NVIDIA#8) * Raise NotImplementedError at HyperConnectionTransformerLayer.__init__ when is_moe_layer is True and point users at HyperConnectionHybridLayer; drop the dead MoE branch in _get_submodules_under_cudagraphs. (NVIDIA#9) * No code change for the MoE composition / extensibility comment (NVIDIA#7) — see the PR thread reply for the rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rename confusing arg

83fcd1c

WORLD_SIZE usually refers to total number of gpus while --nproc_per_node should be set to number of gpus per node

raulpuric suggested changes Apr 25, 2019

View reviewed changes

jaredcasper closed this Sep 20, 2019

punitkoura pushed a commit to punitkoura/Megatron-LM that referenced this pull request Jan 26, 2022

Perf optimizations (NVIDIA#2)

a8bad93

shjwudp referenced this pull request in shjwudp/Megatron-LM Apr 18, 2022

Merge pull request #2 from awan-10/ds-no-pp

b6e1718

Ds no pp

deepakn94 referenced this pull request in stanford-futuredata/Megatron-LM Jan 26, 2023

Merge pull request #2 from tgale96/pipeline_parallelism

a83b0ae

Pipeline parallelism for Switch and MoB models

yulingao mentioned this pull request Jun 27, 2023

RuntimeError: Socket Timeout when setting up NCCL communicator #386

Closed

jon-barker pushed a commit that referenced this pull request Jul 19, 2023

Test #2: Memory, timing

1a03e5d

jon-barker pushed a commit that referenced this pull request Jul 19, 2023

Merge branch 'lmcafee/flash-attn-fix' into 'main'

b030472

Test #2: Memory, timing See merge request ADLR/megatron-lm!677

chelseajohn referenced this pull request in OpenGPTX/Megatron-LM Jul 24, 2023

Test #2: Memory, timing

a7c9364

janEbert pushed a commit to janEbert/Megatron-LM that referenced this pull request Jul 25, 2023

Test NVIDIA#2: Memory, timing

d30833e

Resolves NVIDIA#2.

haidark pushed a commit to haidark/Megatron-LM that referenced this pull request Mar 8, 2024

Merge pull request NVIDIA#2 from NCAI/llama_ve_init_emb_en_reasoning_…

f586ce2

…ar_lr_hyp_tune patch workers.

ZhangEnmao mentioned this pull request Mar 14, 2024

[BUG] NCCL TIMEOUT ( maybe ALLREDUCE ? ) #735

Closed

wccccp mentioned this pull request Jul 19, 2024

terminate called after throwing an instance of 'c10::DistBackendError' #935

Closed

Edenzzzz pushed a commit to Edenzzzz/Megatron-LM that referenced this pull request Aug 20, 2024

Merge pull request NVIDIA#2 from sail-sg/minor_change

a33d5b1

minor change on auto schedule

takuya576 mentioned this pull request Oct 10, 2024

[BUG] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=5, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=600000) ran for 600013 milliseconds before timing out. #1207

Closed

shjwudp referenced this pull request in shjwudp/Megatron-LM Nov 8, 2024

Merge pull request #2 from lostkevin/dev/hdo_io

660a079

add pre-allocation for each cpu grad and overlap CPU/CUDA step

lsk12345 mentioned this pull request Apr 10, 2025

Multiple Node PP errors #1525

Closed

ko3n1g added a commit that referenced this pull request Sep 3, 2025

Fix unit tests #2 for 3892! unifying the comm pgs (breaking backward …

890b635

…compatibility from 0.14.0) (Followed up on !3945) Author: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot Bot pushed a commit that referenced this pull request Nov 13, 2025

Merge pull request #2 from tdene/tde/dp_coordinator_unittest_for_pp

0ea0239

Unit and functional test for PP

mpuphy mentioned this pull request Nov 19, 2025

Enabling DP only causes NaN #2301

Closed

xiaoxi-wangfj mentioned this pull request Dec 26, 2025

Enable Casting-Free FP8-Flow-MoE Blockwise FP8 Dataflow 021ai/Megatron-LM#1

Closed

6 tasks

duncanriach mentioned this pull request Jan 16, 2026

Add MTP support for hybrid models #2363

Merged

6 tasks

nanz-nv pushed a commit to nanz-nv/Megatron-LM that referenced this pull request Feb 5, 2026

Merge pull request NVIDIA#2 from nanz-nv/moe_hot_expert_poc

1d48d5f

guapisolo pushed a commit to guapisolo/Megatron-LM that referenced this pull request Feb 19, 2026

[1/8] fix: misc compatibility fixes for PyTorch and TE (NVIDIA#2)

4c35a48

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

guapisolo pushed a commit to guapisolo/Megatron-LM that referenced this pull request Feb 25, 2026

[1/8] fix: misc compatibility fixes for PyTorch and TE (NVIDIA#2)

b4d401c

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

peter-ni-noob pushed a commit to peter-ni-noob/Megatron-LM that referenced this pull request Feb 27, 2026

Merge pull request NVIDIA#2 from tuantuanzhang/main

ce03da0

add citation and readme_zh

guapisolo pushed a commit to guapisolo/Megatron-LM that referenced this pull request Mar 2, 2026

[1/8] fix: misc compatibility fixes for PyTorch and TE (NVIDIA#2)

9ca7af6

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

copy-pr-bot Bot pushed a commit that referenced this pull request Mar 11, 2026

Merge pull request #2 from mathemakitten/helenn-fix-the-fix-dummy-for…

b248ece

…ward Fix the fix dummy forward to avoid picking up a cudagraph

parthmannan pushed a commit to parthmannan/Megatron-LM that referenced this pull request Mar 31, 2026

Merge pull request NVIDIA#2 from ananthsub/yifu/fix_mtp_detach

9cac65c

Fix mtp_detach_heads

claude Bot mentioned this pull request Apr 6, 2026

fix: handle zero-size tensors in MoE token dispatchers #3626

Closed

6 tasks

copy-pr-bot Bot pushed a commit that referenced this pull request Apr 7, 2026

Merge pull request #2 from conver334/pr-3191-new

409a072

Fix row-parallel bias TP mode detection as replicated

svcnvidia-nemo-ci mentioned this pull request Apr 14, 2026

chore: nightly sync main into dev (14_04_2026) #4287

Closed

claude Bot mentioned this pull request Apr 15, 2026

[main] feat(moe): Support packed sequence for gated delta net (GDN) #2645

Open

6 tasks

claude Bot mentioned this pull request Apr 27, 2026

Add mHC transformer reference implementation #4483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rename confusing arg#2

rename confusing arg#2
yaroslavvb wants to merge 1 commit intoNVIDIA:masterfrom
yaroslavvb:patch-1

yaroslavvb commented Apr 24, 2019

Uh oh!

raulpuric left a comment •

edited

Loading

Uh oh!

jaredcasper commented Sep 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yaroslavvb commented Apr 24, 2019

Uh oh!

raulpuric left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaredcasper commented Sep 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

raulpuric left a comment •

edited

Loading