How to set NVTE_FWD/BWD_LAYERNORM_SM_MARGIN?

Hi!

I noticed in the TransformerEngine source code there are two environment variables related to LayerNorm/RMSNorm:

- `NVTE_FWD_LAYERNORM_SM_MARGIN`
- `NVTE_BWD_LAYERNORM_SM_MARGIN`

In NVIDIA’s submission for MLPerf Training 4.1 results, these variables were set to 8. The comments indicate that setting these two variables can improve p2p overlap performance on H100 GPUs:

```bash
# source: https://github.com/mlcommons/training_results_v4.1/blob/8821c7037ffd06e3775398fd39361a4c591d2235/NVIDIA/benchmarks/gpt3/implementations/eos-dfw_n1452_ngc24.04_nemo/config_common.sh#L9

# This is to improve p2p overlap on H100, and it shouldn't affect A100:
export NVTE_FWD_LAYERNORM_SM_MARGIN=8
export NVTE_BWD_LAYERNORM_SM_MARGIN=8
```

Could you please clarify which type of P2P (p2p in pipeline parallelism, p2p in context parallelism, or p2p in tp-overlap) these variables impact? Additionally, would you mind provide some tuning recommendations for these parameters specifically for H800 and H20 GPUs?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set NVTE_FWD/BWD_LAYERNORM_SM_MARGIN? #1459

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to set NVTE_FWD/BWD_LAYERNORM_SM_MARGIN? #1459

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions