[PyTorch] Use consistent API for fused norm kernels by timmoon10 · Pull Request #1560 · NVIDIA/TransformerEngine

timmoon10 · 2025-03-12T00:30:12Z

Description

There are multiple redundant code paths for suppressing fused norm kernels:

The tex norm functions check an envvar to suppress cuDNN MXFP8 norm kernels
The Python wrapper around the tex norm function checks an envvar to suppress cuDNN MXFP8 norm kernels
LayerNormLinear and LayerNormMLP disable FP8 norm kernels if FP8 current-scaling is enabled

This PR consolidates this logic into the tex functions.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Remove redundant logic for suppressing cuDNN MXFP8 norm kernels
Control cuDNN MXFP8 norm kernels with NVTE_NORM_FWD_USE_CUDNN environment variable

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

for more information, see https://pre-commit.ci

timmoon10 · 2025-03-12T05:26:37Z

/te-ci pytorch

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-03-14T01:02:14Z

/te-ci pytorch

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-03-15T02:47:05Z

/te-ci pytorch

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 · 2025-03-22T00:12:52Z

/te-ci pytorch

* Do not suppress MXFP8 norm in Python wrapper func Signed-off-by: Tim Moon <tmoon@nvidia.com> * Support FP8 current scaling in tex norm functions Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use single envvar to enable cuDNN MXFP8 norm kernels Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug compilation error Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix compilation error Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix full-tile requirement for MXFP8 norm kernels Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove unused imports Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add missing imports Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

timmoon10 added 3 commits March 11, 2025 23:35

Do not suppress MXFP8 norm in Python wrapper func

22177c9

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Support FP8 current scaling in tex norm functions

e09ed7e

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Use single envvar to enable cuDNN MXFP8 norm kernels

01a4d04

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added the bug Something isn't working label Mar 12, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

5051c4f

for more information, see https://pre-commit.ci

timmoon10 added 6 commits March 12, 2025 17:56

Debug compilation error

12f1aa8

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into debug-mxfp8-norms

08af0d1

Merge branch 'main' into debug-mxfp8-norms

1ef163c

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Fix compilation error

c82bdb6

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Fix full-tile requirement for MXFP8 norm kernels

f618856

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Remove unused imports

be0afe5

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 marked this pull request as ready for review March 14, 2025 01:05

timmoon10 requested a review from ksivaman March 14, 2025 01:15

timmoon10 removed the bug Something isn't working label Mar 14, 2025

timmoon10 changed the title ~~[PyTorch] Debug MXFP8 norms~~ [PyTorch] Use consistent API for fused norm kernels Mar 14, 2025

timmoon10 and others added 2 commits March 14, 2025 19:35

Merge branch 'main' into debug-mxfp8-norms

d1d53fc

Add missing imports

192f4de

Signed-off-by: Tim Moon <tmoon@nvidia.com>

ptrendx approved these changes Mar 21, 2025

View reviewed changes

Merge branch 'main' into debug-mxfp8-norms

2d13d38

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 merged commit e80fbd7 into NVIDIA:main Mar 22, 2025

timmoon10 deleted the debug-mxfp8-norms branch March 24, 2025 20:47

timmoon10 mentioned this pull request Mar 25, 2025

[PyTorch] Debug LayerNormLinear with Userbuffers and FP8 current scaling #1608

Closed

13 tasks

hungryGeek16 mentioned this pull request May 31, 2026

fix unfused padding causal sdpa #3063

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Use consistent API for fused norm kernels#1560

[PyTorch] Use consistent API for fused norm kernels#1560
timmoon10 merged 13 commits into
NVIDIA:mainfrom
timmoon10:debug-mxfp8-norms

timmoon10 commented Mar 12, 2025 •

edited

Loading

Uh oh!

timmoon10 commented Mar 12, 2025

Uh oh!

timmoon10 commented Mar 14, 2025

Uh oh!

timmoon10 commented Mar 15, 2025

Uh oh!

timmoon10 commented Mar 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timmoon10 commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Mar 12, 2025

Uh oh!

timmoon10 commented Mar 14, 2025

Uh oh!

timmoon10 commented Mar 15, 2025

Uh oh!

timmoon10 commented Mar 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timmoon10 commented Mar 12, 2025 •

edited

Loading