[JAX] Refactor + MXFP8 + GroupedGEMM by phu0ngng · Pull Request #1627 · NVIDIA/TransformerEngine

phu0ngng · 2025-03-31T19:37:02Z

Description

Introduced ScaledTensor and Quantizer class + code refactoring
MXFP8
Removed old custom calls with non-FFI
GroupedGEMM.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by: Jeremy Berchtold <jberchtold@nvidia.com>

for more information, see https://pre-commit.ci

phu0ngng · 2025-03-31T20:03:55Z

/te-ci jax L1

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

tests/jax/test_custom_call_compute.py

Signed-off-by: Hua Huang <huah@nvidia.com>

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

phu0ngng · 2025-03-31T23:27:28Z

/te-ci jax L1

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

phu0ngng · 2025-04-01T01:20:34Z

/te-ci jax L1

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

* refactor + mxfp8 * added grouped gemm * rename linear to dense * added cublas init phase for groupedGemm * relax the tol of test encoder multiprocessing mxfp8 by 0.001 Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by: Hua Huang <huah@nvidia.com> Co-authored-by: Jeremy Berchtold <jberchtold@nvidia.com>

This reverts commit b27283a.

* refactor + mxfp8 * added grouped gemm * rename linear to dense * added cublas init phase for groupedGemm * relax the tol of test encoder multiprocessing mxfp8 by 0.001 Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by: Hua Huang <huah@nvidia.com> Co-authored-by: Jeremy Berchtold <jberchtold@nvidia.com>

* refactor + mxfp8 * added grouped gemm * rename linear to dense * added cublas init phase for groupedGemm * relax the tol of test encoder multiprocessing mxfp8 by 0.001 Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by: Hua Huang <huah@nvidia.com> Co-authored-by: Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by: Peter Dykas <wdykas@nvidia.com>

phu0ngng force-pushed the branch_for_25.04 branch from 096355c to 0a238fa Compare March 31, 2025 19:47

refactor + mxfp8

faa5ea1

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by: Jeremy Berchtold <jberchtold@nvidia.com>

phu0ngng force-pushed the branch_for_25.04 branch from 8829cbd to faa5ea1 Compare March 31, 2025 19:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

d275dba

for more information, see https://pre-commit.ci

phu0ngng and others added 2 commits March 31, 2025 13:31

enabled test_multiprocessing for mxfp8

f5314d1

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e055312

for more information, see https://pre-commit.ci

phu0ngng requested review from jberchtold-nvidia, ptrendx and timmoon10 March 31, 2025 20:33

jberchtold-nvidia approved these changes Mar 31, 2025

View reviewed changes

phu0ngng force-pushed the branch_for_25.04 branch from 3632315 to 024cad3 Compare March 31, 2025 21:54

jberchtold-nvidia reviewed Mar 31, 2025

View reviewed changes

tests/jax/test_custom_call_compute.py Outdated Show resolved Hide resolved

phu0ngng force-pushed the branch_for_25.04 branch from 024cad3 to 3fffed4 Compare March 31, 2025 21:56

phu0ngng added 2 commits March 31, 2025 15:47

added grouped gemm

7d8f790

Signed-off-by: Hua Huang <huah@nvidia.com>

rename linear to dense

31b4f40

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng force-pushed the branch_for_25.04 branch from 3fffed4 to 31b4f40 Compare March 31, 2025 22:47

pre-commit-ci bot and others added 4 commits March 31, 2025 22:47

[pre-commit.ci] auto fixes from pre-commit.com hooks

3173d98

for more information, see https://pre-commit.ci

fix lint

d8396cb

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0a3f1fe

for more information, see https://pre-commit.ci

Merge branch 'main' into branch_for_25.04

48ba20c

phu0ngng and others added 3 commits March 31, 2025 18:19

reorder test_custom_call_compute

1773ff1

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

added cublas init phase for groupedGemm

bdbf440

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

52e69ea

for more information, see https://pre-commit.ci

relax the tol of test encoder multiprocessing mxfp8 by 0.001

a5ed3f6

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng merged commit cf9a7c2 into NVIDIA:main Apr 1, 2025
11 checks passed

phu0ngng deleted the branch_for_25.04 branch April 1, 2025 02:49

KshitijLakhani added a commit that referenced this pull request Apr 1, 2025

Revert "[JAX] Refactor + MXFP8 + GroupedGEMM (#1627)"

4924444

This reverts commit b27283a.

phu0ngng mentioned this pull request Apr 1, 2025

[JAX] Backward compatible Fixes #1631

Merged

13 tasks

This was referenced Apr 4, 2025

[JAX] Flatten_axis for quantization and Sharding propagation fixes #1644

Merged

Removing NVTE_NO_SCALING #1650

Merged

phu0ngng mentioned this pull request Apr 17, 2025

[JAX] Deprecate Praxis layers #1694

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Refactor + MXFP8 + GroupedGEMM#1627

[JAX] Refactor + MXFP8 + GroupedGEMM#1627
phu0ngng merged 14 commits intoNVIDIA:mainfrom
phu0ngng:branch_for_25.04

phu0ngng commented Mar 31, 2025 •

edited

Loading

Uh oh!

phu0ngng commented Mar 31, 2025

Uh oh!

Uh oh!

phu0ngng commented Mar 31, 2025

Uh oh!

phu0ngng commented Apr 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phu0ngng commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

phu0ngng commented Mar 31, 2025

Uh oh!

Uh oh!

phu0ngng commented Mar 31, 2025

Uh oh!

phu0ngng commented Apr 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phu0ngng commented Mar 31, 2025 •

edited

Loading