[JAX] TensorUsage + FP8 GEMM with all layouts handling on BW by phu0ngng · Pull Request #1844 · NVIDIA/TransformerEngine

phu0ngng · 2025-06-03T15:45:06Z

Description

TensorUsage + FP8 GEMM with all layouts handling on BW.

Verified that no NT enforcements by JAX, i.e. no additional transpose.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

jberchtold-nvidia

Overall looks good, some small comments. I like the TensorUsage concept

jberchtold-nvidia · 2025-06-03T21:55:48Z

Should we keep the check for whether we're training or not? So it'd be something like this

casted_ln_out.get_tensor(TensorUsage.LHS_TRANS) if quantizer_set.x.is_2x2x() else None

because on Hopper this layout wouldn't exist when doing 1x for inference, right?

No, we don't need to do that here.
Whether it's training or not should be handled in the get_tensor() method, and currently, we don't support it yet.

Ah I see, that will make it easier if it's automatic. So the idea is based on whether it's x or kernel, we know which usage (LHS/RHS) will be used in the forward and the backward, so we can automatically know to remove it when doing inference (not currently, but we have all the required info to support it in future)?

phu0ngng · 2025-06-13T17:43:57Z

Adding @huanghua1994 to review the changes in grouped_gemm.

phu0ngng · 2025-06-13T17:44:42Z

/te-ci JAX L0

jberchtold-nvidia

LGTM

phu0ngng · 2025-06-16T13:13:18Z

/te-ci JAX L0

phu0ngng · 2025-06-16T18:23:20Z

/te-ci JAX L0

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng · 2025-06-16T20:24:01Z

/te-ci JAX L0

phu0ngng · 2025-06-17T00:05:57Z

/te-ci JAX L0

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng · 2025-06-17T13:41:05Z

/te-ci JAX L0

phu0ngng · 2025-06-17T14:44:21Z

/te-ci JAX L0

phu0ngng · 2025-06-17T17:00:13Z

/te-ci JAX L0

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

phu0ngng · 2025-06-17T17:53:09Z

/te-ci JAX L0

* TensorUsage + FP8 GEMM with all layouts handling on BW Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng requested review from denera and jberchtold-nvidia June 3, 2025 15:45

denera requested changes Jun 3, 2025

View reviewed changes

Comment thread transformer_engine/jax/quantize/device_utils.py Outdated

jberchtold-nvidia reviewed Jun 3, 2025

View reviewed changes

jberchtold-nvidia previously approved these changes Jun 4, 2025

View reviewed changes

phu0ngng added the 2.5.0 label Jun 6, 2025

phu0ngng dismissed jberchtold-nvidia’s stale review via 9727a8a June 11, 2025 15:08

phu0ngng requested a review from huanghua1994 June 13, 2025 17:43

phu0ngng requested review from denera and jberchtold-nvidia June 13, 2025 17:44

jberchtold-nvidia previously approved these changes Jun 13, 2025

View reviewed changes

phu0ngng dismissed jberchtold-nvidia’s stale review via f95f17a June 16, 2025 12:22

phu0ngng force-pushed the tensor_usage branch 3 times, most recently from b411402 to 0644636 Compare June 16, 2025 18:21

phu0ngng force-pushed the tensor_usage branch from 9298ee9 to cfd880a Compare June 16, 2025 20:19

TensorUsage + FP8 GEMM with all layouts handling on BW

1f5d486

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng force-pushed the tensor_usage branch from cfd880a to 1f5d486 Compare June 16, 2025 20:22

Merge branch 'main' into tensor_usage

dc214cd

fix for quantizer 2x2x

b56031f

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Merge branch 'main' into tensor_usage

1aa9606

phu0ngng force-pushed the tensor_usage branch from 1d17602 to 1aa9606 Compare June 17, 2025 14:44

fix github build

450eee1

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng force-pushed the tensor_usage branch from e57c9c7 to 450eee1 Compare June 17, 2025 17:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

9597e15

for more information, see https://pre-commit.ci

jberchtold-nvidia approved these changes Jun 17, 2025

View reviewed changes

denera approved these changes Jun 18, 2025

View reviewed changes

phu0ngng merged commit 3a298e6 into NVIDIA:main Jun 18, 2025
23 checks passed

phu0ngng deleted the tensor_usage branch June 18, 2025 11:47

Conversation

phu0ngng commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

Uh oh!

jberchtold-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

phu0ngng Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

phu0ngng commented Jun 13, 2025

Uh oh!

phu0ngng commented Jun 13, 2025

Uh oh!

jberchtold-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

phu0ngng commented Jun 16, 2025

Uh oh!

phu0ngng commented Jun 16, 2025

Uh oh!

phu0ngng commented Jun 16, 2025

Uh oh!

phu0ngng commented Jun 17, 2025

Uh oh!

phu0ngng commented Jun 17, 2025

Uh oh!

phu0ngng commented Jun 17, 2025

Uh oh!

phu0ngng commented Jun 17, 2025

Uh oh!

phu0ngng commented Jun 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phu0ngng commented Jun 3, 2025 •

edited

Loading

phu0ngng Jun 4, 2025 •

edited

Loading