[JAX] Flatten_axis for quantization and Sharding propagation fixes by phu0ngng · Pull Request #1644 · NVIDIA/TransformerEngine

phu0ngng · 2025-04-03T21:04:01Z

Description

In #1627, we enforced all the tensors to be flattenable to 2D tensor with axis=-1. This requires additional reshaping in JAX that merges dimensions, resulting in the loss of sharding information.

In this PR, we introduced flatten_axis that allows flattening the tensor to 2D via any axis. With this, merging axes is no longer needed; thus, the sharding information can be propagated correctly.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

jberchtold-nvidia · 2025-04-03T23:20:46Z

transformer_engine/jax/cpp_extensions/misc.py

+    quantizer.q_layout = QuantizeLayout.ROWWISE_COLWISE
+    if flatten_axis < 0:
+        flatten_axis += rowwise.data.ndim
+    assert 0 < flatten_axis < rowwise.data.ndim, "flatten_axis is out of bounds"


This is 0 < flatten_axis rather than 0 <= flatten_axis as we need at least one axis to the left of the flatten axis so after flattening we have 2 axes to give to TE, right?

Yes, TE requires the data to be 2D for MXFP8. So we can't accept axis = 0 and mistakenly flatten the data to 1D.

transformer_engine/jax/csrc/extensions/activation.cpp

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

for more information, see https://pre-commit.ci

phu0ngng · 2025-04-04T15:02:30Z

/te-ci jax L1

jberchtold-nvidia · 2025-04-04T15:32:03Z

transformer_engine/jax/quantize/scaling_modes.py

+                n_scale_blocks //= mid
+            scale_shape = (n_scale_blocks,) + scale_shape
+        else:
+            scale_shape = (n_scale_blocks,)


Do we support quantizing 1D tensors at all? I thought we require they're at least 2D. But if we do support 1D tensors, I agree this is the correct scale shape so okay with this change

No, we don't.

But this function gets called for each part of the 2D-flattened tensor here https://github.com/NVIDIA/TransformerEngine/pull/1644/files#diff-0158638e30529db0bb268ae65eb085b2d22b52e6e0ff4891fe2c7ea9959eea79R209-R214
Therefore, the datashape can be 1D. Perhaps I should rename it to partial_data_shape.

Oh I see now, that makes sense. Thanks!

jberchtold-nvidia · 2025-04-04T21:34:31Z

transformer_engine/jax/cpp_extensions/activation.py

        if is_2x:
            if scaling_mode == ScalingMode.NVTE_DELAYED_TENSOR_SCALING.value:
-                colwise_x_spec = multidim_transpose(x_spec)
+                colwise_x_spec = multidim_transpose(x_spec, transpose_axis=-2)


Note for @jreiffers, change in signature of this function

jberchtold-nvidia · 2025-04-04T21:36:03Z

transformer_engine/jax/cpp_extensions/activation.py

-            out_spec = (*x_spec[:-2], None, x_spec[-2])
+        scale_spec = get_padded_spec(arg_infos[1])
+
+        out_spec = (*x_spec[:-2], x_spec[-1])


Note for @jreiffers, how the signature of this primitive changed in this PR

…VIDIA#1644) * rename QuantizeAxis to QuantizeLayout, get_layout to get_data_layout, q_axis to q_layout * add fatten_axis option * added gated act to test encoder * sharding constraint fixes * fix padding when flattening first dim needs to be padded * update test sizes so that padding is tested * rm output sharding as it can be done in the flax module * sharding scale_inv for mxfp8 --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

…VIDIA#1644) * rename QuantizeAxis to QuantizeLayout, get_layout to get_data_layout, q_axis to q_layout * add fatten_axis option * added gated act to test encoder * sharding constraint fixes * fix padding when flattening first dim needs to be padded * update test sizes so that padding is tested * rm output sharding as it can be done in the flax module * sharding scale_inv for mxfp8 --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by: Peter Dykas <wdykas@nvidia.com>

phu0ngng added 20 commits April 1, 2025 05:16

rename QuantizeAxis to QuantizeLayout

797808a

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

rename get_layout to get_data_layout

3656756

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

rename q_axis to q_layout

a859f4b

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

rename layout to data_layout

ae80e5e

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

add q_axis to quantize/

88c67cd

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

format

dc06cb8

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

q_axis to quantization.py

76af138

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

fixes for quantize/.*py

885599d

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

TestQuantize passed

72b6149

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

TestActivation passed

fd8cec0

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

TestFusedQuantize passed

3428fe0

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

rename quantize_axis to flatten_axis

3e3c51d

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

TestFusedDense passed

ac11aea

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

rework flax layer

cf70d7b

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

fix for axes_len>1 and most test passed

5d491e3

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

enabled 4 gpus

fc6be78

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

added gated act to test encoder

b7feb1c

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

use dact_lu

fb7d001

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

format

a591065

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

fix transpose constraint

ce893b9

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng requested a review from jberchtold-nvidia April 3, 2025 21:04

phu0ngng and others added 2 commits April 3, 2025 15:34

fix gemm output shardings

a8def8e

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

32a6073

for more information, see https://pre-commit.ci

jberchtold-nvidia approved these changes Apr 3, 2025

View reviewed changes

phu0ngng added 6 commits April 4, 2025 06:32

fix padding when flattening first dim needs to be padded

eb6749e

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

clean and minor fixes

4fcc5ec

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

update test sizes so that padding is tested

16a54de

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

fix sharding constraint for dense

b14b958

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

add docstring

cc8b9de

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

merge with main

6cbd913

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

phu0ngng and others added 3 commits April 4, 2025 07:38

rm output sharding as it can be done in flax module

9990fb6

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

sharding scale_inv for mxfp8

ac221c5

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

148a3a3

for more information, see https://pre-commit.ci

phu0ngng marked this pull request as ready for review April 4, 2025 14:49

phu0ngng and others added 3 commits April 4, 2025 08:01

rm duplications

f90f28b

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

cleanup and format

3e65ab5

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

ae938a9

for more information, see https://pre-commit.ci

jberchtold-nvidia reviewed Apr 4, 2025

View reviewed changes

phu0ngng merged commit ff884e2 into NVIDIA:main Apr 4, 2025
22 checks passed

phu0ngng deleted the quantize_axis branch April 4, 2025 18:47

jberchtold-nvidia reviewed Apr 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Flatten_axis for quantization and Sharding propagation fixes#1644

[JAX] Flatten_axis for quantization and Sharding propagation fixes#1644
phu0ngng merged 34 commits intoNVIDIA:mainfrom
phu0ngng:quantize_axis

phu0ngng commented Apr 3, 2025 •

edited

Loading

Uh oh!

jberchtold-nvidia Apr 3, 2025

Uh oh!

phu0ngng Apr 4, 2025

Uh oh!

Uh oh!

Uh oh!

phu0ngng commented Apr 4, 2025

Uh oh!

jberchtold-nvidia Apr 4, 2025 •

edited

Loading

Uh oh!

phu0ngng Apr 4, 2025

Uh oh!

jberchtold-nvidia Apr 4, 2025

Uh oh!

Uh oh!

jberchtold-nvidia Apr 4, 2025

Uh oh!

jberchtold-nvidia Apr 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phu0ngng commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

jberchtold-nvidia Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

phu0ngng Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

phu0ngng commented Apr 4, 2025

Uh oh!

jberchtold-nvidia Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phu0ngng Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jberchtold-nvidia Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phu0ngng commented Apr 3, 2025 •

edited

Loading

jberchtold-nvidia Apr 4, 2025 •

edited

Loading