[PyTorch] Add support for FP8 current scaling in operation-based API by timmoon10 · Pull Request #1858 · NVIDIA/TransformerEngine

timmoon10 · 2025-06-06T21:49:13Z

Description

This PR makes some minor changes needed for te.Sequential to support FP8 current scaling. It also modifies the te.Sequential unit tests to more gracefully handle quantization schemes other than FP8 delayed scaling.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring
Testing

Changes

Support FP8 current scaling in operation-based API
Generalize support for quantizers in tests for operation-based API

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

…r by default Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

for more information, see https://pre-commit.ci

timmoon10 · 2025-06-09T17:48:02Z

/te-ci pytorch L1

ksivaman · 2025-06-09T22:35:10Z

+                    if isinstance(
+                        input_quantizer, (Float8Quantizer, Float8CurrentScalingQuantizer)
+                    ):


Suggested change

if isinstance(

input_quantizer, (Float8Quantizer, Float8CurrentScalingQuantizer)

):

if input_quantizer._get_compatible_recipe().float8_per_tensor_scaling():

I think this fix adds overhead and makes the code less clear. _get_compatible_recipe (added in #1724) returns a type since it's used for type comparisons. If we want to use float8_per_tensor_scaling, then we need to instantiate a new DelayedScaling or Float8CurrentScaling that doesn't match the current recipe. Alternatively, we can compare classes with is or isinstance, which is basically what we're already doing.

ksivaman · 2025-06-09T22:38:47Z

+            if isinstance(
+                weight_quantizer, (Float8Quantizer, Float8CurrentScalingQuantizer)
+            ) and isinstance(weight, Float8TensorBase):


Suggested change

if isinstance(

weight_quantizer, (Float8Quantizer, Float8CurrentScalingQuantizer)

) and isinstance(weight, Float8TensorBase):

if weight_quantizer._get_compatible_recipe().float8_per_tensor_scaling() and

isinstance(weight, Float8TensorBase):

I don't think _get_compatible_recipe benefits us here.

ksivaman · 2025-06-09T22:44:14Z


+            # Recipe-specific configuration
+            recipe = FP8GlobalStateManager.get_fp8_recipe()
+            if recipe.float8_current_scaling():


Note: This being required here is unfortunate. Most of what I would suggest here are probably out of scope for this PR.

I don't think there's a good way around this. The amax reduction logic is specific to linear layers with FP8 current scaling, so there's no other logical place to put it.

ksivaman · 2025-06-09T22:49:29Z

+    elif quantization == "fp8_current_scaling":
+        quantizer = Float8CurrentScalingQuantizer(
+            fp8_dtype=tex.DType.kFloat8E4M3,
+            device=test_device,


Orthogonal but why does this device arg exist here?

Mostly for completeness. We have a ref_device option since I prefer computing the reference impl on CPU, which helps catch CUDA-related bugs.

ksivaman · 2025-06-09T23:05:37Z


+# Import utility functions
+_current_file = pathlib.Path(__file__).resolve()
+sys.path.append(str(_current_file.parent))


Not a fan of this. But I see we currently do this for other tests as well so for this PR it's ok. In a separate refactor we should convert the tests dir to also be a module.

I agree. Ideally we would use relative imports like from .utils import dtype_tols, but Python doesn't allow that if you are running the script directly, which is how Pytest runs the tests.

timmoon10 · 2025-06-12T01:11:51Z

/te-ci pytorch L1

ksivaman

LGTM

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-06-12T21:33:18Z

/te-ci pytorch L1

timmoon10 added 6 commits June 6, 2025 01:39

Add FP8 current scaling to te.Sequential tests

9a19dce

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Helper function for test/ref tensors does not produce quantized tenso…

d5d65de

…r by default Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into te-sequential-new-recipes

72d4f9b

Add FP8 current scaling to distributed te.Sequential tests

3c06f31

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Add FP8 current scaling to Userbuffers te.Sequential tests

f5ac3ad

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into te-sequential-new-recipes

aa4daa3

timmoon10 requested review from ksivaman and ptrendx June 6, 2025 21:49

timmoon10 added enhancement New feature or request testing Improvements to tests or testing infrastructure labels Jun 6, 2025

pre-commit-ci Bot and others added 2 commits June 6, 2025 21:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

cbb9d7b

for more information, see https://pre-commit.ci

Merge branch 'main' into te-sequential-new-recipes

0496d1f

ksivaman reviewed Jun 9, 2025

View reviewed changes

timmoon10 force-pushed the te-sequential-new-recipes branch from 5e2490c to 0496d1f Compare June 12, 2025 01:11

Merge branch 'main' into te-sequential-new-recipes

bbc13b2

ksivaman previously approved these changes Jun 12, 2025

View reviewed changes

timmoon10 added 2 commits June 12, 2025 21:26

Debug MXFP8 tests

5a477ec

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into te-sequential-new-recipes

e2ed9ba

timmoon10 dismissed ksivaman’s stale review via e2ed9ba June 12, 2025 21:29

ksivaman approved these changes Jun 13, 2025

View reviewed changes

ksivaman merged commit e963e4a into NVIDIA:main Jun 13, 2025
25 of 27 checks passed

Conversation

timmoon10 commented Jun 6, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Jun 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmoon10 Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmoon10 Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Jun 12, 2025

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timmoon10 Jun 12, 2025 •

edited

Loading

timmoon10 Jun 9, 2025 •

edited

Loading