[PyTorch] Recipe heuristics for initializing quantized weights by timmoon10 · Pull Request #1827 · NVIDIA/TransformerEngine

timmoon10 · 2025-05-27T22:16:48Z

Description

When initializing a model with quantized weights, the required data is different for training and inference (training requires row-wise data for forward GEMM and column-wise data for dgrad GEMM, inference only requires column-wise). This PR adds a heuristic option to the quantization recipes, with support for "performance" and "inference". In the future, we may want to consider a "memory" heuristic that prioritizes memory usage over performance.

This is similar to the heuristic API in #1300.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Add recipe heuristics for initializing quantized weights

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

for more information, see https://pre-commit.ci

timmoon10 · 2025-05-27T22:18:24Z

/te-ci pytorch

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-05-29T18:36:34Z

+            update_rowwise_usage = True if quantizer.rowwise_usage else None
+            update_columnwise_usage = True if quantizer.columnwise_usage else None
+            tensor.update_usage(
+                rowwise_usage=update_rowwise_usage,
+                columnwise_usage=update_columnwise_usage,
+            )


Destroying unnecessary usages was causing problems when alternating between training steps (column-wise data needed) and validation steps (column-wise data not needed). See #1832 (comment).

TBH this issue is just because of optimizer not doing the right job with quantizing. If we made it so it uses the quantizer then we would not need this part at all.

The layers will configure the quantizer to avoid unnecessary allocations:

TransformerEngine/transformer_engine/pytorch/module/linear.py

Lines 225 to 233 in 855fa65

# Configure quantizer

if weight_quantizer is not None:

columnwise_usage = is_grad_enabled and inp.requires_grad

if not columnwise_usage:

columnwise_usage = (

is_fp8_activation_recompute_enabled()

and not in_fp8_activation_recompute_phase()

)

weight_quantizer.set_usage(rowwise=True, columnwise=columnwise_usage)

This is what we want when allocating new buffers, but is overly aggressive when dealing with an existing QuantizedTensor. We could remove this logic from get_weight_workspace, but I don't like how it would ignore the configuration within the quantizer.

timmoon10 · 2025-05-29T18:37:08Z

/te-ci pytorch

ptrendx · 2025-05-29T21:26:43Z

+    """Configuration for quantization scheme."""
+
+    # Recipe-specific heuristics (options: "performance", "inference")
+    heuristic: str = "performance"


The name is not the best - wouldn't you want performance during inference?

Not necessarily if you're memory constrained.

Perhaps a naming scheme like "training_performance", "inference_performance", "training_memory", "inference_memory" would be more precise?

ptrendx · 2025-05-29T21:38:34Z


        # Quantize to FP8
        assert self._quantizer is not None, "Can't quantize without a quantizer"
+        self._quantizer.internal = False


This makes me think that internal should maybe be an option to tex.quantize rather than the member of quantizer.

I have mixed opinions.

The layers know which tensors can be internal tensors and which must be PyTorch tensors. internal seems like a usage hint just like whether it needs row-wise/column-wise data.

We override internal multiple times, enough to make it feel redundant. These are usually in special cases outside a layer's normal operation (when primary weights are quantized, when setting tensor.data).

Maybe tex.quantize should have an option to force internal=False, but otherwise respect the quantizer's config? This seems a little overcomplicated though.

timmoon10 · 2025-06-13T03:25:24Z

#1847 handles the main issue this PR was addressing: unnecessary memory usage when initializing quantized weights for use in inference. This API is more general though and we may revisit in the future for specialized use-cases, e.g. when it is worth sacrificing performance for reduced memory usage.

Add recipe heuristics for initializing quantized weights

586a081

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 requested review from ksivaman and ptrendx May 27, 2025 22:16

pre-commit-ci Bot and others added 2 commits May 27, 2025 22:18

[pre-commit.ci] auto fixes from pre-commit.com hooks

6efac63

for more information, see https://pre-commit.ci

Merge branch 'main' into quantized-param-heuristics

6413aaf

timmoon10 mentioned this pull request May 29, 2025

Fix MXFP8-training related issue #1832

Closed

12 tasks

timmoon10 added 2 commits May 29, 2025 18:05

Merge branch 'main' into quantized-param-heuristics

cd325cd

Do not destroy unnecessary usages in quantized weight tensors

9cb709d

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 commented May 29, 2025

View reviewed changes

ptrendx reviewed May 29, 2025

View reviewed changes

timmoon10 mentioned this pull request Jun 4, 2025

[PyTorch] Inference mode disables initializing quantized weights with column-wise usage #1847

Merged

13 tasks

timmoon10 marked this pull request as draft June 5, 2025 20:01

timmoon10 closed this Jun 13, 2025

timmoon10 mentioned this pull request Jun 16, 2025

Storage in fp8 #1880

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Recipe heuristics for initializing quantized weights#1827

[PyTorch] Recipe heuristics for initializing quantized weights#1827
timmoon10 wants to merge 5 commits intoNVIDIA:mainfrom
timmoon10:quantized-param-heuristics

timmoon10 commented May 27, 2025

Uh oh!

timmoon10 commented May 27, 2025

Uh oh!

timmoon10 May 29, 2025 •

edited

Loading

Uh oh!

ptrendx May 29, 2025

Uh oh!

timmoon10 May 30, 2025 •

edited

Loading

Uh oh!

timmoon10 commented May 29, 2025

Uh oh!

ptrendx May 29, 2025

Uh oh!

timmoon10 May 30, 2025 •

edited

Loading

Uh oh!

ptrendx May 29, 2025

Uh oh!

timmoon10 May 30, 2025

Uh oh!

timmoon10 commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Configure quantizer
	if weight_quantizer is not None:
	columnwise_usage = is_grad_enabled and inp.requires_grad
	if not columnwise_usage:
	columnwise_usage = (
	is_fp8_activation_recompute_enabled()
	and not in_fp8_activation_recompute_phase()
	)
	weight_quantizer.set_usage(rowwise=True, columnwise=columnwise_usage)

Conversation

timmoon10 commented May 27, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented May 27, 2025

Uh oh!

timmoon10 May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx May 29, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented May 29, 2025

Uh oh!

ptrendx May 29, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx May 29, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timmoon10 May 29, 2025 •

edited

Loading

timmoon10 May 30, 2025 •

edited

Loading

timmoon10 May 30, 2025 •

edited

Loading