[Common] Fix incorrect amax initialization in non-RHT NVFP4 C++ tests by Oleg-Goncharov · Pull Request #2943 · NVIDIA/TransformerEngine

Oleg-Goncharov · 2026-04-30T12:00:11Z

Description

This PR fixes the C++ test infrastructure for the non-RHT NVFP4 path.

Previously, the test flow populated the unused scale field instead of amax, which caused incorrect CPU/GPU comparisons in the non-RHT coverage. This change updates the test setup to initialize amax correctly.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Fix incorrect amax initialization
No kernel changes

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-04-30T12:03:48Z

Greptile Summary

This PR fixes the NVFP4 non-RHT C++ test path where the output tensor's scale field was incorrectly populated instead of amax, causing CPU/GPU reference comparisons to fail. The fix realigns the NVTE_NVFP4_1D_SCALING path in test_common to allocate and sync rowwise and columnwise amax buffers (rather than a scale buffer), and updates performTest to set a stable golden amax (448 × 6 × 8 = 21504) via the new setters. As a bonus, the destructor now frees the previously-leaked amax, columnwise_amax, and scale GPU allocations.

Confidence Score: 5/5

Safe to merge — targeted test-infrastructure fix with no kernel changes and correct memory lifecycle.

All three files contain narrow, well-scoped changes. The allocation/free symmetry in the new constructor + destructor is correct (NVTE_DELAYED_TENSOR_SCALING gets amax + scale freed; NVTE_NVFP4_1D_SCALING gets amax + columnwise_amax freed; scale is null for NVFP4 so no double-free). The from_cpu/to_cpu paths for NVFP4 correctly gate on rowwise_/columnwise_ flags. No production code is touched.

No files require special attention.

Important Files Changed

Filename	Overview
tests/cpp/operator/test_cast_nvfp4_transpose.cu	Replaces dynamic amax computation + set_scale with a hardcoded golden amax and set_tensor_amax/set_tensor_amax_columnwise, fixing the root cause of the CPU/GPU comparison mismatch.
tests/cpp/test_common.cu	Switches NVTE_NVFP4_1D_SCALING from allocating a scale buffer to allocating rowwise and columnwise amax buffers, and adds proper CPU↔GPU sync paths for them in to_cpu()/from_cpu().
tests/cpp/test_common.h	Adds amax_cpu_data_columnwise_ member, amax_columnwise() accessor, set_tensor_amax/set_tensor_amax_columnwise setters, and extends the destructor to free amax, columnwise_amax, and scale GPU allocations (fixing a pre-existing memory leak).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Tensor constructor\nNVTE_NVFP4_1D_SCALING] --> B[cudaMalloc amax\ncudaMalloc amax_columnwise]
    B --> C[tensor_.set_amax\ntensor_.set_columnwise_amax]
    C --> D[set_tensor_amax\nset_tensor_amax_columnwise]
    D --> E[from_cpu\ncopies amax to GPU]
    E --> F[GPU kernel runs\nreads amax, writes scale_inv & data]
    F --> G[to_cpu\ncopies amax, scale_inv & data back]
    G --> H[compare_nvfp4_tensors\nCPU vs GPU reference check]

_{Reviews (4): Last reviewed commit: "Fixed memory leakage" | Re-trigger Greptile}

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

timmoon10

This is fine as a quick fix. I think the C++ testing infrastructure has gotten to the point where it is too messy to really trust. The recipe-specific logic is complicated and has many unhandled edge cases.

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Oleg-Goncharov · 2026-04-30T18:12:13Z

/te-ci

Oleg-Goncharov and others added 5 commits April 30, 2026 00:02

Patch for NVFP4 test suite

9265aa0

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

C++ tests fix

89b1f3b

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Cleanup

cc4e85d

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Merge branch 'main' into pr_nvfp4_cpp_tests_fix

2522544

[pre-commit.ci] auto fixes from pre-commit.com hooks

3af351c

for more information, see https://pre-commit.ci

Oleg-Goncharov added 2 commits April 30, 2026 12:12

Removed dead code

8b21b76

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Set the golden value for amax in tests

6d31962

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Oleg-Goncharov requested review from ptrendx and timmoon10 and removed request for ptrendx April 30, 2026 17:18

timmoon10 previously approved these changes Apr 30, 2026

View reviewed changes

Comment thread tests/cpp/test_common.cu

Fixed memory leakage

8f84455

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

Oleg-Goncharov dismissed timmoon10’s stale review via 8f84455 April 30, 2026 18:06

timmoon10 approved these changes Apr 30, 2026

View reviewed changes

timmoon10 merged commit 4fafdf2 into NVIDIA:main May 1, 2026
11 of 14 checks passed

timmoon10 mentioned this pull request May 6, 2026

Refactor tensor class in C++ unit tests #2962

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Common] Fix incorrect amax initialization in non-RHT NVFP4 C++ tests#2943

[Common] Fix incorrect amax initialization in non-RHT NVFP4 C++ tests#2943
timmoon10 merged 8 commits intoNVIDIA:mainfrom
Oleg-Goncharov:pr_nvfp4_cpp_tests_fix

Oleg-Goncharov commented Apr 30, 2026

Uh oh!

greptile-apps Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

timmoon10 left a comment •

edited

Loading

Uh oh!

Uh oh!

Oleg-Goncharov commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Oleg-Goncharov commented Apr 30, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

timmoon10 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Oleg-Goncharov commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Apr 30, 2026 •

edited

Loading

timmoon10 left a comment •

edited

Loading