[Common] Fix incorrect amax initialization in non-RHT NVFP4 C++ tests#2943
[Common] Fix incorrect amax initialization in non-RHT NVFP4 C++ tests#2943timmoon10 merged 8 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR fixes the NVFP4 non-RHT C++ test path where the output tensor's Confidence Score: 5/5Safe to merge — targeted test-infrastructure fix with no kernel changes and correct memory lifecycle. All three files contain narrow, well-scoped changes. The allocation/free symmetry in the new constructor + destructor is correct (NVTE_DELAYED_TENSOR_SCALING gets amax + scale freed; NVTE_NVFP4_1D_SCALING gets amax + columnwise_amax freed; scale is null for NVFP4 so no double-free). The from_cpu/to_cpu paths for NVFP4 correctly gate on rowwise_/columnwise_ flags. No production code is touched. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Tensor constructor\nNVTE_NVFP4_1D_SCALING] --> B[cudaMalloc amax\ncudaMalloc amax_columnwise]
B --> C[tensor_.set_amax\ntensor_.set_columnwise_amax]
C --> D[set_tensor_amax\nset_tensor_amax_columnwise]
D --> E[from_cpu\ncopies amax to GPU]
E --> F[GPU kernel runs\nreads amax, writes scale_inv & data]
F --> G[to_cpu\ncopies amax, scale_inv & data back]
G --> H[compare_nvfp4_tensors\nCPU vs GPU reference check]
Reviews (4): Last reviewed commit: "Fixed memory leakage" | Re-trigger Greptile |
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
|
/te-ci |
Description
This PR fixes the C++ test infrastructure for the non-RHT NVFP4 path.
Previously, the test flow populated the unused scale field instead of amax, which caused incorrect CPU/GPU comparisons in the non-RHT coverage. This change updates the test setup to initialize amax correctly.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: