Modifying the NVFP4 block quantization kernel to handle the single le… by protonu · Pull Request #5680 · NVIDIA/Fuser

protonu · 2025-12-15T17:36:23Z

The quantization needs to NVFP4 when there's no global scale needs a minor correction in the math.
This PR also updates the existing C++ test to reflect that.
We can't test this against TE as yet since there's no implementation without the global scale.

…vel quantization correctly

github-actions · 2025-12-15T17:37:12Z

Review updated until commit c9b7cf1

Description

Fix NVFP4 block quantization kernel math for single-level quantization (no global scale)
Add reciprocal operation when global scale is not used in both kernel and test
Correct quantization scaling factor calculation for standalone block quantization
Update C++ test to validate the fixed single-level quantization behavior

Changes walkthrough

Relevant files

Bug fix

block_quantization_kernels.cu `Fix NVFP4 kernel quantization math for single-level` runtime/block_quantization_kernels.cu Add else branch to handle single-level quantization by taking reciprocal of scaled_max Fix quantization math when USE_GLOBAL_SCALE is false Ensure proper scaling factor calculation without global scale	+2/-0
test_low_precision_recipe.cpp `Update NVFP4 test for single-level quantization fix` tests/cpp/test_low_precision_recipe.cpp Add else branch to handle single-level quantization by taking reciprocal of block scale Update test to validate fixed quantization behavior without global scale Ensure test covers both global scale and single-level quantization cases	+2/-0

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Numerical Stability The reciprocal operation `1.0f / scaled_max` could cause numerical issues if `scaled_max` is zero or very close to zero. Consider adding bounds checking or using a safe reciprocal function to prevent division by zero or overflow. scaled_max = 1.0f / scaled_max; Test Coverage While the test is updated to handle the no-global-scale case, verify that the test actually exercises this code path by setting `use_global_scale` to false in the test cases. Consider adding specific test cases that validate the mathematical correctness of the reciprocal operation. } else { tv_block_scale_fp32 = reciprocal(tv_block_scale_fp32); }

Test failures

(Medium, 1) CUDA OOM in nvFuser TmaPointwiseTestF.SplitGridDim2D (runner dlcluster_h100)

Test Name H100 Source

TmaPointwiseTestF.SplitGridDim2D ❌ Link

greptile-apps · 2025-12-15T17:38:14Z

Greptile Summary

This PR fixes a mathematical bug in the NVFP4 block quantization kernel for single-level quantization (when no global scale is provided).

Key Changes:

Added missing reciprocal calculation (scaled_max = 1.0f / scaled_max) in the single-level quantization path in runtime/block_quantization_kernels.cu:133-135
Updated the C++ test fusion in tests/cpp/test_low_precision_recipe.cpp:177-179 to match the corrected kernel logic using reciprocal() operation
The fix ensures mathematical consistency between single-level and two-level quantization paths

Technical Context:
The quantization process computes a block scaling factor by:

Finding the max value in a block
Scaling it down by 1/6 (since FP4 max is 6.0)
Converting to FP8 and back to quantize the scale itself
Computing the reciprocal to get the final scaling factor for data quantization

The bug was that step 4 was missing for the single-level case, while it was correctly implemented for the two-level case (line 132). This PR adds the symmetrical logic for single-level quantization.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The change fixes a clear mathematical bug in the single-level NVFP4 quantization path by adding the missing reciprocal operation. The fix mirrors the existing two-level quantization logic and is correctly reflected in both the CUDA kernel and C++ test code. The change is minimal, focused, and mathematically sound.
No files require special attention

Important Files Changed

Filename	Overview
runtime/block_quantization_kernels.cu	Added missing reciprocal calculation for single-level NVFP4 quantization when no global scale is used
tests/cpp/test_low_precision_recipe.cpp	Updated test to match the corrected runtime kernel by adding reciprocal operation for single-level quantization

Sequence Diagram

sequenceDiagram
    participant Input as Input Data (FP32/BF16)
    participant Kernel as block_quantize_to_nvfp4
    participant MaxCalc as Max Calculation
    participant FP8Conv as FP8 Conversion
    participant Scale as Scale Computation
    participant Quant as Quantization

    Input->>Kernel: Array of values (ITEMS_PER_THREAD)
    Kernel->>MaxCalc: Convert to float & compute local max
    MaxCalc->>MaxCalc: Reduce across threads (16/ITEMS_PER_THREAD)
    MaxCalc-->>Kernel: block_max
    
    alt USE_GLOBAL_SCALE = true
        Kernel->>FP8Conv: scaled_max = block_max * global_scale * (1/6)
    else USE_GLOBAL_SCALE = false
        Kernel->>FP8Conv: scaled_max = block_max * (1/6)
    end
    
    FP8Conv->>FP8Conv: clamped_max_fp8 = __float2e4m3(scaled_max)
    FP8Conv->>FP8Conv: scaled_max = __e4m32float(clamped_max_fp8)
    FP8Conv-->>Scale: quantized block scale
    
    alt USE_GLOBAL_SCALE = true
        Scale->>Scale: scaled_max = global_scale / scaled_max
    else USE_GLOBAL_SCALE = false (NEW)
        Scale->>Scale: scaled_max = 1.0 / scaled_max
    end
    
    Scale-->>Quant: scaling factor (scaled_max)
    Quant->>Quant: Write block_scales to global memory
    Quant->>Quant: Scale input values by scaled_max
    Quant->>Quant: Convert to FP4: __float2e2m1
    Quant-->>Kernel: Quantized FP4 output

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

protonu · 2025-12-15T17:38:27Z

!test

protonu · 2025-12-16T22:31:33Z

!test

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

protonu · 2025-12-17T16:01:24Z

!test

modifying the NVFP4 block quantization kernel to handle the single le…

4a1aecb

…vel quantization correctly

protonu requested a review from jjsjann123 December 15, 2025 17:36

greptile-apps bot reviewed Dec 15, 2025

View reviewed changes

jjsjann123 approved these changes Dec 16, 2025

View reviewed changes

Merge branch 'main' into pbasu/nvfp4_onelevel

0f02269

greptile-apps bot reviewed Dec 16, 2025

View reviewed changes

Merge branch 'main' into pbasu/nvfp4_onelevel

c9b7cf1

jjsjann123 merged commit 250709a into main Dec 18, 2025
57 of 61 checks passed

jjsjann123 deleted the pbasu/nvfp4_onelevel branch December 18, 2025 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifying the NVFP4 block quantization kernel to handle the single le…#5680

Modifying the NVFP4 block quantization kernel to handle the single le…#5680
jjsjann123 merged 3 commits intomainfrom
pbasu/nvfp4_onelevel

protonu commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 15, 2025 •

edited by xwang233

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

greptile-apps bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

protonu commented Dec 15, 2025

Uh oh!

protonu commented Dec 16, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

protonu commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

protonu commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 15, 2025 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

greptile-apps bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

protonu commented Dec 15, 2025

Uh oh!

protonu commented Dec 16, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

protonu commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 15, 2025 •

edited by xwang233

Loading

greptile-apps bot commented Dec 15, 2025 •

edited

Loading