Skip to content

Support configurable number of philox rounds for stochastic rounding#2751

Merged
ksivaman merged 2 commits intoNVIDIA:mainfrom
ksivaman:configurable_philox_rounds_sr
Mar 11, 2026
Merged

Support configurable number of philox rounds for stochastic rounding#2751
ksivaman merged 2 commits intoNVIDIA:mainfrom
ksivaman:configurable_philox_rounds_sr

Conversation

@ksivaman
Copy link
Copy Markdown
Member

Description

Support configurable number of philox rounds for stochastic rounding.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Add envvar NVTE_BUILD_NUM_PHILOX_ROUNDS to support configurable number of philox rounds for stochastic rounding.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
@ksivaman ksivaman requested a review from ptrendx March 10, 2026 22:22
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 10, 2026

Greptile Summary

This PR introduces a build-time configurable Philox round count for stochastic rounding kernels, replacing the previously hardcoded value of 10 across all relevant CUDA/CUDA headers with the NVTE_BUILD_NUM_PHILOX_ROUNDS preprocessor macro. The change is non-breaking and well-structured, with validation at both the CMake level (regex check for positive integer) and the C++ level (static_assert in common.h).

Key changes:

  • New NVTE_BUILD_NUM_PHILOX_ROUNDS environment variable read at CMake configure time, validated with a regex, and passed as a compile definition to the transformer_engine target (with PUBLIC scope so all consumers inherit it).
  • A #ifndef fallback in common.h ensures a safe default of 10 when the macro is not defined by the build system (e.g., IDE environments).
  • A static_assert(NVTE_BUILD_NUM_PHILOX_ROUNDS > 0, ...) in common.h acts as a compile-time safety net.
  • All 10 kernel files that instantiate philox4x32_native_state<10> have been consistently updated — no hardcoded occurrences remain.
  • Documentation added to docs/envvars.rst in the correct "Build Configuration" section.

One issue was found: in CMakeLists.txt, the emptiness check for the environment variable uses if (NOT NVTE_BUILD_NUM_PHILOX_ROUNDS_STR), which treats the string "0" as falsy in CMake's boolean evaluation. This causes the value 0 to silently fall through to the default of 10 instead of triggering the "must be a positive integer" error on the subsequent regex check.

Confidence Score: 4/5

  • Safe to merge; the one issue found is a minor CMake quirk that causes an invalid value ("0") to silently default to 10 rather than error, with the static_assert acting as a compile-time backstop.
  • The change is mechanically straightforward and consistently applied across all affected files. The validation strategy is sound (CMake + static_assert layering). The only issue is a CMake boolean evaluation edge case for the value "0" that causes a silent fallback rather than an error message. This does not affect correctness at compile time (the static_assert would catch 0 if it were somehow passed directly), but could mislead a user debugging unexpected behavior. All other aspects of the PR — substitution completeness, default value, documentation, and compile-time guards — are correct.
  • transformer_engine/common/CMakeLists.txt — the emptiness check for the env var should use STREQUAL "" instead of the boolean falsy check to avoid silently ignoring "0".

Important Files Changed

Filename Overview
transformer_engine/common/CMakeLists.txt Adds CMake logic to read, validate, and propagate NVTE_BUILD_NUM_PHILOX_ROUNDS. The regex validation correctly rejects non-positive integers, but the emptiness check uses a CMake boolean evaluation that silently ignores the value "0" (treats it as unset) rather than raising an error.
transformer_engine/common/common.h Adds a fallback #define for NVTE_BUILD_NUM_PHILOX_ROUNDS (defaulting to 10) and a static_assert ensuring the value is positive. Correct placement and good compile-time safety net.
transformer_engine/common/cast/nvfp4/core_nvfp4.cuh Replaces hardcoded philox4x32_native_state<10> with the configurable NVTE_BUILD_NUM_PHILOX_ROUNDS macro. Clean, correct substitution.
transformer_engine/common/cast/nvfp4/quantize_transpose_nvfp4.cuh Two occurrences of hardcoded <10> replaced with NVTE_BUILD_NUM_PHILOX_ROUNDS. Both substitutions are correct.
transformer_engine/common/transpose/quantize_transpose_vector_blockwise_fp4.cu Two occurrences (function signature and kernel body) updated to use NVTE_BUILD_NUM_PHILOX_ROUNDS. Correct and consistent.
docs/envvars.rst Documents the new NVTE_BUILD_NUM_PHILOX_ROUNDS build-time environment variable in the correct "Build Configuration" section with type, default, and description.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Build: NVTE_BUILD_NUM_PHILOX_ROUNDS env var set?"] -->|"No (empty string)"| B["Default to 10"]
    A -->|"Yes"| C{"Matches regex\n^[1-9][0-9]*$?"}
    C -->|"No"| D["CMake FATAL_ERROR"]
    C -->|"Yes"| E["Pass as -DNVTE_BUILD_NUM_PHILOX_ROUNDS=N\nto compiler (PUBLIC)"]
    B --> E
    E --> F{"common.h:\n#ifndef NVTE_BUILD_NUM_PHILOX_ROUNDS"}
    F -->|"Macro not defined\n(non-CMake build)"| G["#define NVTE_BUILD_NUM_PHILOX_ROUNDS 10"]
    F -->|"Macro already defined\nby compiler flag"| H["Use provided value"]
    G --> I["static_assert > 0"]
    H --> I
    I -->|"Pass"| J["philox4x32_native_state&lt;NVTE_BUILD_NUM_PHILOX_ROUNDS&gt;\nin all kernels"]
    I -->|"Fail"| K["Compile error"]
    style D fill:#f55,color:#fff
    style K fill:#f55,color:#fff
    style J fill:#5a5,color:#fff
Loading

Last reviewed commit: 92caffc

Comment on lines +306 to +308
if (NOT NVTE_BUILD_NUM_PHILOX_ROUNDS_STR)
set(NVTE_BUILD_NUM_PHILOX_ROUNDS_STR "10")
endif()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent default for value "0" due to CMake falsy evaluation

In CMake, if(NOT variable) evaluates the boolean truthiness of the variable's string value. The string "0" is treated as false in CMake boolean contexts, so NOT "0" is true. This means if a user explicitly sets NVTE_BUILD_NUM_PHILOX_ROUNDS=0, the check on line 306 will silently reset the value to "10" — the regex validation on line 309 is never reached with the original "0", and no error is emitted.

The intent is clearly to default only when the variable is unset (empty), not when it is set to any falsy value. Using STREQUAL "" makes this explicit:

Suggested change
if (NOT NVTE_BUILD_NUM_PHILOX_ROUNDS_STR)
set(NVTE_BUILD_NUM_PHILOX_ROUNDS_STR "10")
endif()
if (NVTE_BUILD_NUM_PHILOX_ROUNDS_STR STREQUAL "")
set(NVTE_BUILD_NUM_PHILOX_ROUNDS_STR "10")
endif()

@ptrendx
Copy link
Copy Markdown
Member

ptrendx commented Mar 10, 2026

/te-ci

@ksivaman ksivaman merged commit f6001c4 into NVIDIA:main Mar 11, 2026
36 of 42 checks passed
@ksivaman ksivaman deleted the configurable_philox_rounds_sr branch March 11, 2026 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants