mnnvl guard by francesco-bertolotti · Pull Request #3013 · NVIDIA/TransformerEngine

francesco-bertolotti · 2026-05-19T14:09:57Z

I am trying to compile TE on the CINECA Leonardo cluster and encountered a compilation issue that required a small fix to work around. Since this may also affect other environments with mixed CUDA header/toolkit versions, I am submitting this PR in case it is useful more broadly.

I should also mention that some TE tests are currently failing in my environment. From what I can tell so far, these failures appear unrelated to this change and are more likely tied to the system configuration. I still need to investigate them further, but any feedback or insight on that side would be appreciated.

Thank you for your work on TE and your time with this PR!

Description

This fixes a build failure when compiling TE in environments where the CUDA headers provided by pip-installed nvidia-cuda-runtime-cu12 are newer than the system CUDA toolkit.

I encountered this on the CINECA Leonardo cluster (RHEL 8, A100 SXM4, driver 535, system CUDA 12.2, pip nvidia-cuda-runtime-cu12 12.6), but the issue is not cluster-specific and can occur on any system with mixed CUDA header versions.

The failure looks like:

transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers-host.cpp:123:5:
error: 'nvmlGpuFabricInfoV_t' was not declared in this scope
    nvmlGpuFabricInfoV_t fabricInfo = {};

Root cause

The build currently relies on CUDA_VERSION to determine whether the newer NVML fabric APIs are available:

#if CUDA_VERSION < 12040

However, in mixed environments there are effectively two independent version sources:

Source	Controlled by	Example version
CUDA_VERSION	pip nvidia-cuda-runtime-cu12 headers	≥ 12.4
nvml.h	system CUDA toolkit	12.2

If the pip CUDA headers are newer than the system toolkit:

CUDA_VERSION reports ≥ 12.4
the #else branch is enabled
but the system nvml.h does not define:
- nvmlGpuFabricInfoV_t
- nvmlGpuFabricInfo_v2

which causes compilation to fail.

This can happen on HPC clusters, shared cloud nodes, or developer systems where Python CUDA packages are updated independently from the system CUDA installation.

Fix

Replace the version-based guard with a capability-based guard:

- #if CUDA_VERSION < 12040
+ #if !defined(nvmlGpuFabricInfo_v2)

nvmlGpuFabricInfo_v2 is introduced in CUDA 12.4 and is the actual API feature required by the code below. Checking for the symbol directly avoids assuming that all CUDA-related headers come from the same toolkit version.

This makes the logic resilient to mismatched header installations while preserving existing behavior.

Behaviour after this change

CUDA ≥ 12.4 toolkit:
- nvmlGpuFabricInfo_v2 is defined
- existing MNNVL detection path is compiled
CUDA < 12.4 toolkit:
- nvmlGpuFabricInfo_v2 is not defined
- function returns false as before
Mixed-header environments:
- CUDA_VERSION may report ≥ 12.4
- but nvmlGpuFabricInfo_v2 is absent
- code correctly falls back to return false
  build succeeds

This fallback is also semantically correct, since MNNVL support is only relevant on newer H100/GH200-class systems and should return false on A100-era systems regardless.

Testing

Verified on CINECA Leonardo:

A100 SXM4
driver 535
system CUDA 12.2
pip nvidia-cuda-runtime-cu12 12.6

Unrelated but I had to add NVIDIA_TF32_OVERRIDE=0 to test_numerics.py otherwise I would get test failing for small numerical mismatch with layer norms. This has also been done for test_mhc.py.

greptile-apps · 2026-05-19T14:13:41Z

Greptile Summary

Replaces #if CUDA_VERSION < 12040 with #if !defined(nvmlGpuFabricInfo_v2) in has_mnnvl_fabric so that the MNNVL-detection path is gated on whether the symbol is actually present in the installed NVML headers rather than on a version macro that may come from a different (pip-installed) source.

The one-line guard change is logically correct: nvmlGpuFabricInfo_v2 is a macro introduced in the CUDA 12.4 NVML headers, so defined() reliably detects its availability at compile time and survives mixed-header environments.
A pre-existing debug message in the false branch ("since it was not built with CUDA version >= 12.4") no longer accurately describes the new guard condition; an earlier review comment flagged this and a corrected string was proposed there.

Confidence Score: 5/5

Safe to merge; the one-line guard change is correct and the fallback behavior is preserved on all header configurations.

The fix correctly switches from a version macro that can originate from a different package than the NVML headers to a symbol that is defined directly in nvml.h, making the guard self-consistent. The false branch still returns false and the true branch is unchanged, so no existing behavior is altered on a standard single-version CUDA installation. The only open item is a stale debug string that was already flagged in a prior review pass.

No files require special attention beyond the already-flagged debug message on line 99.

Important Files Changed

Filename	Overview
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers-host.cpp	Replaces CUDA_VERSION-based guard with capability-based `!defined(nvmlGpuFabricInfo_v2)` guard in `has_mnnvl_fabric`; the debug message in the false branch still references the old CUDA-version framing (flagged in a prior review comment).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["has_mnnvl_fabric(device_id)"] --> B{"nvmlGpuFabricInfo_v2 defined?\n(compile-time check)"}
    B -- "Not defined" --> C["Log debug message\nReturn false"]
    B -- "Defined\n(NVML headers >= 12.4)" --> D{"cudart_version() >= 12040?\n(run-time check)"}
    D -- "No" --> E["Log debug message\nReturn false"]
    D -- "Yes" --> F{"fabric handle\nsupported on device?"}
    F -- "No" --> G["Return false"]
    F -- "Yes" --> H["Query nvmlGpuFabricInfoV_t\nvia nvmlDeviceGetGpuFabricInfoV"]
    H --> I{"fabric state COMPLETED\n& non-zero clusterUuid?"}
    I -- "No" --> J["Return false"]
    I -- "Yes" --> K["Return true\n(MNNVL supported)"]

_{Reviews (4): Last reviewed commit: "reverting NVIDIA_TF32_OVERRIDE=0" | Re-trigger Greptile}

ptrendx · 2026-05-19T20:33:10Z

Thank you for the PR. I am a little hesitant with the TF32 override though. Could you split this PR into 2 - we could merge the MNNVL guard right away and then look into the TF32 changes?

francesco-bertolotti · 2026-05-20T05:02:01Z

Hi @ptrendx, thank you for the quick reply!

I have reverted the modification on the tests and moved it in this PR #3014

ptrendx · 2026-05-20T06:52:09Z

/te-ci

Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>

francesco-bertolotti force-pushed the f14-mnnvl-guard branch from 5f1a6ca to 5180ff4 Compare May 19, 2026 14:15

ptrendx added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label May 19, 2026

francesco-bertolotti mentioned this pull request May 20, 2026

adding NVIDIA_TF32_OVERRIDE=0 to test_numerics.py #3014

Open

ptrendx approved these changes May 20, 2026

View reviewed changes

francesco-bertolotti added 3 commits May 20, 2026 09:09

guarding nvmlGpuFabricInfo_v2

aa3334f

Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>

precision errors

850057b

Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>

reverting NVIDIA_TF32_OVERRIDE=0

e07592a

Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>

francesco-bertolotti force-pushed the f14-mnnvl-guard branch from b435798 to e07592a Compare May 20, 2026 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mnnvl guard#3013

mnnvl guard#3013
francesco-bertolotti wants to merge 3 commits into
NVIDIA:mainfrom
francesco-bertolotti:f14-mnnvl-guard

francesco-bertolotti commented May 19, 2026

Uh oh!

greptile-apps Bot commented May 19, 2026 •

edited

Loading

Uh oh!

ptrendx commented May 19, 2026

Uh oh!

francesco-bertolotti commented May 20, 2026

Uh oh!

ptrendx commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

francesco-bertolotti commented May 19, 2026

Description

Root cause

However, in mixed environments there are effectively two independent version sources:

Fix

Testing

Uh oh!

greptile-apps Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

ptrendx commented May 19, 2026

Uh oh!

francesco-bertolotti commented May 20, 2026

Uh oh!

ptrendx commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 19, 2026 •

edited

Loading