Build fixes by ktangsali · Pull Request #1640 · NVIDIA/physicsnemo

ktangsali · 2026-05-12T23:50:30Z

PhysicsNeMo Pull Request

Description

This PR includes fixes for two issues found in the latest builds:

Adds a small compat shim (physicsnemo/nn/module/_nvfuser_compat.py) so PhysicsNeMo's fused SiLU path works with both legacy nvfuser and the newer nvfuser_direct package (one in 26.04 PyTorch container). Also makes the import resilient - orphan .dist-info or partial installs now fall back to non-fused SiLU instead of crashing every GNN model on import. fused_silu.py and gnn_layers/mesh_graph_mlp.py now import from the shim; behavior is unchanged where legacy nvfuser already works.
natten dispatches na{1,2,3}d through torch.nn.attention.flex_attention, which raises NotImplementedError on CPU when any of q/k/v has requires_grad=True. The test_backward cases in test/nn/functional/test_natten.py therefore fail under the shared device=["cpu", "cuda:0"] fixture. Adds a small _skip_if_cpu_backward(device) helper and calls it at the top of the three backward tests so the CPU rows skip with an accurate reason while CUDA coverage is unchanged. No production code touched.

Full build logs: https://gitlab-master.nvidia.com/modulus/modulus-release-build-guide/-/jobs/316606693

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

copy-pr-bot · 2026-05-12T23:50:33Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

pzharrington · 2026-05-12T23:53:29Z

@ktangsali the natten issue should be taken care of by #1634

greptile-apps · 2026-05-12T23:53:36Z

Greptile Summary

This PR contains two targeted build fixes: an nvfuser compatibility shim (_nvfuser_compat.py) that makes the fused SiLU path work with both the legacy nvfuser and newer nvfuser_direct packages (and gracefully falls back when neither is importable), and a pytest helper that skips natten backward tests on CPU where flex_attention raises NotImplementedError.

nvfuser shim (_nvfuser_compat.py): Adds find_spec-guarded import logic, reimplements compute_contiguity, and provides unified FusionDefinition/DataType exports; fused_silu.py and mesh_graph_mlp.py are updated to import from the shim and replace define_constant calls with define_scalar.
natten test fix (test_natten.py): Adds _skip_if_cpu_backward(device) called at the top of the three backward test methods; CUDA coverage is unchanged.
Version bump: physicsnemo/__init__.py and CHANGELOG.md updated from 2.1.0a0 to 2.1.0.

Important Files Changed

Filename	Overview
physicsnemo/nn/module/_nvfuser_compat.py	New compat shim that unifies legacy nvfuser and nvfuser_direct behind one import surface; module docstring incorrectly claims define_constant is reimplemented.
physicsnemo/nn/module/fused_silu.py	Migrates nvfuser imports to compat shim; replaces define_constant with define_scalar; clean refactor with no logic changes.
physicsnemo/nn/module/gnn_layers/mesh_graph_mlp.py	Removes inline nvfuser import boilerplate in favour of compat shim; no logic changes.
test/nn/functional/test_natten.py	Adds _skip_if_cpu_backward helper; correctly skips 1D/2D/3D backward tests on CPU where FlexAttention raises NotImplementedError.
physicsnemo/init.py	Version bump from 2.1.0a0 to 2.1.0.
CHANGELOG.md	Version and date stamped to 2.1.0 / 2026-05-26; empty Removed and Security sections cleaned up.

_{Reviews (1): Last reviewed commit: "fix natten tests" | Re-trigger Greptile}

greptile-apps · 2026-05-12T23:53:39Z

+"""Compatibility shim for the legacy ``nvfuser`` package and the newer
+``nvfuser_direct`` package.
+
+The nvFuser Python frontend is split into two distributions: the legacy
+``nvfuser`` package (older PyTorch containers) and ``nvfuser_direct`` (newer
+containers). This module hides the difference behind a single import surface
+and reimplements the two helpers that exist only in the legacy package
+(``compute_contiguity`` and ``define_constant``) so the rest of PhysicsNeMo
+can target either backend without conditionals.


Inaccurate module docstring for define_constant

The docstring states this shim "reimplements the two helpers that exist only in the legacy package (compute_contiguity and define_constant)," but define_constant is never reimplemented here. The callers in fused_silu.py are instead updated to call fd.define_scalar() directly. Readers looking for a define_constant wrapper will be misled; the docstring should describe the actual migration strategy used.

greptile-apps · 2026-05-12T23:53:40Z

+        try:
+            return importlib.import_module(name), name
+        except ImportError as e:
+            logger.warning(
+                "Found %s on sys.path but failed to import (%s); "
+                "trying next backend.",
+                name,
+                e,
+            )


Narrow exception catch may miss non-ImportError failures

The except ImportError block covers the orphan-dist-info scenario described in the PR, but some container environments raise RuntimeError (or OSError) when a CUDA-compiled extension fails to load due to a driver/library mismatch. Those exceptions would propagate uncaught and crash every downstream import (fused_silu, mesh_graph_mlp), defeating the resilience goal. Consider broadening the catch to except (ImportError, RuntimeError) and logging a warning, since the fallback behaviour (NV_FUSER_AVAILABLE = False) is safe.

…dError Cherry-picked test/nn/functional/test_natten.py from upstream commit 7f2451a ("Ci deps group (NVIDIA#1634)"). The previous device == "cpu" early-skip was too broad; this wraps the forward call and only skips on the specific NotImplementedError raised by FlexAttention's CPU-backward guard. If natten picks a different backend (or FlexAttention ever supports CPU backward), the test will run.

ktangsali · 2026-05-13T00:49:07Z

/blossom-ci

mnabian

LGTM

* add fixes for the nvfuser bug * test(natten): narrow CPU-backward skip to FlexAttention NotImplementedError Cherry-picked test/nn/functional/test_natten.py from upstream commit 7f2451a ("Ci deps group (#1634)"). The previous device == "cpu" early-skip was too broad; this wraps the forward call and only skips on the specific NotImplementedError raised by FlexAttention's CPU-backward guard. If natten picks a different backend (or FlexAttention ever supports CPU backward), the test will run. * black formatting --------- Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

add fixes for the nvfuser bug

287223d

ktangsali requested review from loliverhennigh and mnabian as code owners May 12, 2026 23:50

ktangsali requested a review from pzharrington May 12, 2026 23:50

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

ktangsali changed the base branch from main to 2.1.0-rc May 12, 2026 23:53

ktangsali force-pushed the build-fixes branch from 7038b4f to 8ce79b1 Compare May 13, 2026 00:45

black formatting

e70f1e1

mnabian approved these changes May 13, 2026

View reviewed changes

ktangsali merged commit 23b6848 into NVIDIA:2.1.0-rc May 13, 2026
1 check passed

ktangsali deleted the build-fixes branch May 13, 2026 05:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build fixes#1640

Build fixes#1640
ktangsali merged 3 commits into
NVIDIA:2.1.0-rcfrom
ktangsali:build-fixes

ktangsali commented May 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

pzharrington commented May 12, 2026

Uh oh!

greptile-apps Bot commented May 12, 2026

Uh oh!

greptile-apps Bot May 12, 2026

Uh oh!

greptile-apps Bot May 12, 2026

Uh oh!

ktangsali commented May 13, 2026

Uh oh!

mnabian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ktangsali commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

pzharrington commented May 12, 2026

Uh oh!

greptile-apps Bot commented May 12, 2026

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

ktangsali commented May 13, 2026

Uh oh!

mnabian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ktangsali commented May 12, 2026 •

edited

Loading