Skip to content

Modernize docker builds for 2.0 release#1428

Merged
ktangsali merged 9 commits intoNVIDIA:mainfrom
ktangsali:docker-updates-2.0
Feb 19, 2026
Merged

Modernize docker builds for 2.0 release#1428
ktangsali merged 9 commits intoNVIDIA:mainfrom
ktangsali:docker-updates-2.0

Conversation

@ktangsali
Copy link
Collaborator

@ktangsali ktangsali commented Feb 18, 2026

PhysicsNeMo Pull Request

Description

This PR cleans up the dockerfile and makes following changes:

  1. Uses uv to install packages (system wide and not in a virtual env)
  2. Tries to leverage the dependency groups specified in the toml file as much as possible to ensure consistency between the two approaches
  3. Simplifies the stages. Builder contains all the if-else logic, deploy is builder - mlflow and wandb (these two are removed as these packages typically have CVEs). We can also remove any other packages from builder here depending on the license / security guidance as need be. CI is builder + dev group, and a few other dependencies
  4. Locks the version of cupy. cupy-cuda13x==14.0.0 (released on Feb 16, 2026) introduces a bug (see below). Locking the version fixes it.
name = 'cupy_backends.cuda.libs.torch', import_ = <function _gcd_import at 0x7c5fb53000e0>

>   ???
E   torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
E   ModuleNotFoundError: No module named 'cupy_backends.cuda.libs.torch'
E   
E   Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

<frozen importlib._bootstrap>:1324: BackendCompilerFailed

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@ktangsali ktangsali changed the title Modernize docker builds Modernize docker builds for 2.0 release Feb 18, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 18, 2026

Greptile Summary

This PR modernizes the Docker build by replacing pip with uv for package installation, bumping the base image from pytorch:25.09-py3 to pytorch:26.01-py3, simplifying the stage hierarchy (builder -> ci/deploy -> docs), moving torch_cluster installation from CI-only to the builder stage, and consolidating many explicit dependency installs into uv pip install ".[cu13,utils-extras,...]" leveraging pyproject.toml extras. The Makefile is updated to rename MODULUS_GIT_HASH to PHYSICSNEMO_GIT_HASH.

  • Build-breaking bug: uv pip install --group dev on line 217 is invalid -- --group is not a recognized flag for uv pip install (it's only available on uv sync/uv run). The CI image build will fail at this step.
  • Editable install + source deletion: The CI stage installs physicsnemo in editable mode then deletes /physicsnemo/, which breaks import physicsnemo unless the CI runner remounts the source. This should either be documented or changed to a non-editable install.
  • UV_CONSTRAINT may point to nonexistent file: The env var is set unconditionally, but the constraint file may not exist in all base images, which would cause all subsequent uv pip install calls to fail.
  • Unpinned uv:latest: Using ghcr.io/astral-sh/uv:latest risks non-reproducible builds; consider pinning a specific version.

Important Files Changed

Filename Overview
Dockerfile Major Dockerfile modernization using uv, but contains a build-breaking bug (uv pip install --group dev is invalid syntax), a logic issue with editable install + source deletion, and a potential issue with UV_CONSTRAINT pointing to a nonexistent file.
Makefile Simple rename of MODULUS_GIT_HASH to PHYSICSNEMO_GIT_HASH build arg to match the Dockerfile's ARG name. Correct and straightforward.

Last reviewed commit: fb02afd

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@ktangsali ktangsali requested a review from NickGeneva February 18, 2026 22:46
Copy link
Collaborator

@NickGeneva NickGeneva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me thanks for addressing feedback

@ktangsali
Copy link
Collaborator Author

/blossom-ci

@ktangsali ktangsali enabled auto-merge February 19, 2026 09:17
@ktangsali ktangsali added this pull request to the merge queue Feb 19, 2026
Merged via the queue into NVIDIA:main with commit 70b06ed Feb 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants