Temporarily disable latest pulp and mcore until we fix its nvidia-resiliency-ext dependency by kevalmorabia97 · Pull Request #1285 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-04-17T07:07:42Z

megatron-core==0.17.0 released yesterday which requires nightly version of nvidia-resiliency-ext for an import. Pre-installed version in DLFW Pytorch container is nvidia-resiliency-ext==0.5.0
- Temporarily pin mcore<0.17.0 to unblock PR from merging.
Pin pulp<4.0 as it has some breaking changes and release imminent

Correct fix is to just use nemo:26.04 container instead of PyTorch container for megatron-based tests since it always has correct combination of all packages needed for the megatron ecosystem - Done in #1286

…xt dependency Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

coderabbitai · 2026-04-17T07:07:57Z

📝 Walkthrough

Walkthrough

Updated dependency constraints: tox.ini now installs megatron-core<0.17.0 in the cuda13-gpu-megatron test environment pre-install step, and pyproject.toml constrains pulp to pulp<4.0 in [project].dependencies.

Changes

Cohort / File(s)	Summary
Test Environment Configuration `tox.ini`	Changed `cuda13-gpu-megatron` pre-install from `pip install -U megatron-core` to `pip install 'megatron-core<0.17.0'`.
Project Dependencies `pyproject.toml`	Constrained `pulp` dependency from `pulp` to `pulp<4.0` under `[project].dependencies`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns	✅ Passed	PR introduces only version constraint changes to existing dependencies without adding new Python code or security anti-patterns.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: pinning both pulp and megatron-core to avoid incompatible versions until the nvidia-resiliency-ext dependency is resolved.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kevalmorabia97-patch-1

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-17T07:12:55Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-17 11:01 UTC

codecov · 2026-04-17T07:21:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.56%. Comparing base (7e82a5c) to head (0e9984b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1285      +/-   ##
==========================================
+ Coverage   72.74%   76.56%   +3.81%     
==========================================
  Files         459      459              
  Lines       48611    48611              
==========================================
+ Hits        35364    37218    +1854     
+ Misses      13247    11393    -1854

Flag	Coverage Δ
examples	`41.38% <ø> (+1.94%)`	⬆️
gpu	`59.96% <ø> (+7.76%)`	⬆️
unit	`52.20% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

…plify CI workflows (#1286) ### What does this PR do? Type of change: New feature / infrastructure improvement Follow-up to #1285 for correct CI test environment for megatron based tests Replaces `tox` + `tox-current-env` with `nox` for all test, lint, docs, and wheel build sessions. The primary motivation was that `tox-current-env` is incompatible with uv venvs in NGC containers (e.g. NeMo's `/opt/venv`) — it picks the system Python via `sys._base_executable` instead of the container's venv Python which has megatron packages pre-installed. Key changes: - **`noxfile.py`** replaces `tox.ini` with GPU, CPU unit, partial-install, pre-commit, docs, and wheel sessions - **GPU sessions** use `venv_backend="none"` (run directly in container env) and `python -m pip/pytest` to avoid PATH mismatches - **uv** is set as the default venv backend (if available) for CPU sessions (faster installs) Also includes CI workflow simplifications: - **`_pr_gate.yml`** new reusable workflow centralizing file-change detection + linux-check wait logic (was duplicated across 3 workflow files) - **Collapsed pr/non-pr job pairs** into single jobs with conditional `runs-on` in `gpu_tests.yml`, `example_tests.yml`, `regression_tests.yml` - **Collapsed `multi-py` / `multi-torch` / `multi-transformers`** into a single `multi-version` matrix job in `unit_tests.yml` - **PR path filtering** for unit test secondary jobs (multi-version, launcher, partial-install) — skipped if no relevant files changed - **Fixed schedule/workflow_dispatch skipping** — jobs with `needs: [pr-gate]` were incorrectly skipped when all pr-gate internal jobs were skipped; fixed by making the gate job always run - **multi-version, launcher, partial-install** now also run on `schedule` / `workflow_dispatch` ### Usage ```bash python -m pip install nox uv # install nox and uv (once) nox -l # list all sessions nox -s gpu_megatron # run a GPU session (inside container) nox -s "unit-3.12(torch_211, tf_latest)" # run a specific unit test combination nox -s "unit-3.12(torch_211, tf_latest)" -R # force-recreate venv (e.g. after dep changes) COVERAGE_PROCESS_START=pyproject.toml nox -s "unit-3.12(torch_211, tf_latest)" # with coverage ``` ### Testing - Ran `nox -l` to verify all session names - Ran `gpu_megatron` session locally inside NeMo container — confirmed it uses `/opt/venv/bin/python` correctly - Manually triggered nightly-runs: - Unit: https://github.com/NVIDIA/Model-Optimizer/actions/runs/24608013657 - GPU: https://github.com/NVIDIA/Model-Optimizer/actions/runs/24608018763 - Examples: https://github.com/NVIDIA/Model-Optimizer/actions/runs/24608017322 ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: N/A — CI infrastructure only - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ (added `nox` and `uv` to `dev-test`, both Apache-2.0) - Did you write any new necessary tests?: N/A - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A — no user-facing changes ### Additional Information Supersedes the tox-current-env workaround in the parent branch. --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Temporarily disable latest mcore until we fix its nvidia-resiliency-e…

0a89da7

…xt dependency Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested a review from a team as a code owner April 17, 2026 07:07

kevalmorabia97 changed the title ~~Temporarily disable latest mcore until we fix its nvidia-resiliency-e…~~ Temporarily disable latest mcore until we fix its nvidia-resiliency-ext dependency Apr 17, 2026

kevalmorabia97 mentioned this pull request Apr 17, 2026

[CI] Replace tox with nox, use nemo:26.04 for megatron tests, and simplify CI workflows #1286

Merged

Update pyproject.toml

0e9984b

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 changed the title ~~Temporarily disable latest mcore until we fix its nvidia-resiliency-ext dependency~~ Temporarily disable latest pulp and mcore until we fix its nvidia-resiliency-ext dependency Apr 17, 2026

kevalmorabia97 merged commit 4e33368 into main Apr 17, 2026
59 of 61 checks passed

kevalmorabia97 deleted the kevalmorabia97-patch-1 branch April 17, 2026 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporarily disable latest pulp and mcore until we fix its nvidia-resiliency-ext dependency#1285

Temporarily disable latest pulp and mcore until we fix its nvidia-resiliency-ext dependency#1285
kevalmorabia97 merged 2 commits intomainfrom
kevalmorabia97-patch-1

kevalmorabia97 commented Apr 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kevalmorabia97 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kevalmorabia97 commented Apr 17, 2026 •

edited

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

github-actions Bot commented Apr 17, 2026 •

edited

Loading

codecov Bot commented Apr 17, 2026 •

edited

Loading