Skip to content

Temporarily disable latest pulp and mcore until we fix its nvidia-resiliency-ext dependency#1285

Merged
kevalmorabia97 merged 2 commits intomainfrom
kevalmorabia97-patch-1
Apr 17, 2026
Merged

Temporarily disable latest pulp and mcore until we fix its nvidia-resiliency-ext dependency#1285
kevalmorabia97 merged 2 commits intomainfrom
kevalmorabia97-patch-1

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented Apr 17, 2026

  • megatron-core==0.17.0 released yesterday which requires nightly version of nvidia-resiliency-ext for an import. Pre-installed version in DLFW Pytorch container is nvidia-resiliency-ext==0.5.0
    • Temporarily pin mcore<0.17.0 to unblock PR from merging.
  • Pin pulp<4.0 as it has some breaking changes and release imminent

Correct fix is to just use nemo:26.04 container instead of PyTorch container for megatron-based tests since it always has correct combination of all packages needed for the megatron ecosystem - Done in #1286

…xt dependency

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner April 17, 2026 07:07
@kevalmorabia97 kevalmorabia97 changed the title Temporarily disable latest mcore until we fix its nvidia-resiliency-e… Temporarily disable latest mcore until we fix its nvidia-resiliency-ext dependency Apr 17, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Updated dependency constraints: tox.ini now installs megatron-core<0.17.0 in the cuda13-gpu-megatron test environment pre-install step, and pyproject.toml constrains pulp to pulp<4.0 in [project].dependencies.

Changes

Cohort / File(s) Summary
Test Environment Configuration
tox.ini
Changed cuda13-gpu-megatron pre-install from pip install -U megatron-core to pip install 'megatron-core<0.17.0'.
Project Dependencies
pyproject.toml
Constrained pulp dependency from pulp to pulp<4.0 under [project].dependencies.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns ✅ Passed PR introduces only version constraint changes to existing dependencies without adding new Python code or security anti-patterns.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: pinning both pulp and megatron-core to avoid incompatible versions until the nvidia-resiliency-ext dependency is resolved.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kevalmorabia97-patch-1

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 17, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-17 11:01 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.56%. Comparing base (7e82a5c) to head (0e9984b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1285      +/-   ##
==========================================
+ Coverage   72.74%   76.56%   +3.81%     
==========================================
  Files         459      459              
  Lines       48611    48611              
==========================================
+ Hits        35364    37218    +1854     
+ Misses      13247    11393    -1854     
Flag Coverage Δ
examples 41.38% <ø> (+1.94%) ⬆️
gpu 59.96% <ø> (+7.76%) ⬆️
unit 52.20% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 changed the title Temporarily disable latest mcore until we fix its nvidia-resiliency-ext dependency Temporarily disable latest pulp and mcore until we fix its nvidia-resiliency-ext dependency Apr 17, 2026
@kevalmorabia97 kevalmorabia97 merged commit 4e33368 into main Apr 17, 2026
59 of 61 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kevalmorabia97-patch-1 branch April 17, 2026 11:01
kevalmorabia97 added a commit that referenced this pull request Apr 18, 2026
…plify CI workflows (#1286)

### What does this PR do?

Type of change: New feature / infrastructure improvement

Follow-up to #1285 for correct CI test environment for megatron based
tests

Replaces `tox` + `tox-current-env` with `nox` for all test, lint, docs,
and wheel build sessions. The primary motivation was that
`tox-current-env` is incompatible with uv venvs in NGC containers (e.g.
NeMo's `/opt/venv`) — it picks the system Python via
`sys._base_executable` instead of the container's venv Python which has
megatron packages pre-installed.

Key changes:
- **`noxfile.py`** replaces `tox.ini` with GPU, CPU unit,
partial-install, pre-commit, docs, and wheel sessions
- **GPU sessions** use `venv_backend="none"` (run directly in container
env) and `python -m pip/pytest` to avoid PATH mismatches
- **uv** is set as the default venv backend (if available) for CPU
sessions (faster installs)

Also includes CI workflow simplifications:
- **`_pr_gate.yml`** new reusable workflow centralizing file-change
detection + linux-check wait logic (was duplicated across 3 workflow
files)
- **Collapsed pr/non-pr job pairs** into single jobs with conditional
`runs-on` in `gpu_tests.yml`, `example_tests.yml`,
`regression_tests.yml`
- **Collapsed `multi-py` / `multi-torch` / `multi-transformers`** into a
single `multi-version` matrix job in `unit_tests.yml`
- **PR path filtering** for unit test secondary jobs (multi-version,
launcher, partial-install) — skipped if no relevant files changed
- **Fixed schedule/workflow_dispatch skipping** — jobs with `needs:
[pr-gate]` were incorrectly skipped when all pr-gate internal jobs were
skipped; fixed by making the gate job always run
- **multi-version, launcher, partial-install** now also run on
`schedule` / `workflow_dispatch`

### Usage

```bash
python -m pip install nox uv                                                    # install nox and uv (once)
nox -l                                                                          # list all sessions
nox -s gpu_megatron                                                             # run a GPU session (inside container)
nox -s "unit-3.12(torch_211, tf_latest)"                                        # run a specific unit test combination
nox -s "unit-3.12(torch_211, tf_latest)" -R                                     # force-recreate venv (e.g. after dep changes)
COVERAGE_PROCESS_START=pyproject.toml nox -s "unit-3.12(torch_211, tf_latest)"  # with coverage
```

### Testing
- Ran `nox -l` to verify all session names
- Ran `gpu_megatron` session locally inside NeMo container — confirmed
it uses `/opt/venv/bin/python` correctly
- Manually triggered nightly-runs:
- Unit:
https://github.com/NVIDIA/Model-Optimizer/actions/runs/24608013657
- GPU:
https://github.com/NVIDIA/Model-Optimizer/actions/runs/24608018763
- Examples:
https://github.com/NVIDIA/Model-Optimizer/actions/runs/24608017322

### Before your PR is "*Ready for review*"

Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).

Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).

- Is this change backward compatible?: N/A — CI infrastructure only
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ (added `nox`
and `uv` to `dev-test`, both Apache-2.0)
- Did you write any new necessary tests?: N/A
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A — no user-facing changes

### Additional Information
Supersedes the tox-current-env workaround in the parent branch.

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant