Skip to content

ci: exclude vllm_inference and megatron from nightly recipe CI#1554

Merged
pstjohn merged 1 commit intoNVIDIA:mainfrom
svc-bionemo:svc-bionemo/fix-nightly-20260418-349dcee1
Apr 18, 2026
Merged

ci: exclude vllm_inference and megatron from nightly recipe CI#1554
pstjohn merged 1 commit intoNVIDIA:mainfrom
svc-bionemo:svc-bionemo/fix-nightly-20260418-349dcee1

Conversation

@svc-bionemo
Copy link
Copy Markdown
Collaborator

@svc-bionemo svc-bionemo commented Apr 18, 2026

Problem

The 26.03 container update brought transformers==5.5.4, which breaks the vllm_inference recipe CI:

vLLM 0.15.1 (current) + transformers 5.x

  • vLLM 0.15.1 hard-requires transformers < 5
  • install_vllm.sh installs vLLM (which pins transformers <5), then upgrades transformers back to 5.x
  • This creates a broken environment — vLLM is installed against transformers 4.x but runs with 5.x
  • Additionally, the nvidia/esm2_* Hub models have "tokenizer_class": "TokenizersBackend" in their tokenizer_config.json, a class that only exists in transformers 5.x. If vLLM downgrades to <5, tokenizer loading fails

vLLM 0.19.1 (latest, supports transformers 5.x)

  • Requires transformers >= 5.5.1
  • But uses register_opaque_type(hoist=True) in vllm/utils/torch_utils.py, gated behind is_torch_equal_or_newer("2.11.0.dev")
  • The NGC 26.03 torch build (2.11.0a0+nv26.03) matches that version check, but does not have the upstream hoist parameter on register_opaque_type yet
  • Result: TypeError: register_opaque_type() got an unexpected keyword argument 'hoist' at import time

Megatron duplicates

  • Megatron recipes (eden_megatron, evo2_megatron) already have a dedicated CI workflow (unit-tests-mbridge-recipes.yaml) but were also running as duplicates in this recipes workflow on nightly

Fix

Exclude both vllm_inference and megatron from the ALL_DIRS nightly enumeration in unit-tests-recipes.yml:

  • vllm_inference: blocked until either NGC ships torch with the upstream hoist API (enabling vLLM 0.19.1) or vLLM releases a version compatible with both the NGC torch build and transformers 5.x
  • megatron: removes duplicate nightly runs (already covered by unit-tests-mbridge-recipes.yaml)

Both were already excluded from PR changed-files detection but scheduled runs bypass that filter.

Related

Failing CI run

https://github.com/NVIDIA/bionemo-framework/actions/runs/24601629903

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b77b361-f013-44e1-8d9c-92952ec4427e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@pstjohn pstjohn enabled auto-merge April 18, 2026 12:34
auto-merge was automatically disabled April 18, 2026 14:27

Head branch was pushed to by a user without write access

@pstjohn pstjohn enabled auto-merge April 18, 2026 14:37
auto-merge was automatically disabled April 18, 2026 14:50

Head branch was pushed to by a user without write access

@svc-bionemo svc-bionemo force-pushed the svc-bionemo/fix-nightly-20260418-349dcee1 branch from 02bc46d to 814266f Compare April 18, 2026 14:50
@svc-bionemo svc-bionemo changed the title fix(vllm_inference): pin transformers<5 for vLLM 0.15.x compat fix(vllm_inference): upgrade vLLM to 0.19.1 for transformers 5.x compat Apr 18, 2026
@svc-bionemo svc-bionemo force-pushed the svc-bionemo/fix-nightly-20260418-349dcee1 branch from 814266f to 402d1d8 Compare April 18, 2026 16:41
@svc-bionemo svc-bionemo changed the title fix(vllm_inference): upgrade vLLM to 0.19.1 for transformers 5.x compat ci: exclude vllm_inference from nightly and PR CI runs Apr 18, 2026
vllm_inference: vLLM 0.15.1 requires transformers<5 but the 26.03
container ships transformers 5.x. Excluded until container catches up.

megatron recipes: already run via the dedicated mbridge-recipes workflow
(unit-tests-mbridge-recipes.yaml). Remove duplicate runs from the
recipes workflow to save CI resources.

Both were already excluded from PR changed-files detection. This also
excludes them from scheduled (nightly) ALL_DIRS enumeration.

Signed-off-by: svc-bionemo <267129667+svc-bionemo@users.noreply.github.com>
@svc-bionemo svc-bionemo force-pushed the svc-bionemo/fix-nightly-20260418-349dcee1 branch from 402d1d8 to 5c6fa63 Compare April 18, 2026 16:57
@svc-bionemo svc-bionemo changed the title ci: exclude vllm_inference from nightly and PR CI runs ci: exclude vllm_inference and megatron from nightly recipe CI Apr 18, 2026
@pstjohn pstjohn added this pull request to the merge queue Apr 18, 2026
Merged via the queue into NVIDIA:main with commit 1faa6e8 Apr 18, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants