Skip to content

cp: fix: Coerce plain-dict backend to BackendConfig in model init (1784) into r0.4.0#1803

Merged
akoumpa merged 2 commits intor0.4.0from
cherry-pick-1784-r0.4.0
Apr 19, 2026
Merged

cp: fix: Coerce plain-dict backend to BackendConfig in model init (1784) into r0.4.0#1803
akoumpa merged 2 commits intor0.4.0from
cherry-pick-1784-r0.4.0

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor

beep boop [🤖]: Hi @adil-a 👋,

we've cherry picked #1784 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

* fix: Coerce plain-dict backend to BackendConfig in model init

When backend is specified via CLI override (e.g. --model.backend.attn
sdpa) without a _target_ key in the YAML, the config system passes it
as a plain dict. This causes AttributeError in model constructors that
do backend.rms_norm, backend.linear, etc.

Convert the dict to BackendConfig(**dict) in _init_model, which is
the single gateway between the config system and all model constructors.
This fixes the issue for all 17+ custom model implementations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* test: Add unit tests for dict-to-BackendConfig coercion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* style: Remove unused pytest import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* fix(test): Use environment-aware BackendConfig defaults in assertion

BackendConfig defaults for attn/linear depend on TE availability,
so hardcoding "torch" fails on GPU CI where TE is present. Compare
against BackendConfig() defaults instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* fix(test): Compare inputs_embeds generate against input_ids generate

The old test compared cached generate(inputs_embeds) against manual
uncached decode (use_cache=False). Mamba uses different CUDA kernels
for cached vs uncached paths, causing bf16 divergence. Compare both
generate() paths instead, which both use cached kernels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

---------

Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor Author

/ok to test 36d1923

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 19, 2026

/ok to test 44979df

@akoumpa akoumpa merged commit 38da59e into r0.4.0 Apr 19, 2026
52 of 54 checks passed
@akoumpa akoumpa deleted the cherry-pick-1784-r0.4.0 branch April 19, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick Run CICD Trigger Testing CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants