Finetune Nemotron-Labs-Diffusion with NeMo Automodel #2274

zyzhou5 · 2026-05-18T21:49:43Z

zyzhou5
May 18, 2026
Collaborator

NeMo AutoModel now supports SFT and inference for the Nemotron-Labs-Diffusion family of diffusion language models. The PR is #2273.

What is Nemotron-Labs-Diffusion?

Nemotron-Labs-Diffusion is based on: Nemotron-Labs-Diffusion

Concretely, the model is a hybrid diffusion + autoregressive language model — the same backbone learns both a denoising objective on corrupted tokens and a standard next-token objective in a single forward pass.

The recipe in nemo_automodel/recipes/dllm/train_ft.py is shared across other diffusion LLMs and Nemotron-Labs-Diffusion via a strategy pattern. The two pieces that are Nemotron-specific:

HybridStrategy (nemo_automodel/recipes/dllm/strategy.py):
- Corruption: uniform when dllm.block_size: null, otherwise blockwise via corrupt_blockwise.
- Batch shape: model receives clean input_ids plus a masked_indices sidecar (the model applies masking internally). attention_mask and use_cache are popped; labels and skip_loss: True are added so loss is computed by the recipe, not the model.
HybridDiffusionLLMLoss (nemo_automodel/components/loss/dllm_loss.py):
- Accepts either a concatenated [diff_logits | causal_logits] tensor or separate causal_logits (the latter avoids an extra DTensor full-gather under TP).
- Returns DLLMLossOutput(total_loss=alpha*diff + ar, dllm_loss=alpha*diff) so the diffusion component is logged independently of the combined backward target.

Example finetune config

Detailed instructions can be found in the DLLM finetune guide (or in-repo at docs/guides/dllm/finetune.md).

Many thanks to @zyzhou5 @pthombre for all contributions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetune Nemotron-Labs-Diffusion with NeMo Automodel #2274

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Finetune Nemotron-Labs-Diffusion with NeMo Automodel #2274

Uh oh!

Uh oh!

zyzhou5 May 18, 2026 Collaborator

What is Nemotron-Labs-Diffusion?

Example finetune config

Replies: 0 comments

zyzhou5
May 18, 2026
Collaborator