Finetune Nemotron-Labs-Diffusion with NeMo Automodel #2274
Pinned
zyzhou5
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
NeMo AutoModel now supports SFT and inference for the Nemotron-Labs-Diffusion family of diffusion language models. The PR is #2273.
What is Nemotron-Labs-Diffusion?
Nemotron-Labs-Diffusion is based on: Nemotron-Labs-Diffusion
Concretely, the model is a hybrid diffusion + autoregressive language model — the same backbone learns both a denoising objective on corrupted tokens and a standard next-token objective in a single forward pass.
The recipe in
nemo_automodel/recipes/dllm/train_ft.pyis shared across other diffusion LLMs and Nemotron-Labs-Diffusion via a strategy pattern. The two pieces that are Nemotron-specific:HybridStrategy(nemo_automodel/recipes/dllm/strategy.py):dllm.block_size: null, otherwise blockwise viacorrupt_blockwise.input_idsplus amasked_indicessidecar (the model applies masking internally).attention_maskanduse_cacheare popped;labelsandskip_loss: Trueare added so loss is computed by the recipe, not the model.HybridDiffusionLLMLoss(nemo_automodel/components/loss/dllm_loss.py):[diff_logits | causal_logits]tensor or separatecausal_logits(the latter avoids an extra DTensor full-gather under TP).DLLMLossOutput(total_loss=alpha*diff + ar, dllm_loss=alpha*diff)so the diffusion component is logged independently of the combined backward target.Example finetune config
Detailed instructions can be found in the DLLM finetune guide (or in-repo at
docs/guides/dllm/finetune.md).Many thanks to @zyzhou5 @pthombre for all contributions!
Beta Was this translation helpful? Give feedback.
All reactions