Skip to content

feat: implement NEFTune noisy embeddings for instruction fine-tuning#1686

Merged
akoumpa merged 3 commits intoNVIDIA-NeMo:mainfrom
stanley1208:feat/implement-neftune
Apr 6, 2026
Merged

feat: implement NEFTune noisy embeddings for instruction fine-tuning#1686
akoumpa merged 3 commits intoNVIDIA-NeMo:mainfrom
stanley1208:feat/implement-neftune

Conversation

@stanley1208
Copy link
Copy Markdown
Contributor

@stanley1208 stanley1208 commented Apr 6, 2026

What does this PR do?

Implement NEFTune (Noisy Embeddings Fine-Tuning) as a training component with recipe config support.

Fixes #1221.

What is NEFTune?

From the paper: adding uniform random noise to token embeddings during fine-tuning improves instruction following quality, often by a large margin, with no additional compute or data overhead.

Implementation

New component: nemo_automodel/components/training/neftune.py

  • NEFTune class with activate(model) / deactivate(model) methods
  • Uses register_forward_hook on the model's embedding layer
  • Adds scaled uniform noise: alpha / sqrt(seq_len * hidden_dim)
  • Noise only applied during training (checks module.training)
  • _get_input_embeddings() helper finds embeddings via HF method or common attribute names

Recipe integration: nemo_automodel/recipes/llm/train_ft.py

  • Add optional neftune config section in setup()
  • Automatically activated after model build, before training loop

Usage in YAML config: Add neftune.noise_alpha: 5.0 to your training recipe.

Tests

tests/unit_tests/training/test_neftune.py

  • 8 unit tests covering: noise during training, no noise during eval, deactivation, zero alpha, negative alpha, double activate, is_active property
  • 3 tests for _get_input_embeddings helper

Related

CC @akoumpa

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Made-with: Cursor
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@stanley1208
Copy link
Copy Markdown
Contributor Author

@akoumpa Here's the NEFTune implementation — component + recipe config option + 11 unit tests. Ready for review!

…oisy output

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Made-with: Cursor
@stanley1208 stanley1208 mentioned this pull request Apr 6, 2026
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 6, 2026

/ok to test cffa75e

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 6, 2026

@Stanley thanks a lot, it would be great if you could share a yaml showing how to use this (it's already mentioned to the issue description, what I mean is to include a yaml file to make it "click-and-run").

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Made-with: Cursor
Comment thread examples/llm_finetune/llama3_2/llama3_2_1b_squad_neftune.yaml
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

@akoumpa akoumpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @stanley1208

@akoumpa akoumpa merged commit 1c3944a into NVIDIA-NeMo:main Apr 6, 2026
4 checks passed
linnanwang pushed a commit that referenced this pull request Apr 24, 2026
…1686)

* feat: implement NEFTune noisy embeddings for instruction fine-tuning

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Made-with: Cursor

* fix: correct test_noise_applied_during_training to compare clean vs noisy output

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Made-with: Cursor

* feat: add example YAML config for NEFTune fine-tuning

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Made-with: Cursor

---------

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement NEFTune

3 participants