feat: enable TE Linear layers for PEFT/LoRA#1626
Merged
Conversation
Collaborator
Author
|
/ok to test 72bc403 |
72bc403 to
61d9fa8
Compare
Collaborator
Author
|
/ok to test 6525a57 |
akoumpa
reviewed
Mar 30, 2026
When models use Transformer Engine (TE) Linear layers (backend.linear=te), LoRA adapters now use TE for all three linear operations: the base W·x computation, lora_A, and lora_B. Previously, patching a TE Linear with LoRA fell back to F.linear() for the base computation, bypassing TE kernels. Changes: - _init_adapter(): create lora_A/lora_B as TE Linear (using transformer_engine.pytorch.Linear directly, no cross-component import) when the base module is TE Linear - patch_linear_module(): store super_fwd for TE Linear before __class__ swap so LinearLoRA.forward() delegates base computation to TE's forward - Add NemotronV3 PEFT+TE training config (linear: te) - Add unit tests for TE Linear LoRA patching Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6525a57 to
14f95c1
Compare
Collaborator
Author
|
/ok to test 1adc4f2 |
akoumpa
approved these changes
Apr 1, 2026
linnanwang
pushed a commit
that referenced
this pull request
Apr 24, 2026
* feat: enable TE Linear layers for PEFT/LoRA (#1011) When models use Transformer Engine (TE) Linear layers (backend.linear=te), LoRA adapters now use TE for all three linear operations: the base W·x computation, lora_A, and lora_B. Previously, patching a TE Linear with LoRA fell back to F.linear() for the base computation, bypassing TE kernels. Changes: - _init_adapter(): create lora_A/lora_B as TE Linear (using transformer_engine.pytorch.Linear directly, no cross-component import) when the base module is TE Linear - patch_linear_module(): store super_fwd for TE Linear before __class__ swap so LinearLoRA.forward() delegates base computation to TE's forward - Add NemotronV3 PEFT+TE training config (linear: te) - Add unit tests for TE Linear LoRA patching Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix Signed-off-by: adil-a <adil.asif2000@hotmail.com> * fix Signed-off-by: adil-a <adil.asif2000@hotmail.com> --------- Signed-off-by: adil-a <adil.asif2000@hotmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
backend.linear=te, all three linear operations in the LoRA forward pass (base W·x, lora_A, lora_B) use TE kernels for both forward and backwardChanges
nemo_automodel/components/_peft/lora.py:_init_adapter(): createlora_A/lora_Bas TE Linear (viainitialize_linear_module) when the base module is TE Linearpatch_linear_module(): storesuper_fwdfor TE Linear before__class__swap soLinearLoRA.forward()delegates base computation to TE's forwardexamples/llm_finetune/nemotron/nemotron_nano_v3_hellaswag_peft_te.yaml: training config for NemotronV3 Nano with PEFT + TE linearscripts/verify_peft_te_lora.py: verification script for merge + HF PEFT loadtests/unit_tests/_peft/test_lora.py: 4 new unit tests for TE Linear LoRA patchingTest plan
Tested on NemotronV3 Nano (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) with 8 GPUs:
PeftModel.from_pretrained()loads the adapter successfully, forward pass produces valid logitsmerge_and_unload()succeeds (note:save_pretrainedhits a pre-existing HF transformers bug with NemotronV3's_tied_weights_keys— unrelated to this PR)🤖 Generated with Claude Code