Skip to content

feat: enable TE Linear layers for PEFT/LoRA#1626

Merged
akoumpa merged 3 commits intomainfrom
feat/te-linear-peft
Apr 1, 2026
Merged

feat: enable TE Linear layers for PEFT/LoRA#1626
akoumpa merged 3 commits intomainfrom
feat/te-linear-peft

Conversation

@adil-a
Copy link
Copy Markdown
Collaborator

@adil-a adil-a commented Mar 30, 2026

Summary

Changes

  • nemo_automodel/components/_peft/lora.py:
    • _init_adapter(): create lora_A/lora_B as TE Linear (via initialize_linear_module) when the base module is TE Linear
    • patch_linear_module(): store super_fwd for TE Linear before __class__ swap so LinearLoRA.forward() delegates base computation to TE's forward
  • examples/llm_finetune/nemotron/nemotron_nano_v3_hellaswag_peft_te.yaml: training config for NemotronV3 Nano with PEFT + TE linear
  • scripts/verify_peft_te_lora.py: verification script for merge + HF PEFT load
  • tests/unit_tests/_peft/test_lora.py: 4 new unit tests for TE Linear LoRA patching

Test plan

Tested on NemotronV3 Nano (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) with 8 GPUs:

  • Unit tests: all 16 pass (12 existing + 4 new TE Linear tests)
  • Training: 10 steps with PEFT + TE Linear — losses converge normally (2.76 → 2.62)
  • Checkpoint resume: resumed from step 5 checkpoint, step 5 loss matches exactly, steps 6-10 within ~0.2% (expected FP non-determinism from TE kernels)
  • HF PEFT load: PeftModel.from_pretrained() loads the adapter successfully, forward pass produces valid logits
  • Inference: generated coherent text outputs from the PEFT model (verified on 3 prompts)
  • Merge: merge_and_unload() succeeds (note: save_pretrained hits a pre-existing HF transformers bug with NemotronV3's _tied_weights_keys — unrelated to this PR)

🤖 Generated with Claude Code

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@adil-a
Copy link
Copy Markdown
Collaborator Author

adil-a commented Mar 30, 2026

/ok to test 72bc403

@adil-a
Copy link
Copy Markdown
Collaborator Author

adil-a commented Mar 30, 2026

/ok to test 6525a57

Comment thread nemo_automodel/components/_peft/lora.py Outdated
When models use Transformer Engine (TE) Linear layers (backend.linear=te),
LoRA adapters now use TE for all three linear operations: the base W·x
computation, lora_A, and lora_B. Previously, patching a TE Linear with LoRA
fell back to F.linear() for the base computation, bypassing TE kernels.

Changes:
- _init_adapter(): create lora_A/lora_B as TE Linear (using
  transformer_engine.pytorch.Linear directly, no cross-component import)
  when the base module is TE Linear
- patch_linear_module(): store super_fwd for TE Linear before __class__ swap so
  LinearLoRA.forward() delegates base computation to TE's forward
- Add NemotronV3 PEFT+TE training config (linear: te)
- Add unit tests for TE Linear LoRA patching

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adil-a adil-a force-pushed the feat/te-linear-peft branch from 6525a57 to 14f95c1 Compare March 30, 2026 18:16
adil-a added 2 commits March 30, 2026 12:03
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>
@adil-a
Copy link
Copy Markdown
Collaborator Author

adil-a commented Mar 30, 2026

/ok to test 1adc4f2

@akoumpa akoumpa merged commit 980f23d into main Apr 1, 2026
70 of 72 checks passed
@akoumpa akoumpa deleted the feat/te-linear-peft branch April 1, 2026 23:39
linnanwang pushed a commit that referenced this pull request Apr 24, 2026
* feat: enable TE Linear layers for PEFT/LoRA (#1011)

When models use Transformer Engine (TE) Linear layers (backend.linear=te),
LoRA adapters now use TE for all three linear operations: the base W·x
computation, lora_A, and lora_B. Previously, patching a TE Linear with LoRA
fell back to F.linear() for the base computation, bypassing TE kernels.

Changes:
- _init_adapter(): create lora_A/lora_B as TE Linear (using
  transformer_engine.pytorch.Linear directly, no cross-component import)
  when the base module is TE Linear
- patch_linear_module(): store super_fwd for TE Linear before __class__ swap so
  LinearLoRA.forward() delegates base computation to TE's forward
- Add NemotronV3 PEFT+TE training config (linear: te)
- Add unit tests for TE Linear LoRA patching

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* fix

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

---------

Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LoRA for linear modules doesn't work with Transformer Engine Linear Layers

2 participants