feat: enable TE Linear layers for PEFT/LoRA by adil-a · Pull Request #1626 · NVIDIA-NeMo/Automodel

adil-a · 2026-03-30T15:58:19Z

Summary

Fixes LoRA for linear modules doesn't work with Transformer Engine Linear Layers #1011 — LoRA for linear modules now works with Transformer Engine Linear layers
When backend.linear=te, all three linear operations in the LoRA forward pass (base W·x, lora_A, lora_B) use TE kernels for both forward and backward
Adds NemotronV3 PEFT+TE training config and a verification script for merge/HF-PEFT-load

Changes

nemo_automodel/components/_peft/lora.py:
- _init_adapter(): create lora_A/lora_B as TE Linear (via initialize_linear_module) when the base module is TE Linear
- patch_linear_module(): store super_fwd for TE Linear before __class__ swap so LinearLoRA.forward() delegates base computation to TE's forward
examples/llm_finetune/nemotron/nemotron_nano_v3_hellaswag_peft_te.yaml: training config for NemotronV3 Nano with PEFT + TE linear
scripts/verify_peft_te_lora.py: verification script for merge + HF PEFT load
tests/unit_tests/_peft/test_lora.py: 4 new unit tests for TE Linear LoRA patching

Test plan

Tested on NemotronV3 Nano (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) with 8 GPUs:

Unit tests: all 16 pass (12 existing + 4 new TE Linear tests)
Training: 10 steps with PEFT + TE Linear — losses converge normally (2.76 → 2.62)
Checkpoint resume: resumed from step 5 checkpoint, step 5 loss matches exactly, steps 6-10 within ~0.2% (expected FP non-determinism from TE kernels)
HF PEFT load: PeftModel.from_pretrained() loads the adapter successfully, forward pass produces valid logits
Inference: generated coherent text outputs from the PEFT model (verified on 3 prompts)
Merge: merge_and_unload() succeeds (note: save_pretrained hits a pre-existing HF transformers bug with NemotronV3's _tied_weights_keys — unrelated to this PR)

🤖 Generated with Claude Code

copy-pr-bot · 2026-03-30T15:58:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

adil-a · 2026-03-30T16:01:27Z

/ok to test 72bc403

adil-a · 2026-03-30T16:09:09Z

/ok to test 6525a57

When models use Transformer Engine (TE) Linear layers (backend.linear=te), LoRA adapters now use TE for all three linear operations: the base W·x computation, lora_A, and lora_B. Previously, patching a TE Linear with LoRA fell back to F.linear() for the base computation, bypassing TE kernels. Changes: - _init_adapter(): create lora_A/lora_B as TE Linear (using transformer_engine.pytorch.Linear directly, no cross-component import) when the base module is TE Linear - patch_linear_module(): store super_fwd for TE Linear before __class__ swap so LinearLoRA.forward() delegates base computation to TE's forward - Add NemotronV3 PEFT+TE training config (linear: te) - Add unit tests for TE Linear LoRA patching Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

adil-a · 2026-03-30T20:06:51Z

/ok to test 1adc4f2

* feat: enable TE Linear layers for PEFT/LoRA (#1011) When models use Transformer Engine (TE) Linear layers (backend.linear=te), LoRA adapters now use TE for all three linear operations: the base W·x computation, lora_A, and lora_B. Previously, patching a TE Linear with LoRA fell back to F.linear() for the base computation, bypassing TE kernels. Changes: - _init_adapter(): create lora_A/lora_B as TE Linear (using transformer_engine.pytorch.Linear directly, no cross-component import) when the base module is TE Linear - patch_linear_module(): store super_fwd for TE Linear before __class__ swap so LinearLoRA.forward() delegates base computation to TE's forward - Add NemotronV3 PEFT+TE training config (linear: te) - Add unit tests for TE Linear LoRA patching Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix Signed-off-by: adil-a <adil.asif2000@hotmail.com> * fix Signed-off-by: adil-a <adil.asif2000@hotmail.com> --------- Signed-off-by: adil-a <adil.asif2000@hotmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adil-a requested review from HuiyingLi, ZhiyuLi-Nvidia, akoumpa, hemildesai and pthombre as code owners March 30, 2026 15:58

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 16:01 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 30, 2026 16:01 Error

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 16:01 Inactive

copy-pr-bot Bot had a problem deploying to test March 30, 2026 16:01 Error

adil-a force-pushed the feat/te-linear-peft branch from 72bc403 to 61d9fa8 Compare March 30, 2026 16:08

copy-pr-bot Bot temporarily deployed to test March 30, 2026 16:09 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 16:13 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 16:14 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 16:38 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 16:58 Inactive

akoumpa reviewed Mar 30, 2026

View reviewed changes

Comment thread nemo_automodel/components/_peft/lora.py Outdated

akoumpa assigned adil-a Mar 30, 2026

adil-a force-pushed the feat/te-linear-peft branch from 6525a57 to 14f95c1 Compare March 30, 2026 18:16

adil-a added 2 commits March 30, 2026 12:03

fix

69d3f7e

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

fix

1adc4f2

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 20:07 Inactive

copy-pr-bot Bot temporarily deployed to test March 30, 2026 20:07 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 20:22 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 20:43 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 21:03 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 30, 2026 21:03 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci March 30, 2026 21:03 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 1, 2026 15:53 Inactive

akoumpa approved these changes Apr 1, 2026

View reviewed changes

akoumpa merged commit 980f23d into main Apr 1, 2026
70 of 72 checks passed

akoumpa deleted the feat/te-linear-peft branch April 1, 2026 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable TE Linear layers for PEFT/LoRA#1626

feat: enable TE Linear layers for PEFT/LoRA#1626
akoumpa merged 3 commits intomainfrom
feat/te-linear-peft

adil-a commented Mar 30, 2026

Uh oh!

copy-pr-bot Bot commented Mar 30, 2026

Uh oh!

adil-a commented Mar 30, 2026

Uh oh!

adil-a commented Mar 30, 2026

Uh oh!

Uh oh!

adil-a commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adil-a commented Mar 30, 2026

Summary

Changes

Test plan

Uh oh!

copy-pr-bot Bot commented Mar 30, 2026

Uh oh!

adil-a commented Mar 30, 2026

Uh oh!

adil-a commented Mar 30, 2026

Uh oh!

Uh oh!

adil-a commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants