Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions components.d/automodel.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: NeMo-AutoModel
repo: NVIDIA-NeMo/Automodel
description: NeMo AutoModel — fine-tuning and training of HuggingFace-compatible models, including LoRA, PEFT, and MoE workflows.
skills:
- path: skills/
catalog_dir: NeMo-AutoModel
links:
security: false
201 changes: 201 additions & 0 deletions skills/Megatron-Bridge/megatron-bridge-lora-sft/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
---
name: megatron-bridge-lora-sft
description: Configure and run LoRA, DoRA, and full SFT fine-tuning in Megatron-Bridge. Covers LoRA dataclass setup, target module wiring, normalize_moe_lora for MoE models, and adapter export via AutoBridge.export_adapter_ckpt. Use when applying LoRA or DoRA to any Bridge-supported model, setting up SFT datasets, or exporting fine-tuned adapters to HuggingFace PEFT format.
when_to_use: LoRA or DoRA fine-tuning, SFT recipe setup, normalize_moe_lora, MoE expert targeting, adapter export to HuggingFace, peft_scheme lora dora, dim alpha target_modules LoRA dataclass, torchrun recipe fine-tune, export_adapter_ckpt AutoBridge.
---

# LoRA / DoRA / SFT Fine-Tuning

Card: @skills/megatron-bridge-lora-sft/card.yaml

## Quick Decision

| Goal | peft_scheme | Min GPUs |
|---|---|---|
| LoRA on 1B model | `"lora"` | 1 |
| DoRA on 1B model | `"dora"` | 1 |
| Full SFT on 8B | sft recipe | 2 |
| Export adapter to HF PEFT | CPU only | 0 GPUs |

## Enablement

### LoRA (minimal)

```python
from megatron.bridge.recipes.llama import llama32_1b_peft_config

config = llama32_1b_peft_config(peft_scheme="lora")

# Default target_modules: ["linear_qkv", "linear_proj", "linear_fc1", "linear_fc2"]
# Default dim=32, alpha=32

# Override rank and alpha:
config.peft.dim = 16
config.peft.alpha = 32
```

Launch:

```bash
torchrun --nproc_per_node=1 tutorials/recipes/llama/01_quickstart_finetune.py \
--pretrained-checkpoint /path/to/checkpoint
```

### DoRA

```python
config = llama32_1b_peft_config(peft_scheme="dora")
config.peft.dim = 16
config.peft.alpha = 64 # DoRA default alpha is 64
```

### MoE LoRA — expert layer targeting

For MoE models, add expert projection names to `target_modules` and enable
`normalize_moe_lora` to scale down expert rank proportionally:

```python
from megatron.bridge.peft.lora import LoRA

lora = LoRA(
target_modules=[
"linear_qkv", # attention
"linear_proj", # attention output
"linear_fc1", # MLP gate/up (dense fallback)
"linear_fc2", # MLP down (dense fallback)
],
dim=32,
alpha=32,
normalize_moe_lora=True, # dim // moe_router_topk for expert layers
)
```

With `normalize_moe_lora=True`:
- Expert linear layers: effective dim = `dim // moe_router_topk`
- Non-expert layers: effective dim = `dim` (unchanged)
- `dim` must be evenly divisible by `moe_router_topk`

### Adapter export to HuggingFace

```python
from megatron.bridge import AutoBridge

bridge = AutoBridge(hf_model_path="/path/to/hf/model")

bridge.export_adapter_ckpt(
peft_checkpoint="/checkpoints/lora_run",
output_path="./my_adapter",
)
# produces: ./my_adapter/adapter_config.json
# ./my_adapter/adapter_model.safetensors
```

Or via CLI script:

```bash
python examples/conversion/adapter/export_adapter.py \
--hf-model-path /path/to/hf/model \
--lora-checkpoint /checkpoints/lora_run \
--output ./my_adapter
```

The exported adapter loads directly with HuggingFace PEFT:

```python
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, "./my_adapter")
```

Export runs on CPU — no GPU required.

## Code Anchors

LoRA dataclass:

```python
# src/megatron/bridge/peft/lora.py
@dataclass
class LoRA(PEFT, ModuleMatcher):
target_modules: List[str] = field(
default_factory=lambda: ["linear_qkv", "linear_proj", "linear_fc1", "linear_fc2"]
)
dim: int = 32
alpha: int = 32
dropout: float = 0.0
dropout_position: Literal["pre", "post"] = "pre"
lora_A_init_method: str = "xavier"
lora_B_init_method: str = "zero"
a2a_experimental: bool = False
lora_dtype: torch.dtype = None
normalize_moe_lora: bool = False
```

DoRA dataclass:

```python
# src/megatron/bridge/peft/dora.py
@dataclass
class DoRA(PEFT, ModuleMatcher):
target_modules: List[str] = field(
default_factory=lambda: ["linear_qkv", "linear_proj", "linear_fc1", "linear_fc2"]
)
dim: int = 32
alpha: int = 64 # DoRA default differs from LoRA default
```

Recipe function:

```python
# tutorials/recipes/llama/01_quickstart_finetune.py
from megatron.bridge.recipes.llama import llama32_1b_peft_config

config = llama32_1b_peft_config(peft_scheme="lora") # or "dora"
config.peft.dim = 16
config.peft.alpha = 32
```

Export:

```python
# examples/conversion/adapter/export_adapter.py
bridge = AutoBridge(hf_model_path=...)
bridge.export_adapter_ckpt(peft_checkpoint=..., output_path=...)
```

## Pitfalls

1. **MoE expert layers silently skipped without normalize_moe_lora or explicit targets**:
The default `target_modules` covers attention and MLP layers for dense models.
For MoE models, expert weights may not be covered — verify with a forward pass
that expert parameters have `requires_grad=True`.

2. **DoRA alpha convention**: DoRA default `alpha=64`, not 32. Check the `DoRA`
dataclass defaults before overriding.

3. **normalize_moe_lora requires evenly divisible dim**: `dim` must be divisible by
`moe_router_topk`. Indivisible `dim` values will error.

4. **Export produces HF PEFT adapter — no merge step needed**: Unlike some frameworks,
`export_adapter_ckpt` produces `adapter_config.json` + `adapter_model.safetensors`
which load directly via `PeftModel.from_pretrained`. No separate merge step is
required before HuggingFace use.

5. **TP > 1 with PEFT**: LoRA adapter shapes are sharded with the base layer when
`tensor_model_parallel_size > 1`. Adapter `dim` must be consistent across TP ranks.
Mismatched `dim` causes a shape error at initialization.

## Verification

Smoke test LoRA on 1 GPU with mock data:

```bash
torchrun --nproc_per_node=1 tutorials/recipes/llama/01_quickstart_finetune.py \
--pretrained-checkpoint /path/to/checkpoint
```

Success criteria:

- Exit code 0
- Finite loss in logs
- Adapter files generated: `adapter_config.json` + `adapter_model.safetensors`
- `PeftModel.from_pretrained(base_model, output_path)` loads without error
80 changes: 80 additions & 0 deletions skills/Megatron-Bridge/megatron-bridge-lora-sft/card.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
title: megatron_bridge_lora_sft
validated_on: "2026-05-03"
summary: >
Megatron-Bridge exposes LoRA and DoRA via the LoRA and DoRA dataclasses in
src/megatron/bridge/peft/. Default target_modules cover attention and MLP dense
layers. MoE expert rank normalization is via normalize_moe_lora=True (divides dim
by moe_router_topk for expert layers). Adapter export to HuggingFace PEFT format
uses AutoBridge.export_adapter_ckpt — produces adapter_config.json and
adapter_model.safetensors compatible with PeftModel.from_pretrained.

validation_status:
lora_dataclass:
- code_verified
dora_dataclass:
- code_verified
normalize_moe_lora:
- code_verified
recipe_function_llama32_1b:
- code_verified
export_adapter_ckpt:
- code_verified
peft_model_load_after_export:
- code_verified
tp_peft_sharding:
- unclear
end_to_end_moe_lora_finetune:
- unclear

feature_meaning:
lora_dataclass: >
LoRA(target_modules, dim=32, alpha=32, normalize_moe_lora=False).
Applied to model via peft_scheme="lora" in recipe functions.
dora_dataclass: >
DoRA(target_modules, dim=32, alpha=64). DoRA default alpha is 64, not 32.
Applied via peft_scheme="dora".
normalize_moe_lora: >
When True, expert linear layers use dim // moe_router_topk instead of full dim.
Non-expert layers keep full dim. dim must be evenly divisible by moe_router_topk.
export_adapter_ckpt: >
AutoBridge(hf_model_path).export_adapter_ckpt(peft_checkpoint, output_path).
Generates adapter_config.json + adapter_model.safetensors. Runs on CPU.
Output loads directly via PeftModel.from_pretrained(base_model, output_path).

recommended_path:
lora_minimal:
recipe: llama32_1b_peft_config(peft_scheme="lora")
peft.dim: 16
peft.alpha: 32
dora:
recipe: llama32_1b_peft_config(peft_scheme="dora")
peft.dim: 16
peft.alpha: 64
moe_lora:
peft.normalize_moe_lora: true
peft.dim: 32
note: dim must be divisible by moe_router_topk
export:
step_1: "bridge = AutoBridge(hf_model_path)"
step_2: "bridge.export_adapter_ckpt(peft_checkpoint, output_path)"

known_constraints:
- DoRA default alpha is 64, not 32; overriding without checking defaults may produce incorrect scaling.
- normalize_moe_lora requires dim evenly divisible by moe_router_topk.
- TP > 1 with PEFT requires consistent adapter dim across all TP ranks; mismatch errors at init.
- export_adapter_ckpt produces HF PEFT adapter files — no separate merge step is needed before HF use.

known_limitations:
- End-to-end MoE LoRA fine-tune on a real MoE checkpoint not confirmed in CI.
- TP > 1 PEFT sharding behavior not fully validated from source review.

evidence:
- src/megatron/bridge/peft/lora.py
- src/megatron/bridge/peft/dora.py
- tutorials/recipes/llama/01_quickstart_finetune.py
- examples/conversion/adapter/export_adapter.py

follow_up_validation:
- Add a checked-in end-to-end LoRA adapter export round-trip CI test.
- Confirm normalize_moe_lora on a real MoE checkpoint (DeepSeek, Qwen3-MoE).
- Clarify whether TP > 1 PEFT is validated on current container versions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
[
{
"id": "lora-001-llama-dim-alpha-target-modules",
"question": "How do I set up LoRA fine-tuning for a Llama model in Megatron-Bridge with rank 16 targeting attention and MLP layers?",
"expected_skill": "megatron-bridge-lora-sft",
"expected_script": null,
"ground_truth": "Use llama32_1b_peft_config(peft_scheme='lora') as the starting recipe. Set config.peft.dim=16 and config.peft.alpha=32. The default target_modules already includes ['linear_qkv', 'linear_proj', 'linear_fc1', 'linear_fc2'] covering both attention and MLP layers. Launch with torchrun --nproc_per_node=1 tutorials/recipes/llama/01_quickstart_finetune.py --pretrained-checkpoint /path/to/checkpoint.",
"expected_behavior": [
"References llama32_1b_peft_config(peft_scheme='lora') as the recipe entry point",
"Sets config.peft.dim=16 (not lora_rank)",
"Sets config.peft.alpha=32",
"Mentions that default target_modules covers linear_qkv, linear_proj, linear_fc1, linear_fc2",
"Includes the torchrun launch command with --pretrained-checkpoint"
]
},
{
"id": "lora-002-moe-normalize-moe-lora",
"question": "I'm applying LoRA to a MoE model in Megatron-Bridge but want expert layers to use a smaller rank than attention layers. How do I do that?",
"expected_skill": "megatron-bridge-lora-sft",
"expected_script": null,
"ground_truth": "Set normalize_moe_lora=True on the LoRA dataclass. With dim=32 and moe_router_topk=2, expert linear layers get effective dim = 32 // 2 = 16, while non-expert layers keep the full dim=32. dim must be evenly divisible by moe_router_topk. This is set directly on the LoRA dataclass: LoRA(dim=32, alpha=32, normalize_moe_lora=True, target_modules=[...]).",
"expected_behavior": [
"References normalize_moe_lora=True as the mechanism for per-layer rank reduction",
"Explains that expert layer effective dim = dim // moe_router_topk",
"Explains that non-expert layers keep the full dim",
"Notes that dim must be evenly divisible by moe_router_topk",
"Shows the LoRA dataclass usage directly, not a fabricated PEFTConfig field"
]
},
{
"id": "lora-003-export-adapter-hf-peft",
"question": "How do I export my LoRA adapter checkpoint from Megatron-Bridge to HuggingFace PEFT format?",
"expected_skill": "megatron-bridge-lora-sft",
"expected_script": null,
"ground_truth": "Use AutoBridge(hf_model_path) and call bridge.export_adapter_ckpt(peft_checkpoint='/checkpoints/lora_run', output_path='./my_adapter'). This produces adapter_config.json and adapter_model.safetensors in the output directory. The export runs on CPU. The result loads directly with PeftModel.from_pretrained(base_model, './my_adapter'). Alternatively, use the CLI: python examples/conversion/adapter/export_adapter.py --hf-model-path ... --lora-checkpoint ... --output ...",
"expected_behavior": [
"References AutoBridge and export_adapter_ckpt as the export mechanism",
"Shows the peft_checkpoint and output_path arguments",
"States the export produces adapter_config.json and adapter_model.safetensors",
"Mentions the export runs on CPU (no GPU needed)",
"Shows PeftModel.from_pretrained as the consumption pattern, or the CLI alternative"
]
}
]
Loading