NVIDIA · Doondi-Ashlesh · May 3, 2026 · May 3, 2026 · May 3, 2026
@@ -0,0 +1,8 @@
+name: NeMo-AutoModel
+repo: NVIDIA-NeMo/Automodel
+description: NeMo AutoModel — fine-tuning and training of HuggingFace-compatible models, including LoRA, PEFT, and MoE workflows.
+skills:
+  - path: skills/
+    catalog_dir: NeMo-AutoModel
+links:
+  security: false
@@ -0,0 +1,201 @@
+---
+name: megatron-bridge-lora-sft
+description: Configure and run LoRA, DoRA, and full SFT fine-tuning in Megatron-Bridge. Covers LoRA dataclass setup, target module wiring, normalize_moe_lora for MoE models, and adapter export via AutoBridge.export_adapter_ckpt. Use when applying LoRA or DoRA to any Bridge-supported model, setting up SFT datasets, or exporting fine-tuned adapters to HuggingFace PEFT format.
+when_to_use: LoRA or DoRA fine-tuning, SFT recipe setup, normalize_moe_lora, MoE expert targeting, adapter export to HuggingFace, peft_scheme lora dora, dim alpha target_modules LoRA dataclass, torchrun recipe fine-tune, export_adapter_ckpt AutoBridge.
+---
+
+# LoRA / DoRA / SFT Fine-Tuning
+
+Card: @skills/megatron-bridge-lora-sft/card.yaml
+
+## Quick Decision
+
+| Goal | peft_scheme | Min GPUs |
+|---|---|---|
+| LoRA on 1B model | `"lora"` | 1 |
+| DoRA on 1B model | `"dora"` | 1 |
+| Full SFT on 8B | sft recipe | 2 |
+| Export adapter to HF PEFT | CPU only | 0 GPUs |
+
+## Enablement
+
+### LoRA (minimal)
+
+```python
+from megatron.bridge.recipes.llama import llama32_1b_peft_config
+
+config = llama32_1b_peft_config(peft_scheme="lora")
+
+# Default target_modules: ["linear_qkv", "linear_proj", "linear_fc1", "linear_fc2"]
+# Default dim=32, alpha=32
+
+# Override rank and alpha:
+config.peft.dim = 16
+config.peft.alpha = 32
+```
+
+Launch:
+
+```bash
+torchrun --nproc_per_node=1 tutorials/recipes/llama/01_quickstart_finetune.py \
+  --pretrained-checkpoint /path/to/checkpoint
+```
+
+### DoRA
+
+```python
+config = llama32_1b_peft_config(peft_scheme="dora")
+config.peft.dim = 16
+config.peft.alpha = 64   # DoRA default alpha is 64
+```
+
+### MoE LoRA — expert layer targeting
+
+For MoE models, add expert projection names to `target_modules` and enable
+`normalize_moe_lora` to scale down expert rank proportionally:
+
+```python
+from megatron.bridge.peft.lora import LoRA
+
+lora = LoRA(
+    target_modules=[
+        "linear_qkv",       # attention
+        "linear_proj",      # attention output
+        "linear_fc1",       # MLP gate/up (dense fallback)
+        "linear_fc2",       # MLP down (dense fallback)
+    ],
+    dim=32,
+    alpha=32,
+    normalize_moe_lora=True,  # dim // moe_router_topk for expert layers
+)
+```
+
+With `normalize_moe_lora=True`:
+- Expert linear layers: effective dim = `dim // moe_router_topk`
+- Non-expert layers: effective dim = `dim` (unchanged)
+- `dim` must be evenly divisible by `moe_router_topk`
+
+### Adapter export to HuggingFace
+
+```python
+from megatron.bridge import AutoBridge
+
+bridge = AutoBridge(hf_model_path="/path/to/hf/model")
+
+bridge.export_adapter_ckpt(
+    peft_checkpoint="/checkpoints/lora_run",
+    output_path="./my_adapter",
+)
+# produces: ./my_adapter/adapter_config.json
+#           ./my_adapter/adapter_model.safetensors
+```
+
+Or via CLI script:
+
+```bash
+python examples/conversion/adapter/export_adapter.py \
+  --hf-model-path /path/to/hf/model \
+  --lora-checkpoint /checkpoints/lora_run \
+  --output ./my_adapter
+```
+
+The exported adapter loads directly with HuggingFace PEFT:
+
+```python
+from peft import PeftModel
+model = PeftModel.from_pretrained(base_model, "./my_adapter")
+```
+
+Export runs on CPU — no GPU required.
+
+## Code Anchors
+
+LoRA dataclass:
+
+```python
+# src/megatron/bridge/peft/lora.py
+@dataclass
+class LoRA(PEFT, ModuleMatcher):
+    target_modules: List[str] = field(
+        default_factory=lambda: ["linear_qkv", "linear_proj", "linear_fc1", "linear_fc2"]
+    )
+    dim: int = 32
+    alpha: int = 32
+    dropout: float = 0.0
+    dropout_position: Literal["pre", "post"] = "pre"
+    lora_A_init_method: str = "xavier"
+    lora_B_init_method: str = "zero"
+    a2a_experimental: bool = False
+    lora_dtype: torch.dtype = None
+    normalize_moe_lora: bool = False
+```
+
+DoRA dataclass:
+
+```python
+# src/megatron/bridge/peft/dora.py
+@dataclass
+class DoRA(PEFT, ModuleMatcher):
+    target_modules: List[str] = field(
+        default_factory=lambda: ["linear_qkv", "linear_proj", "linear_fc1", "linear_fc2"]
+    )
+    dim: int = 32
+    alpha: int = 64   # DoRA default differs from LoRA default
+```
+
+Recipe function:
+
+```python
+# tutorials/recipes/llama/01_quickstart_finetune.py
+from megatron.bridge.recipes.llama import llama32_1b_peft_config
+
+config = llama32_1b_peft_config(peft_scheme="lora")  # or "dora"
+config.peft.dim = 16
+config.peft.alpha = 32
+```
+
+Export:
+
+```python
+# examples/conversion/adapter/export_adapter.py
+bridge = AutoBridge(hf_model_path=...)
+bridge.export_adapter_ckpt(peft_checkpoint=..., output_path=...)
+```
+
+## Pitfalls
+
+1. **MoE expert layers silently skipped without normalize_moe_lora or explicit targets**:
+   The default `target_modules` covers attention and MLP layers for dense models.
+   For MoE models, expert weights may not be covered — verify with a forward pass
+   that expert parameters have `requires_grad=True`.
+
+2. **DoRA alpha convention**: DoRA default `alpha=64`, not 32. Check the `DoRA`
+   dataclass defaults before overriding.
+
+3. **normalize_moe_lora requires evenly divisible dim**: `dim` must be divisible by
+   `moe_router_topk`. Indivisible `dim` values will error.
+
+4. **Export produces HF PEFT adapter — no merge step needed**: Unlike some frameworks,
+   `export_adapter_ckpt` produces `adapter_config.json` + `adapter_model.safetensors`
+   which load directly via `PeftModel.from_pretrained`. No separate merge step is
+   required before HuggingFace use.
+
+5. **TP > 1 with PEFT**: LoRA adapter shapes are sharded with the base layer when
+   `tensor_model_parallel_size > 1`. Adapter `dim` must be consistent across TP ranks.
+   Mismatched `dim` causes a shape error at initialization.
+
+## Verification
+
+Smoke test LoRA on 1 GPU with mock data:
+
+```bash
+torchrun --nproc_per_node=1 tutorials/recipes/llama/01_quickstart_finetune.py \
+  --pretrained-checkpoint /path/to/checkpoint
+```
+
+Success criteria:
+
+- Exit code 0
+- Finite loss in logs
+- Adapter files generated: `adapter_config.json` + `adapter_model.safetensors`
+- `PeftModel.from_pretrained(base_model, output_path)` loads without error
@@ -0,0 +1,80 @@
+title: megatron_bridge_lora_sft
+validated_on: "2026-05-03"
+summary: >
+  Megatron-Bridge exposes LoRA and DoRA via the LoRA and DoRA dataclasses in
+  src/megatron/bridge/peft/. Default target_modules cover attention and MLP dense
+  layers. MoE expert rank normalization is via normalize_moe_lora=True (divides dim
+  by moe_router_topk for expert layers). Adapter export to HuggingFace PEFT format
+  uses AutoBridge.export_adapter_ckpt — produces adapter_config.json and
+  adapter_model.safetensors compatible with PeftModel.from_pretrained.
+
+validation_status:
+  lora_dataclass:
+    - code_verified
+  dora_dataclass:
+    - code_verified
+  normalize_moe_lora:
+    - code_verified
+  recipe_function_llama32_1b:
+    - code_verified
+  export_adapter_ckpt:
+    - code_verified
+  peft_model_load_after_export:
+    - code_verified
+  tp_peft_sharding:
+    - unclear
+  end_to_end_moe_lora_finetune:
+    - unclear
+
+feature_meaning:
+  lora_dataclass: >
+    LoRA(target_modules, dim=32, alpha=32, normalize_moe_lora=False).
+    Applied to model via peft_scheme="lora" in recipe functions.
+  dora_dataclass: >
+    DoRA(target_modules, dim=32, alpha=64). DoRA default alpha is 64, not 32.
+    Applied via peft_scheme="dora".
+  normalize_moe_lora: >
+    When True, expert linear layers use dim // moe_router_topk instead of full dim.
+    Non-expert layers keep full dim. dim must be evenly divisible by moe_router_topk.
+  export_adapter_ckpt: >
+    AutoBridge(hf_model_path).export_adapter_ckpt(peft_checkpoint, output_path).
+    Generates adapter_config.json + adapter_model.safetensors. Runs on CPU.
+    Output loads directly via PeftModel.from_pretrained(base_model, output_path).
+
+recommended_path:
+  lora_minimal:
+    recipe: llama32_1b_peft_config(peft_scheme="lora")
+    peft.dim: 16
+    peft.alpha: 32
+  dora:
+    recipe: llama32_1b_peft_config(peft_scheme="dora")
+    peft.dim: 16
+    peft.alpha: 64
+  moe_lora:
+    peft.normalize_moe_lora: true
+    peft.dim: 32
+    note: dim must be divisible by moe_router_topk
+  export:
+    step_1: "bridge = AutoBridge(hf_model_path)"
+    step_2: "bridge.export_adapter_ckpt(peft_checkpoint, output_path)"
+
+known_constraints:
+  - DoRA default alpha is 64, not 32; overriding without checking defaults may produce incorrect scaling.
+  - normalize_moe_lora requires dim evenly divisible by moe_router_topk.
+  - TP > 1 with PEFT requires consistent adapter dim across all TP ranks; mismatch errors at init.
+  - export_adapter_ckpt produces HF PEFT adapter files — no separate merge step is needed before HF use.
+
+known_limitations:
+  - End-to-end MoE LoRA fine-tune on a real MoE checkpoint not confirmed in CI.
+  - TP > 1 PEFT sharding behavior not fully validated from source review.
+
+evidence:
+  - src/megatron/bridge/peft/lora.py
+  - src/megatron/bridge/peft/dora.py
+  - tutorials/recipes/llama/01_quickstart_finetune.py
+  - examples/conversion/adapter/export_adapter.py
+
+follow_up_validation:
+  - Add a checked-in end-to-end LoRA adapter export round-trip CI test.
+  - Confirm normalize_moe_lora on a real MoE checkpoint (DeepSeek, Qwen3-MoE).
+  - Clarify whether TP > 1 PEFT is validated on current container versions.
@@ -0,0 +1,44 @@
+[
+  {
+    "id": "lora-001-llama-dim-alpha-target-modules",
+    "question": "How do I set up LoRA fine-tuning for a Llama model in Megatron-Bridge with rank 16 targeting attention and MLP layers?",
+    "expected_skill": "megatron-bridge-lora-sft",
+    "expected_script": null,
+    "ground_truth": "Use llama32_1b_peft_config(peft_scheme='lora') as the starting recipe. Set config.peft.dim=16 and config.peft.alpha=32. The default target_modules already includes ['linear_qkv', 'linear_proj', 'linear_fc1', 'linear_fc2'] covering both attention and MLP layers. Launch with torchrun --nproc_per_node=1 tutorials/recipes/llama/01_quickstart_finetune.py --pretrained-checkpoint /path/to/checkpoint.",
+    "expected_behavior": [
+      "References llama32_1b_peft_config(peft_scheme='lora') as the recipe entry point",
+      "Sets config.peft.dim=16 (not lora_rank)",
+      "Sets config.peft.alpha=32",
+      "Mentions that default target_modules covers linear_qkv, linear_proj, linear_fc1, linear_fc2",
+      "Includes the torchrun launch command with --pretrained-checkpoint"
+    ]
+  },
+  {
+    "id": "lora-002-moe-normalize-moe-lora",
+    "question": "I'm applying LoRA to a MoE model in Megatron-Bridge but want expert layers to use a smaller rank than attention layers. How do I do that?",
+    "expected_skill": "megatron-bridge-lora-sft",
+    "expected_script": null,
+    "ground_truth": "Set normalize_moe_lora=True on the LoRA dataclass. With dim=32 and moe_router_topk=2, expert linear layers get effective dim = 32 // 2 = 16, while non-expert layers keep the full dim=32. dim must be evenly divisible by moe_router_topk. This is set directly on the LoRA dataclass: LoRA(dim=32, alpha=32, normalize_moe_lora=True, target_modules=[...]).",
+    "expected_behavior": [
+      "References normalize_moe_lora=True as the mechanism for per-layer rank reduction",
+      "Explains that expert layer effective dim = dim // moe_router_topk",
+      "Explains that non-expert layers keep the full dim",
+      "Notes that dim must be evenly divisible by moe_router_topk",
+      "Shows the LoRA dataclass usage directly, not a fabricated PEFTConfig field"
+    ]
+  },
+  {
+    "id": "lora-003-export-adapter-hf-peft",
+    "question": "How do I export my LoRA adapter checkpoint from Megatron-Bridge to HuggingFace PEFT format?",
+    "expected_skill": "megatron-bridge-lora-sft",
+    "expected_script": null,
+    "ground_truth": "Use AutoBridge(hf_model_path) and call bridge.export_adapter_ckpt(peft_checkpoint='/checkpoints/lora_run', output_path='./my_adapter'). This produces adapter_config.json and adapter_model.safetensors in the output directory. The export runs on CPU. The result loads directly with PeftModel.from_pretrained(base_model, './my_adapter'). Alternatively, use the CLI: python examples/conversion/adapter/export_adapter.py --hf-model-path ... --lora-checkpoint ... --output ...",
+    "expected_behavior": [
+      "References AutoBridge and export_adapter_ckpt as the export mechanism",
+      "Shows the peft_checkpoint and output_path arguments",
+      "States the export produces adapter_config.json and adapter_model.safetensors",
+      "Mentions the export runs on CPU (no GPU needed)",
+      "Shows PeftModel.from_pretrained as the consumption pattern, or the CLI alternative"
+    ]
+  }
+]