# üöÄ NeMo AutoModel - LLM Fine-Tuning Tutorial

This notebook demonstrates how to fine-tune Large Language Models (LLMs) using NeMo AutoModel.

It implements the same functionality as:
```bash
python examples/llm_finetune/finetune.py --config examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
```

## What you'll learn:
1. How to load and customize configurations
2. How to set up the training recipe
3. How to run the training loop
4. How to customize training parameters


## 1. Setup and Imports


In [None]:
# Standard imports
import os
import sys
from pathlib import Path

# Set the working directory to the repo root if running from notebooks folder
REPO_ROOT = Path(".").resolve()
if REPO_ROOT.name == "notebooks":
    REPO_ROOT = REPO_ROOT.parent
    os.chdir(REPO_ROOT)

print(f"Working directory: {REPO_ROOT}")

# Add repo to path if needed
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))


In [None]:
# NeMo AutoModel imports
from nemo_automodel.components.config.loader import load_yaml_config
from nemo_automodel.recipes.llm.train_ft import TrainFinetuneRecipeForNextTokenPrediction

print("‚úÖ NeMo AutoModel imported successfully!")


## 2. Load Configuration

NeMo AutoModel uses YAML configuration files to define all training parameters.
You can load a pre-defined config or create your own.


In [None]:
# Option 1: Load a pre-defined config from examples
CONFIG_PATH = "examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml"

# Load the YAML configuration
cfg = load_yaml_config(CONFIG_PATH)

print(f"Loaded config from: {CONFIG_PATH}")
print("\n" + "="*60)
print("Configuration Summary:")
print("="*60)
print(cfg)


## 3. Customize Configuration (Optional)

You can modify any configuration parameter programmatically.
This is equivalent to using `--key value` on the command line.


In [None]:
# Example: Modify training parameters for a quick test run
# Uncomment and modify as needed:

# Change the model (use any HuggingFace model ID)
# cfg.set_by_dotted("model.pretrained_model_name_or_path", "meta-llama/Llama-3.2-1B")

# Reduce epochs for testing
# cfg.set_by_dotted("step_scheduler.num_epochs", 1)

# Change batch sizes
# cfg.set_by_dotted("step_scheduler.global_batch_size", 32)
# cfg.set_by_dotted("step_scheduler.local_batch_size", 4)

# Change learning rate
# cfg.set_by_dotted("optimizer.lr", 1e-5)

# Change validation frequency
# cfg.set_by_dotted("step_scheduler.val_every_steps", 50)

# Change checkpoint frequency
# cfg.set_by_dotted("step_scheduler.ckpt_every_steps", 500)

# Enable/disable torch.compile
# cfg.set_by_dotted("compile.enabled", True)

print("Configuration customization complete!")


## 4. Quick Test Configuration

For testing purposes, let's create a minimal configuration that runs quickly.


In [None]:
# Quick test settings - uncomment to use
QUICK_TEST = False  # Set to True for a quick test run

if QUICK_TEST:
    print("‚ö° Quick test mode enabled!")
    
    # Reduce training to minimum
    cfg.set_by_dotted("step_scheduler.num_epochs", 1)
    cfg.set_by_dotted("step_scheduler.global_batch_size", 8)
    cfg.set_by_dotted("step_scheduler.local_batch_size", 2)
    cfg.set_by_dotted("step_scheduler.val_every_steps", 5)
    cfg.set_by_dotted("step_scheduler.ckpt_every_steps", 100)
    
    # Limit validation samples
    cfg.set_by_dotted("validation_dataset.limit_dataset_samples", 16)
    
    print("Quick test configuration applied!")
else:
    print("Using full training configuration")


## 5. View Final Configuration


In [None]:
# Display the final configuration as a dictionary
print("\n" + "="*60)
print("Final Configuration:")
print("="*60)

# Key settings
print(f"\nüì¶ Model: {cfg.get('model.pretrained_model_name_or_path', 'N/A')}")
print(f"üìä Dataset: {cfg.get('dataset.dataset_name', 'N/A')}")
print(f"üîÑ Epochs: {cfg.get('step_scheduler.num_epochs', 'N/A')}")
print(f"üìè Global Batch Size: {cfg.get('step_scheduler.global_batch_size', 'N/A')}")
print(f"üìê Local Batch Size: {cfg.get('step_scheduler.local_batch_size', 'N/A')}")
print(f"üìà Learning Rate: {cfg.get('optimizer.lr', 'N/A')}")
print(f"‚úÖ Validation Every: {cfg.get('step_scheduler.val_every_steps', 'N/A')} steps")
print(f"üíæ Checkpoint Every: {cfg.get('step_scheduler.ckpt_every_steps', 'N/A')} steps")


## 6. Initialize the Training Recipe

The `TrainFinetuneRecipeForNextTokenPrediction` class orchestrates the entire training process:
- Model loading and parallelization
- Dataset and dataloader creation
- Optimizer and scheduler setup
- Checkpointing
- Logging (WandB, MLflow, etc.)


In [None]:
# Create the recipe instance
recipe = TrainFinetuneRecipeForNextTokenPrediction(cfg)
print("‚úÖ Recipe created!")


## 7. Setup the Recipe

The `setup()` method initializes all components:
- Distributed environment
- Model and optimizer
- Dataloaders
- Schedulers
- Checkpointer

‚ö†Ô∏è **Note**: This step may download the model and dataset if not cached.


In [None]:
# Setup all components
print("Setting up recipe components...")
print("This may take a few minutes on first run (downloading model/dataset)\n")

recipe.setup()

print("\n" + "="*60)
print("‚úÖ Recipe setup complete!")
print("="*60)


## 8. Inspect Recipe Components (Optional)

After setup, you can inspect the initialized components.


In [None]:
# Inspect model
print("üìä Model Information:")
print(f"  - Trainable Parameters: {recipe.param_info.get('trainable_params', 'N/A'):,}")
print(f"  - Total Parameters: {recipe.param_info.get('total_params', 'N/A'):,}")

# Inspect dataloader
print(f"\nüì¶ Dataloader:")
print(f"  - Training batches: {len(recipe.dataloader)}")
print(f"  - Validation datasets: {list(recipe.val_dataloaders.keys())}")

# Inspect step scheduler
print(f"\n‚è±Ô∏è Training Schedule:")
print(f"  - Total epochs: {recipe.step_scheduler.num_epochs}")
print(f"  - Gradient accumulation steps: {recipe.step_scheduler.grad_acc_steps}")
print(f"  - Checkpoint every: {recipe.step_scheduler.ckpt_every_steps} steps")


## 9. Run Training

Execute the training loop. This will:
1. Iterate through epochs and batches
2. Perform forward/backward passes
3. Update model parameters
4. Run validation at specified intervals
5. Save checkpoints at specified intervals

‚ö†Ô∏è **Warning**: Training can take a long time depending on your configuration!


In [None]:
# Run the training loop
print("üèãÔ∏è Starting training...")
print("="*60)

recipe.run_train_validation_loop()

print("="*60)
print("‚úÖ Training complete!")


## 10. Training Results

After training completes, you can find:
- **Checkpoints**: In the `checkpoints/` directory (or as configured)
- **Training logs**: `training.jsonl` in the checkpoint directory
- **Validation logs**: `validation.jsonl` in the checkpoint directory
- **WandB/MLflow**: If configured, metrics are logged to these services


In [None]:
# Display checkpoint location
checkpoint_dir = recipe.checkpointer.config.checkpoint_dir
print(f"üìÅ Checkpoints saved to: {checkpoint_dir}")

# List checkpoint contents
import os
if os.path.exists(checkpoint_dir):
    print("\nCheckpoint directory contents:")
    for item in os.listdir(checkpoint_dir):
        print(f"  - {item}")
else:
    print("\n(Checkpoint directory not created yet)")


---

## üìö Additional Examples

### PEFT (LoRA) Fine-Tuning

To use Parameter-Efficient Fine-Tuning with LoRA:


In [None]:
# Example: Load a PEFT config
# PEFT_CONFIG_PATH = "examples/llm_finetune/llama3_2/llama3_2_1b_hellaswag_peft.yaml"
# cfg_peft = load_yaml_config(PEFT_CONFIG_PATH)
# recipe_peft = TrainFinetuneRecipeForNextTokenPrediction(cfg_peft)
# recipe_peft.setup()
# recipe_peft.run_train_validation_loop()

print("See examples/llm_finetune/ for more PEFT configurations!")


### Custom Configuration from Scratch

You can also create a configuration entirely in Python:


In [None]:
# Example: Create config from scratch using ConfigNode
from nemo_automodel.components.config.loader import ConfigNode

custom_config_dict = {
    "step_scheduler": {
        "global_batch_size": 32,
        "local_batch_size": 4,
        "ckpt_every_steps": 500,
        "val_every_steps": 50,
        "num_epochs": 1,
    },
    "model": {
        "_target_": "nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained",
        "pretrained_model_name_or_path": "meta-llama/Llama-3.2-1B",
    },
    "optimizer": {
        "_target_": "torch.optim.Adam",
        "lr": 1e-5,
    },
    # ... add other required sections
}

# To use: cfg_custom = ConfigNode(custom_config_dict)
print("Custom configuration structure shown above!")


---

## üîó Useful Links

- [NeMo AutoModel Documentation](https://docs.nvidia.com/nemo/automodel/latest/index.html)
- [GitHub Repository](https://github.com/NVIDIA-NeMo/Automodel)
- [LLM Fine-tuning Examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune)
- [VLM Fine-tuning Examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune)

## Available Config Files

Check out `examples/llm_finetune/` for many pre-built configurations:
- **Llama**: llama3_1, llama3_2, llama3_3
- **Mistral**: mistral, mixtral
- **Qwen**: qwen2.5, qwen3
- **Gemma**: gemma, gemma2, gemma3
- **Phi**: phi2, phi3, phi4
- And many more!
