# üöÄ NeMo AutoModel - LLM Fine-Tuning Tutorial

This notebook demonstrates how to fine-tune Large Language Models (LLMs) using NeMo AutoModel.

It implements the same functionality as:
```bash
python examples/llm_finetune/finetune.py --config examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml
```

## What you'll learn:
1. How to load and customize configurations
2. How to set up the training recipe
3. How to run the training loop
4. How to customize training parameters


## 1. Setup and Imports


In [None]:
!/usr/bin/python3 -m venv /root/.venvs/nemo_automodel

# 2) Install tools + package into that venv
!/root/.venvs/nemo_automodel/bin/python -m pip install -U pip setuptools wheel ipykernel
!/root/.venvs/nemo_automodel/bin/python -m pip install -U nemo_automodel

# 3) Register the venv as a Jupyter kernel
!/root/.venvs/nemo_automodel/bin/python -m ipykernel install --user \
  --name nemo_automodel --display-name "Python (nemo_automodel)"


In [None]:
import sys
print(sys.executable)
import sys, site
print("exe:", sys.executable)
print("ENABLE_USER_SITE:", site.ENABLE_USER_SITE)
!{sys.executable} -m pip show nemo-automodel || true
!{sys.executable} -m pip show nemo_automodel || true
import sys

# install the PyPI package (hyphen) into THIS kernel's venv
!{sys.executable} -m pip install -U --no-user nemo-automodel

# verify install location + import
!{sys.executable} -m pip show nemo-automodel
!{sys.executable} -c "import nemo_automodel; print('OK:', nemo_automodel.__file__)"

In [None]:
# NeMo AutoModel imports
from nemo_automodel.components.config.loader import load_yaml_config
from nemo_automodel.recipes.llm.train_ft import TrainFinetuneRecipeForNextTokenPrediction

print("‚úÖ NeMo AutoModel imported successfully!")


## 2. Load Configuration

NeMo AutoModel uses YAML configuration files to define all training parameters.
You can load a pre-defined config or create your own.


In [None]:
# Option 1: Load a pre-defined config from examples
CONFIG_PATH = "/opt/Automodel/examples/llm_finetune/gemma/gemma_3_270m_squad.yaml"
import os
os.environ["HF_TOKEN"] = "YOUR HF TOKEN"
# Load the YAML configuration
cfg = load_yaml_config(CONFIG_PATH)
#cfg.set_by_dotted("model.attn_implementation", "eager")

print(f"Loaded config from: {CONFIG_PATH}")
print("\n" + "="*60)
print("Configuration Summary:")
print("="*60)
print(cfg)



## 3. Initialize the Training Recipe

The `TrainFinetuneRecipeForNextTokenPrediction` class orchestrates the entire training process:
- Model loading and parallelization
- Dataset and dataloader creation
- Optimizer and scheduler setup
- Checkpointing
- Logging (WandB, MLflow, etc.)


In [None]:
# Create the recipe instance
recipe = TrainFinetuneRecipeForNextTokenPrediction(cfg)
print("‚úÖ Recipe created!")


## 4. Setup the Recipe

The `setup()` method initializes all components:
- Distributed environment
- Model and optimizer
- Dataloaders
- Schedulers
- Checkpointer

‚ö†Ô∏è **Note**: This step may download the model and dataset if not cached.


In [None]:
# Setup all components
print("Setting up recipe components...")
print("This may take a few minutes on first run (downloading model/dataset)\n")

recipe.setup()

print("\n" + "="*60)
print("‚úÖ Recipe setup complete!")
print("="*60)


## 5. Run Training

Execute the training loop. This will:
1. Iterate through epochs and batches
2. Perform forward/backward passes
3. Update model parameters
4. Run validation at specified intervals
5. Save checkpoints at specified intervals

‚ö†Ô∏è **Warning**: Training can take a long time depending on your configuration!


In [None]:
# Run the training loop
print("üèãÔ∏è Starting training...")
print("="*60)

recipe.run_train_validation_loop()

print("="*60)
print("‚úÖ Training complete!")


## 10. Training Results

After training completes, you can find:
- **Checkpoints**: In the `checkpoints/` directory (or as configured)
- **Training logs**: `training.jsonl` in the checkpoint directory
- **Validation logs**: `validation.jsonl` in the checkpoint directory
- **WandB/MLflow**: If configured, metrics are logged to these services


In [None]:
# Display checkpoint location
checkpoint_dir = recipe.checkpointer.config.checkpoint_dir
print(f"üìÅ Checkpoints saved to: {checkpoint_dir}")

# List checkpoint contents
import os
if os.path.exists(checkpoint_dir):
    print("\nCheckpoint directory contents:")
    for item in os.listdir(checkpoint_dir):
        print(f"  - {item}")
else:
    print("\n(Checkpoint directory not created yet)")


---

## üîó Useful Links

- [NeMo AutoModel Documentation](https://docs.nvidia.com/nemo/automodel/latest/index.html)
- [GitHub Repository](https://github.com/NVIDIA-NeMo/Automodel)
- [LLM Fine-tuning Examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune)
- [VLM Fine-tuning Examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune)

## Available Config Files

Check out `examples/llm_finetune/` for many pre-built configurations:
- **Llama**: llama3_1, llama3_2, llama3_3
- **Mistral**: mistral, mixtral
- **Qwen**: qwen2.5, qwen3
- **Gemma**: gemma, gemma2, gemma3
- **Phi**: phi2, phi3, phi4
- And many more!
