# 🚀 OpenSloth Demo Training Notebook

This notebook demonstrates how to fine-tune large language models using opensloth's multi-GPU capabilities. It's equivalent to running:

```bash
opensloth-train examples/example_sharegpt_lora_2gpus.py
```

## What This Demo Does

- **Multi-GPU Training**: Uses 2 GPUs with NCCL synchronization
- **Adaptive Batching**: Optimizes sequence sorting and padding
- **LoRA Fine-tuning**: Efficient parameter updates with Low-Rank Adaptation
- **Response-only Loss**: Calculates loss only on assistant responses

## Prerequisites

1. opensloth installed: `pip install git+https://github.com/anhvth/opensloth.git`
2. At least 2 GPUs available (adjust `gpus=[0, 1]` if needed)
3. Sufficient VRAM (reduce batch size if needed)

In [1]:
%%capture
%load_ext autoreload
%autoreload 2
# %cd ../

In [None]:
# !pip install -e ./

Obtaining file:///home/anhvth5/projects/opensloth
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: opensloth
  Building editable for opensloth (pyproject.toml) ... [?25ldone
[?25h  Created wheel for opensloth: filename=opensloth-0.1.4-py3-none-any.whl size=6306 sha256=08abeb37cdd43ec6c0a9334b2e3a69992265f2809ad45c09dadb18643d066d6f
  Stored in directory: /tmp/pip-ephem-wheel-cache-th39hfnz/wheels/a8/37/ff/66b14b7b5edc7e9915d5497a6cd2507db184c8cffe07c2577b
Successfully built opensloth
Installing collected packages: opensloth
  Attempting uninstall: opensloth
    Found existing installation: opensloth 0.1.4
    Uninstalling opensloth-0.1.4:
      Successfully uninstalled opensloth-0.1.4
Successfully installed opensloth-0.1.4


In [8]:
# !pip list | grep open

In [None]:
# Import opensloth configuration classes
from opensloth.opensloth_config import *

# Check GPU availability
import torch
print(f'🔥 CUDA Available: {torch.cuda.is_available()}')
print(f'🔥 GPU Count: {torch.cuda.device_count()}')
for i in range(torch.cuda.device_count()):
    print(f'   GPU {i}: {torch.cuda.get_device_name(i)}')


ModuleNotFoundError: No module named 'opensloth'

## ⚙️ Configuration Setup

HyperSloth uses Pydantic models for type-safe configuration. We'll set up:

1. **Data Configuration**: Dataset and tokenization settings
2. **Training Configuration**: GPU allocation and loss calculation
3. **Model Configuration**: Base model and LoRA parameters
4. **Training Arguments**: Learning rate, batch size, and optimization settings

In [None]:
from opensloth.opensloth_config import *
from opensloth.scripts.hp_trainer import run_mp_training, setup_envs

# Main configuration using Pydantic models
opensloth_config = OpenSlothConfig(
    data=HFDatasetConfig(
        dataset_name="mlabonne/FineTome-100k",
        split="train",
        tokenizer_name="Qwen/Qwen3-8B",  # does not matter same family qwen3
        num_samples=5000,
        instruction_part="<|im_start|>user\n",
        response_part="<|im_start|>assistant\n",
        chat_template="qwen3",
    ),
    training=TrainingConfig(
        gpus=[0, 1, 2, 3],
        loss_type="response_only",
    ),
    fast_model_args=FastModelArgs(
        model_name="unsloth/Qwen3-8B-bnb-4bit",
        max_seq_length=32000,
        load_in_4bit=True,
    ),
    lora_args=LoraArgs(
        r=8,
        lora_alpha=16,
        target_modules=[
            "q_proj",
            "k_proj",
            "v_proj",
            "o_proj",
            "gate_proj",
            "up_proj",
            "down_proj",
        ],
        lora_dropout=0,
        bias="none",
        use_rslora=False,
    ),
)

# Training arguments using Pydantic model
training_config = TrainingArgsConfig(
    output_dir="outputs/qwen3-8b-FineTome-4gpus/",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=1e-5,
    logging_steps=3,
    num_train_epochs=3,
    lr_scheduler_type="linear",
    warmup_steps=5,
    save_total_limit=2,
    weight_decay=0.01,
    optim="adamw_8bit",
    seed=3407,
    report_to="tensorboard",  # tensorboard or wawndb
)

setup_envs(opensloth_config, training_config)

run_mp_training(
    opensloth_config.training.gpus, opensloth_config, training_config
)