# Instruction tuning - Supervised finetuning

Pre-training = reading a textbook
Supervised finetuning = Imitating

- SFT helps the model how the knowledge learnt is applied.
- SFT does not teach new facts. It teached new behaviors.
- SFT teached models to shift the model's probability distributions to favor the types of responses in the training dataset
- The model mimics the patterns in the training data. So, invetime time in data curation
- Quality of training data matters in SFT. In pretraining, quantity matters.

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM

from datasets import load_dataset
import trackio as wandb
from trl import SFTTrainer, SFTConfig

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
wandb.init(project="smol-course", name="sft-runs")

* Trackio project initialized: smol-course
* Trackio metrics logged to: /Users/phani/.cache/huggingface/trackio
* View dashboard by running in your terminal:
[1m[93mtrackio show --project "smol-course"[0m
* or by running in Python: trackio.show(project="smol-course")


<trackio.run.Run at 0x179a13380>

In [3]:
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

In [4]:
data = load_dataset("roneneldan/TinyStories")
data

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 2119719
    })
    validation: Dataset({
        features: ['text'],
        num_rows: 21990
    })
})

In [5]:
data['train']

Dataset({
    features: ['text'],
    num_rows: 2119719
})

In [6]:
config = SFTConfig(
    output_dir="./models",
    per_device_train_batch_size=2,
    learning_rate=5e-5,
    max_steps=20,
    logging_steps=2,
    report_to="trackio"
)

In [7]:
trainer = SFTTrainer(
    model=model,
    train_dataset=data['train'].select(range(100)),
    args=config
)

Adding EOS to train dataset: 100%|██████████| 100/100 [00:00<00:00, 20072.28 examples/s]
Tokenizing train dataset: 100%|██████████| 100/100 [00:00<00:00, 2415.11 examples/s]
Truncating train dataset: 100%|██████████| 100/100 [00:00<00:00, 28240.67 examples/s]


In [8]:
trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.


* Trackio project initialized: huggingface
* Trackio metrics logged to: /Users/phani/.cache/huggingface/trackio
* View dashboard by running in your terminal:
[1m[93mtrackio show --project "huggingface"[0m
* or by running in Python: trackio.show(project="huggingface")


Step,Training Loss
2,2.2452
4,2.1235
6,2.3847
8,2.2127
10,2.1763
12,2.2377
14,2.0138
16,1.9635
18,2.2754
20,2.1864


* Uploading logs to Trackio Space: http://127.0.0.1:7860/ (please wait...)


TrainOutput(global_step=20, training_loss=2.181929111480713, metrics={'train_runtime': 62.9966, 'train_samples_per_second': 0.635, 'train_steps_per_second': 0.317, 'total_flos': 21338005438464.0, 'train_loss': 2.181929111480713, 'epoch': 0.4})

## CLI

In [9]:
# ! trl sft \
#     --model_name_or_path Qwen/Qwen3-0.6B \
#     --dataset_name HuggingFaceTB/smoltalk2 \
#     --dataset_config SFT \
#     --output_dir ./Qwen3-0.6B-sft \
#     --per_device_train_batch_size 4 \
#     --gradient_accumulations_steps 4 \
#     --learning_rate 5e-4 \
#     --max_steps 1000 \
#     --logging_steps 50 \
#     --save_steps 200 \
#     --report_to trackio \
#     --push_to_hub \
#     --hub_model_id binga/Qwen3-0.6B-sft

In [18]:
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig

lora_config = LoraConfig(
    r = 16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

sft_config = SFTConfig(
    output_dir="./models",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    packing=True,
)

trainer = SFTTrainer(
    model=model,
    train_dataset=data['train'].select(range(100)),
    peft_config=lora_config
)



In [None]:
trainer.train()



Step,Training Loss
