<a href="https://colab.research.google.com/github/dhnanjay/HuggingFace/blob/main/Qwen3_Supervised_finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Qwen3: Supervised Fine-Tuning with TRL

This notebook demonstrates how to fine-tune a language model using the Supervised Fine-Tuning (SFT) approach with the TRL library.

## Install required libraries

In [None]:
!pip install -U transformers>=4.51.0 torch>=2.7.0 torchaudio>=2.7.0 torchvision trl>=0.17.0 peft bitsandbytes accelerate datasets

In [None]:
!wandb login

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin


## Import libraries

In [None]:
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTConfig, SFTTrainer, setup_chat_format
from peft import LoraConfig

# Load Dataset

In [None]:
dataset_name = "allenai/tulu-3-sft-personas-code"  # Example dataset

# Load dataset
dataset = load_dataset(dataset_name, split="train")
print(f"Dataset loaded: {dataset}")

# Let's look at a sample
print("\nSample data:")
print(dataset[0])

Dataset loaded: Dataset({
    features: ['id', 'prompt', 'messages'],
    num_rows: 34999
})

Sample data:
{'id': 'personas_code_t7x1g2nylbi8322og3tyz3yc', 'prompt': 'Write a python function to count the number of unique words in a given text file. As a local journalist who often reviews mime performances, I have a collection of text files containing my reviews. I want to analyze these reviews by counting how many distinct words I use across a single file.\n\nThe function should take the file path as input and return an integer representing the count of unique words. For this task, words should be considered case-insensitively, meaning "Mime" and "mime" should be considered the same word. Additionally, punctuation should be ignored, so "mime!" and "mime" are also considered the same word.\n\nInput:\n- A string representing the path to a text file.\n\nOutput:\n- An integer representing the count of unique words in the text file.\n\nExample:\nSuppose the content of the file at the given 

In [None]:
dataset = dataset.remove_columns("prompt")
dataset = dataset.train_test_split(test_size=0.2)

In [None]:
print(
    f"Train Samples: {len(dataset['train'])}\nTest Samples: {len(dataset['test'])}"
)

Train Samples: 27999
Test Samples: 7000


## Configuration

Set up the configuration parameters for the fine-tuning process.

In [None]:
# Model configuration
model_name = "Qwen/Qwen3-1.7B"  # You can change this to any model you want to fine-tune

# # Other compatible Qwen3 models
# model_name = "Qwen/Qwen3-32B"
# model_name = "Qwen/Qwen3-14B"
# model_name = "Qwen/Qwen3-8B"
# model_name = "Qwen/Qwen3-4B"
# model_name = "Qwen/Qwen3-1.7B"
# model_name = "Qwen/Qwen3-0.6B"

# Training configuration
use_peft = True  # Set to True to use Parameter-Efficient Fine-Tuning (PEFT)
output_dir = "./output/sft-model"
num_train_epochs = 1
per_device_train_batch_size = 1
gradient_accumulation_steps = 1
learning_rate = 2e-4 if use_peft else 2e-5  # Higher learning rate for PEFT

## Load model and tokenizer

In [None]:
# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    use_cache=False,  # Disable KV cache during training
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# # Set up chat formatting (if the model doesn't have a chat template)
# if tokenizer.chat_template is None:
#     model, tokenizer = setup_chat_format(model, tokenizer, format="chatml")

# # Set padding token
# if tokenizer.pad_token is None:
#     tokenizer.pad_token = tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Configure PEFT (if enabled)

In [None]:
# Set up PEFT configuration if enabled
peft_config = None
if use_peft:
    peft_config = LoraConfig(
        r=32,  # Rank
        lora_alpha=16,  # Alpha parameter for LoRA scaling
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules="all-linear",
    )

## Configure SFT Trainer

In [None]:
# Training arguments
training_args = SFTConfig(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    learning_rate=learning_rate,
    gradient_checkpointing=True,
    logging_steps=25,
    save_strategy="epoch",
    optim="adamw_torch",
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    max_length=1024,
    packing=True,  # Enable packing to increase training efficiency
    eos_token=tokenizer.eos_token,
    bf16=True,
    fp16=False,
    max_steps=1000,
    report_to="wandb",  # Disable reporting to avoid wandb prompts
)

## Initialize and run the SFT Trainer

In [None]:
# Create SFT Trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"] if "test" in dataset else None,
    peft_config=peft_config,
    processing_class=tokenizer,
)

Converting train dataset to ChatML:   0%|          | 0/27999 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/27999 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/27999 [00:00<?, ? examples/s]

Packing train dataset:   0%|          | 0/27999 [00:00<?, ? examples/s]

Converting eval dataset to ChatML:   0%|          | 0/7000 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/7000 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/7000 [00:00<?, ? examples/s]

Packing eval dataset:   0%|          | 0/7000 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
# Train the model
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mbenjamin-burtenshaw[0m ([33msmartwithfood[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
25,1.4207
50,0.8574
75,0.6935
100,0.6259
125,0.6025
150,0.5825
175,0.596
200,0.5856
225,0.5416
250,0.5807


TrainOutput(global_step=1000, training_loss=0.5781423273086548, metrics={'train_runtime': 416.2907, 'train_samples_per_second': 2.402, 'train_steps_per_second': 2.402, 'total_flos': 8868470766336000.0, 'train_loss': 0.5781423273086548})

## Save the fine-tuned model

In [None]:
# Save the model
trainer.save_model(output_dir)

## Test the fine-tuned model

In [None]:
# Load the fine-tuned model and tokenizer
if use_peft:
    from peft import PeftModel, PeftConfig

    # Load the base model
    base_model = AutoModelForCausalLM.from_pretrained(
        model_name, trust_remote_code=True, torch_dtype=torch.bfloat16
    )

    # Load the fine-tuned PEFT model
    model = PeftModel.from_pretrained(base_model, output_dir)
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
else:
    # Load the full fine-tuned model
    model = AutoModelForCausalLM.from_pretrained(output_dir, trust_remote_code=True)
    tokenizer = AutoTokenizer.from_pretrained(output_dir, trust_remote_code=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
# Test the model with an example
prompt = """Write a function called is_palindrome that takes a single string as input and returns True if the string is a palindrome, and False otherwise.

Palindrome Definition:

A palindrome is a word, phrase, number, or other sequence of characters that reads the same forward and backward, ignoring spaces, punctuation, and capitalization.

Example:
```
is_palindrome("racecar")  # Returns True
is_palindrome("hello")  # Returns False
is_palindrome("A man, a plan, a canal: Panama")  # Returns True
```
"""

# Format the chat prompt using the tokenizer's chat template
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]
formatted_prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
print(f"Formatted prompt: {formatted_prompt}")

# Generate response
model.eval()
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\nGenerated Response:")
print(response)

Formatted prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Write a function called is_palindrome that takes a single string as input and returns True if the string is a palindrome, and False otherwise.

Palindrome Definition:

A palindrome is a word, phrase, number, or other sequence of characters that reads the same forward and backward, ignoring spaces, punctuation, and capitalization.

Example:
```
is_palindrome("racecar")  # Returns True
is_palindrome("hello")  # Returns False
is_palindrome("A man, a plan, a canal: Panama")  # Returns True
```
<|im_end|>
<|im_start|>assistant


Generated Response:
system
You are a helpful assistant.
user
Write a function called is_palindrome that takes a single string as input and returns True if the string is a palindrome, and False otherwise.

Palindrome Definition:

A palindrome is a word, phrase, number, or other sequence of characters that reads the same forward and backward, ignoring spaces, punctuation, and 

In [None]:
model.push_to_hub("burtenshaw/Qwen3-30B-A3B-python-code")