<a href="https://colab.research.google.com/github/AliGreo/Text-Based-Projects/blob/main/finetune_qwen_3_4B_reasoning_math_dataset_unsloth.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
%%capture
!pip install unsloth vllm trl

In [1]:
from google.colab import userdata
hf_token = userdata.get('hfr')

In [4]:
import unsloth
from unsloth import FastLanguageModel
import torch
import os

In [5]:
model_id = "Qwen/Qwen3-4B"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    load_in_4bit=True,
    full_finetuning=False,
    max_seq_length=2048,
    dtype=None,
    use_gradient_checkpointing="unsloth"
)

==((====))==  Unsloth 2025.6.5: Fast Qwen3 patching. Transformers: 4.52.4. vLLM: 0.8.5.post1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/3.55G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/707 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/4.67k [00:00<?, ?B/s]

In [6]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 8, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_dropout=0,
    lora_alpha = 16, # *2 speeds up training
    use_gradient_checkpointing = "unsloth", # Reduces memory usage
    random_state = 3407,
)

Unsloth 2025.6.5 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.


### finetune it on `bespokelabs/Bespoke-Stratos-17k` dataset

In [7]:
from datasets import load_dataset

dataset_name = "bespokelabs/Bespoke-Stratos-17k"

dataset = load_dataset(dataset_name, split="train")

README.md:   0%|          | 0.00/5.29k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/125M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/16710 [00:00<?, ? examples/s]

In [8]:
def format_function(row):
    sys_prompt = row['system']
    conv = row['conversations']
    user_input = conv[0]['value']
    answer = conv[1]['value']

    prompt = f"{sys_prompt}\n{user_input}\n{answer}"
    return {"text":prompt}

dataset = dataset.map(format_function)

dataset = dataset.map(lambda x : {"text":x['text'].replace("<|begin_of_thought|>", "<think>").replace("<|end_of_thought|>", "</think>")})

Map:   0%|          | 0/16710 [00:00<?, ? examples/s]

Map:   0%|          | 0/16710 [00:00<?, ? examples/s]

In [None]:
dataset = dataset.select(range(5000))

In [9]:
from trl import SFTConfig, SFTTrainer

args = SFTConfig(
    output_dir= f"{model_id}-finetuned",
    max_steps=100,
    per_device_train_batch_size=2,
    max_seq_length=2048,
    dataset_text_field="text",
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    report_to="none",
    run_name="qwen-3-4b-finetuning",
    logging_steps=20,
    average_tokens_across_devices = False
)

trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    train_dataset=dataset,
    args=args
)

trainer.train()

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/16710 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 16,710 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 16,515,072/4,000,000,000 (0.41% trained)


Step,Training Loss
20,1.062
40,0.8616
60,0.7005
80,0.5539
100,0.5425


TrainOutput(global_step=100, training_loss=0.7440903282165527, metrics={'train_runtime': 2767.0628, 'train_samples_per_second': 0.289, 'train_steps_per_second': 0.036, 'total_flos': 3.547191147076608e+16, 'train_loss': 0.7440903282165527})

### inference with the finetuned model

In [10]:
FastLanguageModel.for_inference(model)

ids = tokenizer.apply_chat_template([{"role":"user","content":"solve 3x - 10 = 28"}],
                                    return_dict=True,
                                    add_generation_config=True,
                                    return_tensors="pt").to("cuda")

from transformers import TextStreamer

_ = model.generate(**ids,
                   streamer=TextStreamer(tokenizer=tokenizer, skip_prompt=True, skip_special_tokens=True),

                   max_new_tokens=4090)

Okay, so I need to solve the equation 3x minus 10 equals 28. Hmm, let me think. Alright, first step, I remember that to solve for x, I need to isolate it on one side of the equation. That means getting rid of the 10 and then dealing with the 3 that's multiplied by x. 

Let me write down the equation again to make sure I have it right: 3x - 10 = 28. Yep, that's correct. So the goal is to get x by itself. Let me recall the steps. Usually, you do the opposite operation to both sides to keep things balanced. So since there's a subtraction of 10, I should add 10 to both sides to undo that. Let me try that.

Adding 10 to both sides: 3x - 10 + 10 equals 28 + 10. Simplifying that, the -10 and +10 on the left side cancel each other out, so it's just 3x. On the right side, 28 plus 10 is 38. So now the equation is 3x = 38. Okay, that seems right. 

Now, the next step is to get rid of that 3 that's multiplied by x. To do that, I need to divide both sides by 3. Let me write that down. Dividing both