In [None]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "trl<0.9.0" xformers

In [2]:
import unsloth
from unsloth import FastLanguageModel
import torch

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [3]:
max_seq_length = 4096  # Qwen2.5 supports up to 128K, but using 4K for demo
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen2.5-7B-Instruct",
    max_seq_length=max_seq_length,
    dtype=None,
    load_in_4bit=True,
)

==((====))==  Unsloth 2025.9.4: Fast Qwen2 patching. Transformers: 4.56.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.16G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/271 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r=32,  # Higher rank for multilingual adaptation
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2025.9.4 patched 28 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


In [6]:
model.print_trainable_parameters()

trainable params: 80,740,352 || all params: 7,696,356,864 || trainable%: 1.0491


In [7]:
from datasets import load_dataset
dataset = load_dataset("microsoft/orca-math-word-problems-200k", split="train[:2000]")

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/84.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/200035 [00:00<?, ? examples/s]

In [8]:
dataset

Dataset({
    features: ['question', 'answer'],
    num_rows: 2000
})

In [20]:
print(dataset['question'][100])

A barrel contains 12 liters (L) and 400 milliliters (㎖) of petroleum, and B barrel contains 7600 milliliters (㎖) of petroleum. How many liters (L) must be moved from A barrel to B barrel to equalize the amount of petroleum in A barrel and B barrel?


In [19]:
print(dataset['answer'][100])

First, let's convert all the measurements to the same unit to make the calculation easier. We'll convert everything to milliliters (㎖).

A barrel contains 12 liters and 400 milliliters. Since 1 liter is equal to 1000 milliliters, we can convert the 12 liters to milliliters:
12 liters = 12 * 1000 milliliters = 12000 milliliters

Now, add the 400 milliliters that are already in milliliters:
12000 milliliters + 400 milliliters = 12400 milliliters

So, A barrel contains a total of 12400 milliliters of petroleum.

B barrel contains 7600 milliliters of petroleum.

To equalize the amount of petroleum in both barrels, we need to find the average of the two amounts:
Total amount in both barrels = 12400 milliliters (A barrel) + 7600 milliliters (B barrel)
Total amount in both barrels = 20000 milliliters

Now, divide this total by 2 to find the equal amount for each barrel:
Equal amount for each barrel = 20000 milliliters / 2
Equal amount for each barrel = 10000 milliliters

Since A barrel curren

In [14]:
def format_qwen_chat(examples):
    """Format examples using Qwen2.5's specific chat template for math problems"""
    texts = []

    for i in range(len(examples["question"])):
        question = examples["question"][i]
        answer = examples["answer"][i]

        # Qwen2.5 uses specific chat template format
        # Create a structured conversation for mathematical reasoning
        messages = [
            {"role": "system", "content": "You are a helpful assistant that solves mathematical word problems step by step."},
            {"role": "user", "content": question},
            {"role": "assistant", "content": answer}
        ]

        text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=False,
        )

        texts.append(text)

    return {"text": texts}

In [15]:
formatted_dataset = dataset.map(
    format_qwen_chat,
    batched=True,
    remove_columns=dataset.column_names,
    desc="Formatting dataset for Qwen2.5 chat template"
)

Formatting dataset for Qwen2.5 chat template:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [16]:
formatted_dataset

Dataset({
    features: ['text'],
    num_rows: 2000
})

In [18]:
print(formatted_dataset['text'][100])

<|im_start|>system
You are a helpful assistant that solves mathematical word problems step by step.<|im_end|>
<|im_start|>user
A barrel contains 12 liters (L) and 400 milliliters (㎖) of petroleum, and B barrel contains 7600 milliliters (㎖) of petroleum. How many liters (L) must be moved from A barrel to B barrel to equalize the amount of petroleum in A barrel and B barrel?<|im_end|>
<|im_start|>assistant
First, let's convert all the measurements to the same unit to make the calculation easier. We'll convert everything to milliliters (㎖).

A barrel contains 12 liters and 400 milliliters. Since 1 liter is equal to 1000 milliliters, we can convert the 12 liters to milliliters:
12 liters = 12 * 1000 milliliters = 12000 milliliters

Now, add the 400 milliliters that are already in milliliters:
12000 milliliters + 400 milliliters = 12400 milliliters

So, A barrel contains a total of 12400 milliliters of petroleum.

B barrel contains 7600 milliliters of petroleum.

To equalize the amount of p

In [21]:
from transformers import TrainingArguments
training_args = TrainingArguments(
    per_device_train_batch_size=1,  # Smaller batch for 7B model
    gradient_accumulation_steps=8,   # Effective batch size: 8
    warmup_steps=10,
    max_steps=200,  # More steps for better mathematical reasoning
    learning_rate=1e-4,  # Lower LR for stable training
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=5,
    optim="adamw_8bit",  # Memory-efficient optimizer
    weight_decay=0.01,
    lr_scheduler_type="cosine",  # Cosine scheduler for better convergence
    seed=3407,
    output_dir="qwen25_math_tutor",
    save_strategy="steps",
    save_steps=100,
)

In [22]:
from trl import SFTTrainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=formatted_dataset,
    args=training_args,
    packing=False,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
)

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [23]:
trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,000 | Num Epochs = 1 | Total steps = 200
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 8 x 1) = 8
 "-____-"     Trainable parameters = 80,740,352 of 7,696,356,864 (1.05% trained)
  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mabeshith[0m ([33mabeshith-dr-m-g-r-educational-and-research-institute[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
5,0.996
10,0.7646
15,0.4634
20,0.4198
25,0.3916
30,0.392
35,0.3999
40,0.3941
45,0.4168
50,0.3568


TrainOutput(global_step=200, training_loss=0.39247893810272216, metrics={'train_runtime': 2091.7055, 'train_samples_per_second': 0.765, 'train_steps_per_second': 0.096, 'total_flos': 1.8468014007066624e+16, 'train_loss': 0.39247893810272216, 'epoch': 0.8})

In [24]:
model.save_pretrained("qwen25_math_expert")
tokenizer.save_pretrained("qwen25_math_expert")

('qwen25_math_expert/tokenizer_config.json',
 'qwen25_math_expert/special_tokens_map.json',
 'qwen25_math_expert/chat_template.jinja',
 'qwen25_math_expert/vocab.json',
 'qwen25_math_expert/merges.txt',
 'qwen25_math_expert/added_tokens.json',
 'qwen25_math_expert/tokenizer.json')

In [None]:
FastLanguageModel.for_inference(model)

In [26]:
def test_qwen25_math(question, max_tokens=512):
    """Test the fine-tuned Qwen2.5 model with mathematical problems"""

    messages = [
        {"role": "system", "content": "You are a helpful assistant that solves mathematical word problems step by step."},
        {"role": "user", "content": question}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )

    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.3,  # Lower temperature for mathematical accuracy
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            repetition_penalty=1.1,
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract only the assistant's response
    assistant_response = response.split("<|im_start|>assistant")[-1].replace("<|im_end|>", "").strip()

    return assistant_response

In [27]:
test_problems = [
    "A train travels 240 miles in 4 hours. What is its average speed in miles per hour?",

    "Sarah has 15 apples. She gives 1/3 of them to her friend and eats 2 apples herself. How many apples does she have left?",

    "A rectangular garden has a length of 12 meters and a width of 8 meters. What is the area and perimeter of the garden?",

    "If a shirt originally costs $60 and is on sale for 25% off, what is the final price after the discount?",

    "A company's profit increased from $50,000 to $75,000. What is the percentage increase in profit?"
]

for i, problem in enumerate(test_problems, 1):
    print(f"\n🔢 Problem {i}:")
    print(f"Q: {problem}")
    solution = test_qwen25_math(problem, max_tokens=300)
    print(f"A: {solution}")
    print("-" * 60)


🔢 Problem 1:
Q: A train travels 240 miles in 4 hours. What is its average speed in miles per hour?
A: system
You are a helpful assistant that solves mathematical word problems step by step.
user
A train travels 240 miles in 4 hours. What is its average speed in miles per hour?
assistant
To find the average speed of the train, you divide the total distance traveled by the total time taken.

Average speed = Total distance / Total time

In this case:

Average speed = 240 miles / 4 hours

Average speed = 60 miles per hour

So, the train's average speed is 60 miles per hour.
------------------------------------------------------------

🔢 Problem 2:
Q: Sarah has 15 apples. She gives 1/3 of them to her friend and eats 2 apples herself. How many apples does she have left?
A: system
You are a helpful assistant that solves mathematical word problems step by step.
user
Sarah has 15 apples. She gives 1/3 of them to her friend and eats 2 apples herself. How many apples does she have left?
assistan

In [28]:
multilingual_problems = [
    "Un rectángulo tiene 10 metros de largo y 6 metros de ancho. ¿Cuál es su área?",  # Spanish
    "Une voiture parcourt 120 km en 2 heures. Quelle est sa vitesse moyenne?",  # French
    "一个圆的半径是5厘米，请计算它的面积。",  # Chinese
]

for i, problem in enumerate(multilingual_problems, 1):
    print(f"\n🌐 Multilingual Problem {i}:")
    print(f"Q: {problem}")
    solution = test_qwen25_math(problem, max_tokens=250)
    print(f"A: {solution}")
    print("-" * 50)


🌐 Multilingual Problem 1:
Q: Un rectángulo tiene 10 metros de largo y 6 metros de ancho. ¿Cuál es su área?
A: system
You are a helpful assistant that solves mathematical word problems step by step.
user
Un rectángulo tiene 10 metros de largo y 6 metros de ancho. ¿Cuál es su área?
assistant
El área de un rectángulo se calcula multiplicando la longitud por el ancho.

Área = Longitud x Ancho

En este caso:

Área = 10 m x 6 m = 60 m²

Por lo tanto, el área del rectángulo es de 60 metros cuadrados.
--------------------------------------------------

🌐 Multilingual Problem 2:
Q: Une voiture parcourt 120 km en 2 heures. Quelle est sa vitesse moyenne?
A: system
You are a helpful assistant that solves mathematical word problems step by step.
user
Une voiture parcourt 120 km en 2 heures. Quelle est sa vitesse moyenne?
assistant
La vitesse moyenne est calculée en divisant la distance totale par le temps total.

Dans ce cas, la distance totale est de 120 km et le temps total est de 2 heures.

Don

In [29]:
structured_prompt = """
Solve this problem and format your answer as JSON:
"A store sells 3 types of fruits: apples ($2 each), bananas ($1 each), and oranges ($3 each).
If someone buys 4 apples, 6 bananas, and 2 oranges, calculate the total cost."

Format the response as:
{
  "items": [...],
  "calculations": [...],
  "total_cost": ...
}
"""

structured_result = test_qwen25_math(structured_prompt, max_tokens=400)
print("🔧 Structured JSON Response:")
print(structured_result)

🔧 Structured JSON Response:
system
You are a helpful assistant that solves mathematical word problems step by step.
user

Solve this problem and format your answer as JSON:
"A store sells 3 types of fruits: apples ($2 each), bananas ($1 each), and oranges ($3 each). 
If someone buys 4 apples, 6 bananas, and 2 oranges, calculate the total cost."

Format the response as:
{
  "items": [...],
  "calculations": [...],
  "total_cost": ...
}

assistant
{
  "items": ["4 apples", "6 bananas", "2 oranges"],
  "calculations": ["(4 * $2) + (6 * $1) + (2 * $3) = $8 + $6 + $6 = $20"],
  "total_cost": "$20"
}
