#  A finetune of Qwen2.5-1.5B on a Brilliant.org Community dataset.

### Preparation
We start by importing all necessary libraries. It should be noted that the finetuning library Unsloth requires a GPU with CUDA support.

In [1]:
import torch
from unsloth import FastLanguageModel
import re
from tqdm.auto import tqdm
import pandas as pd
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from datasets import Dataset

torch.cuda.get_device_name(0)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-01-27 08:59:34.536302: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1737964774.548661 2736272 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1737964774.552478 2736272 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 08:59:34.565284: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


🦥 Unsloth Zoo will now patch everything to make training faster!


'NVIDIA GeForce RTX 2080 Super with Max-Q Design'

Now we download the Qwen2.5 model, which is a state-of-the-art open weight LLM. It is developed by Alibaba and the technical background can be found [here](https://qwenlm.github.io/blog/qwen2.5-math/). Since this training process will be run on a laptop, the smallest available model with 1.5B parameters is chosen. A quantization of 4bits further helps to reduce memory usage.

In [4]:
max_seq_length = 2048

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen2.5-1.5B-bnb-4bit", # unsloth/Qwen2.5-Math-1.5B
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True,
)

==((====))==  Unsloth 2025.1.6: Fast Qwen2 patching. Transformers: 4.47.1.
   \\   /|    GPU: NVIDIA GeForce RTX 2080 Super with Max-Q Design. Max memory: 7.781 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


We test the model on a simple example problem:

In [5]:
qwen_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    qwen_prompt.format(
        "Please reason step by step, and put your final answer within \\boxed{}.", # instruction
        "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$.", # input
        "", # output
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = max_seq_length, use_cache = True)
tokenizer.batch_decode(outputs)

['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nPlease reason step by step, and put your final answer within \\boxed{}.\n\n### Input:\nFind the value of $x$ that satisfies the equation $4x+5 = 6x+7$.\n\n### Response:\nTo find the value of $x$ that satisfies the equation $4x+5 = 6x+7$, we can follow these steps:\n\n1. Subtract $4x$ from both sides of the equation to isolate the variable $x$ on one side:\n   $4x + 5 - 4x = 6x + 7 - 4x$\n   This simplifies to:\n   $5 = 2x + 7$\n\n2. Subtract $7$ from both sides of the equation to isolate the term with $x$:\n   $5 - 7 = 2x + 7 - 7$\n   This simplifies to:\n   $-2 = 2x$\n\n3. Divide both sides of the equation by $2$ to solve for $x$:\n   $-2 / 2 = 2x / 2$\n   This simplifies to:\n   $-1 = x$\n\nTherefore, the value of $x$ that satisfies the equation $4x+5 = 6x+7$ is $\\boxed{-1}$.<|endoftext|>']

Now the dataset of problems will be loaded. It was created with `data-mining.py`, which processes the dump of the now defunct Brilliant.org community questions. The dump was published by the VP of Brilliant [here](https://www.reddit.com/r/DataHoarder/comments/o0qrey/comment/h1zerf6/) and has a size of around 10 GB.

In [6]:
df = pd.read_csv('brilliant-community.csv')
df_sample = df.sample(n=50, random_state=42)
df_train = df.drop(df_sample.index)
df

Unnamed: 0,question,answer
0,How many different 5 letter sequences can be m...,Number of total words - Number of “BAD” words:...
1,$\\large{ \\sum _{ n=1 }^{ \\infty }{ \\frac ...,Using the following property :∑n=1∞Fnxn=xx2−x...
2,Which of the following statements are true and...,Pick any values for S1 and S2.\nUse S1 to obta...
3,A cylinder of radius $R$ and length $l$ is flo...,"Thus, the final answer is \\boxed{-1}."
4,$\\large \\begin{cases} x = a(t-\\sin t) \\\\ ...,{x=a(t−sin⁡t)⟹dxdt=a−acos⁡ty=a(1−cos⁡t)⟹dydt=a...
...,...,...
45277,$\\log_{3} \\left (1 + \\dfrac13 \\right ) + \...,log⁡3(1+13)+log⁡3(1+14)+log⁡3(1+15)+⋯+log⁡3(1+...
45278,Calculate $1^3 + 2^3 + 3^3 + ... + 10^3$ witho...,13+23=1+8=9=321^3 + 2^3 = 1 + 8 = 9 = 3^213+23...
45279,"In $\\triangle ABC$ , if $\\dfrac{\\cos A}{\\s...","To solve this problem, we need 3 inequalities,..."
45280,"$\\lim_{x\\to0} \\dfrac{\\sin5x}{2x} = \\, ?$",lim⁡x→0sin⁡5x2x=52.lim⁡5x→0sin⁡5x5x=52\\lim_{x...


We start by benchmarking the unmodified model on a sample of the training dataset.

In [7]:
# evaluate if model found correct answer
def evaluate(answer, model):
    # Extract the last content inside \boxed{}
    answer_match = re.findall(r'\\boxed{([^{}]*)}', answer)
    model_match = re.findall(r'\\boxed{([^{}]*)}', model)
    
    # If no boxed content found, return False
    if not answer_match or not model_match:
        return False
    
    # Compare the last boxed content
    return answer_match[-1].strip() == model_match[-1].strip()

def benchmark(df, model):
    total_correct = 0
    total_processed = 0

    progress_bar = tqdm(total=len(df), desc="Processing Questions")

    results = []
    for index, row in df.iterrows():
        # Prepare input
        inputs = tokenizer(
            [qwen_prompt.format(
                "Please reason step by step, and put your final answer within \\boxed{}.",
                row['question'],  # input
                ""  # output empty since model should generate its own
            )], return_tensors = "pt").to("cuda")
        
        outputs = model.generate(**inputs, max_new_tokens = max_seq_length, use_cache = True)
        
        decoded_output = tokenizer.batch_decode(outputs)[0]
        print(decoded_output)
        
        is_correct = evaluate(row['answer'], decoded_output)
        
        total_processed += 1
        if is_correct:
            total_correct += 1
        
        # Update progress bar
        progress_bar.set_postfix({
            'Accuracy': f'{total_correct/total_processed:.2%}'
        })
        progress_bar.update(1)
        
        # Store result
        results.append({
            'question': row['question'],
            'ground_truth': row['answer'],
            'model_output': decoded_output,
            'correct': is_correct
        })

    progress_bar.close()

    final_accuracy = total_correct / total_processed
    print(f"\nFinal Accuracy: {final_accuracy:.2%}")

In [5]:
benchmark(df_sample, model)

Processing Questions:   0%|          | 0/50 [00:00<?, ?it/s]

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Please reason step by step, and put your final answer within \boxed{}.

### Input:
If $(1,x,y)$ is a geometric sequence and $(x,y,3)$ is an arithmetic sequence then find the maximum value of $x+y$ .

### Response:
To find the maximum value of $x+y$, we need to use the properties of geometric and arithmetic sequences.

First, let's recall the properties of geometric sequences. In a geometric sequence, the ratio between consecutive terms is constant. Let's denote this common ratio by $r$. Then, we have:

$$x = r \cdot 1$$
$$y = r \cdot x$$

Now, let's consider the properties of arithmetic sequences. In an arithmetic sequence, the difference between consecutive terms is constant. Let's denote this common difference by $d$. Then, we have:

$$y = a + d$$
$$3 = a + 2d$$

From the above equations, we can express $x$ and

We observe a pretty bad accuracy of around $10 \%$, which can be explained by the very small size of the model and the high difficulty level of many problems.

### Finetuning

We incorporate LoRA adapters, allowing us to only finetune on a small fraction of the model parameters. The technical details behind LoRA can be found [here](https://arxiv.org/abs/2309.15223).

In [8]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2025.1.6 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


Finally we convert the dataframe into a HuggingFace dataset:

In [9]:
def formatting_prompts_func(df):
    texts = []
    for _, row in df.iterrows():
        # Assuming columns: instruction, input, output
        text = qwen_prompt.format(
            row.get('instruction', 'Please reason step by step, and put your final answer within \\boxed{}.'),
            row['question'],
            row['answer']
        ) + tokenizer.eos_token
        texts.append(text)
    
    return pd.DataFrame({'text': texts})

dataset = formatting_prompts_func(df_train)
dataset = Dataset.from_pandas(dataset)

Now we start the training process, which takes a few hours:

In [10]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1,  # full training run
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
    ),
)

Map (num_proc=2):   0%|          | 0/45232 [00:00<?, ? examples/s]

In [11]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 45,232 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 5,654
 "-____-"     Number of trainable parameters = 18,464,768


  0%|          | 0/5654 [00:00<?, ?it/s]

{'loss': 1.0961, 'grad_norm': 0.46392911672592163, 'learning_rate': 4e-05, 'epoch': 0.0}
{'loss': 1.4669, 'grad_norm': 0.7520129084587097, 'learning_rate': 8e-05, 'epoch': 0.0}
{'loss': 1.1691, 'grad_norm': 0.3953118920326233, 'learning_rate': 0.00012, 'epoch': 0.0}
{'loss': 0.9945, 'grad_norm': 0.4034992456436157, 'learning_rate': 0.00016, 'epoch': 0.0}
{'loss': 0.7904, 'grad_norm': 0.276416152715683, 'learning_rate': 0.0002, 'epoch': 0.0}
{'loss': 1.1238, 'grad_norm': 0.22802414000034332, 'learning_rate': 0.00019996459550362895, 'epoch': 0.0}
{'loss': 1.082, 'grad_norm': 0.2964746057987213, 'learning_rate': 0.00019992919100725792, 'epoch': 0.0}
{'loss': 1.359, 'grad_norm': 0.4739301800727844, 'learning_rate': 0.0001998937865108869, 'epoch': 0.0}
{'loss': 1.1807, 'grad_norm': 0.3060254454612732, 'learning_rate': 0.00019985838201451586, 'epoch': 0.0}
{'loss': 0.9768, 'grad_norm': 0.34633293747901917, 'learning_rate': 0.0001998229775181448, 'epoch': 0.0}
{'loss': 0.9504, 'grad_norm': 0.



{'loss': 0.8176, 'grad_norm': 0.2005707174539566, 'learning_rate': 0.0001824393697999646, 'epoch': 0.09}
{'loss': 0.7567, 'grad_norm': 0.2651562988758087, 'learning_rate': 0.00018240396530359355, 'epoch': 0.09}
{'loss': 0.8587, 'grad_norm': 0.193593367934227, 'learning_rate': 0.00018236856080722255, 'epoch': 0.09}
{'loss': 0.8512, 'grad_norm': 0.2551272213459015, 'learning_rate': 0.0001823331563108515, 'epoch': 0.09}
{'loss': 1.2736, 'grad_norm': 0.22416122257709503, 'learning_rate': 0.00018229775181448046, 'epoch': 0.09}
{'loss': 1.0359, 'grad_norm': 0.2738470137119293, 'learning_rate': 0.0001822623473181094, 'epoch': 0.09}
{'loss': 0.6352, 'grad_norm': 0.24015548825263977, 'learning_rate': 0.00018222694282173836, 'epoch': 0.09}
{'loss': 0.7356, 'grad_norm': 0.22841660678386688, 'learning_rate': 0.00018219153832536733, 'epoch': 0.09}
{'loss': 0.7827, 'grad_norm': 0.25998130440711975, 'learning_rate': 0.0001821561338289963, 'epoch': 0.09}
{'loss': 0.7521, 'grad_norm': 0.273446798324584



{'loss': 0.6655, 'grad_norm': 0.24759244918823242, 'learning_rate': 0.00016473712161444503, 'epoch': 0.18}
{'loss': 0.6099, 'grad_norm': 0.25295934081077576, 'learning_rate': 0.000164701717118074, 'epoch': 0.18}
{'loss': 0.6678, 'grad_norm': 0.2240295708179474, 'learning_rate': 0.00016466631262170296, 'epoch': 0.18}
{'loss': 0.8088, 'grad_norm': 0.2650209665298462, 'learning_rate': 0.00016463090812533193, 'epoch': 0.18}
{'loss': 1.1236, 'grad_norm': 0.2574915587902069, 'learning_rate': 0.00016459550362896087, 'epoch': 0.18}
{'loss': 0.9638, 'grad_norm': 0.22810406982898712, 'learning_rate': 0.00016456009913258984, 'epoch': 0.18}
{'loss': 0.7118, 'grad_norm': 0.2429683655500412, 'learning_rate': 0.0001645246946362188, 'epoch': 0.18}
{'loss': 0.7952, 'grad_norm': 0.18560190498828888, 'learning_rate': 0.00016448929013984778, 'epoch': 0.18}
{'loss': 0.8675, 'grad_norm': 0.23652635514736176, 'learning_rate': 0.00016445388564347672, 'epoch': 0.18}
{'loss': 0.9203, 'grad_norm': 0.247690424323



{'loss': 0.8498, 'grad_norm': 0.2456086277961731, 'learning_rate': 0.00014703487342892547, 'epoch': 0.27}
{'loss': 0.7344, 'grad_norm': 0.23290085792541504, 'learning_rate': 0.00014699946893255444, 'epoch': 0.27}
{'loss': 0.7642, 'grad_norm': 0.20329609513282776, 'learning_rate': 0.0001469640644361834, 'epoch': 0.27}
{'loss': 0.6411, 'grad_norm': 0.2410823106765747, 'learning_rate': 0.00014692865993981238, 'epoch': 0.27}
{'loss': 0.9118, 'grad_norm': 0.2228451520204544, 'learning_rate': 0.00014689325544344132, 'epoch': 0.27}
{'loss': 0.7991, 'grad_norm': 0.3148958384990692, 'learning_rate': 0.00014685785094707026, 'epoch': 0.27}
{'loss': 0.8972, 'grad_norm': 0.21027664840221405, 'learning_rate': 0.00014682244645069926, 'epoch': 0.27}
{'loss': 0.6111, 'grad_norm': 0.258932888507843, 'learning_rate': 0.0001467870419543282, 'epoch': 0.27}
{'loss': 0.6503, 'grad_norm': 0.20580044388771057, 'learning_rate': 0.00014675163745795717, 'epoch': 0.27}
{'loss': 0.7234, 'grad_norm': 0.2476489096879



{'loss': 0.7365, 'grad_norm': 0.23606492578983307, 'learning_rate': 0.00012933262524340592, 'epoch': 0.35}
{'loss': 0.7442, 'grad_norm': 0.29858458042144775, 'learning_rate': 0.0001292972207470349, 'epoch': 0.35}
{'loss': 1.0686, 'grad_norm': 0.3309692144393921, 'learning_rate': 0.00012926181625066386, 'epoch': 0.35}
{'loss': 0.8067, 'grad_norm': 0.2661139667034149, 'learning_rate': 0.0001292264117542928, 'epoch': 0.35}
{'loss': 0.6633, 'grad_norm': 0.20102441310882568, 'learning_rate': 0.00012919100725792177, 'epoch': 0.35}
{'loss': 0.6424, 'grad_norm': 0.3060685396194458, 'learning_rate': 0.0001291556027615507, 'epoch': 0.35}
{'loss': 0.5998, 'grad_norm': 0.18895766139030457, 'learning_rate': 0.0001291201982651797, 'epoch': 0.35}
{'loss': 0.6661, 'grad_norm': 0.23000478744506836, 'learning_rate': 0.00012908479376880864, 'epoch': 0.36}
{'loss': 0.8632, 'grad_norm': 0.22174608707427979, 'learning_rate': 0.0001290493892724376, 'epoch': 0.36}
{'loss': 0.7792, 'grad_norm': 0.2389176636934



{'loss': 0.6804, 'grad_norm': 0.2083280384540558, 'learning_rate': 0.00011163037705788635, 'epoch': 0.44}
{'loss': 0.562, 'grad_norm': 0.22783422470092773, 'learning_rate': 0.00011159497256151532, 'epoch': 0.44}
{'loss': 0.6954, 'grad_norm': 0.24710874259471893, 'learning_rate': 0.00011155956806514427, 'epoch': 0.44}
{'loss': 0.6193, 'grad_norm': 0.18318966031074524, 'learning_rate': 0.00011152416356877324, 'epoch': 0.44}
{'loss': 0.8396, 'grad_norm': 0.35866793990135193, 'learning_rate': 0.0001114887590724022, 'epoch': 0.44}
{'loss': 0.725, 'grad_norm': 0.21781031787395477, 'learning_rate': 0.00011145335457603117, 'epoch': 0.44}
{'loss': 0.681, 'grad_norm': 0.20789621770381927, 'learning_rate': 0.00011141795007966012, 'epoch': 0.44}
{'loss': 0.7947, 'grad_norm': 0.23209047317504883, 'learning_rate': 0.00011138254558328909, 'epoch': 0.44}
{'loss': 0.7888, 'grad_norm': 0.24139094352722168, 'learning_rate': 0.00011134714108691804, 'epoch': 0.44}
{'loss': 0.7503, 'grad_norm': 0.2531319260



{'loss': 0.6128, 'grad_norm': 0.2251749038696289, 'learning_rate': 9.39281288723668e-05, 'epoch': 0.53}
{'loss': 0.887, 'grad_norm': 0.2277165949344635, 'learning_rate': 9.389272437599575e-05, 'epoch': 0.53}
{'loss': 0.4767, 'grad_norm': 0.17773526906967163, 'learning_rate': 9.385731987962472e-05, 'epoch': 0.53}
{'loss': 0.5515, 'grad_norm': 0.205086812376976, 'learning_rate': 9.382191538325368e-05, 'epoch': 0.53}
{'loss': 0.7033, 'grad_norm': 0.20704199373722076, 'learning_rate': 9.378651088688264e-05, 'epoch': 0.53}
{'loss': 0.6033, 'grad_norm': 0.25107741355895996, 'learning_rate': 9.37511063905116e-05, 'epoch': 0.53}
{'loss': 0.5945, 'grad_norm': 0.24195027351379395, 'learning_rate': 9.371570189414057e-05, 'epoch': 0.53}
{'loss': 0.818, 'grad_norm': 0.2949829399585724, 'learning_rate': 9.368029739776952e-05, 'epoch': 0.53}
{'loss': 0.8531, 'grad_norm': 0.23077252507209778, 'learning_rate': 9.364489290139849e-05, 'epoch': 0.53}
{'loss': 0.7484, 'grad_norm': 0.2248276025056839, 'lear



{'loss': 0.8086, 'grad_norm': 0.24997824430465698, 'learning_rate': 7.622588068684723e-05, 'epoch': 0.62}
{'loss': 0.5649, 'grad_norm': 0.2368565797805786, 'learning_rate': 7.619047619047618e-05, 'epoch': 0.62}
{'loss': 0.6535, 'grad_norm': 0.20250937342643738, 'learning_rate': 7.615507169410515e-05, 'epoch': 0.62}
{'loss': 0.7053, 'grad_norm': 0.27611875534057617, 'learning_rate': 7.611966719773411e-05, 'epoch': 0.62}
{'loss': 0.6388, 'grad_norm': 0.2672882676124573, 'learning_rate': 7.608426270136308e-05, 'epoch': 0.62}
{'loss': 0.7386, 'grad_norm': 0.21487553417682648, 'learning_rate': 7.604885820499203e-05, 'epoch': 0.62}
{'loss': 0.6917, 'grad_norm': 0.2893073558807373, 'learning_rate': 7.6013453708621e-05, 'epoch': 0.62}
{'loss': 0.5603, 'grad_norm': 0.2019592523574829, 'learning_rate': 7.597804921224995e-05, 'epoch': 0.62}
{'loss': 0.749, 'grad_norm': 0.2201721966266632, 'learning_rate': 7.594264471587892e-05, 'epoch': 0.62}
{'loss': 0.7327, 'grad_norm': 0.25742068886756897, 'le



{'loss': 0.5564, 'grad_norm': 0.23265300691127777, 'learning_rate': 5.8523632501327675e-05, 'epoch': 0.71}
{'loss': 0.7101, 'grad_norm': 0.21547646820545197, 'learning_rate': 5.848822800495664e-05, 'epoch': 0.71}
{'loss': 0.6537, 'grad_norm': 0.3136346936225891, 'learning_rate': 5.84528235085856e-05, 'epoch': 0.71}
{'loss': 0.6824, 'grad_norm': 0.2742202579975128, 'learning_rate': 5.841741901221456e-05, 'epoch': 0.71}
{'loss': 0.611, 'grad_norm': 0.17679938673973083, 'learning_rate': 5.838201451584352e-05, 'epoch': 0.71}
{'loss': 0.8445, 'grad_norm': 0.22496432065963745, 'learning_rate': 5.834661001947248e-05, 'epoch': 0.71}
{'loss': 0.662, 'grad_norm': 0.22333352267742157, 'learning_rate': 5.8311205523101445e-05, 'epoch': 0.71}
{'loss': 0.5, 'grad_norm': 0.23228107392787933, 'learning_rate': 5.827580102673039e-05, 'epoch': 0.71}
{'loss': 0.7332, 'grad_norm': 0.23300538957118988, 'learning_rate': 5.8240396530359354e-05, 'epoch': 0.71}
{'loss': 0.7326, 'grad_norm': 0.26597335934638977, 



{'loss': 0.647, 'grad_norm': 0.24235528707504272, 'learning_rate': 4.0856788812179146e-05, 'epoch': 0.8}
{'loss': 0.683, 'grad_norm': 0.22401919960975647, 'learning_rate': 4.082138431580811e-05, 'epoch': 0.8}
{'loss': 0.7446, 'grad_norm': 0.23382307589054108, 'learning_rate': 4.078597981943707e-05, 'epoch': 0.8}
{'loss': 0.7729, 'grad_norm': 0.2826232314109802, 'learning_rate': 4.075057532306603e-05, 'epoch': 0.8}
{'loss': 0.5855, 'grad_norm': 0.16573365032672882, 'learning_rate': 4.071517082669499e-05, 'epoch': 0.8}
{'loss': 0.6149, 'grad_norm': 0.2068246454000473, 'learning_rate': 4.0679766330323954e-05, 'epoch': 0.8}
{'loss': 0.6292, 'grad_norm': 0.25116756558418274, 'learning_rate': 4.0644361833952915e-05, 'epoch': 0.8}
{'loss': 0.749, 'grad_norm': 0.24593499302864075, 'learning_rate': 4.060895733758188e-05, 'epoch': 0.8}
{'loss': 0.7501, 'grad_norm': 0.2518158257007599, 'learning_rate': 4.057355284121084e-05, 'epoch': 0.8}
{'loss': 0.6009, 'grad_norm': 0.2378339171409607, 'learnin



{'loss': 0.5359, 'grad_norm': 0.1733028143644333, 'learning_rate': 2.315454062665959e-05, 'epoch': 0.88}
{'loss': 0.8219, 'grad_norm': 0.23643222451210022, 'learning_rate': 2.311913613028855e-05, 'epoch': 0.88}
{'loss': 0.9079, 'grad_norm': 0.19090883433818817, 'learning_rate': 2.3083731633917508e-05, 'epoch': 0.88}
{'loss': 0.5926, 'grad_norm': 0.20389729738235474, 'learning_rate': 2.304832713754647e-05, 'epoch': 0.89}
{'loss': 0.6139, 'grad_norm': 0.22022655606269836, 'learning_rate': 2.301292264117543e-05, 'epoch': 0.89}
{'loss': 0.5927, 'grad_norm': 0.16876593232154846, 'learning_rate': 2.2977518144804393e-05, 'epoch': 0.89}
{'loss': 0.5636, 'grad_norm': 0.18037112057209015, 'learning_rate': 2.294211364843335e-05, 'epoch': 0.89}
{'loss': 0.9375, 'grad_norm': 0.29497581720352173, 'learning_rate': 2.2906709152062313e-05, 'epoch': 0.89}
{'loss': 0.8498, 'grad_norm': 0.2735775411128998, 'learning_rate': 2.2871304655691274e-05, 'epoch': 0.89}
{'loss': 0.5981, 'grad_norm': 0.224390044808



{'loss': 0.8214, 'grad_norm': 0.26235687732696533, 'learning_rate': 5.452292441140025e-06, 'epoch': 0.97}
{'loss': 0.6556, 'grad_norm': 0.27478864789009094, 'learning_rate': 5.416887944768986e-06, 'epoch': 0.97}
{'loss': 0.5252, 'grad_norm': 0.20414888858795166, 'learning_rate': 5.381483448397947e-06, 'epoch': 0.97}
{'loss': 0.7811, 'grad_norm': 0.24595856666564941, 'learning_rate': 5.346078952026907e-06, 'epoch': 0.97}
{'loss': 0.7354, 'grad_norm': 0.2255927324295044, 'learning_rate': 5.310674455655869e-06, 'epoch': 0.97}
{'loss': 0.7331, 'grad_norm': 0.2308746874332428, 'learning_rate': 5.275269959284829e-06, 'epoch': 0.97}
{'loss': 0.5281, 'grad_norm': 0.23222166299819946, 'learning_rate': 5.23986546291379e-06, 'epoch': 0.97}
{'loss': 0.57, 'grad_norm': 0.22308744490146637, 'learning_rate': 5.2044609665427516e-06, 'epoch': 0.97}
{'loss': 0.6595, 'grad_norm': 0.24802899360656738, 'learning_rate': 5.169056470171712e-06, 'epoch': 0.97}
{'loss': 0.7384, 'grad_norm': 0.2327650636434555, 



{'train_runtime': 22434.1296, 'train_samples_per_second': 2.016, 'train_steps_per_second': 0.252, 'train_loss': 0.7496720589341568, 'epoch': 1.0}


The finetuned model is then saved to disk. Since we only modifed the LoRA adapters, the export fits on GitHub.

In [12]:
model.save_pretrained("qwen-brilliant-1.5B")
tokenizer.save_pretrained("qwen-brilliant-1.5B")



('qwen-brilliant-1.5B/tokenizer_config.json',
 'qwen-brilliant-1.5B/special_tokens_map.json',
 'qwen-brilliant-1.5B/vocab.json',
 'qwen-brilliant-1.5B/merges.txt',
 'qwen-brilliant-1.5B/added_tokens.json',
 'qwen-brilliant-1.5B/tokenizer.json')

If we don't want to repeat the finetuning process, we can then simply load the model from disk.

In [10]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "qwen-brilliant-1.5B",
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

==((====))==  Unsloth 2025.1.6: Fast Qwen2 patching. Transformers: 4.47.1.
   \\   /|    GPU: NVIDIA GeForce RTX 2080 Super with Max-Q Design. Max memory: 7.781 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Qwen2ForCausalLM(
      (model): Qwen2Model(
        (embed_tokens): Embedding(151936, 1536, padding_idx=151665)
        (layers): ModuleList(
          (0-27): 28 x Qwen2DecoderLayer(
            (self_attn): Qwen2Attention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=1536, out_features=1536, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=1536, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=1536, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora

Finally, we benchmark the new model. Note that since we excluded `df_sample` from the training data, the model can't just memorize the correct answers.

In [11]:
#df_sample = df.sample(n=50, random_state=43)
benchmark(df_sample, model)

Processing Questions:   0%|          | 0/50 [00:00<?, ?it/s]

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Please reason step by step, and put your final answer within \boxed{}.

### Input:
If $(1,x,y)$ is a geometric sequence and $(x,y,3)$ is an arithmetic sequence then find the maximum value of $x+y$ .

### Response:
 Thus, the final answer is \\boxed{12}.<|endoftext|>
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Please reason step by step, and put your final answer within \boxed{}.

### Input:
Adam, Bob, Caleb, Dylan, Elaine, Francis, Gillian, and Hamilton all decided they wanted to play Mario Kart together. They played three races, and the winner was determined by a points system:
​ For getting first place in a single race, a player was awarded 8 points.​ For getting second place in a 

We observe that the finetuning did not lead to an improvement. This could be caused by the model being unsuited for the task, since the problems are simply too hard for such a small model. Another reason could be that the mined dataset is of quite low quality, since as reasoning we automatically choose the highest upvoted comment without any human moderation that it contains a sensible and correct proof.

(c) Mia Müßig