__Homework 5__

The goal of this homework is to finetune a pretrained LLM to be better at mathematical reasoning.

As before, we will use a Llama-3-8B model. We will finetune the model on a small subset of school-math instructions. We will use LoRA for efficient finetuning. We will notice how the model's ability to reason in simple math questions is significantly improved after finetuning.

Most of the code is given to you. Your task is to implement the optimizer and tune its hyperparameters properly. One epoch of finetuning is sufficient for this homework


In [1]:
# Installs Unsloth, Xformers (Flash Attention) and all other packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install xformers==0.0.29 trl peft accelerate bitsandbytes
!pip install transformers datasets torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-6kw5qidc/unsloth_600bdaf78e3f4f1081c1c0aaaaf2549f
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-6kw5qidc/unsloth_600bdaf78e3f4f1081c1c0aaaaf2549f
  Resolved https://github.com/unslothai/unsloth.git to commit 07b9490cd0540de0ea21b676e49551b99f4fe20f
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting unsloth_zoo>=2025.5.11 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Downloading unsloth_zoo-2025.6.1-py3-none-any.whl.metadata (8.1 kB)
Collecting tyro (from unsloth@ git+https://github.com/unslothai/unsloth.gi

Collecting xformers==0.0.29
  Downloading xformers-0.0.29-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Downloading xformers-0.0.29-cp311-cp311-manylinux_2_28_x86_64.whl (15.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m68.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: xformers
Successfully installed xformers-0.0.29


In [2]:
# Import necessary libraries
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from transformers import TextStreamer

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [3]:
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/llama-3-8b-bnb-4bit", # [NEW] 15 Trillion token Llama-3
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

FastLanguageModel.for_inference(model)

==((====))==  Unsloth 2025.6.1: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128255)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSN

In [5]:
# initialize LoRA parameters and associate them with the model
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                       "gate_proj","up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth", # reducing memory
    random_state = 1,
)

Unsloth: Already have LoRA adapters! We shall skip this step.


In [6]:
# load OpenAI's grade-school-math dataset from the paper Cobbe et al, 'Training Verifiers to Solve Math World Problems', 2021
dataset = load_dataset("qwedsacf/grade-school-math-instructions", split = "train")
print('The dataset has ', len(dataset), '  many entries')

dataset = dataset.select(range(500)) #Take only 500 examples
print('We take ' , len(dataset), ' many entries for this homework ')

print('This is how an entry of the dataset looks like: ' , dataset[0])

README.md:   0%|          | 0.00/852 [00:00<?, ?B/s]

(…)-00000-of-00001-3f5d416810641542.parquet:   0%|          | 0.00/2.55M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8792 [00:00<?, ? examples/s]

The dataset has  8792   many entries
We take  500  many entries for this homework 
This is how an entry of the dataset looks like:  {'INSTRUCTION': 'This math problem has got me stumped: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\nCan you show me the way?', 'RESPONSE': 'Natalia sold 48/2 = 24 clips in May.\nNatalia sold 48+24 = 72 clips altogether in April and May.', 'SOURCE': 'grade-school-math'}


In [7]:
# Prepare the data for finetuning, we concatenate the INSTRUCTION and RESPONSE fields from the grade-school-math instructions dataset
# into one string.

prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts(examples):
    instructions = examples["INSTRUCTION"]
    responses     = examples["RESPONSE"]
    texts = []
    for instruction, response in zip(instructions, responses):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = prompt.format(instruction, response) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

dataset = dataset.map(formatting_prompts, batched = True)
print(dataset[0]['text'])

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
This math problem has got me stumped: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
Can you show me the way?

### Response:
Natalia sold 48/2 = 24 clips in May.
Natalia sold 48+24 = 72 clips altogether in April and May.<|end_of_text|>


In [8]:
#Here we check how the pretrained model responds to 3 simple questions before finetuning.
Q1 = "I have 10 apples, my brother took half of them from me, I lost 1, and my friend gave me 3. How many do I have now?"
Q2 = "I earn five euros per hour. I worked two hours yesterday and five hours today. How much did I earn in total?"
Q3 = "In year 2000 I was 20 years old. My sister is 5 years younger than me. How old is she in 2020?"

input1 = tokenizer([prompt.format(Q1, "",)],return_tensors = "pt").to("cuda")
input2 = tokenizer([prompt.format(Q2, "",)],return_tensors = "pt").to("cuda")
input3 = tokenizer([prompt.format(Q3, "",)],return_tensors = "pt").to("cuda")

In [9]:
#Response to first question
_= model.generate(**input1, streamer = TextStreamer(tokenizer), max_new_tokens = 70, do_sample=False)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<|begin_of_text|>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
I have 10 apples, my brother took half of them from me, I lost 1, and my friend gave me 3. How many do I have now?

### Response:
I have 10 apples, my brother took half of them from me, I lost 1, and my friend gave me 3. How many do I have now?

### Explanation:
I have 10 apples, my brother took half of them from me, I lost 1, and my friend gave me 3. How many do I


In [10]:
#Response to second question
_= model.generate(**input2, streamer = TextStreamer(tokenizer), max_new_tokens = 70, do_sample=False)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<|begin_of_text|>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
I earn five euros per hour. I worked two hours yesterday and five hours today. How much did I earn in total?

### Response:
I earned 10 euros in total.<|end_of_text|>


In [11]:
#Response to third question
_= model.generate(**input3, streamer = TextStreamer(tokenizer), max_new_tokens = 70, do_sample=False)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<|begin_of_text|>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
In year 2000 I was 20 years old. My sister is 5 years younger than me. How old is she in 2020?

### Response:
In year 2000 I was 20 years old. My sister is 5 years younger than me. How old is she in 2020?

### Explanation:
In year 2000 I was 20 years old. My sister is 5 years younger than me. How old is she in 2020?

### Instruction:
In year


Implement and run the finetuning here. We recommend using the SFTTrainer (Supervised FineTuning Trainer). One epoch of finetuning is enough for the model to answer the questions below correctly.

In [21]:
from trl import SFTConfig, SFTTrainer

In [33]:
training_args = SFTConfig(packing=True, eos_token=EOS_TOKEN, num_train_epochs=1)

trainer = SFTTrainer(model,
                     args=training_args,
                     train_dataset=dataset,
                     formatting_func= formatting_prompts,
                     tokenizer=tokenizer)
trainer.train()

Unsloth: Hugging Face's packing is currently buggy - we're disabling it for now!


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 1 | Total steps = 63
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 2 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


Step,Training Loss
1,0.9852
2,1.0269
3,0.9756
4,1.0944
5,0.9576
6,1.0628
7,0.859
8,1.0052
9,0.9821
10,1.0412


TrainOutput(global_step=63, training_loss=0.9329866237110562, metrics={'train_runtime': 363.7146, 'train_samples_per_second': 1.375, 'train_steps_per_second': 0.173, 'total_flos': 5211142342606848.0, 'train_loss': 0.9329866237110562})

In [34]:
#Response to first question after finetuning
_= model.generate(**input1, streamer = TextStreamer(tokenizer), max_new_tokens = 70, do_sample=False)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<|begin_of_text|>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
I have 10 apples, my brother took half of them from me, I lost 1, and my friend gave me 3. How many do I have now?

### Response:
My brother took half of them from me, so I have 10/2 = 5 apples.
I lost 1, so I have 5-1 = 4 apples.
My friend gave me 3, so I have 4+3 = 7 apples.<|end_of_text|>


In [35]:
#Response to second question after finetuning
_= model.generate(**input2, streamer = TextStreamer(tokenizer), max_new_tokens = 70, do_sample=False)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<|begin_of_text|>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
I earn five euros per hour. I worked two hours yesterday and five hours today. How much did I earn in total?

### Response:
I earned 5 * 2 = 10 euros yesterday.
I earned 5 * 5 = 25 euros today.
I earned 10 + 25 = 35 euros in total.<|end_of_text|>


In [36]:
#Response to third question after finetuning
_= model.generate(**input3, streamer = TextStreamer(tokenizer), max_new_tokens = 70, do_sample=False)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<|begin_of_text|>Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
In year 2000 I was 20 years old. My sister is 5 years younger than me. How old is she in 2020?

### Response:
In 2020 I will be 20+20=40 years old.
My sister will be 40-5=35 years old.<|end_of_text|>
