# Fine Tuning the gemma-270m model

Neste notebook vamos realizar uma demonstração de um simples fine tuning com o modelo gemma-270m.
## Motivação
Este modelo é extremamente pequeno e eficiente para o seu tamanho, o que reduz drasticamente os gastos computacionais para a realização de um fine tuning. Cogitei realizar o fine tuning dentro do ambiente da Bedrock, porém como todo o projeto principal já utilizou o Bedrock para diversas tarefas, optei por realizar o fine tuning de outra forma, a fim de mostrar um espectro maior de habilidades. Além disso, pessoalmente já queria tentar realizar o fine tuning desde modelo desde o seu lançamento, há alguns meses.  

In [1]:
# Se estiver no Google Colab tira o comentario abaixo
!pip install unsloth datasets trl

Collecting unsloth
  Downloading unsloth-2025.9.11-py3-none-any.whl.metadata (55 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/55.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.1/55.1 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Collecting trl
  Downloading trl-0.23.1-py3-none-any.whl.metadata (11 kB)
Collecting unsloth_zoo>=2025.9.13 (from unsloth)
  Downloading unsloth_zoo-2025.9.14-py3-none-any.whl.metadata (31 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.32.post2-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.1 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.48.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.32-py3-none-any.whl.metadata (11 kB)
Collecting datasets
  Downloading datasets-4.1.1-py3-none-any.whl.metadata (18 kB)
Collecting trl
  Downloading trl-0.23.0-py3-none-a

In [2]:
import torch
import pandas as pd
from datasets import load_dataset
from unsloth import FastModel
import json

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [3]:
max_seq_len = 2048

In [4]:
# https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(270M).ipynb#scrollTo=-Xbb0cuLzwgf
model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3-270m-it",
    max_seq_length = max_seq_len,
    load_in_4bit = False,
    load_in_8bit = False,
    full_finetuning = False,
)

==((====))==  Unsloth 2025.9.11: Fast Gemma3 patching. Transformers: 4.56.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.
Unsloth: Gemma3 does not support SDPA - switching to fast eager.
Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.


model.safetensors:   0%|          | 0.00/536M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/233 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/670 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

In [5]:
model = FastModel.get_peft_model(
    model,
    r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 128,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth: Making `model.base_model.model.model` require gradients


In [6]:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma3",
)

In [7]:
def processar_chatml(exemplo):
  prompt = f"Question: {exemplo['question']}"
  completion = f"Answer: {exemplo['answers']['text']}"
  return {"conversations": [
    {"role": "user", "content": prompt},
    {"role": "assistant", "content": completion}
  ]}


In [8]:
def formatar_prompts(exemplos):
   convos = exemplos["conversations"]
   texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('<bos>') for convo in convos]
   return { "text" : texts, }


In [9]:

dataset = load_dataset("rajpurkar/squad")
dataset = dataset.map(processar_chatml)
dataset = dataset.map(formatar_prompts, batched = True)

dataset_misturado = dataset['train'].shuffle(seed=42)
train_dataset = dataset_misturado.select(range(10000))
test_dataset = dataset_misturado.select(range(10000, 15000))

print(f"Training set size: {len(train_dataset)}")
print(f"Test set size: {len(test_dataset)}")

README.md: 0.00B [00:00, ?B/s]

plain_text/train-00000-of-00001.parquet:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

plain_text/validation-00000-of-00001.par(…):   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

Map:   0%|          | 0/87599 [00:00<?, ? examples/s]

Map:   0%|          | 0/10570 [00:00<?, ? examples/s]

Map:   0%|          | 0/87599 [00:00<?, ? examples/s]

Map:   0%|          | 0/10570 [00:00<?, ? examples/s]

Training set size: 10000
Test set size: 5000


In [10]:
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset = test_dataset, # Can set up evaluation!
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 1, # Use GA to mimic batch size!
        warmup_steps = 5,
        #num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 100,
        learning_rate = 5e-5, # Reduce to 2e-5 for long training runs
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir="outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Unsloth: Switching to float32 training since model cannot work with float16


Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/10000 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/5000 [00:00<?, ? examples/s]

In [11]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<start_of_turn>user\n",
    response_part = "<start_of_turn>model\n",
)

Map (num_proc=2):   0%|          | 0/10000 [00:00<?, ? examples/s]

Map (num_proc=2):   0%|          | 0/5000 [00:00<?, ? examples/s]

In [12]:
tokenizer.decode(trainer.train_dataset[100]["input_ids"])

"<bos><start_of_turn>user\nQuestion: What's one of the most popular tourist destinations in the world?<end_of_turn>\n<start_of_turn>model\nAnswer: ['the Alps']<end_of_turn>\n"

In [13]:
tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100]["labels"]]).replace(tokenizer.pad_token, " ").replace("<end_of_turn>", "")

"                         Answer: ['the Alps']\n"

In [14]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10,000 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 1 x 1) = 8
 "-____-"     Trainable parameters = 30,375,936 of 298,474,112 (10.18% trained)


Step,Training Loss
10,3.8296
20,1.8312
30,2.0198
40,2.013
50,2.095
60,1.8357
70,1.7354
80,1.878
90,1.6832
100,1.6603


In [15]:
messages = [
    {'role': 'system','content':test_dataset['conversations'][10][0]['content']},
    {"role" : 'user', 'content' : test_dataset['conversations'][10][0]['content']}
]
print(test_dataset['conversations'][10][1]['content'])
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # Must add for generation
).removeprefix('<bos>')

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 150,
    temperature = 1, top_p = 0.95, top_k = 64,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

Answer: ['Middle Way']
Answer: ['metropolitan']<end_of_turn>


In [16]:
model.save_pretrained("gemma-3")  # Local saving
tokenizer.save_pretrained("gemma-3")

('gemma-3/tokenizer_config.json',
 'gemma-3/special_tokens_map.json',
 'gemma-3/chat_template.jinja',
 'gemma-3/tokenizer.model',
 'gemma-3/added_tokens.json',
 'gemma-3/tokenizer.json')

In [17]:
messages = [
    {'role': 'system','content':test_dataset['conversations'][10][0]['content']},
    {"role" : 'user', 'content' : test_dataset['conversations'][10][1]['content']}
]
print(test_dataset['conversations'][10][0]['content'])
print(test_dataset['conversations'][10][1]['content'])
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # Must add for generation
).removeprefix('<bos>')

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 250,
    temperature = 1, top_p = 0.95, top_k = 64,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

a
Question: What is the path of moderation called he followed?
Answer: ['Middle Way']
Answer: ['of the middle way']<end_of_turn>
