# Fine-tune Gemma 3n for function calling
This notebook is based on the [article](https://medium.com/@lucamassaron/fine-tuning-gemma-3-1b-for-function-calling-a-step-by-step-guide-66a613352f99) and [code](https://gist.github.com/lmassaron/7166f58912ff23de3fa627671fac07df) by Luca Massaron for fine-tuning the Gemma 3 1B model for function calling, together with the [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb) released by Unsloth for fine-tuning Gemma 3n models.

In [1]:
# Install latest transformers for Gemma 3N
# !pip install --no-deps transformers>=4.53.1 # Only for Gemma 3N
# !pip install --no-deps --upgrade timm # Only for Gemma 3N
# from huggingface_hub import login
# login()

In [2]:
# Get the model from Unsloth
from unsloth import FastModel
import torch

torch._dynamo.config.cache_size_limit = 64  # or higher  

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit", # Text only but it's for Ollama so it should be fine for now
    dtype = None, # None for auto detection
    max_seq_length = 2048, # Used for training for function call responses
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    full_finetuning = False, 
    attn_implementation="flash_attention", 
    # token = "hf_...", # use one if using gated models
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.8.1: Fast Gemma3N patching. Transformers: 4.55.0.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.034 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.1+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.31.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Gemma3N does not support SDPA - switching to eager!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
# Get the finetuning model
from peft import TaskType
lora_arguments = {
    "r": 16,
    "lora_alpha": 64,
    "lora_dropout": 0.05
}
model = FastModel.get_peft_model(
    model,
    finetune_vision_layers     = False, # Turn off for just text!
    finetune_language_layers   = True,  # Should leave on!
    finetune_attention_modules = True,  # Attention good for GRPO
    finetune_mlp_modules       = True,  # Should leave on always!
    **lora_arguments,
    task_type = TaskType.CAUSAL_LM, 
)

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.


Unsloth: Making `model.base_model.model.model.language_model` require gradients


In [4]:
# Get the chat template
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma3n",
)

# Use this chat template for training tool calls
tokenizer.chat_template = (
    "{{ bos_token }}{% for message in messages %}{% if message['role'] != 'system' %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
)

In [5]:
# Load the dataset
from datasets import load_dataset
_dataset = load_dataset("lmassaron/hermes-function-calling-v1")
dataset = _dataset["train"]
eval_dataset = _dataset["test"]

In [6]:
# Convert the dataset to the correct format for finetuning
def formatting_prompts_func(examples):
   convos = examples["conversations"]
   texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('<bos>') for convo in convos]
   return { "text" : texts, }

dataset =  dataset.map(formatting_prompts_func, batched = True)
eval_dataset =  eval_dataset.map(formatting_prompts_func, batched = True)
dataset[100]["text"]

Map:   0%|          | 0/1042 [00:00<?, ? examples/s]

"<start_of_turn>human\nYou are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'analyze_website', 'description': 'Analyze the content and structure of a website', 'parameters': {'type': 'object', 'properties': {'url': {'type': 'string', 'description': 'The URL of the website to analyze'}}, 'required': ['url']}}}, {'type': 'function', 'function': {'name': 'calculate_bmi', 'description': 'Calculate the Body Mass Index (BMI)', 'parameters': {'type': 'object', 'properties': {'weight': {'type': 'number', 'description': 'The weight in kilograms'}, 'height': {'type': 'number', 'description': 'The height in meters'}}, 'required': ['weight', 'height']}}}] </tools>Use the following pydantic model json schema for each tool call you will

In [None]:
# Setup the fine-tuning trainer

training_arguments = {
    # Basic training configuration
    "num_train_epochs": 1,
    #"max_steps": 120,
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 1,
    "gradient_accumulation_steps": 4,
    # Optimization settings
    "optim": "adamw_torch_fused",
    "learning_rate": 1e-4,
    "weight_decay": 0.1,
    "max_grad_norm": 1.0,
    "lr_scheduler_type": "cosine",
    "warmup_ratio": 0.1,
    # Memory optimization
    "gradient_checkpointing": True,
    "gradient_checkpointing_kwargs": {"use_reentrant": False},
    # Evaluation and saving
    "eval_strategy": "epoch",
    "save_strategy": "epoch",
    "save_total_limit": 2,
    "load_best_model_at_end": True,
    "metric_for_best_model": "eval_loss",
    "greater_is_better": False,
    # Logging and output
    "logging_steps": 5,
    "report_to": None,
    "logging_dir": "logs/runs",
    "overwrite_output_dir": True,
    # Model sharing
    "push_to_hub": False,
    "hub_private_repo": False,
}

from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    eval_dataset= eval_dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    dataset_num_proc = 2,
    packing = True,
    args = SFTConfig(**training_arguments),
    
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/4167 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1042 [00:00<?, ? examples/s]

In [10]:
# Train the model on the responses only, ignore user instructions
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<start_of_turn>user\n",
    response_part = "<start_of_turn>model\n",
)

Map (num_proc=8):   0%|          | 0/1042 [00:00<?, ? examples/s]

In [11]:
# Verify the chat template was applied correctly. Only 1 <bos> token should be present.
tokenizer.decode(trainer.train_dataset[100]["input_ids"])

"<bos><start_of_turn>human\nYou are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'analyze_website', 'description': 'Analyze the content and structure of a website', 'parameters': {'type': 'object', 'properties': {'url': {'type': 'string', 'description': 'The URL of the website to analyze'}}, 'required': ['url']}}}, {'type': 'function', 'function': {'name': 'calculate_bmi', 'description': 'Calculate the Body Mass Index (BMI)', 'parameters': {'type': 'object', 'properties': {'weight': {'type': 'number', 'description': 'The weight in kilograms'}, 'height': {'type': 'number', 'description': 'The height in meters'}}, 'required': ['weight', 'height']}}}] </tools>Use the following pydantic model json schema for each tool call you

In [12]:
# Verify user instruction masked
tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100]["labels"]]).replace(tokenizer.pad_token, " ")

"                                                                                                                                                                                                                                                                                                                                                                                                                    Of course, I can help you with that. Could you please provide me with the URL of the website you want to analyze?<end_of_turn><eos>\n<start_of_turn>human\nSure, the website URL is www.example.com.<end_of_turn><eos>\n<start_of_turn>model\n<tool_call>\n{'name': 'analyze_website', 'arguments': {'url': 'www.example.com'}}\n</tool_call><end_of_turn><eos>\n<start_of_turn>tool\n<tool_response>\n{'status': 'success', 'message': 'Website analysis completed', 'data': {'structure': 'The website has a clear and intuitive structure with a navigation menu at the top. The homepage, about us, services, a

In [13]:
# Train the model
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 4,167 | Num Epochs = 1 | Total steps = 120
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 38,420,480 of 7,888,398,672 (0.49% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Epoch,Training Loss,Validation Loss
0,0.6237,0.507926


Unsloth: Not an error, but Gemma3nForConditionalGeneration does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


In [14]:
model_name = "gemma3n_e4b_tools_test"
model_merged_hf_repo = "allisterb/gemma3n_e4b_tools_test"
model_gguf_hf_repo = model_merged_hf_repo + "-GGUF"

In [16]:
# Save to float16
model.save_pretrained_merged(model_name, tokenizer)

Found HuggingFace hub cache directory: /home/eddsa-key-20250707/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00004.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Downloading safetensors index for unsloth/gemma-3n-e4b-it...


Unsloth: Merging weights into 16bit:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.08G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit:  25%|██▌       | 1/4 [00:38<01:54, 38.32s/it]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit:  50%|█████     | 2/4 [01:53<02:00, 60.03s/it]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit:  75%|███████▌  | 3/4 [03:10<01:07, 67.77s/it]

model-00004-of-00004.safetensors:   0%|          | 0.00/2.66G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit: 100%|██████████| 4/4 [03:43<00:00, 55.93s/it]


In [34]:
import os
model.push_to_hub_merged(model_merged_hf_repo, tokenizer, token=os.environ["HF_TOKEN"])

No files have been modified since last commit. Skipping to prevent empty commit.


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  /tmp/tmpp__c251d/tokenizer.model      : 100%|##########| 4.70MB / 4.70MB            

  /tmp/tmpp__c251d/tokenizer.json       : 100%|##########| 33.4MB / 33.4MB            

No files have been modified since last commit. Skipping to prevent empty commit.


Found HuggingFace hub cache directory: /home/eddsa-key-20250707/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00004.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Downloading safetensors index for unsloth/gemma-3n-e4b-it...


model.safetensors.index.json: 0.00B [00:00, ?B/s]

No files have been modified since last commit. Skipping to prevent empty commit.
Unsloth: Merging weights into 16bit:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.08G [00:00<?, ?B/s]

Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...1d/model-00001-of-00004.safetensors:   1%|1         | 41.8MB / 3.08GB            

No files have been modified since last commit. Skipping to prevent empty commit.
Unsloth: Merging weights into 16bit:  25%|██▌       | 1/4 [01:08<03:26, 68.68s/it]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...1d/model-00002-of-00004.safetensors:   1%|          | 41.9MB / 4.97GB            

Unsloth: Merging weights into 16bit:  50%|█████     | 2/4 [03:07<03:16, 98.26s/it]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...1d/model-00003-of-00004.safetensors:   0%|          |  602kB / 4.99GB            

Unsloth: Merging weights into 16bit:  75%|███████▌  | 3/4 [05:21<01:54, 114.46s/it]

model-00004-of-00004.safetensors:   0%|          | 0.00/2.66G [00:00<?, ?B/s]

Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...1d/model-00004-of-00004.safetensors:   0%|          | 36.1kB / 2.66GB            

Unsloth: Merging weights into 16bit: 100%|██████████| 4/4 [06:27<00:00, 96.81s/it] 


In [None]:
model.push_to_hub_gguf(
        model_name,
        quantization_type = "Q8_0""Q8_0",
        repo_id = model_gguf_hf_repo,
        token = os.environ["HF_TOKEN"],
    )

TypeError: save_to_gguf_generic() got an unexpected keyword argument 'tokenizer'

In [None]:
# Upload Ollama template to Hugging Face Hub
api.upload_file(
    path_or_fileobj="template",
    path_in_repo="template",
    repo_id=model_gguf_hf_repo,
)

api.create_commit()

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/allisterb/gemma3n_e4b_tools_test-GGUF/commit/03dd2fddd3d881870ac6352d7c46ae5cc2fc0522', commit_message='Upload template with huggingface_hub', commit_description='', oid='03dd2fddd3d881870ac6352d7c46ae5cc2fc0522', pr_url=None, repo_url=RepoUrl('https://huggingface.co/allisterb/gemma3n_e4b_tools_test-GGUF', endpoint='https://huggingface.co', repo_type='model', repo_id='allisterb/gemma3n_e4b_tools_test-GGUF'), pr_revision=None, pr_num=None)